CN112052396A - Course matching method, system, computer equipment and storage medium - Google Patents

Course matching method, system, computer equipment and storage medium Download PDF

Info

Publication number
CN112052396A
CN112052396A CN202011041524.7A CN202011041524A CN112052396A CN 112052396 A CN112052396 A CN 112052396A CN 202011041524 A CN202011041524 A CN 202011041524A CN 112052396 A CN112052396 A CN 112052396A
Authority
CN
China
Prior art keywords
course
keywords
employee
keyword
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011041524.7A
Other languages
Chinese (zh)
Inventor
马丹
曾增烽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202011041524.7A priority Critical patent/CN112052396A/en
Publication of CN112052396A publication Critical patent/CN112052396A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources

Abstract

The invention provides a course matching method, a system, computer equipment and a storage medium, wherein the course matching method comprises the steps of obtaining an enterprise course library and obtaining course keywords in the enterprise course library; receiving a work report of an employee, and mining an employee keyword of the work report by adopting a keyword mining model; calculating matching values of the course keywords and the employee keywords by adopting a similarity algorithm; and automatically recommending courses with high matching values to the employees according to the matching values. Therefore, the invention can automatically mine and analyze the information based on the existing data by applying the NLP technology, thereby automatically giving the personalized recommendation of thousands of courses. Meanwhile, the invention also relates to a block chain technology.

Description

Course matching method, system, computer equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a course matching method, system, computer device, and storage medium.
Background
The training of the enterprise staff is a crucial link for large and small enterprises: this link is related to the remodeling of the cultural and value of the enterprise.
Currently, most of employees are usually trained by the enterprise human resource department for planning, and due to the limitation of human cost and time, the following problems often exist: 1) training items are single, and generally only new employees are trained for job entry, but long-term culture and learning are lacked; 2) the pertinence is poor, the course design of thousands of people is difficult, and the design can not be made according to the professional knowledge base of the staff, the current project work requirement and the like. The existence of the problems often causes poor training effect and difficult training purpose. However, we find that the problems are caused by the fact that the course recommendation mode depends on manpower excessively, and because a human resource department cannot be in face and cannot know the background, the current post and the like of all employees, the course design and recommendation cannot be timely, accurate and targeted.
Disclosure of Invention
Based on the above, the invention provides a course matching method, a course matching system, computer equipment and a storage medium, which are used for solving the problem that information cannot be automatically mined and analyzed, so that the personalized recommendation of thousands of courses can be automatically given.
In order to achieve the above object, the present invention provides a course matching method based on NLP technology, wherein the course matching method comprises:
acquiring an enterprise course library and acquiring course keywords in the enterprise course library;
receiving a work report of an employee, and mining an employee keyword of the work report by adopting a keyword mining model;
calculating matching values of the course keywords and the employee keywords by adopting a similarity algorithm;
and automatically recommending courses with high matching values to the employees according to the matching values.
Preferably, before calculating the matching value of the course keyword and the employee keyword by using the similarity algorithm, the method further includes:
receiving the weight of the set employee keywords;
and selecting effective staff keywords according to the weights of the staff keywords.
Preferably, the employee keywords include post keywords, work content keywords, evaluation keywords, and indicator keywords, and the weights of the post keywords and the work content keywords are higher than the weights of the evaluation keywords and the indicator keywords.
Preferably, the step of calculating the matching value of the course keyword and the employee keyword by using the similarity algorithm includes:
calculating similarity values of the course keywords and the employee keywords by adopting a similarity algorithm;
and multiplying the employee keyword weight by the similarity value calculated by the similarity to obtain a matching value.
Preferably, the step of mining the employee keywords of the job report by using the keyword mining model includes:
preprocessing a training sample set to obtain a BIO input data format required by a serialized labeling model;
extracting the characteristics of each sequence by adopting a serialization labeling model to obtain the semantic characteristics of the text corresponding to each sequence;
classifying the semantic features of the obtained text by using a softmax classifier, and training and testing the model to obtain output data in a BIO format;
and carrying out post-processing on the output data in the BIO format to obtain the employee keywords.
Preferably, after the course with the high matching value is automatically recommended to the employee according to the matching value, the course matching result is uploaded to a block chain, so that the block chain encrypts and stores the course matching result.
In order to achieve the above object, the present invention further provides a course matching system based on NLP technology, wherein the course matching system comprises:
the course keyword mining module is used for acquiring an enterprise course library and acquiring course keywords in the enterprise course library;
the employee keyword mining module is used for receiving the work report of the employee and mining the employee keywords of the work report by adopting a keyword mining model;
the computing module is used for computing the matching value of the course keyword and the employee keyword by adopting a similarity algorithm;
and the matching module is used for automatically recommending courses with high matching values to the employees according to the matching values.
Preferably, the system further comprises a weighting module, wherein the weighting module is used for receiving the weight of the set employee keyword and selecting an effective employee keyword according to the weight of the employee keyword; in the calculation module, the step of calculating the matching value of the course keyword and the employee keyword by using a similarity algorithm includes:
calculating similarity values of the course keywords and the employee keywords by adopting a similarity algorithm;
and multiplying the employee keyword weight by the similarity value calculated by the similarity to obtain a matching value.
To achieve the above object, the present invention further provides a computer device, comprising a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the course matching method as described above.
To achieve the above object, the present invention also provides a computer storage medium storing computer readable instructions, which when executed by a processor, implement the steps of the course matching method as described above.
The invention provides a course matching method, a system, computer equipment and a storage medium, wherein the course matching method obtains an enterprise course library and course keywords in the enterprise course library; receiving a work report of an employee, and mining an employee keyword of the work report by adopting a keyword mining model; calculating matching values of the course keywords and the employee keywords by adopting a similarity algorithm; and automatically recommending courses with high matching values to the employees according to the matching values. Therefore, the invention can automatically mine and analyze the information based on the existing data by applying the NLP technology, thereby automatically giving the personalized recommendation of thousands of courses.
Drawings
FIG. 1 is a diagram of an environment in which a course matching method is implemented, as provided in one embodiment;
FIG. 2 is a block diagram showing an internal configuration of a computer device according to an embodiment;
FIG. 3 is a flow diagram of a course matching method in one embodiment;
FIG. 4 is a flow diagram for mining employee keywords using a keyword mining model in one embodiment;
FIG. 5 is a flow diagram of employee keyword setting weights in one embodiment;
FIG. 6 is a flow diagram that illustrates the calculation of a match value for a course keyword and an employee keyword, under an embodiment;
FIG. 7 is a flow diagram of a method for optimizing course matching in one embodiment;
FIG. 8 is a schematic diagram of a course matching system in one embodiment;
FIG. 9 is another diagram of a course matching system in one embodiment;
FIG. 10 is a schematic diagram of a computer apparatus in one embodiment;
FIG. 11 is a schematic diagram of a storage medium in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another.
Fig. 1 is a diagram of an implementation environment of the course matching method based on NLP technology provided in an embodiment, as shown in fig. 1, in the implementation environment, including a computer device 110 and a terminal device 120.
The computer device 110 may be a computer device such as a computer used by a user, and the computer device 110 is installed with a course matching system based on the NLP technology. When calculating, the user can calculate the matching value in accordance with the course matching method based on the NLP technology at the computer device 110, and notify the matched course to the employee through the terminal device 120.
It should be noted that the computer device 110 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like; the terminal device 120 is mainly a smart phone, a tablet computer, a notebook computer, or a desktop computer, and the course matching system based on the NLP technology is an APP or an application installed in the terminal device 120. The description of the computer device 110 and the terminal device 120 is not limited thereto.
FIG. 2 is a diagram showing an internal configuration of a computer device according to an embodiment. As shown in fig. 2, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize a course matching method based on the NLP technology when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform a course matching method based on NLP techniques. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 2 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
As shown in fig. 3, in an embodiment, a course matching method based on NLP technology is provided, and the course matching method may be applied to the computer device 110 and the display device 120, and specifically includes the following steps:
and step 31, acquiring an enterprise course library, and acquiring course keywords in the enterprise course library.
Specifically, the enterprise needs to prepare an enterprise course library, and extract and set keywords for each course in the enterprise course library in advance. For example, a course "PPT production technique and common template" can be given course keywords: PPT, fabrication, template, skill, etc. If the number of courses is large, the course keywords of the text can be extracted from the course captions, the course PPT and the like by using the natural language processing technology to set the keywords of the courses. More specifically, because the chinese language does not use natural space separation between the words in the english language, some basic NLP techniques are needed to segment the language fragments into independent words, and the commonly used chinese word segmentation tools include ansj, jieba, Baidu or Haugh word segmentation toolkits, and the like. After word segmentation, the course keywords are extracted by adopting tools such as TF-IDF, term-weighting and the like.
Specifically, the keywords of the common courses are provided, and if the courses do not have the keywords, the keywords can be extracted through tools such as TF-IDF and term-weighting, and the tools are mature, so that the keywords of the courses can be extracted and set well. Specifically, a TF-IDF (term-inverse document frequency) is taken as an example to explain how to extract course keywords, and is a common weighting technology for information retrieval and data mining, is commonly used for mining keywords in articles, and is simple and efficient in algorithm, and is often used for initial text data cleaning by industry. The TF-IDF has two meanings, one of which is "Term Frequency" (abbreviated as TF) and the other of which is "Inverse Document Frequency" (abbreviated as IDF). Suppose we now have a long term called "design of quantitative system architecture", and the words with high frequency are often stop words in the article, such as "what", "yes", etc., which are the most common words in the document but do not help the result, and are words to be filtered, and the TF can count the stop words and filter them. Only the remaining words with practical significance need to be considered after the high frequency words are filtered. But this will encounter another problem, and we may find that the three words "quantization", "system", "architecture" occur as many times as possible. This does not mean that, as keywords, their importance is the same? In fact, "system" should be common in other articles, so in keyword ranking, "quantization" and "schema" should be arranged in front of "system", which requires IDF, which gives less weight to common words, whose size is inversely proportional to how common a word is. When there is TF and IDF, the two words are multiplied to obtain the TF-IDF value of one word. The larger the TF-IDF of a word in an article, the higher the importance of the word in the article generally, so that the words ranked first are the keywords of the article by calculating the TF-IDFs of the words in the article and ranking from large to small.
Furthermore, the extraction of the course keywords is carried out one by one in a course, and the course keywords cannot be carried out in batches, so that the correctness is ensured, and the keyword setting of the course is carried out in a labeling mode after the extraction is finished.
Further, if the contents such as course introduction and the like are missing, the following steps can be performed: 1) if the video format is adopted, the characters of the subtitles in the video file can be extracted and arranged into a text format; 2) if the courseware in the PPT format exists, the characters in the PPT courseware can be extracted and arranged into a text format. The arrangement into the text format is convenient for extracting and processing the course keywords by using a keyword extraction tool.
And step 32, receiving the work report of the employee, and mining the employee keywords of the work report by adopting a keyword mining model.
Specifically, after the enterprise prepares the big data of the course library, the work reports of the enterprise employees are collected. Generally, businesses have a written summary and report that employees regularly make. Most of the summaries and reports are stored in the enterprise's database in the form of text as digitization is widespread today, and therefore this data is readily available. As a complementary example, if there are businesses that do not require employees to periodically provide summaries and reports, the employees may be attributed to match the mining of the employee keywords.
Further, the employee keywords include: work content keywords, post keywords, evaluation keywords, and indicator keywords. In practical situations, the post keywords and the work content keywords are mainly used, and the other two types of keywords only play an auxiliary role. According to the practical situation, in some small enterprises, due to the limitation of people, only the post keywords and the work content keywords may be adopted for matching, in large enterprises, four types of keywords may be adopted, and the post keywords and the work content keywords are taken as the main keywords, and the other two types of keywords are taken as the auxiliary keywords. In one embodiment, if the work content keyword, the position keyword, the evaluation keyword and the index keyword each have related employee keywords, after the lessons are subjected to similarity calculation, the lesson keywords of a plurality of lessons can correspond to the work content keyword, the position keyword, the evaluation keyword and the index keyword, and if only one lesson is selected for learning, the lessons corresponding to the work content keyword and the position keyword are selected with emphasis. Further, enterprises can also set weights for employee keywords according to different types (such as four types of work content keywords, position keywords, evaluation keywords and index keywords) and meanings (keywords of parts of speech, such as positive evaluation, negative evaluation, KPI values and the like), the weight settings of different enterprises are different, and the calculation process of the weights will be described in detail later.
Further, the evaluation keywords refer to keywords with parts of speech such as positive evaluation and negative evaluation; the index keywords refer to keywords related to the KPI indexes of the enterprise.
Furthermore, the four types of key information are mined in the work report of the staff, and the key information can comprehensively and fully provide information such as posts, foundations, requirements and the like of the staff. In order to mine the information, a serialization labeling method can be adopted, for example, a BERT model is adopted, the used pretrained model is sourced from a BERT Chinese version pretrained model of google, more accurate information can be mined, and the classification effect is good.
Referring to fig. 4, the specific process of mining the employee keywords of the work report by using the keyword mining model includes: the labeled corpus is used as a training sample set and input into a preset serialization labeling model, such as a BERT Chinese pre-training model, for training, so as to obtain data with keyword labeling, and the specific steps are as follows:
and S41, preprocessing the training sample set to obtain a BIO input data format required by the serialized annotation model.
Specifically, the BIO system is divided into labels "begin", "inside", and "outside", and each character segment of the original text data is labeled according to a sequence labeling format to obtain thousands of first corpora.
For example, "the completion rate of training of sales skill series reaches 90%, the incomplete objective" can be labeled as: "sell, B-job content", "sell, I-job content", "skill, I-job content", "energy, I-job content", "tie O", "cultivate, B-job content", "train, I-job content", "finish, B-job content", "finish, I-job content", "rate, I-job content", "reach O", "9, B-index", "0, I-index", "%, I-index", ", O", "not, B-evaluation", "finish, I-evaluation", "subject, I-evaluation".
S42, extracting the characteristics of each sequence by adopting a serialization labeling model to obtain the semantic characteristics of the text corresponding to each sequence;
it should be noted that a core module of the BERT model is a transform, a key part of the transform is an Attention mechanism, and is multi-layer Attention and position embedding, a common BERT model is 12-layer Attention and 24-layer Attention, and in this embodiment, a 12-layer Attention mechanism is adopted to extract a deeper text semantic feature.
S43, classifying the semantic features of the obtained text by using a softmax classifier, and training and testing the model to obtain output data in a BIO format;
and S44, carrying out post-processing on the output data in the BIO format to obtain the employee keywords.
Specifically, taking the example sentence as an example, the obtained employee keyword result is:
work content keywords: sales skills, training, completion rate;
index keywords: 90 percent;
evaluating keywords: the destination is not completed.
In addition, the post keywords are extracted from the post names, and some regular matching rules are used. Such as: and the XX district XX city XX branch Xth business district sales assistant obtains the post keywords as follows: business area, sales.
Furthermore, the post keywords may have special applications, and in some larger enterprises and groups, the same work content keywords may exist, but completely different learning requirements exist, and this situation occurs mainly due to different posts. For this case, the enterprise may use the position keywords to further screen the courses. For example: the keyword of "sales" appears in the work reports of employee A and employee B of a certain company, however, if the position keyword of employee A is "telemarketing" and the position keyword of employee B is "distribution", the same sales work obviously needs to recommend different learning courses. The more practical operation steps are then: and matching the posts on the basis of the course result matched with the work content, wherein the matching method is a calculation method of Jaccard similarity, and the calculation method of Jaccard similarity is further explained in the step 33.
And step 33, calculating the matching value of the course keyword and the employee keyword by adopting a similarity algorithm.
Specifically, the matching of the course keyword and the employee keyword means that each employee keyword is calculated with each course keyword, and a calculation method of Jaccard similarity may be used, for example
Figure BDA0002706791990000081
Further, for example, to illustrate the similarity calculation, if a keyword in the work report of a member of Zhao four is "finance";
x course keyword is "administration";
then, a is "financial", "business; b is "executive", "political";
then | a ≈ B | ═ 0; | a ═ u ═ B ═ 4; j (a, B) ═ 0;
the similarity is 0.
Similarly, if one of the keywords in the job report of the Zhao four employee is "finance";
the X course keyword is financial accounting;
then, a is "financial", "business; b ═ finance "," party ";
then | a |, n-B | ═ 1; | a ═ u ═ B ═ 3; j (A, B) ═ 0.33
The similarity was 0.33.
Both "0" and "0.33" are matching values for the course keyword and the employee keyword. Of course, if the employee keyword is "finance" and the course keyword is also "finance", then the similarity calculation has a match value of 1.
And step 34, automatically recommending courses with high matching values to the employees according to the matching values.
Referring further to the case of step 33, after the similarity calculation of step 33, if the matching values of the two lessons are "0" and "0.33", it is obvious that the lesson with the matching value of "0.33" is recommended to the four-Zhao employee.
Further, if there are many courses that can be matched by the four Zhao employees, we can set a threshold of a matching value in advance, that is, the threshold can be set to "0.33", and the course that is greater than or equal to "0.33" is the course recommended to the four Zhao employees, so as to implement an automated process.
Further, the entire cases are described with respect to steps 31 to 34.
Three employees have a work report: zhang III financial assistant: today, the work of reimbursement of travel in march by personnel in a group is completed, invoices of all people are collected and arranged, but due to bank system problems, the reimbursement of money of 10% of the personnel is not paid and is not completed on time.
Wherein, four types of keywords extracted from the work report are respectively as follows:
work content keywords: reimbursement work, invoices, reimbursement items, bank systems;
the post keywords are: a financial assistant;
the evaluation keywords are: not done on time- > negative evaluation;
the index keywords are: 10 percent.
Through analysis of the keywords of the staff, the information that the completion of the work of three reimbursement of the staff is not good enough and the staff may need to be trained can be obtained from the negative evaluation.
According to information mined from the report of Zhang Sanzhong of the employee, targeted course recommendation is carried out on the employee, a course of financial reimbursement related flow and handling is found from a course library, and the course comprises course keywords: the matching value of the course keywords and the employee keywords is calculated by adopting a similarity algorithm, the matching value is found to be high, and therefore the course is recommended to Zhang III.
The case not only has higher correlation between the work content keywords and the post keywords, but also plays an important auxiliary role in evaluating the keywords, and can further determine the courses of the employees who need to learn three emergencies.
Further, similar to the evaluation keywords, some enterprises may have more uniform KPI requirements for the index keywords. For example, the financial reimbursement completion rate is required to reach more than 98%. Then, index screening is performed on all employees who have keywords such as "financial reimbursement", and if the index is lower than 98%, the "financial reimbursement" is given a higher keyword weight similarly to "negative evaluation".
Referring to fig. 5, in an embodiment, before step 33 "calculating matching values of course keywords and employee keywords by similarity algorithm" after step 32 "collecting work reports of employees and mining employee keywords by using keyword mining model" further includes:
s51, receiving the weight of the set employee keyword;
and S52, selecting effective staff keywords according to the weights of the staff keywords.
Specifically, priorities can be set among the employee keywords, and after the employee keywords are extracted by adopting a keyword mining model, the weights of the employee keywords can be obtained by calculating with tools such as tf-idf or term-weighting.
Specifically, taking tf-idf as an example, the calculation formula of tf-idf is: tf-idf (word frequency divided by total word number of articles) divided by the number of documents containing the word, and taking the logarithm of the quotient.
For example, if the word "train" appears 3 times in a certain report of a certain employee, the report has a total of 30 words, then
Figure BDA0002706791990000101
The total number of reports for employees who had the word "training" was 1000, and the total number of employees was 2 ten thousand, then
Figure BDA0002706791990000102
The denominator +1 is to avoid the denominator being 0. Further, tf-idf is 0.1, 1.3 is 0.13, and 0.13 is a value of a weight.
According to the case in S21 in step 32: "the completion rate of the training of the sales skill series reaches 90%, the target is not completed", according to three words which we exemplify: the values of the calculated weights for "sales skill", "training", "completion rate", tf-idf are 0.35, 0.13 and 0.05, respectively, wherein 0.35 and 0.05 are the values of the calculated weights according to the hypothetical case, and the specific calculation process is not illustrated here. Then, the most important word is "sales skill" and the less important word is "training".
In this embodiment, the tf-idf tool is used to calculate the weight because tf-idf is used to calculate the importance of a word, and generally speaking, the higher the frequency of a word, the higher the importance, and the larger the number of documents in which the word appears, the lower the importance of the word. For example, the word "training" appears many times, but the work tasks of many employees have the word, so that the specificity of the word is not enough, the importance of the word is reduced, and effective employee keywords can be further selected.
In one embodiment, false keywords generated by keyword mining model errors can also be removed by calculating weights, where the false keywords refer to incomplete keywords, such as the keyword "financial assistant", and are shown as "financial assistant", and are incomplete keywords. When the user is calculated through tf-idf, due to the fact that the user is an error keyword, the weight is particularly low, the keywords with particularly low weights can be automatically checked, a threshold value is set, and the employee keywords corresponding to the weight values which are lower than the threshold value need to be automatically deleted.
In one embodiment, the employee keywords include position keywords, work content keywords, evaluation keywords, and indicator keywords, the position keywords and the work content keywords having a higher weight than the evaluation keywords and the indicator keywords.
Referring to fig. 6, in an embodiment, the step of calculating the matching value of the course keyword and the employee keyword by using the similarity algorithm includes:
s61, calculating similarity values of the course keywords and the employee keywords by adopting a similarity algorithm;
and S62, multiplying the employee keyword weight by the similarity value calculated by the similarity to obtain a matching value.
Specifically, the priority of course recommendation and learning is calculated according to the weight of the employee keyword and the two quantitative indicators of the similarity value of the course keyword and the employee keyword.
More specifically, the calculation scheme formed by combining the employee keyword weight and the similarity calculation is as follows: matching value is employee keyword weight similarity value.
Or taking the case in S21 in step 32 as an example, sequentially performing Jaccard similarity calculation on each keyword of the employee and each keyword of courses a and B, and only keeping the word with the maximum similarity, as shown in the following table:
TABLE 1 matching values
Course A Course B
Training 1*0.13 0.5*0.13
Marketing skills 0.5*0.35 1*0.35
Completion rate 0.3*0.05 0.6*0.05
From the results in Table 1, it can be seen that the highest matching value for both classes is "sales skill", and further that class B has a higher similarity value and the final result is more matched, so the class selected for recommendation here is class B.
From the above case, it can be seen that the keyword that best matches course a is "training", but the importance of this word is reduced due to the low weight of the keyword of "training", that is, we prefer to recommend a course containing the more important keyword "sales skill".
Referring to FIG. 7, in one embodiment, after completing a course match, perhaps a quarter, the enterprise needs to perform a course match again, and the matching process includes:
s71, collecting and analyzing the course learning condition of the employee, and recording the course which is not interested by the employee;
and S72, shielding courses which are not interested by the employee when the steps 31 to 34 are repeated.
The embodiment is an optimized technical scheme, and needs to collect behaviors of the staff after seeing the recommended course, such as whether to start a course, the time for learning the course, and the like, and then judge the appropriateness of course recommendation by analyzing the data, so as to further optimize the matching of the staff course. In practice, since the employee will be summarized every year or every quarter, when repeating steps 31 to 34, the system can automatically mask the courses that are not interested by the employee, and in the course matching process of the next round, the courses that are not interested by the employee will not appear, so as to further optimize the course matching method based on the NLP technology.
Furthermore, the courses which are already learned by the staff can be removed, and the repeated recommendation process is avoided.
In an alternative embodiment, it is also possible to: and uploading the result of the course matching method based on the NLP technology to a block chain.
Specifically, the corresponding summary information is obtained based on the result of the NLP-based course matching method, and specifically, the summary information is obtained by performing hash processing on the result of the NLP-based course matching method, for example, using the sha256s algorithm. Uploading summary information to the blockchain can ensure the safety and the fair transparency of the user. The user can download the summary information from the blockchain to verify whether the result of the NLP technology-based course matching method is falsified. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
As shown in fig. 8, the present invention further provides a course matching system based on NLP technology, which includes a course keyword mining module 20, an employee keyword mining module 30, a calculating module 40, and a matching module 50.
The course keyword mining module 20 is configured to obtain an enterprise course library, and obtain course keywords in the enterprise course library;
the employee keyword mining module 30 is configured to receive a work report of an employee, and mine an employee keyword of the work report by using a keyword mining model;
the calculating module 40 is configured to calculate matching values of the course keywords and the employee keywords by using a similarity algorithm;
and the matching module 50 is used for automatically recommending courses with high matching values to the employees according to the matching values.
Referring to fig. 9, in an embodiment, the course matching system further includes a weighting module 60, where the weighting module 60 is configured to receive the set weight of the employee keyword and select a valid employee keyword according to the weight of the employee keyword.
In one embodiment, in the calculating module 40, the step of calculating the matching value of the course keyword and the employee keyword by using the similarity algorithm includes:
calculating similarity values of the course keywords and the employee keywords by adopting a similarity algorithm;
and multiplying the employee keyword weight by the similarity value calculated by the similarity to obtain a matching value.
Referring to fig. 10, fig. 10 is a schematic structural diagram of an apparatus according to an embodiment of the present invention. As shown in fig. 10, the apparatus 200 includes a processor 201 and a memory 202 coupled to the processor 201.
The memory 202 stores program instructions for implementing the course matching method based on NLP technology described in any of the above embodiments.
The processor 201 is used to execute program instructions stored by the memory 202.
The processor 201 may also be referred to as a Central Processing Unit (CPU). The processor 201 may be an integrated circuit chip having signal processing capabilities. The processor 201 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a computer storage medium according to an embodiment of the present invention. The computer storage medium of the embodiment of the present invention stores computer readable instructions, which when executed by a processor implement the steps described above, that is, the computer storage medium stores a program file 301 capable of implementing all the methods described above, wherein the program file 301 may be stored in the storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.
The invention provides a course matching method, a system, computer equipment and a storage medium, wherein the course matching method comprises the steps of obtaining an enterprise course library and obtaining course keywords in the enterprise course library; receiving a work report of an employee, and mining an employee keyword of the work report by adopting a keyword mining model; calculating matching values of the course keywords and the employee keywords by adopting a similarity algorithm; and automatically recommending courses with high matching values to the employees according to the matching values. Therefore, the invention can automatically mine and analyze the information based on the existing data by applying the NLP technology, thereby automatically giving the personalized recommendation of thousands of courses. In addition, the whole system model is short in time consumption and suitable for enterprises of various sizes.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Claims (10)

1. A course matching method based on NLP technology is characterized by comprising the following steps:
acquiring an enterprise course library and acquiring course keywords in the enterprise course library;
receiving a work report of an employee, and mining an employee keyword of the work report by adopting a keyword mining model;
calculating matching values of the course keywords and the employee keywords by adopting a similarity algorithm;
and automatically recommending courses with high matching values to the employees according to the matching values.
2. The course matching method as claimed in claim 1, wherein said calculating the matching value between the course keyword and the employee keyword using the similarity algorithm further comprises:
receiving the weight of the set employee keywords;
and selecting effective staff keywords according to the weights of the staff keywords.
3. The course matching method of claim 2, wherein said employee keywords comprise position keywords, work content keywords, evaluation keywords, and indicator keywords, said position keywords and work content keywords having a higher weight than said evaluation keywords and indicator keywords.
4. The course matching method as claimed in claim 2, wherein the step of calculating the matching value of the course keyword and the employee keyword using the similarity algorithm comprises:
calculating similarity values of the course keywords and the employee keywords by adopting a similarity algorithm;
and multiplying the employee keyword weight by the similarity value calculated by the similarity to obtain a matching value.
5. The course matching method of claim 1, wherein said step of mining employee keywords of the job report using a keyword mining model comprises:
preprocessing a training sample set to obtain a BIO input data format required by a serialized labeling model;
extracting the characteristics of each sequence by adopting a serialization labeling model to obtain the semantic characteristics of the text corresponding to each sequence;
classifying the semantic features of the obtained text by using a softmax classifier, and training and testing the model to obtain output data in a BIO format;
and carrying out post-processing on the output data in the BIO format to obtain the employee keywords.
6. The course matching method as claimed in claim 1, wherein after automatically recommending the course with high matching value to the employee according to the matching value, uploading the course matching result to the blockchain, so that the blockchain encrypts and stores the course matching result.
7. A course matching system based on NLP technology, said course matching system comprising:
the course keyword mining module is used for acquiring an enterprise course library and acquiring course keywords in the enterprise course library;
the employee keyword mining module is used for receiving the work report of the employee and mining the employee keywords of the work report by adopting a keyword mining model;
the computing module is used for computing the matching value of the course keyword and the employee keyword by adopting a similarity algorithm;
and the matching module is used for automatically recommending courses with high matching values to the employees according to the matching values.
8. The course matching system of claim 7, further comprising a weighting module for receiving a set weight of the employee keyword and selecting a valid employee keyword based on the weight of the employee keyword; in the calculation module, the step of calculating the matching value of the course keyword and the employee keyword by using a similarity algorithm includes:
calculating similarity values of the course keywords and the employee keywords by adopting a similarity algorithm;
and multiplying the employee keyword weight by the similarity value calculated by the similarity to obtain a matching value.
9. A computer device comprising a memory and a processor, said memory having stored therein computer readable instructions which, when executed by said processor, cause said processor to perform the steps of the course matching method as claimed in any one of claims 1 to 6.
10. A computer storage medium having stored thereon computer readable instructions which, when executed by a processor, carry out the steps of the course matching method as claimed in any of claims 1-6.
CN202011041524.7A 2020-09-28 2020-09-28 Course matching method, system, computer equipment and storage medium Pending CN112052396A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011041524.7A CN112052396A (en) 2020-09-28 2020-09-28 Course matching method, system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011041524.7A CN112052396A (en) 2020-09-28 2020-09-28 Course matching method, system, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112052396A true CN112052396A (en) 2020-12-08

Family

ID=73604894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011041524.7A Pending CN112052396A (en) 2020-09-28 2020-09-28 Course matching method, system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112052396A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732892A (en) * 2020-12-30 2021-04-30 平安科技(深圳)有限公司 Course recommendation method, device, equipment and storage medium
CN112860851A (en) * 2021-01-22 2021-05-28 平安科技(深圳)有限公司 Course recommendation method, device, equipment and medium based on root cause analysis
CN113554316A (en) * 2021-07-26 2021-10-26 李园园 Staff training system based on Internet of things
CN116957870A (en) * 2023-09-18 2023-10-27 山西美分钟信息科技有限公司 Control method, device, equipment and medium for clinical skill assessment management system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706921A (en) * 2009-12-03 2010-05-12 上海一佳一网络科技有限公司 Intelligent curriculum matching system and method
CN108510307A (en) * 2018-02-25 2018-09-07 心触动(武汉)科技有限公司 A kind of course recommendation method and system
CN110060027A (en) * 2019-04-16 2019-07-26 深圳市一览网络股份有限公司 With the recommended method and equipment and storage medium of the matched career development course of resume
CN110377804A (en) * 2019-06-20 2019-10-25 平安科技(深圳)有限公司 Method for pushing, device, system and the storage medium of training course data
CN111160017A (en) * 2019-12-12 2020-05-15 北京文思海辉金信软件有限公司 Keyword extraction method, phonetics scoring method and phonetics recommendation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706921A (en) * 2009-12-03 2010-05-12 上海一佳一网络科技有限公司 Intelligent curriculum matching system and method
CN108510307A (en) * 2018-02-25 2018-09-07 心触动(武汉)科技有限公司 A kind of course recommendation method and system
CN110060027A (en) * 2019-04-16 2019-07-26 深圳市一览网络股份有限公司 With the recommended method and equipment and storage medium of the matched career development course of resume
CN110377804A (en) * 2019-06-20 2019-10-25 平安科技(深圳)有限公司 Method for pushing, device, system and the storage medium of training course data
CN111160017A (en) * 2019-12-12 2020-05-15 北京文思海辉金信软件有限公司 Keyword extraction method, phonetics scoring method and phonetics recommendation method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732892A (en) * 2020-12-30 2021-04-30 平安科技(深圳)有限公司 Course recommendation method, device, equipment and storage medium
WO2022142043A1 (en) * 2020-12-30 2022-07-07 平安科技(深圳)有限公司 Course recommendation method and apparatus, device, and storage medium
CN112732892B (en) * 2020-12-30 2022-09-20 平安科技(深圳)有限公司 Course recommendation method, device, equipment and storage medium
CN112860851A (en) * 2021-01-22 2021-05-28 平安科技(深圳)有限公司 Course recommendation method, device, equipment and medium based on root cause analysis
CN112860851B (en) * 2021-01-22 2022-05-06 平安科技(深圳)有限公司 Course recommendation method, device, equipment and medium based on root cause analysis
CN113554316A (en) * 2021-07-26 2021-10-26 李园园 Staff training system based on Internet of things
CN116957870A (en) * 2023-09-18 2023-10-27 山西美分钟信息科技有限公司 Control method, device, equipment and medium for clinical skill assessment management system
CN116957870B (en) * 2023-09-18 2023-12-22 山西美分钟信息科技有限公司 Control method, device, equipment and medium for clinical skill assessment management system

Similar Documents

Publication Publication Date Title
US20210382878A1 (en) Systems and methods for generating a contextually and conversationally correct response to a query
Singh et al. PROSPECT: a system for screening candidates for recruitment
CN112035653B (en) Policy key information extraction method and device, storage medium and electronic equipment
CN112052396A (en) Course matching method, system, computer equipment and storage medium
Bekkerman et al. High-precision phrase-based document classification on a modern scale
CN112632989B (en) Method, device and equipment for prompting risk information in contract text
CN102662930A (en) Corpus tagging method and corpus tagging device
Kiefer Assessing the Quality of Unstructured Data: An Initial Overview.
US20200074242A1 (en) System and method for monitoring online retail platform using artificial intelligence
US20150347489A1 (en) Information retrieval system and method based on query and record metadata in combination with relevance between disparate items in classification systems
Huang et al. Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow
CN110941702A (en) Retrieval method and device for laws and regulations and laws and readable storage medium
CN110705307A (en) Information change index monitoring method and device, computer equipment and storage medium
Tito et al. Icdar 2021 competition on document visual question answering
Shekhawat Sentiment classification of current public opinion on brexit: Naïve Bayes classifier model vs Python’s Textblob approach
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
Chung et al. Text-mining open-ended survey responses using structural topic modeling: a practical demonstration to understand parents’ coping methods during the COVID-19 pandemic in Singapore
KR102280490B1 (en) Training data construction method for automatically generating training data for artificial intelligence model for counseling intention classification
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
Aqel et al. A framework for employee appraisals based on sentiment analysis
US20210117448A1 (en) Iterative sampling based dataset clustering
Suprayogi et al. Information extraction for mobile application user review
Babvey et al. Content-based user classifier to uncover information exchange in disaster-motivated networks
CN110737749B (en) Entrepreneurship plan evaluation method, entrepreneurship plan evaluation device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination