CN117171428B

CN117171428B - Method for improving accuracy of search and recommendation results

Info

Publication number: CN117171428B
Application number: CN202310981457.4A
Authority: CN
Inventors: 时迎超; 王杨
Original assignee: Beijing Wangpin Information Technology Co ltd
Current assignee: Beijing Wangpin Information Technology Co ltd
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2024-04-05
Anticipated expiration: 2043-08-04
Also published as: CN117171428A

Abstract

The invention discloses a method for improving accuracy of search and recommendation results, and belongs to the technical field of data processing. The method comprises the following steps: s10, improving the data quality of the knowledge graph through double-chain data, and cleaning and improving the data accuracy of the knowledge graph by using a clustering method; s20, training by using a pre-training model, and integrating the knowledge of the job tree into the pre-training model in advance; s30, using a multitasking training mode to reduce the confusion degree of the pre-training model; and S40, recommending preferred job classes, including the most probable job class and similar job classes. In order to improve the retrieval and matching performance of JD and CV, the invention upgrades and reforms the system in terms of data quality improvement, marking model optimization, vector model optimization and the like.

Description

Method for improving accuracy of search and recommendation results

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a method for improving the accuracy of search and recommendation results, in particular to a method for improving the accuracy of search and recommendation results based on a knowledge graph.

Background

The technology and application value of big data are widely accepted, and a Knowledge Graph (knowledgegraph) of the future core technology is rapidly developed along with the application of huge companies of internet technology. Amazon uses big data to recommend commodity information to clients, and a comprehensive relationship between people and commodities is formed; microsoft develops a 'person cube', forms a person-to-person three-dimensional relationship, and truly realizes six-degree space search of people and people; the Baidu develops a Baidu brain, redefines a search engine in China, and provides a comprehensive expansion search result for a user; google has even long begun to think "take over the world" with big data, has developed internet search engine at the earliest, has opened the internet era, and has developed Google Brain based on this, has led to concept and technical popularization of the knowledge graph.

The knowledge graph is a knowledge base of graph structure, belonging to the category of knowledge engineering. Different from a common knowledge base, the knowledge graph fuses all disciplines, knowledge units with different sources, different types and different structures are linked into a graph, a knowledge system with wider and deeper is provided for a user based on metadata of each discipline, the knowledge system is continuously expanded, and the knowledge graph essentially comprises the steps of systemizing and relativizing the knowledge data of the field, and visualizing the knowledge in a graph mode. In short, the knowledge graph can be understood as a knowledge system established based on an information system, and the complex knowledge domain is systematically displayed through technologies such as data acquisition, data mining, information processing, knowledge metering, graphic drawing and the like, so that the dynamic development rule of the knowledge domain is revealed.

The job class is one of the most important information in recruitment industry, on-end job class information appears in the use or office flow of the user, and on-policy job class information is also an important ordering or recall policy. And different recruitment platform job trees are huge and have different contents, so that accurate understanding and memorizing of the job trees are particularly costly for users. Only 80% + of the users counted can understand and memorize the target job class for recruitment and select correctly from the huge job class tree. The prepared job classification task has great significance for improving the use efficiency of users, improving the quality of basic data and contributing to the characteristics on business.

For example, prior art, application number: CN202310528124.6 discloses a knowledge graph-based network hotspot information recommendation method, system and equipment, wherein the invention acquires hotspot event and decision information, and builds a network hotspot knowledge graph after keyword extraction and knowledge extraction; acquiring an emergency event, and extracting keywords and knowledge of the emergency event; according to the keywords, entity attributes and relations in the sudden-hot event and the network hot knowledge graph, carrying out keyword similarity, entity attribute similarity and relation similarity evaluation; and recommending the hot event and the decision information according to the keyword similarity, the entity attribute similarity and the relationship similarity obtained through evaluation. However, the existing similar technology has the following problems: the data quality cannot be controlled, the data quality is uneven, the data distribution cannot be controlled, and the data model and the actual data are often deviated; when the number of job classes is increased (more than 1380 conventional job positions are reached at present), the model iteration efficiency is low due to the increase of the target number; from the perspective of training targets, the models are difficult to fit accurately due to the similarity of part of job classes; from a training data perspective, most positions can be divided into multiple positions, with the property of multiple labels. Therefore, a more excellent data recommendation model is needed to solve the above problems.

Disclosure of Invention

Problems to be solved

Aiming at the problems in the prior art, the invention provides a method for improving the accuracy of search and recommendation results, and in order to improve the retrieval and matching performance of JD and CV, the invention upgrades and reforms the system in the aspects of data quality improvement, marking model optimization, vector model optimization and the like.

Technical proposal

In order to solve the problems, the invention adopts the following technical scheme.

A method for improving the accuracy of search and recommendation results comprises the following steps:

s10: the data quality of the knowledge graph is improved through double-chain data, and the data accuracy of the knowledge graph is cleaned and improved through a clustering method;

s20: training by using a pre-training model, and integrating the training with the knowledge of the job tree in advance;

s30: the multitask training mode is used, so that the confusion degree of the pre-training model is reduced;

s40: preferred job classes are recommended, including most probable job classes and similar job classes.

The method for improving the accuracy of the search and recommendation results,

the double-chain data in the step S10 are grouped by using the occurrence frequency of the keywords in the basic data;

the weight formula of the packet of double-stranded data described in step S10 is as follows:

W _pf (i)＝pf _i *idf _i /if _i ；

in which W is _pf (i) Weight value, pf representing the ith group of keywords _i Representing the occurrence frequency of the keywords of the ith group, idf _i If representing the ratio between the number of groups of the above-mentioned structured double-stranded data and the number of groups of the above-mentioned unstructured double-stranded data _i Representing the inverse frequency.

The method for improving the accuracy of the search and recommendation results,

if in step S10 _i The calculation method of (2) is as follows:

wherein N represents the total number of basic data, df _i Representing the number of occurrences of the i-th group of keywords in the base data.

The method for improving the accuracy of the search and recommendation results,

the clustering method in step S20 is as follows:

carrying out characteristic representation processing on the structured double-chain data and the semi-structured double-chain data, wherein the characteristic representation processing needs to carry out the following algorithm processing on the weight value of the i-th group of keywords:

wherein P (S) represents the probability of distribution of the weight values of the keywords of all groups, wherein S represents the total sequence of weight values of the keywords of all groups, wherein w _i (1.ltoreq.i.ltoreq.n) represents the sequence number of the i-th group keyword.

The method for improving the accuracy of the search and recommendation results,

the method of pre-training the model described in step S20 is as follows:

the double-chain data which is structured after the feature representation processing and the semi-structured double-chain data are sent to an NLP service center;

finally, the NLP service center optimizes the BP network model used by the double-chain data represented by the screened characteristics conforming to the rules to obtain a first entity relationship;

the optimization algorithm using the BP network model is as follows:

wherein G is _i Represents the optimized first entity relationship degree value, wherein N represents the sum of statistics of all groups of keywords, and P _i ⁿ The probability of distribution of the weight values representing the keywords of all groups.

The method for improving the accuracy of the search and recommendation results,

the manner of integration in step S20 is as follows:

and optimizing the scheduling layer, establishing a model by using a max-min mathematical algorithm, and determining each single objective function to obtain the blended scheduling benefit.

The method for improving the accuracy of the search and recommendation results,

the multitasking training method described in step S30 is as follows:

each single objective function is determined, including a first objective function, a second objective function, a third objective function, and a fourth objective function.

The method for improving the accuracy of the search and recommendation results,

the first objective function is F ₁ (q _v1 )＝(q _v11 -q _v0 )/q _v12 Wherein q is _v0 、q _v1 、q _v11 Qv ₁₂ Planning task values for different periods;

the second objective function isWherein F is _i (q _vi ) The calculation mode is to adopt a weight coefficient transformation method and assign weight for the reference flow value;

the third objective function is V _i，j+1 ＝V _i，j +(Q _i，j -q _i，j -Q _lossi，j ) Wherein V is _i，j Data volume calculated for the jth time period ith cloud, where V _i，j+1 Data volume calculated for the (j+1) th time period (i) th cloud, where Q _i，j Data entry for the jth time period, the ith cloud, where q _i，j Data leakage calculated for the jth time period ith cloud, where Q _lossi，j The data loss amount calculated for the ith cloud in the jth time period;

the fourth objective function isWherein E is _sm Energy value calculated for cloud, where V _m，T Is the effective storage capacity of the scheduling period, wherein gamma _m，T And (3) calculating the number and the total number of m in the T period by using the data quantity of the scheduling period, wherein m and T are calculated by using the cloud.

The method for improving the accuracy of the search and recommendation results,

the algorithm recommended in step S40 is as follows:

of the formula, wherein FD _Qk Represents a recommended quantization complexity level value, where d _kij Knowledge-graph data representing a recommended kth set of components in column and row, where p _ki Complexity value of knowledge-graph data in column of the k-th set of recommended components, where p _kj Complexity values representing knowledge-graph data in the row direction of the proposed kth component set.

Advantageous effects

Compared with the prior art, the invention has the beneficial effects that:

the double-chain data is used for improving the data quality, the clustering method is used for cleaning the data, the data accuracy is enhanced, the pre-training large model is integrated with the hierarchical characteristics, the iteration efficiency is improved, and the confusion is reduced by adopting a multi-task joint training mode. The invention realizes the following functions: converting the monitoring of the GPU into the monitoring of the CPU; the point-to-point connection of the api and the model process is realized, and the load balance is realized; the flow requirement of 3 times of the spring station can be met; the method can synchronously expand and other services, and further improve the service performance problem.

Drawings

FIG. 1 is a flow chart of a method for improving accuracy of search and recommendation results according to the present invention;

FIG. 2 is an interface diagram of a post of a method of improving accuracy of search and recommendation results according to the present invention;

FIG. 3 is a diagram of a model calculation, schematically illustrated as a financial accounting position, of one method of improving accuracy of search and recommendation results of the present invention;

FIG. 4 is a diagram illustrating a resume position for a financial accounting position as an example of a method for improving accuracy of search and recommendation results according to the present invention;

FIG. 5 is a JD dimension total feature map of a method of the present invention for improving accuracy of search and recommendation results;

FIG. 6 is a CV dimension overall characteristic diagram of a method for enhancing the accuracy of search and recommendation results according to the present invention;

FIG. 7 is a keyword recognition flowchart of NLP of a method for improving accuracy of search and recommendation results according to the present invention;

FIG. 8 is a keyword sample of a method of improving accuracy of search and recommendation results according to the present invention;

FIG. 9 is a keyword cluster map of a method of improving accuracy of search and recommendation results according to the present invention;

FIG. 10 is a scoring criteria diagram of a method of improving accuracy of search and recommendation results according to the present invention;

FIG. 11 is a diagram showing an example of JD and CV inputs performed to achieve output scoring according to the scoring criteria described above in a method for improving accuracy of search and recommendation results;

FIG. 12 is a diagram of one sample example of a method of improving the accuracy of search and recommendation results in accordance with the present invention;

FIG. 13 is a vector model diagram of a method for improving accuracy of search and recommendation results according to the present invention;

FIG. 14 is a vector model result diagram of a method of improving accuracy of search and recommendation results according to the present invention;

FIG. 15 is a diagram of a system architecture employed in one method of the present invention for improving the accuracy of search and recommendation results;

FIG. 16 is a diagram of a deployment framework of a system employed by the method of the present invention for improving accuracy of search and recommendation results;

FIG. 17 is a diagram of a physical architecture of a system employed in a method of enhancing accuracy of search and recommendation results in accordance with the present invention;

FIG. 18 is a content resolution diagram of the outcome of one method of the present invention for improving the accuracy of search and recommendation results;

FIG. 19 is a diagram of a resume (CV) effect in a method for improving accuracy of search and recommendation results according to the present invention;

FIG. 20 is a diagram of a resume (CV) effect in a method for improving accuracy of search and recommendation results according to the present invention;

FIG. 21 is a diagram showing a Job Description (JD) effect in a method for improving accuracy of search and recommendation results according to the present invention;

fig. 22 is a diagram showing a Job Description (JD) effect in a method for improving accuracy of search and recommendation results according to the present invention.

Detailed Description

The invention is further described below in connection with specific embodiments.

Example 1

As shown in fig. 1, the method for improving the accuracy of the search and recommendation results comprises the following steps:

s10: and improving the data quality of the knowledge graph through double-chain data, and cleaning and improving the data accuracy of the knowledge graph by using a clustering method.

It should be noted that, for structured data and semi-structured data, the present invention adopts a completely different manner from unstructured data.

In the prior art, the following is often adopted: iterative training is performed using BiLSTM (bidirectional long short term cyclic neural network) and CRF (conditional random field) knowledge extraction models in NPL. Wherein both the BiLSTM knowledge model and the CRF model have defects. The method and the system utilize a mature NLP service center to try to extract knowledge from the speech segments of scientific documents after the processes of word segmentation, part-of-speech tagging, syntactic analysis, semantic analysis and the like by using an NLP technology, then convert sentences described by natural language into a form which can be understood by a computer through knowledge representation, and store the sentences in a knowledge base. Knowledge extraction systems are divided into two major parts: one part is natural language processing and the other part is knowledge extraction. The natural language processing mainly analyzes related contents from the language perspective, and comprises 8 large modules of sentence segmentation, automatic word segmentation, part-of-speech tagging, word meaning tagging, syntactic analysis, sentence meaning analysis, sentence segment analysis and language analysis, wherein the first 4 modules are foundations, the syntactic analysis and the sentence meaning analysis are cores, and the sentence segment analysis and the language analysis are extensions. In the operation process of the 8 modules, a keyword library, a probability dictionary, a semantic dictionary, a syntax rule, a domain narrative list and a domain ontology class 6 resource are required to be supported. The knowledge extraction system based on NLP adopts MVC as a design pattern, and Java is adopted for system realization for object-oriented programming; the object-oriented database adopts ObjectStore, and the relational database adopts Oracle; the automatic word segmentation adopts a maximum vector matching algorithm, the part-of-speech tagging adopts a maximum probability algorithm, the grammar analysis adopts an LR analysis algorithm, and the semantic analysis adopts predicate logic; the system interface adopts XML.

The method for improving the accuracy of the search and recommendation results,

W _pf (i)＝pf _i *idf _i /if _i ；

Further, in the method for improving accuracy of search and recommendation results described above, if in step S10 _i The calculation method of (2) is as follows:

Further, the structured double-chain data and the semi-structured double-chain data are subjected to NLP feature representation processing, wherein the weight value of the ith group of keywords is required to be subjected to the following algorithm processing during feature representation:

The method for improving the accuracy of the search and recommendation results, disclosed by the invention, further comprises the following steps:

s20: training is carried out by using a pre-training model, and the training is integrated into the pre-training model in advance by combining the knowledge of the job tree.

The method for improving the accuracy of the search and recommendation results,

the method of pre-training the model described in step S20 is as follows:

the optimization algorithm using the BP network model is as follows:

The method is also one of the creation points of the application, the BP neural network is generally applied to the modeling direction, the MSE in a projection algorithm is optimized, and the comparison of the relationship degree values of two entities is added. Compared with the traditional iterative training by using a BiLSTM+CR F knowledge extraction model in NLP, the method has the advantages that the processing effect is improved by about 12.4%, and a group of data values can be obtained in about 24 hours. In the process of optimizing a projection algorithm, the method is based on a nonlinear diffusion filtering principle, a nonlinear scale space is constructed by adopting a rapid display diffusion scheme, a data projection profile structure is obtained, the feature extraction has scale invariance, and the profile corner point of a projection block is extracted according to the gray level difference between the projection to be detected and the neighborhood circle pixels in the data projection domain and the scale domain. And finally, calculating a feature description vector by adopting a FREAK algorithm, searching matching points of the projection image according to an epipolar constraint criterion, and accurately extracting and matching contour corner points of the obstacle.

The method for improving the accuracy of the search and recommendation results,

the manner of integration in step S20 is as follows:

the method for improving the accuracy of the search and recommendation results,

the multitasking training method described in step S30 is as follows:

The method for improving the accuracy of the search and recommendation results,

third stepThe objective function is V _i，j+1 ＝V _i，j +(Q _i，j -q _i，j -Q _lossi，j ) Wherein V is _i，j Data volume calculated for the jth time period ith cloud, where V _i，j+1 Data volume calculated for the (j+1) th time period (i) th cloud, where Q _i，j Data entry for the jth time period, the ith cloud, where q _i，j Data leakage calculated for the jth time period ith cloud, where Q _lossi，j The data loss amount calculated for the ith cloud in the jth time period;

The database can be optimized by the following method:

constructing a big data generation countermeasure network cycle D2GAN, which comprises two big data generators and four big data discriminators, namely a small sample generator G, a big sample generator F, a small sample discriminator D1s, a small sample discriminator D2s, a big sample discriminator D1b and a big sample discriminator D2b;

constructing big data to generate an optimized objective function of the countermeasure network, and respectively carrying out iterative training on the two generators and the four discriminators based on the optimized objective function so as to train and obtain a small sample generation parameter model;

wherein the training of the small sample generator G and the training of the small sample discriminators D1s and D2s are a set of challenge processes, and the training of the large sample generator F and the training of the large sample discriminators D1b and D2b are a set of challenge processes.

In the above method for improving accuracy of search and recommendation results, the algorithm recommended in step S40 is as follows:

Specifically, in order to improve user experience and data intensive management of a knowledge graph, a cloud platform is additionally arranged, wherein the cloud platform comprises a user login unit, an identity library, a display unit, a processor, a data grabbing unit, a deflection data analysis unit, a data collection unit and a data temporary storage unit; the user login unit is used for inputting identity information and corresponding key information of the user, and standard identity information of an approved user and corresponding approved key information of the approved user are stored in the identity library; the user login unit is used for transmitting the identity information and the corresponding secret key information to the processor, and the processor is used for carrying out equipment verification processing on the identity information and the secret key information by combining the identity library to generate a pass signal or an equipment error signal; the processor drives the display unit to display that the used equipment is not trusted and check when generating equipment error signals; when the processor generates an error initial signal, the processor drives the display unit to display an identity key error, please verify; the processor is used for carrying out data grabbing on the identity information by utilizing the personal database when the passing signal is generated; the data collection unit is used for collecting access information groups formed by a plurality of access information of users, wherein the access information is specifically access content of the users when the users access websites; the data collection unit is used for transmitting the access information group to the data temporary storage unit for storage by combining the corresponding identity information; the deflection data analysis unit is used for carrying out data analysis on the access information groups stored in the data temporary storage unit and the corresponding identity information thereof to obtain all the sequence access information corresponding to the identity information; the data grabbing unit is communicated with the Internet and used for acquiring information of the Internet in real time; the deviation data analysis unit is used for transmitting the sequence access information to the personal library, and the processor is used for recommending the identity information by combining the sequence access information in the personal library and the data grabbing unit. The above description is to enhance access to the cloud platform after intensive processing by individuals.

Example 2

Formal application

As shown in fig. 2, the interface diagram of the job publication of the product is verified, and the job category, the industry requirement and the academic experience can be easily set.

FIG. 3 is a model calculation diagram of the present invention, illustrated by way of example as a financial accounting position; fig. 4 is a resume position presentation view of the present invention, taking a financial accounting position as an example.

As shown, taking the accounting job as an example, the contents are as follows:

1. the accounting system is responsible for collecting financial reports of company products, logging in an accounting book, archiving and reporting tax;

2. the financial affairs and business related matters can be well matched and processed;

3. coordination with other departments;

4. completing other work of temporary delivery;

5. the financial software such as friends, golden butterfly and the like is used by the skilled user.

Meanwhile, the models of the financial accounting posts confuse the number of job classes:

first class job features

Secondary job features

Three-level job features.

FIG. 5 is a JD dimension summary feature map of the present invention; FIG. 6 is a CV dimension overall characteristic diagram of the present invention.

Next, as shown in fig. 5 and fig. 6, the id/cv information submitted by the current user includes a situation that the job name or the tertiary job class does not coincide with the job description, where such information may be that the user maliciously swipes a bill to reduce the successful release rate of such low-quality id/cv, and the running of the low-quality job/resume checking process is aimed at improving the overall quality of the platform id/cv.

Based on the existing JD and CV libraries, the invention firstly obtains the threshold values of the low-quality JD and CV through data statistics, designs the screening flow of the low-quality JD and CV, then eliminates the low-quality data from the sample library in a manual auxiliary data script mode, and clears obstacles for subsequent machine learning. The check scope includes JD and CV generated by end users on line B, C at the commit and issue node.

The correlation X is equal to the similarity between the job name and the job description;

x1=the minimum value that can determine that JD/CV misses a low-quality tag, i.e., when x < x1, it is directly determined as normal JD/CV;

x2=can determine that JD/CV hits the maximum value of low quality tags, i.e., when x < x2, it is determined directly as low quality JD/CV.

As shown in fig. 5 and 6, JD & CV content understanding preferably supports [ push B vector recall experiments ], in both the offline and real-time versions of the established schemes, there are the following binning requirements:

scheme one: offline model

Content understanding this period output content: the fall-table hive is needed, in particular, the JD/CV understanding output specification, where the part of the elevation priority is used for offline model training of B-level d/C whole vectors.

Scheme II: real-time model

As shown in fig. 7, a keyword recognition flowchart of NLP in the marking process is illustrated.

FIG. 8 is a keyword sample of the present invention; FIG. 9 is a keyword cluster map of the present invention; fig. 10 is a scoring standard chart of the present invention.

The application of np position description keywords (hereinafter abbreviated as np keywords) is expanded on the end and strategy, and nlp keywords are further optimized to obtain expected benefits based on current capabilities: nlp keyword accuracy improves by 10%, other results are as follows: the accuracy of top3 is 77.4% and the accuracy of top10 is 66%.

Vector model diagrams are shown in fig. 13 and 14, which provide vectors for vector training using a double-tower model, training a double-tower structure using query-title data, and calculating using a title tower in JDCV understanding. And carrying out vectorization representation on the words extracted by the word model. Packaging into batch data allows model parallel computing to improve performance. The model structure is as follows:

1. the title and description information of the job position or the work experience are adopted to be tiled and fed into the model.

2. And encoding the input by using a pre-training model BERT to obtain a chapter vector.

3. The chapter vectors are softmax multi-classified and the penalty is calculated.

4. The final output result is the probability distribution of the job corresponding to the current input, and the probability is the most selected from the probability distribution

The large job class enters a post-processing flow, and the post-processing is carried out on the key job class to be used as a final output.

FIG. 15 is a system architecture diagram of the present invention; fig. 16 is a deployment framework of the present invention.

As shown in fig. 15, the system architecture: the system is divided into three layers from a technical platform layer, a business service layer and an end layer based on the existing micro-service system. The mobile terminal is divided into a C terminal, a B terminal, a sales terminal, a management terminal, an operation terminal and other independent apps according to the group of users. The server performs new expansion and optimization according to new requirements on the basis of the existing architecture, and meets the basic public services of front-end data service, information encryption, privacy protection, authority authentication and the like. The basic platform is based on the existing number bin and machine learning platform, the marked data is imported into a training system, and continuous training is carried out through a new sample model and attributes. And the trained model is released to a verification environment for AB test, and the model is optimized and adjusted through the feedback test effect. And universal interfaces such as a face recognition service interface, a sesame credit interface and the like of a third party are uniformly packaged on a basic platform layer, and service interfaces of other projects and product line body standards can be used for serving the project. And the expandability and maintainability of the system are improved.

FIG. 17 is a physical architecture diagram of the present invention; as in fig. 17, physical architecture: the system is deployed in a unified cloud environment of a company, and additional service nodes are added for meeting the requirements of testing and gray level release of the system while multiplexing service resources of the early-stage system. The automatic capacity reduction and expansion function is required to be provided by an internal operation and maintenance management system in the peak period of business.

Fig. 18 is a content analysis chart of the results of the present invention, meanwhile, in fig. 18, the content analysis of the unstructured field of the core: the content is as follows: deep understanding is carried out on unstructured fields of the core, keywords are extracted, weight calculation and vector characterization are carried out, and the keywords comprise three levels of job classes, job names, company names, skill keywords and the like; application: the method is used for downstream searching, recommending and other scenes, is applied to recall and sequencing layers, improves head results, and enlarges recall quantity.

Content association

Relationship processing is carried out on the analyzed content, and unstructured information analysis of the non-core is supplemented:

and KG capability is perfect:

the content is as follows: and constructing a long-term sustainable KG information production link by combining the analyzed information, and perfecting KG, such as company aliases, skills, schools, professions, job classes and the like.

Application: KG perfection

And (3) checking information:

the content is as follows: and judging the consistency of the same-dimension characteristics, such as the consistency of the idtitle and the id three-level job class, the consistency of the jd academic requirement and the academic extracted from jd description, the consistency of the job name class and the work content in the history work experience of the user cv, and the like.

Application: (1) And (3) optimizing the order in the strategy or the model, and improving the head result to (2) filling guidance, error correction reminding and the like of the service end.

Content mining

Based on the first two working stages, the method can combine the historical behavior data of JD/CV and other chat contents to make progress mining, such as:

quality evaluation:

the content is as follows: the quality of d and cv content is evaluated, such as from the comprehensive dimensions of filling content specification, integrity, update time and frequency, chat content, etc., eg: jd job-seeking risk assessment, cv black yield assessment, cv & id content quality assessment (e.g., identifying low-quality cv, etc.), etc

Application: (1) Recall limit, sort downright, or old flow control of low-quality, high-wind, dangerous id & cv in a policy or model: (2) Id & cv competition assessment of service end, content rewriting guide and the like

Preference mining and prediction:

the content is as follows: such as factory preferences, competition preferences, stability preferences, etc. in id recruitment; distance preference of cv, city 0 preference, factory preference, etc

Application: (1) Preferences are applied in policies or models that promote job-seeking path prediction for the positive chain (2).

As one of the largest domestic specialized recruitment service platforms, intelligent joint recruitment has collected a lot of JD and CV data at present, and has performed semantic analysis and sample labeling of NLP based on the text content of JD and CV, and through NL P analysis and machine learning, an artificial intelligence technology is applied to the fields of resume search and job recommendation, etc., but from the viewpoint of the current operation effect, there is still a problem that the job prediction rate and the standard recall rate of the system are always lower than those of the bid.

The present system has already carried out NLP analysis about jd, but its semantic analysis is not accurate enough, especially for three-level job understanding also has great ambiguity and error, including job name, company name, skill keywords, etc., resulting in the downstream in recall and ordering under the scene, accuracy decline.

The inconsistent knowledge graph for jd and cv results in very low accuracy of front-end search and recommendation algorithms, especially on special posts (three-level job classes) in specific industries, which is more serious.

The current unstructured data mining is insufficient based on jd and cv communication scenes, and the mining and analysis of key information such as chat frequency, chat content, black yield, matching degree and the like are insufficient, so that the waste and the idling of data assets are caused.

In order to improve the application value of the data asset, improve the NLP analysis accuracy and improve the consistency of the knowledge graph, the invention plans to upgrade and reform the current NLP and KG (knowledgegraph) so as to improve the retrieval efficiency and the matching accuracy.

Project goal

The following project targets are expected to be realized through means of KG upgrading, labeling and learning of incremental data samples, NLP algorithm optimization and the like:

a) CV job prediction index

the top1 quasi-recall reaches 90.4% (top 1 quasi-recall=total number of times/total number of valid samples that the job prediction top1 result appears in the labeling job set);

the top4 accuracy reaches 96.8% (top 4 accuracy = total number of times that the job prediction top4 result appears in the labeling job set/total number of valid samples);

the hit rate of user selection reaches 86.4% (hit rate of user selection = total number of times/total number of valid samples that user selected secondary job appears in the set of labeling job classes);

the hit class prediction top1 ratio of the user selection reaches 60.1 percent (the total number of times/total effective sample number of occurrence of the three-level class in-class prediction top1 result selected by the user);

the hit class prediction top4 of the user selection reaches 80.7 percent (the total number of times/total effective sample number of the occurrence of the three-level class in-class prediction top4 result selected by the user);

b) JD job class prediction index:

the top1 quasi-recall reaches 92.7% (top 1 quasi-recall=total number of times/total number of valid samples that the job prediction top1 result appears in the labeling job set);

the top4 accuracy reaches 98.3 percent (top 4 accuracy = total number of times/total number of valid samples that the job prediction top4 result appears in the labeling job set);

the hit rate of user selection reaches 89.7% (hit rate of user selection = total number of times/total number of valid samples that user selected tertiary job class appears in the set of labeling job classes);

the hit class prediction top1 of the user selection reaches 68.0 percent (the total number of times/total effective sample number of occurrence of the three-level class in-class prediction top1 result selected by the user);

the user selects hit class prediction top4 to be 87.6% (the total number of times/total number of valid samples that the user selects three-level class in-class prediction top4 results appear).

As can be seen from a comparison of the schematic results of fig. 19 and 20, the resume prediction effect is good; meanwhile, as can be seen by combining the schematic comparison results of fig. 21 and 22, the matching effect between the resume and the job position is good.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims

1. The method for improving the accuracy of the search and recommendation results is characterized by comprising the following steps:

s40: recommending preferred job classes, including most probable job classes and similar job classes;

the double-chain data in step S10 are grouped by using the occurrence frequency of the keywords in the basic data;

W _pf (i)＝pf _i *idf _i /if _i ；

in which W is _pf (i) Weight value, pf representing the ith group of keywords _i Representing the occurrence frequency of the keywords of the ith group, idf _i If representing the ratio between the number of groups of structured double-stranded data and the number of groups of unstructured double-stranded data _i Represents the inverse frequency;

wherein if in step S10 _i The calculation method of (2) is as follows:

wherein N represents the total number of basic data, df _i Representing the occurrence times of the ith group of keywords in the basic data;

the clustering method in step S10 is as follows:

wherein P (S) represents the probability of distribution of the weight values of the keywords of all groups, wherein S represents the total sequence of weight values of the keywords of all groups, wherein w _i Representing the serial number of the i-th group of keywords, wherein i is more than or equal to 1 and less than or equal to n;

in the formula, the method of pre-training the model in step S20 is as follows:

the optimization algorithm using the BP network model is as follows:

wherein G is _i Represents the optimized first entity relationship degree value, wherein N represents the sum of statistics of all groups of keywords, and P _i ⁿ A distribution probability of weight values representing keywords of all groups;

the manner of integration in step S20 is as follows:

optimizing a scheduling layer, establishing a model by using a max-min mathematical algorithm, and determining each single objective function to obtain blended scheduling benefit;

the multi-task training method in step S30 is as follows:

determining each single objective function, including a first objective function, a second objective function, a third objective function and a fourth objective function;

wherein the first objective function is F ₁ (q _v1 )＝(q _v11 -q _v0 )/q _v12 Wherein q is _v0 、q _v1 、q _v11 Qv ₁₂ Planning task values for different periods;

the second objective function isWherein F is _i (q _vi ) For the reference flow value, the calculation mode is to adopt a weight coefficient transformation method and assign weight,

the third objective function is V _i，j+1 ＝V _i，j +(Q _i，j -q _i，j -Q _lossi，j ) Wherein V is _i，j Data calculated for the jth time period ith cloudIn an amount of V _i，j+1 Data volume calculated for the (j+1) th time period (i) th cloud, where Q _i，j Data entry for the jth time period, the ith cloud, where q _i，j Data leakage calculated for the jth time period ith cloud, where Q _lossi，j The data loss amount calculated for the ith cloud in the jth time period;

the fourth objective function isWherein E is _sm Energy value calculated for cloud, where V _m，T Is the effective storage capacity of the scheduling period, wherein gamma _m，T The method comprises the steps that data quantity in a scheduling period is calculated, wherein m is the number of the cloud computing m in a T period, and T is the total number of the cloud computing m in the T period;

the algorithm recommended in step S40 is as follows:

of the formula, wherein FD _Qk Represents a recommended quantization complexity level value, where d _kij Knowledge-graph data representing a recommended kth set of components in column and row, where p _ki Complexity value of knowledge-graph data in column of the k-th set of recommended components, where pk _j Complexity values representing knowledge-graph data in the row direction of the proposed kth component set.