CN113886562A

CN113886562A - AI resume screening method, system, equipment and storage medium

Info

Publication number: CN113886562A
Application number: CN202111169078.2A
Authority: CN
Inventors: 孙红升; 蒋华; 刘建华; 邢继风; 王超; 姚凯
Original assignee: Zhilian Wuxi Information Technology Co ltd
Current assignee: Zhilian Wuxi Information Technology Co ltd
Priority date: 2021-10-02
Filing date: 2021-10-02
Publication date: 2022-01-04

Abstract

The invention discloses an AI resume screening method, which comprises the following steps: inputting the plain text content of all resumes into an improved BERT model; the BERT model adopts a multi-layer Transformer architecture; arranging a teacher classifier behind the transform stack; connecting a student classifier behind each layer of Transformer architecture of the BERT model; calculating the classification confidence of the output result of the student classifier of each layer of the Transformer; when the classification confidence coefficient is higher than a preset threshold value, continuously operating the next layer of Transformer, and repeating the previous step; and when the classification confidence coefficient is lower than a preset threshold value, directly outputting the result. The screening method enables the recruitment website to perform preliminary screening on massive resume information contents input by job seekers simply and quickly, suspected low-quality resumes are screened out intelligently through a neural network, the calculation speed is high, the model parameter scale is small, and operability and convenience are provided for next-step accurate screening of the low-quality resumes.

Description

AI resume screening method, system, equipment and storage medium

Technical Field

The invention relates to the field of online recruitment, in particular to a method, a system, equipment and a storage medium for completing intelligent resume screening by utilizing a neural network model.

Background

Compared with the traditional offline recruitment, the online recruitment has great advantages in the aspects of convenience and information transparency. General recruitment websites or APP clients store a large amount of resume information of job seekers, and the resume information of the job seekers forms a huge job hunting data database along with the increasing of newly added job seekers. Under the general condition, different resume templates are designed for job seekers to fill in by the recruitment website according to the breadth and the level of information required by enterprises, so that resume filling behaviors of the job seekers are standardized to a great extent. However, in order to highlight the unique experience of the job seeker and facilitate the recruiter to comprehensively know the character and experience of the job seeker, all the recruitment websites can keep a certain proportion of the content autonomously filled by the job seeker in the resume filling process, such as modules for self-evaluation, experience evaluation, feature evaluation, project introduction and the like. In practice, some job seekers find that in the resume entry, some abnormal behaviors exist, such as mass pasting of contents irrelevant to job hunting, inputting of sensitive or low-custom phrases, and the like. These resumes, if pushed to the recruiter, can cause a waste of valuable time or psychological discomfort for all the recruiters of the enterprise that view the resume. And the recruitment website can leave an unprofessional or untight audit impression for the enterprise, so that a dual-output situation is created. Therefore, for the recruitment website, it is necessary to perform technical preliminary screening on a large number of job-hunting resumes, delete or intercept most of low-quality resumes, and create a good recruitment environment for a large number of enterprises.

In the prior art, a resume quality judgment system based on machine learning is disclosed, which includes the following steps: step 1: acquiring a target resume text from a background, and storing the target resume text in a resume database document; step 2: text preprocessing is carried out through the resume database documents, Chinese word segmentation is used for constructing a data type format for the resume text, and word vectors to be extracted are screened. Step 2 comprises the following substeps: step 2.1: constructing a data type format by performing special proper nouns through Chinese word segmentation, and classifying resumes through the special proper nouns; step 2.2: carrying out category marking on skill mastering degree, experience, proper nouns and seniority; step 2.3: screening the label categories; and step 3: extracting text features, extracting a marked noun, matching a database of preset word vectors to form a quality judgment word library and form a talent judgment standard class label; and 4, step 4: marking word vectors from documents with known quality, counting word frequency, training parameter models according to existing word vectors and marked categories, obtaining data characteristic dimension relation between learning word vectors and labeled categories and lightweight training models, verifying accuracy of the models on an unclassified resume document library, and counting learning efficiency of the models and accuracy on a test set. And 5: and obtaining a judgment result of the resume quality through the training model, and feeding the judgment result back to the background. However, this method focuses on the comparison between the words used in the new resume and the key words extracted in advance, and further analyzes the word matching degree to obtain the judgment of the quality of the resume. In fact, the method does not focus on screening low-quality resumes, but screens high-quality resumes meeting preset requirements. In addition, the method utilizes the neural network model to compare the matching degree, but does not select, optimize and adjust the neural network aiming at the purpose of screening the low-quality resume, and the modeling and training of the neural network are not suitable for the requirement of screening the low-quality resume and can not complete the corresponding function.

The second prior art discloses an integrated resume information extraction method based on machine learning and fuzzy rules, which comprises the following steps: extracting characteristics of a resume, a suspected resume and a non-resume text, and screening the resume text through characteristic information; secondly, performing word frequency statistics on the screened resume texts to obtain common keywords and generate fuzzy matching rules; segmenting the resume text by using a fuzzy matching rule, sampling and checking a segmentation result, and verifying the accuracy of segmentation; step four, aiming at the data distribution characteristics of each block after the resume text is segmented, extracting the specific information of the resume text by adopting a fuzzy matching rule, a sequence marking or a classification method; and step five, correcting the unreasonable result by using a correction strategy and outputting the structured resume information. The method of the first step specifically comprises the following steps: 101. labeling resume, suspected resume and non-resume samples; 102. extracting the characteristics of the 3 types of samples by using a characteristic extractor, and training a classification model; 103. inputting text into the model, and outputting one of the resume, the suspected resume and the non-resume. The method of the second step specifically comprises the following steps: 201. dividing words of the resume text content according to blank spaces, counting word frequency, and screening candidate keywords from Top N of the word frequency; 202. and generating fuzzy rules according to the candidate keywords. The method is based on a machine learning natural language processing technology, firstly, the characteristics of a resume, a suspected resume and a non-resume text are extracted, then, the keywords are counted, fuzzy rule matching is used for partitioning the resume text, then, different information extraction or classification methods are adopted according to the data distribution characteristics of all the partitions, and finally, the structuring of the resume data is completed. The method provides some solutions for low-quality patents, but actually is a resume content extraction method, and solves the problems that the resume information extraction process is easily interfered by invalid information, the word segmentation granularity is too small, and the entity reference is unknown. Meanwhile, the method utilizes the neural network model to divide the resume into modules, but the optimization of the model and the training method is not carried out aiming at the screening of the low-quality resume, and the object output by the model is the structured resume and does not particularly contribute to the screening of the low-quality resume.

The AI resume screening method mainly aims to screen out the main content and the core part of the resumes, and does not utilize technical means to realize the purpose of filtering the low-quality resumes of the online recruitment websites. Therefore, a method for screening massive patents of an online recruitment website is needed at present, so that the recruitment website can initially screen resume information contents input by a job seeker by using a neural network model, suspected low-quality resumes are filtered out, and operability and convenience are provided for next low-quality resume accurate screening and targeted measures.

Disclosure of Invention

In order to solve the problems, the invention creatively provides a new screening mode for carrying out low-quality filtering on the mass resumes of the online recruitment website, and improves the prior AI resume screening mode according with the characteristics of the recruitment website.

The invention provides an AI resume screening method, which comprises the following steps:

(a) extracting a certain number of resume of job seekers to be screened; (b) acquiring text content data of all resumes; (c) screening and primarily processing the text data to obtain all resume optimized plain text contents; (d) inputting the plain text content of all resumes into an improved BERT model; (e) the BERT model adopts a multi-layer Transformer architecture; (f) a teacher classifier (teacher classifier) is arranged behind the transform stack; (g) connecting a student classifier (student classifier) behind each layer of Transformer architecture of the BERT model; (h) calculating the classification confidence of the output result of the student classifier of each layer of the Transformer; (i) when the classification confidence coefficient is higher than a preset threshold value, continuing to operate the next layer of Transformer, and repeating the step h; (j) when the classification confidence coefficient is lower than a preset threshold value, directly outputting the result; (k) and judging whether the resume meets the requirements or not according to the output result.

Further, the method comprises a training method of the BERT model, (1) in sample acquisition, the detection range of text content is increased, the acquisition and detection are carried out on the resume character input part, wherein the resume character input part comprises a long text and a short text, and the resume marking sample must cover the special texts and simultaneously process the two special text structures; (2) firstly, extracting text content data of a training set resume by using a data acquisition system, screening and primarily processing the data, and then directly carrying out BERT model training by using the text contents of all resumes; (3) aiming at the models containing long samples, the output is of three types, namely normal, too simple and meaningless; for models containing short samples, the outputs are of two types, normal and meaningless respectively.

Further, in the training method, the balanced sample collection process further includes: (1) because the number of the positive samples is far greater than that of the negative samples, random down-sampling treatment of the positive samples is adopted for sample balance, the sampling ratios are respectively 1: 1, 1: 2, 1: 5 and 1: 10, and then the sampled samples are respectively verified; (2) because the number of the negative samples is small, the sampling process of the negative samples is carried out by adopting an interpolation up-sampling method; (3) because the negative samples contain a part of texts which are randomly input by the user by using various input methods, the negative samples with small quantity are required to be identified through a model, a training sample is constructed by constructing the negative samples through the positive samples, and the method is realized by randomly disordering the front and back sequence of the positive samples with certain length after word segmentation and word segmentation or randomly extracting words with a certain proportion to form a new negative sample; (4) when the negative sample type is insufficient, a manual generation method is adopted for the full-type negative sample, the nonsense parts for each project of the resume are defined from the corpus, and the selected nonsense words are randomly intercepted and combined; (5) transforming the samples according to manual or model screening and marked actual low-quality resume samples to construct new training negative samples with similar contents; (6) in the total sample collection, the long text category accounts for 80%, the short text category accounts for 20%, and the redundant short text is discarded.

Further, in steps (h) and (g), the original BERT model is called a stem, and each circumscribed student classifier is called a branch, where the student classifiers are derived by self-distillation from the teacher classifier at the last layer of the stem.

Furthermore, the self-distillation method is a knowledge distillation method, a trained instructor network is used for guiding the training of a student network, the two network tasks are the same and can achieve the same purpose, and the instructor network parameter setting and calculation process is more complex than that of the student network; only updating trunk parameters in the pre-training and fine-tuning stages, freezing the trunk parameters after fine-tuning, ensuring that the learning knowledge in the pre-training and fine-tuning stages is not influenced, distilling the probability distribution of the trunk classifier by using a branch classifier, and fitting the distribution of the trunk classifier by using only the branch classifier.

Further, in step (g), the student classifier is used for calculating a final class probability of outputting each layer according to the output data of the transform of the layer; the student classifier is provided as a neural network including three fully connected layers and one self-attention mechanism layer, and converts output values of multi-classification into probability distributions ranging between [0, 1] and 1 in sum by a Softmax function.

Further, the system also comprises (1) a 0-2 th layer transform connected student classifier which is set to comprise three fully connected layers and a neural network of a self-attention mechanism layer arranged between the first fully connected layer and the second fully connected layer; (2) the student classifier connected with the 3 rd-5 th layer of transform is set to comprise four layers of fully-connected neural networks and two layers of self-attention mechanism layers arranged between the first fully-connected layer and the second fully-connected layer and between the third fully-connected layer and the fourth fully-connected layer; (3) the 6 th-10 th layer transform connected student classifier can be set to comprise two layers to three layers of fully connected neural networks and one layer of self-attention mechanism neural network; (4) and dynamically adjusting parameters, the number of layers and the ratio of the network according to the loss value and the effect in the training process, wherein the parameters to be adjusted comprise the ratio of the size of a head to the ratio of dropout in the multi-head attention mechanism, the ratio of the number of layers, the size and the ratio of the dropout of a hidden layer, the maximum vector length of position coding, the type of a pooling layer and the number of fully-connected layers.

Further, in step (h), the confidence level is measured by outputting a result uncertainty measurement formula, the result uncertainty measurement value is low, and the classifier confidence level is high, wherein the formula is as follows:

wherein, Ps is the output result of the student classifier, and N is the number of classification categories.

In addition, the invention also discloses a system for AI screening of resume, which comprises the following steps: (a) the resume text acquisition module is used for extracting a certain number of resume of job seekers and acquiring text content data of all the resumes; (b) the resume text screening and primary processing module is used for screening and primary processing the text data to obtain all resume optimized plain text contents; (c) the AI neural network module comprises a BERT model, wherein the BERT model adopts a multi-layer Transformer architecture, a teacher classifier (teacher classifier) is arranged behind the stack of the transformers, and a student classifier (student classifier) is connected behind each layer of the Transformer architecture of the BERT model; (d) a confidence calculation module: the system is used for calculating the classification confidence of the output result of the student classifier of each layer of the Transformer; (e) resume quality judgment module: the system is used for judging whether the resume meets the requirements according to a preset rule, and when the classification confidence coefficient is higher than a preset threshold value, continuing to operate the next layer of Transformer and recalculating the confidence coefficient of the next layer; and when the classification confidence coefficient is lower than a preset threshold value, directly outputting the result.

And, an apparatus, characterized in that it comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to perform the AI resume screening method of any of the preceding claims.

And a computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, performing the AI resume screening method according to any one of the preceding claims.

The invention has the following beneficial technical effects:

1. focusing on screening and filtering of low quality calendars. The conventional AI resume screening mode mainly extracts core information and key words of resumes, and does not pay attention to filtering and screening of low-quality resumes. However, in recent years, it has been found through our practice that contents completely inconsistent with the purpose of resume delivery, such as large-segment meaningless pasted contents, a large number of repeatedly copied phrases or short sentences, unintelligent words, sensitive words, etc., are increasingly appearing in resumes input by job seekers. The resumes can cause time waste and psychological discomfort of enterprise recruiters browsing the contents, and can also cause the recruiting websites to leave an inexpert or untight impression on the enterprise, thereby causing a double-output situation. According to the invention, the neural network model is used for autonomous screening to enable the recruiter to obtain normal resume information, the primary screening of the mass job-seeking resumes is technically carried out, most of low-quality resumes are deleted or retained, the accuracy rate is basically equal to that of manual screening, and a good recruitment environment is created for vast recruitment enterprises. Meanwhile, the job seeker with malicious input irrelevant information can be marked in the later period, and the occurrence frequency of the malicious input is reduced. The time for acquiring the information of the recruiter is shortened, the information acquisition efficiency is improved, and a large amount of meaningless information cannot enter the qualified resume database.

2. A new method for low quality summary AI screening is presented. The method is dependent on the index of illegal word database, if there is no corresponding word in the database, it can not match, and it can not screen out the word which is not qualified. Such as some newly appearing sensitive words or short words, etc. However, the neural network method adopted by the user embodies the characteristic of strong autonomous supervised learning, and after targeted training is carried out, the network model can judge whether the resume has problems according to 'self' cognition, so that the judgment redundancy is improved to a great extent. In addition, the traditional method has no distinguishing capability for large-section copy of the meaningless content, such as pasting a large number of ancient poems in the self-evaluation part, and then, such as repeatedly copying and pasting the same section of the meaningless or meaningless content, and the like, and the traditional method has no way for this, because the words in the ancient poems are unlikely to be all put into the comparison database as sensitive words. However, the neural network can be closer to the judgment level of ordinary people through continuous self-learning and continuous sample training, and has certain discovery capability for emerging problems.

3. The BERT model was creatively modified. On the basis of a standard BERT model, a classifier is accessed behind each layer of Transformer, confidence degree calculation and judgment are carried out after each layer of Transformer is operated, once the confidence degree is lower than a certain threshold value, the judgment is considered to be finished and a result is directly output, the advantages of different layers of the BERT network on text and semantic judgment are fully excavated, the whole judgment process can be finished only through a few layers of transformers, all 12 layers of transformers do not need to be run out every judgment, and the screening time is greatly saved. In addition, the invention also improves the model training method and the training parameters, improves the accuracy of the model, even realizes that each layer of student classifier adopts different network structures, adjusts the parameters in a targeted manner, and can help the network to finish the judgment work only when the network runs a shallow network. No matter the input resume has more meaningless content or insufficient content or the input content is meaningful, the filtering can be carried out through the neural network, so that various requirements of a recruiter and a website party are met, and the filtering efficiency and the screening accuracy are effectively improved.

In summary, the method solves the problem that in the conventional AI resume screening method, key word matching is excessively focused, meaningless input is ignored or input content is forbidden to influence the overall quality of the resume, utilizes an excellent training method and network structure setting, gives full play to the advantages of a BERT model in the resume screening field, accurately judges out low-quality resumes, directly presents the resumes with complete core and important information to a recruitment party, greatly improves the acquisition efficiency of enterprises, avoids wasting a large amount of time on filtering the meaningless information, and focuses on the most core requirement in an online recruitment scene.

Drawings

FIG. 1 is a schematic diagram of the AI resume screening method of the invention comprising steps and flow;

FIG. 2 is a schematic diagram of the operation of the AI resume screening method of the invention;

FIG. 3 is a schematic diagram of the AJ resume screening method using neural network architecture according to the present invention;

FIG. 4 is a schematic diagram of the system of the present invention.

Detailed Description

The following embodiments of the present invention will be described in detail with reference to the accompanying drawings and embodiments, which are implemented on the premise of the technical solutions of the present invention, and it is to be understood that the specific embodiments described herein are only used for explaining the embodiments of the present invention, and do not limit the present invention. It should be further noted that, for convenience of description, only some structures, not all structures, relating to the embodiments of the present invention are shown in the drawings.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Referring to fig. 1 to 3, the present invention provides an AI resume screening method, including: (a) extracting a certain number of resume of job seekers to be screened; (b) acquiring text content data of all resumes; (c) screening and primarily processing the text data to obtain all resume optimized plain text contents; (d) inputting the plain text content of all resumes into an improved BERT model; (e) the BERT model adopts a multi-layer Transformer architecture; (f) a teacher classifier (teacher classifier) is arranged behind the transform stack; (g) connecting a student classifier (student classifier) behind each layer of Transformer architecture of the BERT model; (h) calculating the classification confidence of the output result of the student classifier of each layer of the Transformer; (i) when the classification confidence coefficient is higher than a preset threshold value, continuing to operate the next layer of Transformer, and repeating the step h; (j) when the classification confidence coefficient is lower than a preset threshold value, directly outputting the result; (k) and judging whether the resume meets the requirements or not according to the output result.

The low-quality AI resume screening method mainly aims at the governance that resume information input by job seekers does not accord with standard text content. For example, in the work description or self evaluation of the resume, a job seeker may write some texts unrelated to the work description, including non-civilized words, sensitive words, advertising words, contact ways, nonsense texts and the like, and the purpose of the project is to screen out the resumes, grade the quality of the resumes, eliminate resumes which do not meet the requirements seriously, or remind a user to correct the corresponding texts, so that the user experience of the job seeker and a recruiter on a platform is improved. On the other hand, black products and gray products can be attacked, and the situation that the user acquires information related to fraud and causes unnecessary loss is avoided.

The screening idea of the invention is different from the traditional AI resume screening idea. The traditional method focuses on comparison between patents to be screened and high-quality patents, and resumes with high matching degree are considered to tend to be high-quality resumes. The result is often a greater percentage of the noise being filtered, including a greater number of actually normal resumes in the problematic resumes being filtered.

In the existing practice, a word frequency screening method is created. The method comprises the steps of collecting all resumes in an existing database as samples, and traversing and segmenting the contents of all samples. In this way, almost all words that may appear in the resume leave traces in the word frequency cloud database as comparison objects. In the invention, the neural network is used as a discrimination tool, and the neural network model judgment belongs to a model method (artificial intelligence/machine learning), and the problem is processed based on the strong fitting capability of the current neural network and the computing capability of a machine.

The full term of BERT is a transform-based bi-directional encoder characterization in which a bi-directional representation model can simultaneously utilize information of both preceding words and following words when processing a word. The source of this bi-directionality is that BERT, unlike conventional language models, does not predict the most likely current word given all preceding words, but rather randomly masks some words and predicts with all the unmasked words. The core process of BERT is very compact, and it first extracts two sentences from a data set, where the probability that the second sentence is the next to the first sentence is 50%, so that the relationship between the sentences can be learned. Second, the relation inside the sentence can be learned by randomly removing some words from the two sentences and requiring the model to predict what these words are. And finally, transmitting the processed sentences into a large-scale Transformer model, and simultaneously learning the two targets through two loss functions to finish training.

Google issued a bidirectional Transformer-based large-scale pre-training language model BERT in 2018, the pre-training model can efficiently extract text information and be applied to various NLP tasks, and the research refreshes the current optimal performance records of 11 NLP tasks by means of the pre-training model. Through practical tests, various NLP tasks can achieve a good effect only by fine adjustment of a small amount of data, and BERT also becomes a genuine backbone network.

For the resume identification special purpose, the BERT model and the corresponding multi-layer Transformer are used, the multi-layer Transformer basic framework of the BERT model is not changed in the application of the invention, but in the specific application, aiming at the characteristics of inputting the resume by an online recruitment website and the special purpose of low-quality resume screening, the existing BERT model is improved and optimized, and the method mainly focuses on three aspects: (1) because the positive and negative samples are unbalanced, special processing is performed on the samples of the website input resume, such as random down-sampling of the positive samples, interpolation up-sampling of the negative samples, generation of the negative samples and the like; (2) aiming at the task of processing the resume text, the operation flow and the specific network structure of the BERT model are improved, different branch classifiers of each transform layer are added except for the last layer, the judgment result can be obtained quickly, and the judgment time of the model is shortened; (3) aiming at the characteristics of recruitment website resume recognition service, the BERT model in training is adjusted by different parameters to adapt to the sample specificity of website input resumes, and the scale of the model can be reduced to a certain extent.

In a preferred embodiment, a software system is used to extract the text content data of the resume, and the data is filtered and primarily processed. The system is generally equivalent to a data development processing platform, can well use SQL to perform batch and flow integrated processing, and supports popular big data components, such as Pulsar, Redis, HBase, Elsatic search, and Druid. Developers only need to master the skill of SQL development, and the method has the advantages of high development demand speed, simple task submission, automatic task resource allocation and resource scheduling, automatic operation optimization and monitoring and the like. Of course, other software systems with basic extraction functions can also be fully applied to the present invention to accomplish the corresponding functions. The preliminary screening of the text is mainly to remove some obvious and unreasonable sample contents according to preset rules, such as special characters and line feed symbols without text meaning, and auxiliary words and language words in a preset index table. After the initial screening, the optimized plain text content can be obtained, in practice, the optimization methods are various and have emphasis, the effect overall difference is not large, the invention does not focus on the specific method and steps, and therefore, the related tasks can be completed by adopting any improved system.

Although BERT works well for NLP, it requires a very large amount of computation, and the designer of the model also indicates in his paper that only 15% of words can be predicted at a time, so the model converges very slowly, and pre-training also requires a lot of computation. BERT, a pre-trained language model, has proven to be very high performance. However, in many practical scenarios they are often computationally expensive, as such heavy models are difficult to implement easily with limited resources.

In order to guarantee the performance of the model and improve the efficiency of the model, a novel speed-adjustable improved BERT with adaptive inference time is provided. Under different requirements, the reasoning speed can be flexibly adjusted, and meanwhile, the redundant calculation of samples is avoided. In addition, the model adopts a unique self-distillation mechanism during fine adjustment, so that the model has higher calculation efficiency and minimum performance loss. It can be accelerated by 1 to several times than BERT if a speed-performance tradeoff is made given different acceleration thresholds.

The BERT model consists of a multi-layer neural network (usually default is 12 layers, and there are 24 layers), and the more the number of layers, the more parameters, the better the data distribution can be fitted. The neural network mainly stores parameters of the neural network, and comprises two parts, wherein the first part is weight, the second part is deflection bias, and for a BERT model, the total quantity of the parameters is more than 1 hundred million, and the storage space for storing the parameters and intermediate parameters during training is about at least 1.2G.

Because the BERT model is trained based on a deep neural network, the deep neural network comprises huge parameters, the trained model is as high as 1.2G, and the model is compressed in order to reduce the space occupied by the model and shorten the reasoning time of the model on a CPU and a GPU.

The following compression effects are only provided according to the resume collection rules of our company: (1) and (4) model size. The size of the model is reduced from the previous 1.2G to 399M, and the storage space for deployment is greatly reduced; (2) and (4) reasoning time. Under the premise of not losing the precision, 4631 pieces of data calculate the average inference time of a single piece of data: under the CPU, the inference time is reduced from 0.1706 to 0.0588, and is reduced by 65.53 percent; under GPU, the inference time is reduced from 0.0162 to 0.0125, which is reduced by 22.84%.

The model of BERT is that for Chinese, the default architecture is 12 layers of transformers, and the 12 layers of transformers can do many functions, such as text classification, question and answer tasks, real-time naming recognition and the like, but for simple tasks such as text classification, the 12 layers of BERT models belong to simple tasks, and the 12 layers of BERT models are small in size, so that the 12 layers of BERT models do not need to be run for full 12 layers to give conclusions, and fewer layers of transformers can still complete given classification tasks and have good effects, so that when the shallow network parameters are enough to simulate the data distribution of the classification tasks, the BERT models are considered to be sufficient. Naturally, the number of layers is reduced, and the parameters are naturally reduced, and the confidence is satisfactory. That is, the shallow network gives a conclusion that is substantially identical to the conclusion reached by running through layer 12.

Therefore, the BERT network architecture is improved, except for the last layer, a classifier is independently connected behind each transform layer, and whether the calculation process can be directly ended is judged by calculating and acquiring a confidence coefficient value of the classifier. If the confidence of the shallow classifier is high, the model does not need to continue to be moved later, and the inference time is shortened.

Finally, we need to judge whether the resume belongs to a suspected low-quality patent according to a preset confidence threshold. When the classification confidence coefficient is higher than a preset threshold value, continuously operating the next layer of Transformer, and repeating the previous calculation step h; when the classification confidence coefficient is lower than a preset threshold value, the result is directly output, namely the resume to be screened, the overall confidence coefficient of which is lower than a certain preset threshold value, is automatically judged to be suspected low-quality resumes and marked, and in the following processing procedure, special personnel can further accurately screen and judge the suspected low-quality resumes.

Because the system is primarily screened through the BERT network, the labor intensity of workers in subsequent procedures is greatly reduced, the treatment of the low-quality resume is more targeted, the content of the resume is purified, and the quality of the resume is improved. The following table shows the effect of a certain test, which is only used to qualitatively illustrate the screening effect that can be obtained by the present invention due to different collection standards and modes, different sample numbers and different sample types.

The model training effect is as follows:

the actual effect after the online is as follows:

resume module	Description of the work	Self-evaluation	Item description	Name of school
					Rate of misjudgment	0.45％	0.42％	0％	1.01％

The model effect is as follows:

resume module	Name of item	Company name	Name of job	Name of specialty
					Rate of misjudgment	1.91％	1.08％	6.38％	2.06％

The accuracy of the normal samples indicates how many samples of the normal samples are predicted correctly, and the accuracy of the abnormal samples is the same as the accuracy of the normal samples; the accuracy of the normal resume shows that the prediction shows how much of the normal resume is a real normal resume, and the accuracy of the abnormal resume is the same as the accuracy of the normal resume; the recall of the normal resume shows how many normal resumes in the sample are predicted to be normal resumes, and the accuracy of the abnormal resumes is the same as the accuracy of the abnormal resumes; f1-score of the normal resume represents the harmonic mean of precision and recall, which is used for overall measurement of precision and recall, and f1-score of the abnormal resume is the same as the harmonic mean; the closer the above index is to 1, the better the model effect.

Furthermore, the invention also comprises a training method of the BERT model, wherein (1) in sample acquisition, the detection range of text content is increased, and the acquisition and detection are carried out on the resume character input part, wherein the resume character input part comprises a long text and a short text, and the resume marking sample must cover the special texts and simultaneously process the two special text structures; (2) firstly, extracting text content data of a training set resume by using a data acquisition system, screening and primarily processing the data, and then directly carrying out BERT model training by using the text contents of all resumes; (3) aiming at the models containing long samples, the output is of three types, namely normal, too simple and meaningless; for models containing short samples, the outputs are of two types, normal and meaningless respectively.

Every industry has the particularity of every industry text, the styles of enterprises in the industry are different, and the website input resume of the inventor is basically manually described, so that fixed optional items are few. Of course, some text types can input long text or general length text, and some text types are limited in that the subject text content is short. Therefore, the model is required to simultaneously process special text structures such as long texts, short texts and/or long and short mixed texts, some long texts are meaningless in whole, some long texts may hide texts which are partially not in accordance with the specification, and some short texts are meaningless in themselves but belong to non-specification input under specific items, and all the texts are specificity. To find and discriminate these specificities, we can analyze well in combination with context semantics, and we have text covering these specificities exactly in the resume annotation sample. Because the samples that we recognize are all from the user input, the user input consistently shows diversity, which results in a wide variety of samples, and some samples cannot find the hidden non-compliant content without looking carefully, which also poses a challenge to the selection of the samples. In order to simulate such diversity as much as possible, we will consciously design the fraction of sample types in order to be as suitable as possible for discriminating the actual input samples.

Therefore, the collected plain text content can be added to almost all text resume contents such as work description, self-evaluation, project description, school name, professional name, character feature, job hunting expectation and the like from the simple work description and self-evaluation. This is mainly to exclude some filling content related to fixed formats and check-out classes. In practice, other text input parts may be included, which are determined mainly according to the specification of the resume template. Firstly, extracting text content data of the resume by using a self-developed DataMax system or other existing systems, increasing detection of the text content, adding almost all text resume contents such as work description, self-evaluation, project description, school name, professional name, character characteristics, job hunting expectation and the like from the previous simple work description and self-evaluation, screening and primarily processing the data, and then directly carrying out model training by using the text contents of all the resumes.

In the conventional business, three types of input work description, project description and self introduction comprise models of long samples, and three types of output are normal, too simple and meaningless respectively. For the model with short samples like character description and school name, the output is two types, namely normal and meaningless. In previous sample collections, only the collection of long texts was usually of interest, and short texts were prone to screening by other methods, or simply not screened. Through a series of experiments, under the condition that the content of an input item is very limited, if the collected items are reduced, the result of the neural network model calculation is easy to have unexpected deviation, the neural network is required to receive enough sample types and contents to better play a role, and if only similar sample contents exist, the screening of low-quality resumes cannot be accurately finished.

Class-imbalance (class-imbalance) refers to the maldistribution of classes in the training set used in training the classifier. For example, a two-classification problem is solved, wherein 1000 training samples are ideal, and the number of positive-class samples and the number of negative-class samples are not much different; and if 995 positive class samples and only 5 negative class samples exist, it means that class imbalance exists. In practice, the class imbalance on the data set is not enough to require special processing, and the effect of the model trained without processing on the verification set is also needed. For low quality profile screening, the absence of low quality profile samples is a very significant image of the decision. Therefore, we try to avoid the class imbalance problem.

Sampling is a relatively simple and common method of resolving sample distribution imbalances, including both over-sampling and under-sampling. Under-sampling, which is used when the number of samples is large, is also called under-sampling and under-sampling. The method realizes sample balance by reducing the number of samples of a plurality of types of samples in classification, and the most direct method is to randomly remove a plurality of types of samples to reduce the scale of the plurality of types of samples, and has the defect that important information in the plurality of types of samples is lost. Aiming at the defect, the sampling proportion is respectively 1: 1, 1: 2, 1: 5 and 1: 10, and then the method for verifying the sampled samples respectively makes up the defect, thereby avoiding important information loss caused by a single sampling method and proportion.

Oversampling, the use of a small number of samples, also called upsampling, over-sampling. The method realizes sample balance by increasing the number of a few types of samples in the classification, the most direct method is to simply copy the few types of samples to form a plurality of records, and the method has the defect that overfitting can be caused if the characteristics of the samples are few; the improved over-sampling method generates new synthesized samples by adding random noise, interference data in a few classes or by a certain rule, such as the SMOTE algorithm. In summary, the present algorithm synthesizes new samples for a small number of classes based on "interpolation".

From the perspective of training the model, if the number of samples in a class is small, then the "information" provided by this class is too small. Empirical risk (average loss of model over training set) minimization was used as the learning criterion for the model. Assuming a loss function of 0-1loss (which is a typical loss function of equal cost), the optimization objective is equivalent to error rate minimization, i.e., accuracy maximization. Consider the extreme case: of the 1000 training samples, there were 999 positive samples and 1 negative sample. After a certain iteration is finished in the training process, all samples are classified into positive classes by the model, although the negative class is mistaken, the loss caused by the misclassification is negligible, the accuracy is 99.9%, and therefore, under the normal condition, optimization is not necessary, so that the training can be carried out until now, but the final result is that the model does not learn how to distinguish a few classes of negative samples. For this reason, the number of different negative samples needs to be increased "artificially". At present, two methods are mainly adopted to increase the number of negative samples. The negative sample is manufactured by modifying the positive sample, for example, after the positive sample with a length larger than a certain length is divided into characters and words, the front and the back of the positive sample are randomly disordered to construct the negative sample with disordered word order and semanteme, or a certain proportion of words are randomly extracted to form a new negative sample. Secondly, reforming the existing negative sample and creating a new negative sample, for example, circling the meaningless words for each project of the resume from the words in the existing corpus, randomly intercepting and combining the selected meaningless words to construct a new negative sample, for example, words, network phrases, colloquialisms, ancient poems, ancient languages and the like which do not appear or appear with extremely low frequency in the positive sample can be selected in the corpus; transforming the samples according to manual or model screening and marked actual low-quality resume samples to construct new negative samples with similar contents, which is equivalent to the simulation creation of the actual negative samples; for the low-quality profile with the missing type, a training negative sample is created in a manual mode, for example, the negative sample type never encountered in the actual screening can be created directly by manual work, so as to make up for the defect of insufficient actual sample type.

In actual operation, the sample collection is not performed in a completely random manner, because the content of the resume input has some characteristics, and the sampling method and the sampling manner are determined according to the characteristics. In contrast, under trial and error, our sampling focuses on the collection of long text content, assisted by the collection of short text. In the total sample collection amount, the long text classification accounts for 80%, the short text classification accounts for 20%, the short texts are more than the long texts in terms of category and entry quantity, and the homogeneity of the used words is high in specific description, so that the redundant short texts can be abandoned finally, and the core is to reach a long and short text mixed sample set with a preset proportion, so that the training effect tends to be in an actual condition and is more accurate.

The classical BERT model backbone consists of three parts: an embedding layer, an encoder, and a teacher classifier. The structure of the embedding layer and the encoder conforms to the original structure of BERT, where the number of execution layers per sample varies according to its complexity. The invention describes a sample-level adaptive mechanism, and a student classifier is added. The original BERT model may be referred to herein as a trunk (Backbone) and each classifier as a Branch (Branch). Taking a batch of inputs (batch size 4) as an example, 0 and student classifier 0 infer their labels as probability distributions and calculate the uncertainty of the individual. The cases with lower uncertainty are immediately removed from the batch, while the cases with higher uncertainty are sent to the next layer for further inference.

In step (g), the Self-Distillation (Self-Distillation) method may also be referred to as Knowledge Distillation (Knowledge Distillation), which is to use a trained instructor classifier to guide training of a student classifier, where the two classifiers have the same task and can achieve the same purpose, and the parameter setting and calculation process of the instructor classifier is larger and more complicated than that of the student classifier. Note that the student classifiers (branch classifiers) are distilled from the last-layer classifier. The invention requires separate training steps for the backbone and the student classifier. In principle, a parameter in one module is always frozen when another module is trained. The training of the model comprises three parts: stem pre-training, whole stem fine tuning, and self-distillation of student classifiers. One is pre-training of the main skeleton model, which is the same as the traditional BERT model; then fine-tuning (fine-tuning) the entire backbone model, where the instructor classifier is trained; and finally distilling the instructor classifier to the student classifier. The pre-training and fine-tuning processes are not distinguished from the classical BERT model, the differences mainly being concentrated in the self-distillation section (self-distillation). The biggest difference between the self-distillation mode and the traditional distillation mode is that a teacher model and a student model are the same in one model, and the traditional mode usually needs to design the student model independently. Specifically, only the trunk parameters are updated in the pre-training and fine-tuning stages, the trunk parameters are frozen after fine-tuning, and the probability distribution of the trunk (instructor) classifier is distilled by a branch (student) classifier. It is worth noting that the trunk is frozen during the distillation, mainly to ensure that the knowledge learned during the pre-training and fine-tuning stages is not affected, and that the student classifier is used only to fit the distribution of the instructor classifier as much as possible.

Since this process only requires the output of the instructor, we are free to use an unlimited amount of unlabeled data, rather than being limited to labeled data. This provides us with sufficient resources to self-distill, which means that we can improve student performance as long as the instructor allows. In addition, the method is different from the existing distillation method, teachers and students output in the same model, and the learning process does not need an additional pre-training structure, so that the distillation is completely a self-learning process.

No matter which layer of transform output result is the result output by BERT, the result is equivalent to the extraction of the features, and the extracted features are classified, the extracted features generally need to pass through a plurality of layers of fully-connected networks, the number of layers is small, corresponding distribution cannot be learned, if the number of layers is large, parameters are large, calculation is large, complexity is high, and the purpose of improving the inference rate is difficult to achieve, so that in a real-time experiment, the effect of the three-layer fully-connected network is good, and the complexity is low. The SoftMax function in the method is not essentially different from the general role of the function in the neural network, and is used for organizing a high-dimensional output structure into classified class data, and summing the classified class data into probability values of 1, for example, the output before SoftMax is 10, 20 and 30, then after SoftMax, the possible output results are 0.2, 0.3 and 0.5, so that the probability of each class is obtained, and the probability summation is 1, namely the normalization is often called. However, if the SoftMax function is used at the rear end, the final processing and output of data can be conveniently performed by the student, an output result which the student wants to obtain is obtained, the confidence coefficient value can be conveniently and smoothly calculated later, and the weight of an important element can be more highlighted through the internal mechanism of the SoftMax.

Further, the system also comprises (1) a 0-2 th layer transform connected student classifier which is set to comprise three fully connected layers and a neural network of a self-attention mechanism layer arranged between the first fully connected layer and the second fully connected layer; (2) the student classifier connected with the 3 rd-5 th layer of transform is set to comprise four layers of fully-connected neural networks and two layers of self-attention mechanism layers arranged between the first fully-connected layer and the second fully-connected layer and between the third fully-connected layer and the fourth fully-connected layer; (3) the 6 th-11 th layer transform connected student classifier can be set to comprise two layers to three layers of fully connected neural networks and one layer of self-attention mechanism neural network; (4) parameters, layer numbers and ratios of the network are dynamically adjusted according to loss values and effects in the training process, and the parameters and structures to be adjusted comprise the ratio of head size to dropout, the ratio of layer numbers and sizes of hidden layers, the maximum vector length of position codes, the type of pooling layers and the number of full-connection layers in the multi-head attention mechanism.

Some conventional network probing methods have been introduced in the prior art, and with these probing methods, it is possible to see what knowledge the Bert or Transformer has learned. If one concludes with the current study, it is roughly as follows: after BERT training is completed, a low-level (shallow-level) transform mainly learns the characteristics of the surface layer of the natural language, a middle-level learns the coding syntax information, and a high-level encodes the semantic characteristics of NLP. At present, an unknown experimental method proves the conclusion, and what we need to do is to exert the efficiency of the model to the maximum level according to the known conclusion and the characteristics of BERT and the concrete work of resume screening.

The invention applies the classical scheme in the paper to the actual industrial production project. In a specific application, the improvement or adjustment of the classical architecture scheme is mainly focused on a classifier, including the number of layers of a full connection layer and the number of layers of a self-attention mechanism added in the middle of the full connection layer, and in general, the network structure of the classifier is properly adjusted to be more suitable for the work of resume screening.

What will you see first if you see a picture? When overload information is mapped to the eye curtain, our brain will focus on the main information, which is the Attention (Attention) mechanism of the brain. Similarly, when we read a sentence, the brain first remembers important words, so that the Attention mechanism can be applied to natural language processing tasks, and people propose an Attention (Attention) mechanism by processing information overload with the help of the human brain. It is essentially understood that the self-attention mechanism screens out a small amount of important information from a large amount of information and focuses on the important information, ignoring most of the unimportant information. The larger the weight, the more focused on its corresponding Value, i.e. the weight represents the importance of the information, and the Value is the corresponding information. The self-attention mechanism (self-attention) may be a special case of the general attention mechanism in which Q K V performs the attention calculation for each and all the elements in the sequence. The multi-head attention mechanism proposed by Google captures relevant information on different subspaces by computing multiple times. The self-attention mechanism is characterized in that the distance between the disregarded words is directly calculated to obtain the dependency relationship, the internal structure of a sentence can be learned, the realization is simpler, and the parallel calculation can be realized. The self-attention mechanism can be used in cooperation with RNN, CNN, FNN and the like as a layer and is applied to most NLP tasks.

The network structure of the student classifier is carefully designed, so that the accuracy and the reasoning speed of the model can be balanced. Since a simple network may compromise performance, while a large number of attention modules may severely slow down the reasoning speed. Our classifier has been shown to be lightweight and to ensure accuracy of screening. We add a branch classifier after each transform layer, but this only indicates that there is also a possibility that the 12-layer network is running. In practice, it is very rare that the 12-layer transform network runs completely, most samples cannot reach the last layer, the inference time is only reduced, the inference rate is increased, and model parameter reduction is realized by removing parameters stored in the model during training and only leaving parameters for inference prediction.

In some embodiments, we have redesigned the network structure of the student classifier. (1) A neural network of three fully-connected layers and a self-attention-braking layer disposed intermediate the first and second fully-connected layers is used behind the 0-2 shallow transform. Since the shallow transformers perform most of the text processing and are very powerful at processing text data, yielding a large amount of output, we have devised behind the first 3 layers of transformers to quickly perform most of the computations using a relatively simple but powerful student classifier network architecture. In practice, the throughput speed of the three-layer fully-connected network is high, the accuracy is guaranteed, and the effect can be fully played by arranging the self-attention mechanism layer at the front end, so that the rapid and accurate processing of a large amount of data is facilitated; (2) and using a four-layer fully-connected network and a neural network of two self-attention mechanism layers arranged between the first fully-connected layer and the second fully-connected layer and between the third fully-connected layer and the fourth fully-connected layer in the middle section Transformer. Although the processing time of the classifier may be prolonged when viewed individually, the accuracy is improved, the possibility of satisfying the threshold requirement is improved, and the purpose of shortening the overall judgment time can be well accomplished. The calculation is timely ended and the result is given out on the 3 rd to 5 th layers by utilizing the matching of the four layers of fully-connected networks and the self-attention mechanism layers, which is determined by the design purpose of people, and people expect that the whole resume screening work can be completed at most in the middle transform layer, so that the whole processing time is solved, and the parameter scale of the later transform layer is reduced. After 6 layers, the design is relatively simple, parameters and network structures are simplified, and the size of the model is reduced to a certain extent. In some embodiments, the determination result is output before almost all tasks are concentrated in the 4 th to 5 th layers.

Parameters and structures required to be adjusted by the network comprise (1) the ratio of head (head) size to dropout in a multi-head attention mechanism, (2) the number of layers, the size and the ratio of dropout of a hidden layer, (3) the maximum vector length of position coding, (4) the type of a pooling layer, the size of a full connection and the number of layers of the full connection. All the parameters, the number of layers and the ratio are dynamically adjusted during training, the specific requirements are determined according to loss values and effect investigation indexes in the training process, the changing direction and the value engineer can determine according to the practical situation of the training, and the key is to determine which parameters are adjusted, the associated parameters in the adjustment can be adjusted in a linkage manner, and as for the size of specific numerical values, because for each specific model, the training samples are inconsistent, the network structure may be different, no definite value can be referred to, the principle is that the parameters can be adjusted, in particular, the size of the head and the number of layers of the hidden layer in the multi-head attention mechanism tend to be adjusted to be small at the same time, so that the trend of overall scale reduction of the whole model parameter can be determined, other relevant parameter adjustment on the basis has a good balance basis in achieving the purpose of the model and reducing the scale of the model. The parameters are modified in a mode of increasing or reducing within a certain range, related parameters can be modified in a linkage mode, the purpose is to find a parameter combination more suitable for actually acquiring the characteristics of the sample, and the key point is to see whether the accuracy of the final output result can be improved. For the same text, one attention mechanism obtains one representation space, and if multiple attention mechanisms exist, multiple different representation spaces can be obtained, namely, a multi-point attention mechanism provides multiple 'representation subspaces' for the attention mechanism.

The performance of the BERT model can be rounded, but on the other hand, its consumption increases significantly in terms of resources, with many and complex parameters. We try to compress the BERT model for convenient use. The compression model is commonly used as follows: round-remove parts from the model, Quantization-reciprocal Double to Int32, Distiltation-teach a small model ". Our compression method can be basically classified into the last class, and is exactly Knowledge Distillation classification, and relatively, student classifiers are lighter in weight, more resource-saving and faster in inference speed.

In the present invention, we propose an improved version of BERT. In particular, the goal of achieving higher efficiency with less loss of precision is achieved by using a self-distilling mechanism in the training phase and an adaptive mechanism in the reasoning phase. This has a very practical feature in an industrial setting, namely that its inference speed is adjustable. It is shown from the data that our experiments demonstrated good results on 12 NLP datasets. Empirical results show that without performance degradation, it is 2-3 times faster than conventional BERT. If we relax tolerance loss in accuracy, the model can be freely adjusted by a factor of 1 to 12.

wherein, Ps is the output result of the student classifier, and N is the number of classification categories. The formula is entropy, the larger the entropy is, the more unreliable the result is, and if an uncertainty is smaller than a threshold value, the result is output, so that the reasoning speed is improved. If the shallow layer accuracy is low, then there is no way to increase the accuracy to the desired level. The accuracy generally increases with the number of layers, for example, the accuracy of a shallow layer is 0.7, then through continuous optimization of the later layers, the accuracy may increase step by step, and may reach an accuracy of more than 0.9 to the last layer, which generally meets the service requirements, but if the accuracy of the shallow layer is relatively low, for example, 0.2, then the accuracy of the later layers cannot be improved to more than 0.9 to the last layer, generally to about 0.6 to 0.7 at most, if the accuracy of the later layers is relatively low, such accuracy generally does not meet the service requirements, so if the accuracy of the shallow layer is low, then an accurate judgment result cannot be given later. If the threshold value is determined, where the model deduces, and a result uncertainty metric needs to be calculated in each layer of transform calculation, where the metric is between 0 and 1, for example, the threshold value determined by experiments is 0.7, when the uncertainty metric of the layer is less than 0.7, the deduction is stopped, and the layer result is used as the final output result. Because the classification task is adopted, if the probability of the classification result for each category is not great, the description discrimination is not enough, and the sample can not be judged to belong to any of the three categories, if the probability of one category in the classification task is far higher than the probability of other categories, the classification is very definite, the situation can be expressed with a positive property, and the uncertainty measure for the result is low with a positive property, namely the layer result can be used as the final output result. The calculation is carried out once after each student classifier until the calculation result meets the threshold requirement, and the content and the quantity of the classified categories of each student classifier are the same.

In addition, the AI resume screening method according to the embodiment of the invention described in conjunction with fig. 1 to 3 may be implemented by a corresponding electronic device. Fig. 4 is a diagram illustrating a hardware architecture 300 according to an embodiment of the invention.

The invention also discloses a system for AI screening of resume, comprising: (a) the resume text acquisition module is used for extracting a certain number of resume of job seekers and acquiring text content data of all the resumes; (b) the resume text screening and primary processing module is used for screening and primary processing the text data to obtain all resume optimized plain text contents; (c) the AI neural network module comprises a BERT model, wherein the BERT model adopts a multi-layer Transformer architecture, a teacher classifier (teacher classifier) is arranged behind the stack of the transformers, and a student classifier (student classifier) is connected behind each layer of the Transformer architecture of the BERT model; (d) a confidence calculation module: the system is used for calculating the classification confidence of the output result of the student classifier of each layer of the Transformer; (e) resume quality judgment module: the system is used for judging whether the resume meets the requirements according to a preset rule, and when the classification confidence coefficient is higher than a preset threshold value, continuing to operate the next layer of Transformer and recalculating the confidence coefficient of the next layer; and when the classification confidence coefficient is lower than a preset threshold value, directly outputting the result.

And, an apparatus, characterized in that it comprises:

one or more processors;

a memory for storing one or more programs;

As shown in fig. 4, the apparatus 300 for implementing the present invention in this embodiment includes: the device comprises a processor 301, a memory 302, a communication interface 303 and a bus 310, wherein the processor 301, the memory 302 and the communication interface 303 are connected through the bus 310 and complete mutual communication.

In particular, the processor 301 may include a central processing unit (CPu), or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement an embodiment of the present invention.

That is, the device 300 may be implemented to include: a processor 301, a memory 302, a communication interface 303, and a bus 310. The processor 301, memory 302 and communication interface 303 are coupled by a bus 310 and communicate with each other. The memory 302 is used to store program code; the processor 301 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 302 for performing the method in any embodiment of the present invention, thereby implementing the method and apparatus described in conjunction with fig. 1 to 3.

As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims

1. An AI resume screening method, comprising:

(a) extracting a certain number of resume of job seekers to be screened;

(b) acquiring text content data of all resumes;

(c) screening and primarily processing the text data to obtain all resume optimized plain text contents;

(d) inputting the plain text content of all resumes into an improved BERT model;

(e) the BERT model adopts a multi-layer Transformer architecture;

(f) a teacher classifier (teacher classifier) is arranged behind the transform stack;

(g) connecting a student classifier (student classifier) behind each layer of Transformer architecture of the BERT model;

(h) calculating the classification confidence of the output result of the student classifier of each layer of the Transformer;

(i) when the classification confidence coefficient is higher than a preset threshold value, continuing to operate the next layer of Transformer, and repeating the step h;

(j) when the classification confidence coefficient is lower than a preset threshold value, directly outputting the result;

(k) and judging whether the resume meets the requirements or not according to the output result.

2. The AI resume screening method of claim 1, further comprising a method of training the BERT model,

(1) in sample collection, the detection range of text content is increased, the collection and detection are carried out on the resume character input part, wherein the resume character input part comprises a long text and a short text, and the resume marking sample must cover the special texts and simultaneously process the two special text structures; (2) firstly, extracting text content data of a training set resume by using a data acquisition system, screening and primarily processing the data, and then directly carrying out BERT model training by using the text contents of all resumes; (3) aiming at the models containing long samples, the output is of three types, namely normal, too simple and meaningless; for models containing short samples, the outputs are of two types, normal and meaningless respectively.

3. The AI resume screening method of claim 2, wherein the training method further comprises the step of collecting a balance sample: (1) because the number of the positive samples is far greater than that of the negative samples, random down-sampling treatment of the positive samples is adopted for sample balance, the sampling ratios are respectively 1: 1, 1: 2, 1: 5 and 1: 10, and then the sampled samples are respectively verified; (2) because the number of the negative samples is small, the sampling process of the negative samples is carried out by adopting an interpolation up-sampling method; (3) because the negative samples contain a part of texts which are randomly input by the user by using various input methods, the negative samples with small quantity are required to be identified through a model, a training sample is constructed by constructing the negative samples through the positive samples, and the method is realized by randomly disordering the front and back sequence of the positive samples with certain length after word segmentation and word segmentation or randomly extracting words with a certain proportion to form a new negative sample; (4) when the negative sample type is insufficient, a manual generation method is adopted for the full-type negative sample, the nonsense parts for each project of the resume are defined from the corpus, and the selected nonsense words are randomly intercepted and combined; (5) transforming the samples according to manual or model screening and marked actual low-quality resume samples to construct new training negative samples with similar contents; (6) in the total sample collection, the long text category accounts for 80%, the short text category accounts for 20%, and the redundant short text is discarded.

4. The AI resume screening method of claim 1, wherein in steps (h) and (g), the original BERT model is called a stem, and each circumscribed student classifier is called a branch, wherein the student classifiers are derived by distillation from the instructor classifier on the last layer of the stem.

5. The AI resume screening method of claim 4, wherein the self-distillation method is a knowledge distillation method, a trained instructor network is used to guide a student network to train, the two network tasks are the same and can achieve the same purpose, and wherein the instructor network parameter setting and calculation process is more complex than the student network; only updating trunk parameters in the pre-training and fine-tuning stages, freezing the trunk parameters after fine-tuning, ensuring that the learning knowledge in the pre-training and fine-tuning stages is not influenced, distilling the probability distribution of the trunk classifier by using a branch classifier, and fitting the distribution of the trunk classifier by using only the branch classifier.

6. The AI resume screening method of claim 1, wherein in step (g), the student classifier is configured to calculate a final class probability of outputting each layer according to the output data of the fransormer of the layer; the student classifier is provided as a neural network including three fully connected layers and one self-attention mechanism layer, and converts output values of multi-classification into probability distributions ranging between [0, 1] and 1 in sum by a Softmax function.

7. The AI resume screening method of claim 6, further comprising, (1) a 0-2 layer transform-connected student classifier configured as a neural network comprising three fully-connected layers and a self-attention-control layer disposed intermediate the first and second fully-connected layers; (2) the student classifier connected with the 3 rd-5 th layer of transform is set to comprise four layers of fully-connected neural networks and two layers of self-attention mechanism layers arranged between the first fully-connected layer and the second fully-connected layer and between the third fully-connected layer and the fourth fully-connected layer; (3) the 6 th-10 th layer transform connected student classifier can be set to comprise two layers to three layers of fully connected neural networks and one layer of self-attention mechanism neural network; (4) parameters, layer numbers and ratios of the network are dynamically adjusted according to loss values and effects in the training process, and the parameters and structures to be adjusted comprise the ratio of head size to dropout, the ratio of layer numbers and sizes of hidden layers, the maximum vector length of position codes, the type of pooling layers and the number of full-connection layers in the multi-head attention mechanism.

8. The AI resume screening method of claim 1, wherein in step (h), the confidence level is measured by outputting a result uncertainty measure formula, the measure of result uncertainty is low, the classifier confidence level is high, and the formula is as follows:

9. A system for AI screening of resumes, comprising:

(a) the resume text acquisition module is used for extracting a certain number of resume of job seekers and acquiring text content data of all the resumes;

(b) the resume text screening and primary processing module is used for screening and primary processing the text data to obtain all resume optimized plain text contents;

(c) the AI neural network module comprises a BERT model, wherein the BERT model adopts a multi-layer Transformer architecture, a teacher classifier (teacher classifier) is arranged behind the stack of the transformers, and a student classifier (student classifier) is connected behind each layer of the Transformer architecture of the BERT model;

(d) a confidence calculation module: the system is used for calculating the classification confidence of the output result of the student classifier of each layer of the Transformer;

(e) resume quality judgment module: the system is used for judging whether the resume meets the requirements according to a preset rule, and when the classification confidence coefficient is higher than a preset threshold value, continuing to operate the next layer of Transformer and recalculating the confidence coefficient of the next layer; and when the classification confidence coefficient is lower than a preset threshold value, directly outputting the result.

10. An apparatus, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to perform the AI resume screening method of any of claims 1-8.

11. A computer-readable storage medium on which a computer program is stored, characterized in that the program is processed by a processor

When executed, performs the AI resume screening method of any of claims 1-8.