Disclosure of Invention
The invention aims to provide a semantic-based interpretable human job matching method and a semantic-based interpretable human job matching system. The method can be used for solving the problems that the existing recommendation algorithm cannot well process unstructured recruitment information and resumes and cannot provide proper recommendation reasons in the recommendation process.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a semantic-based interpretable human job matching method, which comprises the following steps:
and step S1, receiving the resume text from the job seeker and the recruitment information text from the recruiter, performing word embedding processing on the resume text and the recruitment information text according to a pre-trained word embedding model to obtain word vector representation of each word in the resume text and the recruitment information text, and obtaining vectorized representation of the resume text and vectorized representation of the recruitment information text.
Step S2, modeling each word in each text through a recurrent neural network encoder based on hierarchical attention to obtain the state representation of each sentence in the text and the overall state representation of each text;
specifically, a vectorized resume text and a recruitment information text are respectively input into a bidirectional cyclic neural network encoder to obtain the state representation of each word in the resume text and the state representation of each word in the recruitment information text;
aiming at the state representation of each word in the resume text, a multi-layer attention mechanism is introduced to calculate the state representation of each word in the resume text and the overall state representation of the whole resume text;
and aiming at the state representation of each word in the recruitment information text, a multi-layer attention mechanism is introduced to calculate the state representation of each sentence in the recruitment information text and the overall state representation of the whole recruitment information text.
And S3, fitting the relationship between every two corresponding sentences in the resume text and the recruitment information text through the similarity matrix and the offset matrix to obtain a semantic similarity matrix, extracting the maximum similarity of every sentence in the resume text and the recruitment information text according to the semantic similarity matrix, and calculating the aim representation of every sentence and the overall aim representation of every text.
And S4, splicing the overall state representation and the overall target representation of each text to obtain a final representation of each text, splicing the final representation of the resume text and the final representation of the recruitment information text, and calculating through a multi-layer sensing machine to obtain the matching degree of the resume and the recruitment information.
Step S5, recommending a plurality of recruitment information texts with highest matching degree in the recruitment information base aiming at the job seeker, and giving sentence level explanation; and recommending a plurality of resume texts with the highest matching degree in the resume library aiming at the recruiter and giving sentence level explanation.
Compared with the prior art, the matching method has the beneficial effects that: the method can capture the state representation of words and sentences in the whole text from the unstructured text, capture the association degree between each pair of corresponding sentences in the resume text and the recruitment information text as the recommendation basis, improve the expression capacity of the model, splice the whole state representation and the whole target representation of each text to obtain the final representation of each text, and splice the final representations of the resume text and the recruitment information text to calculate the matching degree of the resume and the recruitment information.
In the aforementioned method for matching human and employment based on semantics, in step S2, the vectorized resume text and the recruitment information text are input into a bidirectional gated recurrent neural network encoder, respectively, so as to obtain the state representation of each word in the resume text and the recruitment information text, and then a multi-layer attention mechanism is introduced into the state representation of each word to calculate the state representation of each sentence in each text and the overall state representation of each text.
In the aforementioned method for matching human positions based on semantics, obtaining the state representation of each word in the resume text and the recruitment information text includes: and respectively inputting the vectorized resume text and the recruitment information text into a bidirectional gated recurrent neural network encoder to obtain the forward state and the backward state of each word, and splicing the forward state, the backward state and the forward state of each word to obtain the state representation of each word.
In the aforementioned method for matching human positions based on semantics, the calculation of the state representation of each sentence in each text and the overall state representation of each text includes: and calculating the attention weight of the state representation of each word in each sentence through a layer of fully-connected network and a softmax function, multiplying each word vector in each sentence by the attention weight of the word vector and summing to obtain the state representation of each sentence, and summing the state representations of each sentence to obtain the integral state representation of each text.
In the aforementioned method for matching human positions based on semantics, in step S3, obtaining a semantic similarity matrix includes: filling the number of sentences of the resume text and the number of sentences of the recruitment information text into consistent numbers, and respectively combining the consistent numbers into sentence matrixes of the resume and the recruitment information; and multiplying the sentence matrix of the resume by the similarity matrix, multiplying the sentence matrix of the recruitment information by the transpose of the sentence matrix, and adding a bias matrix to obtain a semantic similarity matrix.
In the aforementioned semantic-based interpretable job matching method, in step S3, the calculation of the representation for each sentence and the representation for the entirety of each text includes:
aiming at a certain sentence in the resume text, finding out a corresponding value of the sentence with the strongest relevance in the recruitment information text from the semantic similarity matrix, and multiplying the value by the sentence to obtain the aiming expression of the sentence;
aiming at a certain sentence in the recruitment information text, finding out a corresponding value of the sentence in the matrix, which has the strongest relevance with the certain sentence in the resume text, from the semantic similarity matrix, and multiplying the value by the sentence to obtain the opposite expression of the sentence;
and summing the target representation of each sentence in the resume text and the recruitment information text respectively to obtain the integral target representation of the resume text and the recruitment information text.
In the aforementioned method for matching a job of an interpretable person based on semantics, in step S5, for a user of a job seeker, a matching degree between a resume text of the user and each recruitment information text in a recruitment information base is calculated, and a recruitment information text with the highest matching degree is preferentially recommended; and finding three pairs of sentences with the strongest relevance from the semantic similarity matrix of the pair of resume texts and the recruitment information text, and explaining the job seeker user: the three words in the recruitment information respectively best match the three words in the resume.
In the aforementioned method for matching interpretable persons and jobs based on semantics, in step S5, for a recruiter user, a matching degree between a recruitment information text and each resume text in a resume library is calculated, and a resume text with the highest matching degree is preferentially recommended; and finding three pairs of sentences with the strongest relevance from the semantic similarity matrix of the pair of the recruitment information texts and the resume text, and explaining the recruiter user: the three words in the resume respectively best match the three words in the recruitment information.
In a second aspect of the present invention, there is provided a semantic-based interpretable job matching system, comprising:
the word embedding module is used for carrying out word embedding operation on the resume text and the recruitment information text according to a pre-trained word embedding model to obtain word vector representation of each word in the resume text and the recruitment information text and obtain vectorized representation of the resume text and vectorized representation of the recruitment information text;
the text coding module is used for inputting the vectorized representation of the resume text and the recruitment information text into the bidirectional circulation neural network encoder to obtain the state representation of each word in the text, and calculating the state representation of each word and the integral state representation of the resume text and the recruitment information text by introducing a multi-layer attention mechanism;
the semantic similarity calculation module is used for fitting the relationship between every two corresponding sentences in the resume text and the recruitment information text through the similarity matrix and the offset matrix to obtain a semantic similarity matrix, extracting the maximum similarity of every sentence in the resume and the recruitment information text according to the semantic similarity matrix, and calculating the target representation of every sentence and the integral target representation of the resume text and the recruitment information text;
the text matching degree prediction module is used for splicing the whole state representation and the whole specific representation of the resume text and the recruitment information text and inputting the matching degree of the resume text and the recruitment information text predicted by the multilayer sensing machine;
the interpretable recommendation module is used for giving recommendations and interpretations, selecting resume texts with the highest matching degree from the resume library according to the recruitment information texts submitted by the recruiter for recommendation and giving a recommendation reason; and aiming at the job seeker, selecting a recruitment information text with the highest matching degree from the recruitment information base according to the resume text submitted by the job seeker, recommending the selected recruitment information text and providing a recommendation reason.
In the aforementioned semantic-based interpretable job matching system, the text encoding module includes:
the word state coding module is used for inputting the vectorized text into the bidirectional gated recurrent neural network to calculate the forward and backward states of the word and splicing the forward and backward states with the vector representation of the word to obtain the state representation of the word;
the sentence state coding module is used for calculating a weight vector of each word through a layer of fully-connected network and a softmax function, and performing weighting sum to obtain state representation of a sentence;
and the overall state coding module is used for calculating a weight vector of each sentence through a layer of fully-connected network and the softmax function, and performing weighting to obtain overall state representation of the text.
In the aforementioned semantic-based interpretable person-job matching system, the interpretable recommendation module includes:
the recommendation module is used for recommending resume texts or recruitment information texts most relevant to the user according to the matching degree;
and the interpretation module is used for giving the explanation of the recommendation, finding out three pairs of sentences with the strongest relevance from the semantic similarity matrix of the resume text according to the recommended resume text and explaining the user: the three words in the resume respectively and most accord with the three words in the recruitment information; aiming at the job seeker user, three pairs of sentences with strongest relevance are found out from the semantic similarity matrix according to the recommended recruitment information text, and the explanation is made to the user: the three words in the recruitment information respectively best match the three words in the resume.
Compared with the prior art, the matching system has the beneficial effects that: the invention provides a functional frame structure for the semantic-based interpretable person and job matching method, and aims to solve the problems that the existing recommendation algorithm cannot well process unstructured recruitment information and resume and cannot provide proper recommendation reasons in the recommendation process.
Detailed Description
The invention is further illustrated by the following figures and examples, which are not to be construed as limiting the invention.
Example 1: a semantic-based interpretable person and job matching method comprises the steps of describing a problem proposed by an application, and in the embodiment, describing a recruitment information text p as a combination of np sentences
And the ith sentence
Is described as n
p,iCombination of words
Wherein
Indicating the kth word in the ith sentence in the recruitment information text p. Each sentence in the recruitment information text describes the skill or capability required for the job. Similarly, a resume text r is described as a combination of nr sentences
And the ith sentence
Is described as n
r,iCombination of words
Wherein
Representing the kth word in the ith sentence in resume text r. Each sentence in the resume text describes the skill or experience that the job seeker has acquired. The method aims to recommend the recruitment information text or resume text which is most likely to be interested by the user in the recruitment information base or resume base and give a certain explanation when the resume text r or the recruitment information text p is given. The specific method comprises the following steps, and the brief flow chart is shown in figure 1, and comprises the following steps:
step S1: acquiring data and preprocessing the data:
and carrying out word embedding operation on the resume text and the recruitment information text with the predicted matching degree through a pre-trained Chinese word embedding model beat-wwm, and coding each word in the resume text and the recruitment information text into a 300-dimensional embedding vector. And dividing the resume text and the recruitment information text according to periods and paragraphs, wherein each text consists of 20 sentences, each sentence consists of 100 words, and if the text is insufficient, the text is complemented by adopting a 0-padding method to obtain vector representation of the resume text and the recruitment information text.
Step S2: modeling more effective state representation of words and sentences in the resume text and the recruitment information text and calculating overall state representation of the resume text and the recruitment information text by a recurrent neural network encoder based on layered attention, wherein the method relates to the overall state representation calculation of the resume text and the overall state representation calculation of the recruitment information text, the method of the two methods are basically the same, and the calculation of the overall state representation of the recruitment information text is taken as an example:
for a recruitment information text p, there are 20 × 100 words embeddedThe inbound vector representation inputs each sentence in the recruitment information text p into the bi-directional recurrent neural network BIGRU, e.g., for the ith sentence in the recruitment information p, since the representation of a word should not be determined by its own meaning alone, but should also capture its meaning in the entire sentence
Concatenation of embedded vectors in the form of 20 words
Enter it into BIGRU to get forward and backward state of each word in the sentence, e.g. get forward state vector for k word
And backward state vector
The state of the kth word calculated by splicing with the embedded vector of the kth word is expressed as follows:
since different words in a sentence have different degrees of importance, a mechanism of attention is introduced to calculate the weight coefficient of each word in the sentence, for example, the weight coefficient a of the k-th word in the ith sentence
kAnd status representation of the ith sentence
The calculation formula of (a) is as follows:
ak=softmax(ak)
wherein
And b
1Is a 900-dimensional trainable parameter vector, W
1A trainable parameter matrix of 900 x 900 dimensions.
Since different sentences in the recruitment information text have different degrees of importance, an attention mechanism is introduced to calculate the weight coefficient of each sentence in the text, such as the weight coefficient B of the ith sentence in the recruitment information text piAnd an overall status representation t of the recruitment information text ppThe calculation formula of (a) is as follows:
Bi=softmax(Bi)
wherein
And b
2Is a 900-dimensional trainable parameter vector, W
2A trainable parameter matrix of 900 x 900 dimensions.
The status representation of the ith sentence in the resume text can be calculated in the same way
With global state representation t
r。
Step S3: introducing a similarity matrix to capture the relationship between two corresponding sentences in the resume text and the recruitment information text pair, and calculating the overall target representation of the resume text and the recruitment information text, wherein the method comprises the following steps:
the resume text consists of 20 sentences and is represented by a 20-x 900-dimensional matrix Rs tableShowing 20 words in the resume text,
the same 20-by-900-dimensional matrix Js is used to represent 20 words in the recruitment information text,
the relationship between the two sentences in the resume text and the recruitment text can be captured through the similarity matrix, and the calculation formula is as follows:
A=Js·Wa·RsT+Ba
wherein Wa is a trainable parameter matrix with 900 x 900 dimensions, and Ba is a trainable parameter matrix with 20 x 20 dimensions.
Each entry in the matrix a presents a semantic similarity between the sentences in the resume text and the recruitment information text, which is calculated by the bilinear product of the two sentence representations, for example, the element aij in the matrix a represents the semantic similarity between the jth sentence in the resume text and the ith sentence in the recruitment information text.
The semantic similarity matrix A can be used for calculating the overall aiming expression of the resume text and the recruitment information text, and aiming at the ith sentence in the recruitment information text p
Calculating the maximum value of the ith row in the semantic similarity matrix A as the corresponding weight thereof
Therefore, the most relevant sentence in the resume text can be captured and merged into the representation of the resume text to obtain the target representation of the sentence, and the specific calculation method is as follows:
wherein MaxL (A)
i) Is the operation of taking the maximum value for the ith row in matrix a,
the recruitment information text p is a wholly targeted representation.
For the ith sentence in resume text r
Calculating the ith maximum value in the semantic similarity matrix A as the corresponding weight thereof
Therefore, the sentence which is most relevant to the recruitment information text can be captured and merged into the representation of the recruitment information text to obtain the target representation of the sentence, and the specific calculation method is as follows:
wherein MaxR (A)
i) Is the operation of taking the maximum value for the ith column in matrix a,
the entirety of the resume text r is directed to the representation.
Step S4: and calculating the matching degree of the resume text and the recruitment information text, wherein the method comprises the following steps:
splicing the overall state representation and the overall target representation of the resume text and the recruitment information text to obtain a final representation T of the resume text and the recruitment information textrAnd TpAnd after splicing, inputting the multi-layer perceptron of the sigmod activating function to predict the matching degree, wherein the specific formula is as follows:
y=MLP([Tr,Tp])
step S5: aiming at the recruiter user, according to the recruitment information text provided by the user, traversing the resume library to calculate the resume with the highest matching degree with the recruitment information of the user, storing three pairs of sentences with the strongest relevance according to the semantic similarity matrix A, finally recommending the resume text, and explaining the user: the three sentences in the resume text r respectively and most accord with the three sentences in the recruitment information text;
aiming at a job seeker user, according to resume texts provided by the user, traversing the recruitment information base to calculate recruitment information with the highest matching degree with the resume of the user, storing three pairs of sentences with the strongest relevance according to a semantic similarity matrix A, finally recommending the recruitment information text, and explaining the user: the three sentences in the recruitment information text p respectively best match the three sentences in the resume text.
A network model diagram described in the entire embodiment 1 is shown in fig. 2.
Example 2: a semantic-based interpretable job matching system, the functional block diagram of which is shown in fig. 3, comprising:
the word embedding module 1 is used for performing word embedding operation on the resume text and the recruitment information text according to a pre-trained word embedding model to obtain word vector representation of each word in the resume text and the recruitment information text and obtain vectorized representation of the resume text and vectorized representation of the recruitment information text.
The text coding module 2 is used for inputting the vectorized representation of the resume text and the recruitment information text into the bidirectional circulation neural network encoder to obtain the state representation of each word in the text, and calculating the state representation of each word and the integral state representation of the resume text and the recruitment information text by introducing a multi-layer attention mechanism;
the text encoding module 2 includes:
the word state coding module is used for inputting the vectorized text into the bidirectional gated recurrent neural network to calculate the forward and backward states of the word and splicing the forward and backward states with the vector representation of the word to obtain the state representation of the word;
the sentence state coding module is used for calculating a weight vector of each word through a layer of fully-connected network and a softmax function, and performing weighting sum to obtain state representation of a sentence;
and the overall state coding module is used for calculating a weight vector of each sentence through a layer of fully-connected network and the softmax function, and performing weighting to obtain overall state representation of the text.
And the semantic similarity calculation module 3 is used for fitting the relationship between every two corresponding words in the resume text and the recruitment information text through the similarity matrix and the offset matrix to obtain a semantic similarity matrix, extracting the maximum similarity of every word in the resume and the recruitment information text according to the semantic similarity matrix, and calculating the target representation of every word and the overall target representation of the resume text and the recruitment information text.
And the text matching degree prediction module 4 is used for splicing the whole state representation of the resume text and the recruitment information text and the matching degree of the resume text and the recruitment information text which is wholly expressed and input by aiming at the representation and input by the multilayer sensing machine.
The interpretable recommendation module 5 is used for giving recommendations and interpretations, selecting resume texts with the highest matching degree from the resume library according to the recruitment information texts submitted by the recruiter for recommendation and giving a recommendation reason; aiming at job seekers, selecting a recruitment information text with the highest matching degree from a recruitment information base according to resume texts submitted by the job seekers for recommendation and giving a recommendation reason;
the interpretable recommendation module 5 includes:
the recommendation module is used for recommending resume texts or recruitment information texts most relevant to the texts provided by the user according to the matching degree;
the explanation module is used for giving explanation of the recommendation, finding out three pairs of sentences with strongest relevance from the semantic similarity matrix of the resume text according to the recommended resume text and explaining the user; and aiming at the job seeker user, three pairs of sentences with strongest relevance are found out from the semantic similarity matrix according to the recommended recruitment information text, and the explanation is made for the user.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned examples, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.