CN113268560A - Method and device for text matching - Google Patents
Method and device for text matching Download PDFInfo
- Publication number
- CN113268560A CN113268560A CN202010097271.9A CN202010097271A CN113268560A CN 113268560 A CN113268560 A CN 113268560A CN 202010097271 A CN202010097271 A CN 202010097271A CN 113268560 A CN113268560 A CN 113268560A
- Authority
- CN
- China
- Prior art keywords
- text
- resume
- vector
- keywords
- trained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003062 neural network model Methods 0.000 claims abstract description 112
- 230000000875 corresponding Effects 0.000 claims description 58
- 238000000605 extraction Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 9
- 230000003416 augmentation Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 18
- 239000000203 mixture Substances 0.000 description 13
- 238000000034 method Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 8
- 230000015654 memory Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000006403 short-term memory Effects 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 4
- 230000003287 optical Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000006011 modification reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000000644 propagated Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 210000003666 Nerve Fibers, Myelinated Anatomy 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000003365 glass fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
Abstract
The invention discloses a method and a device for text matching, and relates to the technical field of computers. Wherein, the method comprises the following steps: extracting key words in the resume text and the position text based on the knowledge graph; under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on a trained first neural network model; and determining the similarity of the vector of the resume text and the vector of the position text, and then judging whether the resume text is matched with the position text according to the similarity. Through the steps, the matching accuracy of the resume text and the job position text can be improved, and the accuracy of job position or resume recommendation service is further improved.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for text matching.
Background
With the development of internet technology, online recruitment and job hunting have become important channels for recruiters and job seekers. In the face of a large number of positions published on the internet, it is often difficult for job seekers to quickly find a position suitable for themselves. Meanwhile, it is also often difficult for the recruiter to quickly find the resume that suits his post. How to realize accurate matching of the resume and the job information and further provide accurate recommendation service is a key for solving the problems.
In the process of implementing the invention, the inventor of the invention finds that: in the prior art, the job position or resume recommendation is usually carried out according to the browsing history of the user, and the job position or resume recommendation is rarely carried out by calculating whether the resume is matched with the job position information or not. Even if the resume is recommended by calculating whether the resume is matched with the position information or not, the problems that the matching accuracy of the resume and the position information is low and the accuracy of position or resume recommendation is poor exist.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for text matching, which can improve matching accuracy of a resume text and a job text, and further improve accuracy of job or resume recommendation service.
To achieve the above object, according to a first aspect of the present invention, there is provided a method for text matching.
The method for text matching of the invention comprises the following steps: extracting key words in the resume text and the position text based on the knowledge graph; under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on a trained first neural network model; and determining the similarity of the vector of the resume text and the vector of the position text, and then judging whether the resume text is matched with the position text according to the similarity.
Optionally, the method further comprises: under the condition that the number of the keywords in the resume text or the number of the keywords in the position text is smaller than a preset threshold value, generating a vector of the resume text and a vector of the position text based on a trained second neural network model; wherein the second neural network model is a pre-training model.
Optionally, the method further comprises: constructing a first twin network based on the first neural network model, and constructing a second twin network based on the second neural network model; carrying out supervised training on the first twin network according to a first training sample set to obtain the trained first neural network model; carrying out supervised training on the second twin network according to a second training sample set to obtain the trained second neural network model; wherein the first training sample set and the second training sample set comprise: sample pairs of resume text and job text with category labels; the class label is used to indicate whether the sample pair matches.
Optionally, the method further comprises: acquiring a sample pair of a resume text and a position text with category labels, performing data augmentation on the sample pair, and constructing a first training sample set and a second training sample set according to the sample pair subjected to data augmentation.
Optionally, the first neural network model is a TextCNN model; the second neural network model includes: BERT model, and fully connected layer.
Optionally, the generating a vector of resume text and a vector of position text based on the trained first neural network model includes: vectorizing each keyword in the resume text to obtain a word vector corresponding to each keyword; inputting the word vectors corresponding to the keywords in the resume text into the trained textCNN model to obtain the vectors of the resume text; vectorizing the keywords in the position text to obtain word vectors corresponding to the keywords; and inputting the word vector corresponding to each keyword in the position text into the trained textCNN model to obtain the vector of the position text.
Optionally, the generating the vector of resume text and the vector of position text based on the trained second neural network model includes: dividing the resume text into a plurality of paragraph texts with equal lengths, and inputting the paragraph texts into a trained BERT model to obtain a vector corresponding to each paragraph text; inputting the vectors corresponding to the paragraph texts into the trained full-connection layer to obtain the vectors of the resume texts; and converting the job text into a text with the same length as the paragraph text, and then sequentially inputting the text into the trained BERT model and the trained full-link layer to obtain a vector of the job text.
To achieve the above object, according to a second aspect of the present invention, there is provided a resume recommendation method.
The resume recommendation method comprises the following steps: extracting key words in the resume text and the position text based on the knowledge graph; under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on a trained first neural network model; determining the similarity of the vector of the resume text and the vector of the position text, and judging whether the resume text is matched with the position text according to the similarity; and generating a resume recommendation list according to the resume text matched with the position text, and sending the resume recommendation list to a recruitment user terminal corresponding to the position text.
To achieve the above object, according to a third aspect of the present invention, a job recommendation method is provided.
The position recommendation method comprises the following steps: extracting key words in the resume text and the position text based on the knowledge graph; under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on a trained first neural network model; determining the similarity of the vector of the resume text and the vector of the position text, and judging whether the resume text is matched with the position text according to the similarity; and generating a job recommendation list according to the job text matched with the resume text, and sending the job recommendation list to the job hunting user terminal corresponding to the resume text.
To achieve the above object, according to a fourth aspect of the present invention, there is provided an apparatus for text matching.
The device for text matching of the present invention comprises: the extraction module is used for extracting key words in the resume text and the position text based on the knowledge graph; the generation module is used for generating a vector of the resume text and a vector of the position text based on the trained first neural network model under the condition that the number of the keywords in the resume text and the position text is larger than a preset threshold; and the judging module is used for determining the similarity of the vector of the resume text and the vector of the position text and then judging whether the resume text is matched with the position text or not according to the similarity.
To achieve the above object, according to a fifth aspect of the present invention, there is provided a resume recommending apparatus.
The resume recommending device of the invention comprises: the extraction module is used for extracting key words in the resume text and the position text based on the knowledge graph; the generation module is used for generating a vector of the resume text and a vector of the position text based on the trained first neural network model under the condition that the number of the keywords in the resume text and the position text is larger than a preset threshold; the judging module is used for determining the similarity of the vector of the resume text and the vector of the position text and judging whether the resume text is matched with the position text or not according to the similarity; and the resume recommendation module is used for generating a resume recommendation list according to the resume text matched with the position text and sending the resume recommendation list to the recruitment user terminal corresponding to the position text.
To achieve the above object, according to a sixth aspect of the present invention, there is provided a position recommending apparatus.
The position recommendation device of the invention comprises: the extraction module is used for extracting key words in the resume text and the position text based on the knowledge graph; the generation module is used for generating a vector of the resume text and a vector of the position text based on the trained first neural network model under the condition that the number of the keywords in the resume text and the position text is larger than a preset threshold; the judging module is used for determining the similarity of the vector of the resume text and the vector of the position text and judging whether the resume text is matched with the position text or not according to the similarity; and the job recommendation module is used for generating a job recommendation list according to the job text matched with the resume text and sending the job recommendation list to the job hunting user terminal corresponding to the resume text.
To achieve the above object, according to a seventh aspect of the present invention, there is provided an electronic apparatus.
The electronic device of the present invention includes: one or more processors; and storage means for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement the method for text matching or resume recommendation method or job recommendation method of the present invention.
To achieve the above object, according to an eighth aspect of the present invention, there is provided a computer-readable medium.
The computer-readable medium of the invention has stored thereon a computer program which, when being executed by a processor, implements the method for text matching or the resume recommendation method or the job recommendation method of the invention.
One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of extracting keywords in a resume text and a position text based on a knowledge graph, generating a vector of the resume text and a vector of the position text based on a trained first neural network model under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, determining the similarity of the vector of the resume text and the vector of the position text, and judging whether the resume text and the position text are matched or not according to the similarity, so that the matching accuracy of the resume text and the position text can be improved, and the accuracy of position or resume recommendation service is further improved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a main flow diagram of a method for text matching according to a first embodiment of the present invention;
FIG. 2 is a partial flow diagram of a method for text matching according to a second embodiment of the present invention;
FIG. 3 is a partial flow diagram of a method for text matching according to a second embodiment of the present invention;
FIG. 4 is a schematic main flow chart of a resume recommendation method according to a third embodiment of the present invention;
fig. 5 is a schematic main flow chart of a job recommendation method according to a fourth embodiment of the present invention;
FIG. 6 is a schematic diagram of the main modules of an apparatus for text matching according to a fifth embodiment of the present invention;
FIG. 7 is a schematic diagram of the main blocks of a resume recommendation apparatus according to a sixth embodiment of the present invention;
fig. 8 is a schematic view of the main blocks of a position recommending apparatus according to a seventh embodiment of the present invention;
FIG. 9 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
FIG. 10 is a block diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
Fig. 1 is a main flow diagram of a method for text matching according to a first embodiment of the present invention. As shown in fig. 1, the method for text matching according to the embodiment of the present invention includes:
and S101, extracting key words in the resume text and the position text based on the knowledge graph.
The knowledge graph is a data structure based on a graph and consists of nodes (points) and edges (edges). In the knowledge-graph, each node represents an "entity" existing in the real world, and each edge is a "relationship" between entities. Generally, a knowledge graph is a relational network obtained by connecting all kinds of Information (Heterogeneous Information).
Exemplarily, step S101 may specifically include: 1. identifying keywords in the resume text or the position text based on the knowledge graph, wherein the keywords are mainly skill vocabularies, for example, the word "vue" in the resume text appears in the knowledge graph and is a skill vocabulary, and the word can be extracted; 2. synonym combinations are performed based on the knowledge graph, such as "vue", "vue. js", "vuejs", "vue framework" and "vue front end framework" all having the same meaning, and these synonyms can be combined based on the knowledge graph, such as collectively denoted as "vue".
In specific implementation, the knowledge graph can be stored in a graph database (such as Neo4j), and then keywords in the resume text and the position text can be extracted by accessing the knowledge graph in the graph database. In addition, in order to facilitate online quick access, the knowledge graph can be stored in the memory in a document form, and keywords in the resume text and the position text can be extracted by accessing the document corresponding to the knowledge graph. The document corresponding to the knowledge graph can comprise: vocabulary index table, relation index table and relation table. The vocabulary index table may include indices of individual words in the knowledge graph, such as Vue with an index of 1, vue.js with an index of 2; the relationship index may include indices of various relationships in the knowledge graph, such as an index of "synonyms" for a relationship of 1, an index of "same range" for a relationship of 2, an index of "child skills" for a relationship of 3, and an index of "parent skills" for a relationship of 4; the relationship table may include relationships between various words, which may take the form of a "subject-relationship-object" representation, such as "112" representing "Vue and vue.
In the embodiment of the invention, in consideration of the fact that the resume relates to various industry fields, if the word segmentation is carried out based on a common dictionary, certain accuracy rate is difficult to maintain in each industry field. Since the knowledge graph has clear industry field information, the problem can be solved by introducing the knowledge graph to extract keywords. Moreover, the knowledge graph contains relations among the vocabularies, such as synonym relations, and the vocabularies with the same meanings can be merged by introducing the knowledge graph, so that the follow-up matching accuracy is improved, and more keywords can be extracted conveniently through the synonym relations.
And S102, under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on the trained first neural network model.
For example, in a specific application scenario, the preset threshold may be set to 5. When the number of the keywords extracted from the resume text is more than 5 and the number of the keywords extracted from the position text is more than 5, the number of the keywords extracted through the knowledge graph is considered to be enough, and then the vectors of the resume text and the position text can be generated based on the trained first neural network model. The preset threshold value related to the number of keywords can be flexibly set without affecting the implementation of the present invention.
Illustratively, the trained first neural network model may be obtained by: constructing a first twin network based on a first neural network model, and carrying out supervised training on the first twin network according to a first training sample set to obtain the trained first neural network model; the first set of training samples comprises: sample pairs of resume text and job text with category labels; the class label is used to indicate whether the sample pair matches.
The first twin Network (siernese Network) has two sub-networks with the same structure and sharing weight. The two subnetworks each receive one of the sample pairs (i.e., one receives the resume text in the sample pair and the other receives the position sample in the sample pair) and convert it to a vector. In the training process, a loss function can be calculated based on the two output vectors to continuously perform back propagation optimization on the twin network, so that the distance between the matched position text and the resume text in the vector space is short, and the distance between the unmatched position text and the resume text in the vector space is long.
In an alternative example, the first neural network model is a TextCNN model. The TextCNN model is a convolutional neural network model. In this alternative example, a vector of resume text, and a vector of position text, may be generated based on the trained TextCNN model. In addition, when implemented, the first neural network model may also employ an LSTM (long short term memory network) model, or other models that may be used to generate a vector representation of text.
Step S103, determining the similarity of the vectors of the resume texts and the vectors of the position texts, and then judging whether the resume texts are matched with the position texts according to the similarity.
In this step, the cosine similarity (or "cosine distance") of the vector of the resume text and the vector of the position text may be calculated, and the cosine similarity may be taken as a similarity calculation result. In addition, when the method is implemented, the similarity between the vector of the resume text and the vector of the position text can be calculated in other ways. For example, the euclidean distance between the vector of the resume text and the vector of the position text may be calculated, and the euclidean distance may be used as the similarity calculation result.
Illustratively, the determining whether the resume text and the position text match according to the similarity includes: comparing the similarity of the vectors of the resume text and the job text with a preset threshold; determining that the resume text is matched with the position text under the condition that the similarity is greater than a preset threshold value; otherwise, determining that the resume text does not match the job text. And the preset threshold value can be flexibly set. For example, the preset threshold may be set to 0.9, 0.8, or other values.
In the embodiment of the invention, keywords in the resume text and the position text are extracted based on the knowledge graph, under the condition that the number of the keywords in the resume text and the position text is larger than a preset threshold value, a vector of the resume text and a vector of the position text are generated based on a trained first neural network model, the similarity of the vector of the resume text and the vector of the position text is determined, and then whether the resume text and the position text are matched or not is judged according to the similarity, so that the matching accuracy of the resume text and the position text can be improved, and the accuracy of position or resume recommendation service is further improved.
Fig. 2 is a partial flow diagram of a method for text matching according to a second embodiment of the present invention. As shown in fig. 2, the method for text matching according to the embodiment of the present invention includes:
step S201, extracting keywords in the resume text and the position text based on the knowledge graph.
The knowledge graph is a data structure based on a graph and consists of nodes (points) and edges (edges). In the knowledge-graph, each node represents an "entity" existing in the real world, and each edge is a "relationship" between entities. Generally, a knowledge graph is a relational network obtained by connecting all kinds of Information (Heterogeneous Information).
Exemplarily, step S201 may specifically include: 1. identifying keywords in the resume text or the position text based on the knowledge graph, wherein the keywords are mainly skill vocabularies, for example, the word "vue" in the resume text appears in the knowledge graph and is a skill vocabulary, and the word can be extracted; 2. synonym combinations are performed based on the knowledge graph, for example, "vue", "vue. js", "vuejs", "vue framework" and "vue front end framework" are all the same meaning, and these synonyms can be combined based on the knowledge graph, for example, they can be collectively represented as "vue".
In specific implementation, the knowledge graph can be stored in a graph database (such as Neo4j), and then keywords in the resume text and the position text can be extracted by accessing the knowledge graph in the graph database. In addition, in order to facilitate online quick access, the knowledge graph can be stored in the memory in a document form, and keywords in the resume text and the position text can be extracted by accessing the document corresponding to the knowledge graph. The document corresponding to the knowledge graph can comprise: vocabulary index table, relation index table and relation table. The vocabulary index table may include indices of individual words in the knowledge graph, such as Vue with an index of 1, vue.js with an index of 2; the relationship index may include indices of various relationships in the knowledge graph, such as an index of "synonyms" for a relationship of 1, an index of "same range" for a relationship of 2, an index of "child skills" for a relationship of 3, and an index of "parent skills" for a relationship of 4; the relationship table may include relationships between various words, which may take the form of a "subject-relationship-object" representation, such as "112" representing "Vue and vue.
In the embodiment of the invention, in consideration of the fact that the resume relates to various industry fields, if the word segmentation is carried out based on a common dictionary, certain accuracy rate is difficult to maintain in each industry field. Since the knowledge graph has clear industry field information, the problem can be solved by introducing the knowledge graph to extract keywords. Moreover, the knowledge graph contains relations among the vocabularies, such as synonym relations, and the vocabularies with the same meanings can be merged by introducing the knowledge graph, so that the follow-up matching accuracy is improved, and more keywords can be extracted conveniently through the synonym relations.
Step S202, judging whether the number of the keywords in the resume text and the position text is larger than a preset threshold value. If the determination result is yes, step S203 may be performed; in the case where the determination result is no, step S204 may be performed.
In consideration of the fact that the knowledge graph is usually difficult to perfect in actual situations, and further, when keywords are extracted based on the knowledge graph, the situation that the number of the extracted keywords is insufficient is likely to occur, and the first neural network model cannot be supported for processing. In view of this, in the embodiment of the present invention, it can be determined whether the number of extracted keywords is sufficient through step S202, and step S203 or step S204 is performed subsequently according to the determination result.
For example, in a specific application scenario, the preset threshold may be set to 5. When the number of the keywords extracted from the resume text is more than 5 and the number of the keywords extracted from the position text is more than 5, the number of the keywords extracted through the knowledge graph is considered to be enough; in the case where the number of keywords extracted from the resume text is less than or equal to 5, or the number of keywords extracted from the position text is less than or equal to 5, it is considered that the number of keywords extracted through the knowledge graph is insufficient. In specific implementation, the preset threshold can be flexibly set according to requirements.
And S203, generating a resume text vector and a position text vector based on the trained first neural network model.
Wherein, the first neural network model can adopt a TextCNN model. The TextCNN model is a convolutional neural network model.
In the embodiment of the present invention, generating the vector of the resume text and the vector of the position text based on the trained TextCNN model may specifically include: step a1 to step a 4.
Step a1, performing vectorization processing on each keyword in the resume text to obtain a word vector corresponding to each keyword. For example, keywords extracted from the summary text may be input into the word2vec model to obtain a word vector corresponding to each keyword.
Step a2, inputting the word vector corresponding to each keyword in the resume text into the trained TextCNN model to obtain the vector of the resume text. In this step, the word vectors corresponding to the keywords in the resume text form a matrix, and the matrix can be input into the trained TextCNN model to output a vector of the resume text with a specified length (for example, 32 dimensions).
Step a3, performing vectorization processing on the keywords in the position text to obtain word vectors corresponding to the keywords. For example, keywords extracted from the job text may be input into the word2vec model to obtain a word vector corresponding to each keyword.
Step a4, inputting the word vector corresponding to each keyword in the position text into the trained TextCNN model to obtain the vector of the position text. In this step, the word vectors corresponding to the keywords in the position text form a matrix, and the matrix can be input into the trained TextCNN model to output a vector of the position text with a specified length (for example, 32 dimensions).
And S204, generating a vector of the resume text and a vector of the position text based on the trained second neural network model.
Wherein the second neural network model is a pre-training model. In specific implementation, the trained second neural network model can be obtained by fine tuning the pre-training model. Illustratively, the second neural network model may include: BERT model, and fully connected layer. In this example, generating the vector of resume text and the vector of position text based on the trained second neural network model includes: step b1 to step b 3. The BERT model is a pre-training model developed by Google and is called Bidirective Encoder responses from Transformer. In specific implementation, different BERT models can be selected according to different NLP (natural language processing) tasks.
Step b1, dividing the resume text into a plurality of paragraph texts with equal length, and inputting the paragraph texts into the trained (or fine-tuned) BERT model to obtain the vector corresponding to each paragraph text.
Considering that the number of words in the resume text is usually much larger than that in the position text, the word number difference is likely to affect the subsequent matching accuracy. In view of this, in the embodiment of the present invention, the resume text is divided into a plurality of paragraph texts of equal length, and the position text is converted into a text of equal length to the paragraph texts. For example, the resume text may be divided into 4 paragraph texts, each of 150 characters. The job, professional and self description parts in the resume text can be divided into a paragraph text, the work description part can be divided into a paragraph text, the work duty part can be divided into a paragraph text, and the project description part can be divided into a paragraph text. Then, the divided four paragraph texts may be input into the trained BERT model, so as to obtain vectors (for example, 4 vectors with 768 dimensions) corresponding to the four paragraph texts.
And b2, inputting the vectors corresponding to the texts of the paragraphs into the trained full-connection layer to obtain the vectors of the resume texts.
In this step, the vectors corresponding to the paragraph texts can be merged into a vector of the resume text by inputting the vectors corresponding to the paragraph texts into the trained fully-connected layer. For example, a 32-dimensional resume text vector can be obtained by inputting vectors corresponding to 4 768-dimensional paragraph texts into a4 × 768 × 32 full-link layer.
Step b3, converting the position text into a text with the same length as the paragraph text, and then inputting the text into the trained BERT model and the trained full-link layer in sequence to obtain the position text vector.
For example, the job text may be converted to a paragraph text of 150 characters in length, which is sequentially input into the trained BERT model and a trained 768 × 32 fully-connected layer, resulting in a 32-dimensional job text vector.
And S205, determining the similarity between the vector of the resume text and the vector of the position text.
In this step, the cosine similarity (or "cosine distance") of the vector of the resume text and the vector of the position text may be calculated, and the cosine similarity may be taken as a similarity calculation result. In addition, when the method is implemented, the similarity between the vector of the resume text and the vector of the position text can be calculated in other ways. For example, the euclidean distance between the vector of the resume text and the vector of the position text may be calculated, and the euclidean distance may be used as the similarity calculation result.
And step S206, judging whether the similarity is greater than a preset threshold value. If the similarity is greater than the preset threshold, step S207 may be performed; in the case where the similarity is less than or equal to the preset threshold, step S208 may be performed.
And the preset threshold value can be flexibly set. For example, the preset threshold may be set to 0.9, 0.8, or other values.
And step S207, determining that the resume text is matched with the position text.
And S208, determining that the resume text is not matched with the position text.
In the embodiment of the invention, keywords in the resume text and the role text are extracted based on the knowledge graph, vectors of the resume text and the role text are selectively generated based on the trained first neural network model or the trained second neural network model according to different comparison results of the number of the keywords in the resume text and the role text and the preset threshold value, then, the similarity of the vectors of the resume text and the vectors of the role text is determined, and whether the resume text and the role text are matched or not is judged according to the similarity.
Fig. 3 is a partial flow diagram of a method for text matching according to a second embodiment of the present invention. Fig. 3 mainly illustrates a training flow of the first neural network model and the second neural network model. As shown in fig. 3, the method of the embodiment of the present invention further includes:
step S301, obtaining a sample pair of the resume text and the position text with the category labels, and performing data augmentation on the sample pair.
Wherein the category label is used for indicating whether the resume text and the position text in the sample pair are matched. In this step, the data augmentation may take various forms, such as randomly transposing the order of sentences, randomly transposing the order of words in sentences, randomly deleting words, randomly deleting sentences, and so on. For example, after dividing the resume text into a plurality of paragraph sections, the order of sentences may be randomly transposed within the respective sections, the order of words in the sentences may be randomly transposed, and so forth.
In the embodiment of the invention, the number of the labeled sample pairs can be increased by performing data augmentation on the labeled sample pairs, the problem that a large number of labeled samples are needed for supervised learning is solved, and meanwhile, the subsequent model training effect is favorably improved.
Step S302, a first training sample set and a second training sample set are constructed according to the sample pairs after data augmentation.
Step S303, performing supervised training on the first twin network according to a first training sample set to obtain a trained first neural network model.
The first twin Network (siernese Network) is mainly constructed based on a first neural Network model (such as a TextCNN model), and two sub-networks with the same structure and sharing weight are provided. The two subnetworks each receive one of the sample pairs (i.e., one receives the resume text in the sample pair and the other receives the position sample in the sample pair) and convert it to a vector. In the training process, a loss function can be calculated based on the two output vectors to continuously perform back propagation optimization on the first twin network, so that the matched position text and the resume text are close to each other in the vector space, and the unmatched position text and the resume text are far from each other in the vector space.
And S304, performing supervised training on the second twin network according to a second training sample set to obtain a trained second neural network model.
The second twin Network (Simese Network) is mainly constructed based on the second neural Network model, and comprises two sub-networks which have the same structure and share the weight. The two subnetworks each receive one of the sample pairs (i.e., one receives the resume text in the sample pair and the other receives the position sample in the sample pair) and convert it to a vector. In the training process, a loss function can be calculated based on the two output vectors to perform continuous back propagation optimization on the second twin network, so that the matched position text and the resume text are close to each other in the vector space, and the unmatched position text and the resume text are far from each other in the vector space.
Wherein the second neural network model is a pre-training model. When the second neural network model is a pre-training model, a process of performing supervised training on the second twin network can be understood as a process of performing fine tuning on the second twin network. Illustratively, the second neural network model may be composed of a BERT model and a fully connected layer.
In the embodiment of the invention, the training of the first neural network model and the second neural network model can be realized through the steps, so that the distance between the matched position text and the resume text in the vector space is short, and the distance between the unmatched position text and the resume text in the vector space is long, thereby optimizing the quality of the text vectors generated by the first neural network model and the second neural network model and being beneficial to improving the matching accuracy of the subsequent resume text and the position text.
Fig. 4 is a main flowchart of a resume recommendation method according to a third embodiment of the present invention. As shown in fig. 4, the resume recommendation method according to the embodiment of the present invention includes;
and S401, extracting key words in the resume text and the position text based on the knowledge graph.
Exemplarily, step S401 may specifically include: 1. identifying keywords in the resume text or the position text based on the knowledge graph, wherein the keywords are mainly skill vocabularies, for example, the word "vue" in the resume text appears in the knowledge graph and is a skill vocabulary, and the word can be extracted; 2. synonym combinations are performed based on the knowledge graph, such as "vue", "vue. js", "vuejs", "vue framework" and "vue front end framework" all having the same meaning, and these synonyms can be combined based on the knowledge graph, such as collectively denoted as "vue".
In the embodiment of the invention, in consideration of the fact that the resume relates to various industry fields, if the word segmentation is carried out based on a common dictionary, certain accuracy rate is difficult to maintain in each industry field. Since the knowledge graph has clear industry field information, the problem can be solved by introducing the knowledge graph to extract keywords. Moreover, the knowledge graph contains relations among the vocabularies, such as synonym relations, and the vocabularies with the same meanings can be merged by introducing the knowledge graph, so that the follow-up matching accuracy is improved, and more keywords can be extracted conveniently through the synonym relations.
And S402, under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on the trained first neural network model.
In consideration of the fact that the knowledge graph is usually difficult to perfect in actual situations, and further, when keywords are extracted based on the knowledge graph, the situation that the number of the extracted keywords is insufficient is likely to occur, and the first neural network model cannot be supported for processing. In view of this, in the embodiment of the present invention, when it is determined that the number of extracted keywords is sufficient, a vector of the resume text and a vector of the position text may be generated based on the trained first neural network model.
In an alternative example, the first neural network model is a TextCNN model. The TextCNN model is a convolutional neural network model. In this alternative example, a vector of resume text, and a vector of position text, may be generated based on the trained TextCNN model. In addition, when implemented, the first neural network model may also employ an LSTM (long short term memory network) model, or other models that may be used to generate a vector representation of text.
Further, the method of the embodiment of the present invention may further include the steps of: and under the condition that the number of the keywords in the resume text or the number of the keywords in the position text is less than or equal to a preset threshold value, generating a vector of the resume text and a vector of the position text based on the trained second neural network model.
In one optional example, the second neural network model may include: BERT model, and fully connected layer. In this example, generating the vector of resume text and the vector of position text based on the trained second neural network model includes: dividing the resume text into a plurality of paragraph texts with equal lengths, and inputting the paragraph texts into a trained BERT model to obtain a vector corresponding to each paragraph text; inputting the vectors corresponding to the paragraph texts into the trained full-connection layer to obtain the vectors of the resume texts; and converting the job text into a text with the same length as the paragraph text, and then sequentially inputting the text into the trained BERT model and the trained full-link layer to obtain a vector of the job text.
Step S403, determining the similarity of the vectors of the resume text and the vectors of the position text, and judging whether the resume text is matched with the position text according to the similarity.
Comparing the similarity of the quantities with a preset threshold; determining that the resume text is matched with the position text under the condition that the similarity is greater than a preset threshold value; otherwise, determining that the resume text does not match the job text. And the preset threshold value can be flexibly set. For example, the preset threshold may be set to 0.9, 0.8, or other values.
And S404, generating a resume recommendation list according to the resume text matched with the position text, and sending the resume recommendation list to a recruitment user terminal corresponding to the position text.
For example, assuming that there are 100 resume texts in total, if it is determined through steps S401 to S403 that there are 20 resume texts matching the job text 1, a resume recommendation list may be generated from the 20 matching resume texts and sent to the recruitment user terminal corresponding to the job text 1.
According to the embodiment of the invention, the matching accuracy of the resume text and the job text can be improved through the steps, and the accuracy of the resume recommendation service is further improved.
Fig. 5 is a main flowchart of a job recommendation method according to a fourth embodiment of the present invention. As shown in fig. 5, the job recommendation method according to the embodiment of the present invention includes:
and S501, extracting key words in the resume text and the position text based on the knowledge graph.
In step S501, with respect to how to extract keywords in the resume text and the position text based on the knowledge graph, reference may be made to related contents in the embodiment shown in fig. 4.
And step S502, under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on the trained first neural network model.
In consideration of the fact that the knowledge graph is usually difficult to perfect in actual situations, and further, when keywords are extracted based on the knowledge graph, the situation that the number of the extracted keywords is insufficient is likely to occur, and the first neural network model cannot be supported for processing. In view of this, in the embodiment of the present invention, when it is determined that the number of extracted keywords is sufficient, a vector of the resume text and a vector of the position text may be generated based on the trained first neural network model.
In an alternative example, the first neural network model is a TextCNN model. The TextCNN model is a convolutional neural network model. In this alternative example, a vector of resume text, and a vector of position text, may be generated based on the trained TextCNN model. In addition, when implemented, the first neural network model may also employ an LSTM (long short term memory network) model, or other models that may be used to generate a vector representation of text.
Further, the method of the embodiment of the present invention may further include the steps of: and under the condition that the number of the keywords in the resume text or the number of the keywords in the position text is less than or equal to a preset threshold value, generating a vector of the resume text and a vector of the position text based on the trained second neural network model.
In one optional example, the second neural network model may include: BERT model, and fully connected layer. In this example, generating the vector of resume text and the vector of position text based on the trained second neural network model includes: dividing the resume text into a plurality of paragraph texts with equal lengths, and inputting the paragraph texts into a trained BERT model to obtain a vector corresponding to each paragraph text; inputting the vectors corresponding to the paragraph texts into the trained full-connection layer to obtain the vectors of the resume texts; and converting the job text into a text with the same length as the paragraph text, and then sequentially inputting the text into the trained BERT model and the trained full-link layer to obtain a vector of the job text.
Step S503, determining the similarity of the vectors of the resume text and the vectors of the position text, and judging whether the resume text is matched with the position text according to the similarity.
In this step, the cosine similarity (or "cosine distance") of the vector of the resume text and the vector of the position text may be calculated, and the cosine similarity may be taken as a similarity calculation result. In addition, when the method is implemented, the similarity between the vector of the resume text and the vector of the position text can be calculated in other ways. For example, the euclidean distance between the vector of the resume text and the vector of the position text may be calculated, and the euclidean distance may be used as the similarity calculation result.
For example, in step S503, the determining whether the resume text and the position text match according to the similarity includes: comparing the similarity of the vectors of the resume text and the job text with a preset threshold; determining that the resume text is matched with the position text under the condition that the similarity is greater than a preset threshold value; otherwise, determining that the resume text does not match the job text. And the preset threshold value can be flexibly set. For example, the preset threshold may be set to 0.9, 0.8, or other values.
And step S504, generating a job recommendation list according to the job text matched with the resume text, and sending the job recommendation list to the job hunting user terminal corresponding to the resume text.
For example, assuming that 200 job texts are shared, if it is determined through steps S501 to S503 that 10 job texts matching the resume text 1 are shared, a job recommendation list may be generated according to the 10 matched job texts, and the job recommendation list may be sent to the job hunting user terminal corresponding to the resume text 1.
According to the embodiment of the invention, the matching accuracy of the resume text and the job text can be improved through the steps, and the accuracy of job recommendation service is further improved.
Fig. 6 is a schematic diagram of main blocks of an apparatus for text matching according to a fifth embodiment of the present invention. As shown in fig. 6, an apparatus 600 for text matching according to an embodiment of the present invention includes: an extraction module 601, a generation module 602, and a judgment module 603.
And the extraction module 601 is used for extracting keywords in the resume text and the position text based on the knowledge graph.
The knowledge graph is a data structure based on a graph and consists of nodes (points) and edges (edges). In the knowledge-graph, each node represents an "entity" existing in the real world, and each edge is a "relationship" between entities. Generally, a knowledge graph is a relational network obtained by connecting all kinds of Information (Heterogeneous Information).
For example, the extracting module 601 may specifically include, based on the knowledge graph, extracting the keywords in the resume text and the position text: 1. the extraction module 601 identifies keywords in the resume text or the position text based on the knowledge graph, wherein the keywords are mainly skill vocabularies, for example, if the word "vue" in the resume text appears in the knowledge graph and is a skill vocabulary, the word can be extracted; 2. extraction module 601 performs synonym combination based on the knowledge graph, such as "vue", "vue. js", "vuejs", "vue framework" and "vue front end framework" all having the same meaning, and may combine these synonyms based on the knowledge graph, such as collectively denoted as "vue".
In specific implementation, the knowledge graph can be stored in a graph database (such as Neo4j), and then keywords in the resume text and the position text can be extracted by accessing the knowledge graph in the graph database. In addition, in order to facilitate online quick access, the knowledge graph can be stored in the memory in a document form, and keywords in the resume text and the position text can be extracted by accessing the document corresponding to the knowledge graph. The document corresponding to the knowledge graph can comprise: vocabulary index table, relation index table and relation table. The vocabulary index table may include indices of individual words in the knowledge graph, such as Vue with an index of 1, vue.js with an index of 2; the relationship index may include indices of various relationships in the knowledge graph, such as an index of "synonyms" for a relationship of 1, an index of "same range" for a relationship of 2, an index of "child skills" for a relationship of 3, and an index of "parent skills" for a relationship of 4; the relationship table may include relationships between various words, which may take the form of a "subject-relationship-object" representation, such as "112" representing "Vue and vue.
In the embodiment of the invention, in consideration of the fact that the resume relates to various industry fields, if the word segmentation is carried out based on a common dictionary, certain accuracy rate is difficult to maintain in each industry field. Since the knowledge graph has clear industry field information, the problem can be solved by introducing the knowledge graph to extract keywords. Moreover, the knowledge graph contains relations among the vocabularies, such as synonym relations, and the vocabularies with the same meanings can be merged by introducing the knowledge graph, so that the follow-up matching accuracy is improved, and more keywords can be extracted conveniently through the synonym relations.
A generating module 602, configured to generate a vector of the resume text and a vector of the position text based on the trained first neural network model under the condition that both the number of the keywords in the resume text and the number of the keywords in the position text are greater than a preset threshold.
For example, in a specific application scenario, the preset threshold may be set to 5. When the number of the keywords extracted from the resume text is more than 5 and the number of the keywords extracted from the position text is more than 5, the number of the keywords extracted through the knowledge graph is considered to be enough, and then the vectors of the resume text and the position text can be generated based on the trained first neural network model. The preset threshold value related to the number of keywords can be flexibly set without affecting the implementation of the present invention.
Illustratively, the trained first neural network model may be obtained by: constructing a first twin network based on a first neural network model, and carrying out supervised training on the first twin network according to a first training sample set to obtain the trained first neural network model; the first set of training samples comprises: sample pairs of resume text and job text with category labels; the class label is used to indicate whether the sample pair matches.
The first twin Network (siernese Network) has two sub-networks with the same structure and sharing weight. The two subnetworks each receive one of the sample pairs (i.e., one receives the resume text in the sample pair and the other receives the position sample in the sample pair) and convert it to a vector. In the training process, a loss function can be calculated based on the two output vectors to continuously perform back propagation optimization on the twin network, so that the distance between the matched position text and the resume text in the vector space is short, and the distance between the unmatched position text and the resume text in the vector space is long.
In an alternative example, the first neural network model is a TextCNN model. The TextCNN model is a convolutional neural network model. In this alternative example, a vector of resume text, and a vector of position text, may be generated based on the trained TextCNN model. In addition, when implemented, the first neural network model may also employ an LSTM (long short term memory network) model, or other models that may be used to generate a vector representation of text.
Further, the generating module 602 may be further configured to generate a vector of the resume text and a vector of the position text based on the trained second neural network model under the condition that the number of the keywords in the resume text or the number of the keywords in the position text is smaller than a preset threshold.
In one optional example, the second neural network model may include: BERT model, and fully connected layer. In this example, the generating module 602 generates the vector of resume text and the vector of position text based on the trained second neural network model includes: dividing the resume text into a plurality of paragraph texts with equal lengths, and inputting the paragraph texts into a trained BERT model to obtain a vector corresponding to each paragraph text; inputting the vectors corresponding to the paragraph texts into the trained full-connection layer to obtain the vectors of the resume texts; and converting the job text into a text with the same length as the paragraph text, and then sequentially inputting the text into the trained BERT model and the trained full-link layer to obtain a vector of the job text.
The determining module 603 is configured to determine similarity between the vector of the resume text and the vector of the position text, and then determine whether the resume text and the position text are matched according to the similarity.
For example, the determining module 603 may calculate cosine similarity (or "cosine distance") between the vector of the resume text and the vector of the position text, and use the cosine similarity as a similarity calculation result. In addition, in a specific implementation, the determining module 603 may also calculate the similarity between the vector of the resume text and the vector of the position text in other manners. For example, the determining module 603 may calculate a euclidean distance between the vector of the resume text and the vector of the position text, and use the euclidean distance as a similarity calculation result.
For example, the determining module 603 determines whether the resume text and the position text match according to the similarity includes: the judging module 603 compares the similarity between the vector of the resume text and the vector of the position text with a preset threshold; when the similarity is greater than a preset threshold, the judging module 603 determines that the resume text is matched with the job text; otherwise, the determining module 603 determines that the resume text does not match the job text. And the preset threshold value can be flexibly set. For example, the preset threshold may be set to 0.9, 0.8, or other values.
In the device provided by the embodiment of the invention, keywords in the resume text and the position text are extracted based on the knowledge map, under the condition that the number of the keywords in the resume text and the position text is larger than a preset threshold value, a vector of the resume text and a vector of the position text are generated based on a trained first neural network model, the similarity of the vector of the resume text and the vector of the position text is determined, and then whether the resume text and the position text are matched or not is judged according to the similarity, so that the matching accuracy of the resume text and the position text can be improved, and the accuracy of position or resume recommendation service is further improved.
Fig. 7 is a schematic diagram of main blocks of a resume recommendation apparatus according to a sixth embodiment of the present invention. As shown in fig. 7, the resume recommendation apparatus 700 according to the embodiment of the present invention includes: an extraction module 701, a generation module 702, a judgment module 703 and a resume recommendation module 704.
And the extraction module 701 is used for extracting key words in the resume text and the position text based on the knowledge graph. Regarding how the extraction module 701 extracts keywords in the resume text and the position text based on the knowledge graph, reference may be made to the related contents of the embodiment shown in fig. 6.
A generating module 702, configured to generate a vector of the resume text and a vector of the position text based on the trained first neural network model under the condition that both the number of the keywords in the resume text and the number of the keywords in the position text are greater than a preset threshold. In an alternative example, the first neural network model is a TextCNN model. The TextCNN model is a convolutional neural network model. In this alternative example, a vector of resume text, and a vector of position text, may be generated based on the trained TextCNN model. In addition, when implemented, the first neural network model may also employ an LSTM (long short term memory network) model, or other models that may be used to generate a vector representation of text.
The generating module 702 is further configured to generate a vector of the resume text and a vector of the position text based on the trained second neural network model under the condition that the number of the keywords in the resume text or the number of the keywords in the position text is smaller than a preset threshold. In one optional example, the second neural network model may include: BERT model, and fully connected layer. In this example, the generating module 702 generating the vector of resume text and the vector of position text based on the trained second neural network model includes: dividing the resume text into a plurality of paragraph texts with equal lengths, and inputting the paragraph texts into a trained BERT model to obtain a vector corresponding to each paragraph text; inputting the vectors corresponding to the paragraph texts into the trained full-connection layer to obtain the vectors of the resume texts; and converting the job text into a text with the same length as the paragraph text, and then sequentially inputting the text into the trained BERT model and the trained full-link layer to obtain a vector of the job text.
The determining module 703 is configured to determine similarity between the vector of the resume text and the vector of the position text, and then determine whether the resume text and the position text are matched according to the similarity. As to how the determining module 703 determines the similarity between the vector of the resume text and the vector of the position text, and how the determining module 703 determines whether the resume text and the position text match according to the similarity, reference may be made to the related contents of the embodiment shown in fig. 6.
And the resume recommendation module 704 is configured to generate a resume recommendation list according to the resume text matched with the position text, and send the resume recommendation list to the recruitment user terminal corresponding to the position text. For example, assuming that there are 100 resume texts in total, if it is determined through steps S401 to S403 that there are 20 resume texts matching the job text 1, a resume recommendation list may be generated from the 20 matching resume texts and sent to the recruitment user terminal corresponding to the job text 1.
The device provided by the embodiment of the invention can improve the matching accuracy of the resume text and the job text, and further improve the accuracy of the resume recommendation service.
Fig. 8 is a schematic diagram of main blocks of a position recommending apparatus according to a seventh embodiment of the present invention. As shown in fig. 8, a job recommendation apparatus 800 according to an embodiment of the present invention includes: an extraction module 801, a generation module 802, a judgment module 803 and a position recommendation module 804.
And the extraction module 801 is used for extracting keywords in the resume text and the position text based on the knowledge graph. Regarding how the extraction module 801 extracts keywords in the resume text and the position text based on the knowledge graph, reference may be made to the related contents of the embodiment shown in fig. 6.
A generating module 802, configured to generate a vector of the resume text and a vector of the position text based on the trained first neural network model when the number of the keywords in the resume text and the position text is greater than a preset threshold.
In an alternative example, the first neural network model is a TextCNN model. The TextCNN model is a convolutional neural network model. In this alternative example, a vector of resume text, and a vector of position text, may be generated based on the trained TextCNN model. In addition, when implemented, the first neural network model may also employ an LSTM (long short term memory network) model, or other models that may be used to generate a vector representation of text.
The generating module 802 is further configured to generate a vector of the resume text and a vector of the position text based on the trained second neural network model when the number of the keywords in the resume text or the number of the keywords in the position text is smaller than a preset threshold.
In one optional example, the second neural network model may include: BERT model, and fully connected layer. In this example, the generation module 802 generating the vector of resume text and the vector of position text based on the trained second neural network model includes: dividing the resume text into a plurality of paragraph texts with equal lengths, and inputting the paragraph texts into a trained BERT model to obtain a vector corresponding to each paragraph text; inputting the vectors corresponding to the paragraph texts into the trained full-connection layer to obtain the vectors of the resume texts; and converting the job text into a text with the same length as the paragraph text, and then sequentially inputting the text into the trained BERT model and the trained full-link layer to obtain a vector of the job text.
The determining module 803 is configured to determine similarity between the vector of the resume text and the vector of the position text, and then determine whether the resume text and the position text are matched according to the similarity. As to how the determining module 803 determines the similarity between the vector of the resume text and the vector of the position text, and how the determining module 803 determines whether the resume text and the position text match according to the similarity, reference may be made to the related contents of the embodiment shown in fig. 6.
And the job recommendation module 804 is used for generating a job recommendation list according to the job text matched with the resume text and sending the job recommendation list to the job hunting user terminal corresponding to the resume text. For example, assuming that 200 job texts are shared, if it is determined through steps S501 to S503 that 10 job texts matching the resume text 1 are shared, a job recommendation list may be generated according to the 10 matched job texts, and the job recommendation list may be sent to the job hunting user terminal corresponding to the resume text 1.
The device provided by the embodiment of the invention can improve the matching accuracy of the resume text and the job text, and further improve the accuracy of job recommendation service.
Fig. 9 shows an exemplary system architecture 900 of a method for text matching or a resume recommendation method or a job recommendation method or an apparatus for text matching or a resume recommendation apparatus or a job recommendation apparatus to which embodiments of the present invention may be applied.
As shown in fig. 9, the system architecture 900 may include end devices 901, 902, 903, a network 904, and a server 905. Network 904 is the medium used to provide communication links between terminal devices 901, 902, 903 and server 905. Network 904 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 901, 902, 903 to interact with a server 905 over a network 904 to receive or send messages and the like. The terminal devices 901, 902, 903 may have various communication client applications installed thereon, such as a recruitment website, a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 901, 902, 903 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 905 may be a server providing various services, for example, a background management server providing support for a recruitment website browsed by a user using the terminal devices 901, 902, 903. The background management server can analyze and process data such as resume text and position text, and feed back a processing result (such as a position recommendation list or resume recommendation list) to the terminal device.
It should be noted that the method for text matching or the resume recommendation method or the job recommendation method provided by the embodiment of the present invention is generally executed by the server 905, and accordingly, the device for text matching or the resume recommendation device or the job recommendation device is generally disposed in the server 905.
It should be understood that the number of terminal devices, networks, and servers in fig. 9 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 10, shown is a block diagram of a computer system 1000 suitable for use with the electronic device implementing an embodiment of the present invention. The computer system illustrated in FIG. 10 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the invention.
As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the system 1000 are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 1001.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises an extraction module, a generation module and a judgment module. The names of these modules do not in some cases constitute a limitation on the module itself, and for example, the extraction module may also be described as a "module that extracts keywords".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to perform the following: extracting key words in the resume text and the position text based on the knowledge graph; under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on a trained first neural network model; determining the similarity of the vector of the resume text and the vector of the position text, and then judging whether the resume text and the position text are matched according to the similarity
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (14)
1. A method for text matching, the method comprising:
extracting key words in the resume text and the position text based on the knowledge graph;
under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on a trained first neural network model;
and determining the similarity of the vector of the resume text and the vector of the position text, and then judging whether the resume text is matched with the position text according to the similarity.
2. The method of claim 1, further comprising:
under the condition that the number of the keywords in the resume text or the number of the keywords in the position text is smaller than a preset threshold value, generating a vector of the resume text and a vector of the position text based on a trained second neural network model; wherein the second neural network model is a pre-training model.
3. The method of claim 2, further comprising:
constructing a first twin network based on the first neural network model, and constructing a second twin network based on the second neural network model; carrying out supervised training on the first twin network according to a first training sample set to obtain the trained first neural network model; carrying out supervised training on the second twin network according to a second training sample set to obtain the trained second neural network model; wherein the first training sample set and the second training sample set comprise: sample pairs of resume text and job text with category labels; the class label is used to indicate whether the sample pair matches.
4. The method of claim 3, further comprising:
acquiring a sample pair of a resume text and a position text with category labels, performing data augmentation on the sample pair, and constructing a first training sample set and a second training sample set according to the sample pair subjected to data augmentation.
5. The method of claim 2, wherein the first neural network model is a TextCNN model; the second neural network model includes: BERT model, and fully connected layer.
6. The method of claim 5, wherein generating the vector of resume text and the vector of position text based on the trained first neural network model comprises:
vectorizing each keyword in the resume text to obtain a word vector corresponding to each keyword; inputting the word vectors corresponding to the keywords in the resume text into the trained textCNN model to obtain the vectors of the resume text; vectorizing the keywords in the position text to obtain word vectors corresponding to the keywords; and inputting the word vector corresponding to each keyword in the position text into the trained textCNN model to obtain the vector of the position text.
7. The method of claim 5, wherein generating the vector of resume text and the vector of position text based on the trained second neural network model comprises:
dividing the resume text into a plurality of paragraph texts with equal lengths, and inputting the paragraph texts into a trained BERT model to obtain a vector corresponding to each paragraph text; inputting the vectors corresponding to the paragraph texts into the trained full-connection layer to obtain the vectors of the resume texts; and converting the job text into a text with the same length as the paragraph text, and then sequentially inputting the text into the trained BERT model and the trained full-link layer to obtain a vector of the job text.
8. A resume recommendation method, the method comprising:
extracting key words in the resume text and the position text based on the knowledge graph;
under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on a trained first neural network model;
determining the similarity of the vector of the resume text and the vector of the position text, and judging whether the resume text is matched with the position text according to the similarity;
and generating a resume recommendation list according to the resume text matched with the position text, and sending the resume recommendation list to a recruitment user terminal corresponding to the position text.
9. A method for job recommendation, the method comprising:
extracting key words in the resume text and the position text based on the knowledge graph;
under the condition that the number of the keywords in the resume text and the number of the keywords in the position text are both larger than a preset threshold value, generating a vector of the resume text and a vector of the position text based on a trained first neural network model;
determining the similarity of the vector of the resume text and the vector of the position text, and judging whether the resume text is matched with the position text according to the similarity;
and generating a job recommendation list according to the job text matched with the resume text, and sending the job recommendation list to the job hunting user terminal corresponding to the resume text.
10. An apparatus for text matching, the apparatus comprising:
the extraction module is used for extracting key words in the resume text and the position text based on the knowledge graph;
the generation module is used for generating a vector of the resume text and a vector of the position text based on the trained first neural network model under the condition that the number of the keywords in the resume text and the position text is larger than a preset threshold;
and the judging module is used for determining the similarity of the vector of the resume text and the vector of the position text and then judging whether the resume text is matched with the position text or not according to the similarity.
11. A resume recommendation apparatus, the apparatus comprising:
the extraction module is used for extracting key words in the resume text and the position text based on the knowledge graph;
the generation module is used for generating a vector of the resume text and a vector of the position text based on the trained first neural network model under the condition that the number of the keywords in the resume text and the position text is larger than a preset threshold;
the judging module is used for determining the similarity of the vector of the resume text and the vector of the position text and judging whether the resume text is matched with the position text or not according to the similarity;
and the resume recommendation module is used for generating a resume recommendation list according to the resume text matched with the position text and sending the resume recommendation list to the recruitment user terminal corresponding to the position text.
12. A position recommendation apparatus, the apparatus comprising:
the extraction module is used for extracting key words in the resume text and the position text based on the knowledge graph;
the generation module is used for generating a vector of the resume text and a vector of the position text based on the trained first neural network model under the condition that the number of the keywords in the resume text and the position text is larger than a preset threshold;
the judging module is used for determining the similarity of the vector of the resume text and the vector of the position text and judging whether the resume text is matched with the position text or not according to the similarity;
and the job recommendation module is used for generating a job recommendation list according to the job text matched with the resume text and sending the job recommendation list to the job hunting user terminal corresponding to the resume text.
13. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.
14. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010097271.9A CN113268560A (en) | 2020-02-17 | 2020-02-17 | Method and device for text matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010097271.9A CN113268560A (en) | 2020-02-17 | 2020-02-17 | Method and device for text matching |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113268560A true CN113268560A (en) | 2021-08-17 |
Family
ID=77227530
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010097271.9A Pending CN113268560A (en) | 2020-02-17 | 2020-02-17 | Method and device for text matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113268560A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114942980A (en) * | 2022-07-22 | 2022-08-26 | 北京搜狐新媒体信息技术有限公司 | Method and device for determining text matching |
-
2020
- 2020-02-17 CN CN202010097271.9A patent/CN113268560A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114942980A (en) * | 2022-07-22 | 2022-08-26 | 北京搜狐新媒体信息技术有限公司 | Method and device for determining text matching |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11151177B2 (en) | Search method and apparatus based on artificial intelligence | |
CN107491534B (en) | Information processing method and device | |
CN107066449B (en) | Information pushing method and device | |
CN110737758B (en) | Method and apparatus for generating a model | |
CN108628830B (en) | Semantic recognition method and device | |
CN107798622B (en) | Method and device for identifying user intention | |
CN111753086A (en) | Junk mail identification method and device | |
EP3958145A1 (en) | Method and apparatus for semantic retrieval, device and storage medium | |
CN112528654A (en) | Natural language processing method and device and electronic equipment | |
CN113268560A (en) | Method and device for text matching | |
CN113657100A (en) | Entity identification method and device, electronic equipment and storage medium | |
CN114036921A (en) | Policy information matching method and device | |
CN112711943A (en) | Uygur language identification method, device and storage medium | |
CN107657035B (en) | Method and apparatus for generating directed acyclic graph | |
CN113761183A (en) | Intention recognition method and intention recognition device | |
CN111723188A (en) | Sentence display method and electronic equipment based on artificial intelligence for question-answering system | |
CN110807097A (en) | Method and device for analyzing data | |
CN110852057A (en) | Method and device for calculating text similarity | |
CN114330345B (en) | Named entity recognition method, training method, device, electronic equipment and medium | |
CN114925185B (en) | Interaction method, model training method, device, equipment and medium | |
CN114385781B (en) | Interface file recommendation method, device, equipment and medium based on statement model | |
WO2022174496A1 (en) | Data annotation method and apparatus based on generative model, and device and storage medium | |
Oliveira et al. | Sentiment analysis of stock market behavior from Twitter using the R Tool | |
CN113762301A (en) | Training of information matching model, information matching method and device | |
CN114817476A (en) | Language model training method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |