CN116452169B - Online recruitment generation type recommendation system and method - Google Patents

Online recruitment generation type recommendation system and method Download PDF

Info

Publication number
CN116452169B
CN116452169B CN202310700570.0A CN202310700570A CN116452169B CN 116452169 B CN116452169 B CN 116452169B CN 202310700570 A CN202310700570 A CN 202310700570A CN 116452169 B CN116452169 B CN 116452169B
Authority
CN
China
Prior art keywords
module
user
training
text
recommendation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310700570.0A
Other languages
Chinese (zh)
Other versions
CN116452169A (en
Inventor
邱昭鹏
郑值
宋洋
祝恒书
赵鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hanlan Wolf Technology Co ltd
Original Assignee
Beijing Huapin Borui Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huapin Borui Network Technology Co ltd filed Critical Beijing Huapin Borui Network Technology Co ltd
Priority to CN202310700570.0A priority Critical patent/CN116452169B/en
Publication of CN116452169A publication Critical patent/CN116452169A/en
Application granted granted Critical
Publication of CN116452169B publication Critical patent/CN116452169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention discloses an online recruitment generation type recommendation system, which comprises the following components: the input module is used for respectively converting each input of a user into a text and splicing the text into a group of text features; the generation module is used for generating a recommendation result according to the text characteristics, the recommendation result represents complete description matched with each input of the user, the generation module carries out first-stage training through part of training sets in a plurality of training sets, and carries out third-stage training based on the generation module after the first-stage training and the reward module after the second-stage training; and the rewarding module is used for evaluating the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result, and the rewarding module performs second-stage training through part of training sets in the training sets. The embodiment of the invention also discloses an online recruitment generation type recommendation method. The invention can directly intervene in the recommended result and quickly obtain feedback, is more friendly to users and has good interpretability.

Description

Online recruitment generation type recommendation system and method
Technical Field
The invention relates to the technical field of recommendation systems, in particular to an online recruitment generation type recommendation system and method.
Background
In recent years, an online recruitment platform based on a recommendation system is rapidly developed, and a convenient matching mode is provided for job seekers and recruiters. The traditional recommendation system based on ordering can recommend the positions possibly met according to the user portrait and the historical behaviors to the job seeker, the user passively receives the recommendation result in the whole recommendation system and cannot actively intervene, but when the user has personalized requirements, for example, the job seeker considers the trade change or recruiter has special skill requirements, the recommendation system is difficult to directly intervene in the recommendation process and immediately obtain feedback, the whole recommendation process is black box, and the interpretability is poor.
Disclosure of Invention
In order to solve the problems, the invention aims to provide the online recruitment generation type recommendation system and the online recruitment generation type recommendation method, which can directly intervene in a recommendation result and quickly obtain feedback, are more friendly to users and have good interpretability.
The embodiment of the invention provides an online recruitment generation type recommendation system, which comprises:
the input module is used for respectively converting each input of a user into texts and splicing the texts into a group of text features, wherein each input of the user comprises a user instruction, a user structural feature and a user-defined condition, the text converted by the user instruction is a user instruction text, the text converted by the user structural feature is a user feature text, and the text converted by the user-defined condition is a user-customized constraint condition text;
The generation module is used for generating a recommendation result according to the text characteristics, the recommendation result represents complete description matched with each input of a user, the generation module performs first-stage training through part of training sets in a plurality of training sets, and performs third-stage training based on the generation module after the first-stage training and the reward module after the second-stage training;
and the rewarding module is used for evaluating the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result, and the rewarding module performs second-stage training through part of training sets in the training sets.
As a further improvement of the invention, the generation module uses a multi-layer neural network;
and the generation module adopts a serialization generation mode to aggregate the text characteristics currently output by the input module and the information of the generated text as input.
As a further improvement of the invention, the reward module adopts a plurality of submodules, and the submodules are used for evaluating the recommendation results from a plurality of evaluation dimensions respectively;
and the input of each sub-module is the text characteristic and the recommended result, and the output is a scalar value, and the scalar value represents the conformity degree of the recommended result in one evaluation dimension.
As a further improvement of the invention, each sub-module uses a multi-layer neural network structure, and parameters of the multi-layer neural network structure used by each sub-module are different.
As a further refinement of the present invention, the at least one evaluation dimension includes integrity, conciseness, constraint compliance, and relevance;
the integrity represents the coincidence degree of the recommendation result containing necessary information;
the conciseness characterizes the degree of coincidence that the recommendation result does not contain unnecessary information;
the constraint compliance characterizes the compliance degree of the recommended result to the user-defined condition;
the relevance characterizes how well the recommendation matches the user's structured features.
As a further improvement of the present invention, the partial training set for performing the first stage training on the generating module includes:
and the first training set comprises a plurality of first complete descriptions obtained after randomly sampling and overwriting part of user instructions and/or randomly deleting part of user structural features from a pre-constructed candidate instruction set.
As a further improvement of the present invention, the partial training set for performing the second stage training on the reward module includes:
A second training set for training a sub-module for evaluating the recommendation from the integrity, the second training set comprising a plurality of second complete descriptions obtained by randomly deleting a portion of text in the plurality of first complete descriptions;
a third training set for training a sub-module for evaluating the recommendation from the brevity, the third training set including a plurality of third complete descriptions obtained by adding noise text to a portion of text in the plurality of first complete descriptions;
a fourth training set for training a sub-module for evaluating the recommendation result from the constraint compliance, the fourth training set including a plurality of fourth complete descriptions obtained by adding conditional text to the user-customized constraint condition text corresponding to the plurality of first complete descriptions;
and a fifth training set for training a sub-module for evaluating the recommended results from the correlation, wherein the fifth training set comprises a plurality of comparison sample pairs constructed based on interaction behaviors among users, each comparison sample pair is a text feature of the current user, namely a sample pair of different recommended results, and is used for representing the correlation between the current user and the recommended results generated by different interaction behaviors.
As a further improvement of the invention, the generation module after the first-stage training is used as an Actor, the reward module after the second-stage training is used as Critic, and the generation module after the first-stage training is subjected to the third-stage training through an Actor-Critic algorithm to obtain the generation module after the third-stage training.
The embodiment of the invention also provides an online recruitment generation type recommendation method, which comprises the following steps:
the input module is used for respectively converting each input of a user into a text and splicing the text into a group of text features, wherein each input of the user comprises a user instruction, a user structural feature and a user-defined condition, the text converted by the user instruction is a user instruction text, the text converted by the user structural feature is a user feature text, and the text converted by the user-defined condition is a user-customized constraint condition text;
the input module inputs the text features into the generation module, and the generation module outputs recommendation results according to the text features, wherein the recommendation results represent complete descriptions matched with various inputs of a user;
the rewarding module evaluates the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result;
The method further comprises the steps of: training in three stages to obtain a final recommended model according to training results:
training the generating module in a first stage by utilizing part of training sets in a plurality of training sets;
training the reward module in a second stage by utilizing part of training sets in a plurality of training sets;
and based on the generation module after the first-stage training and the rewarding module after the second-stage training, performing a third-stage training on the generation module, wherein the generation module after the third-stage training is used as a final recommendation model.
As a further improvement of the present invention, the at least one evaluation dimension includes integrity, conciseness, constraint compliance, and relevance, and the evaluating the degree of compliance of the recommendation in the at least one evaluation dimension includes:
the rewarding module evaluates the completeness of the recommended result to determine the coincidence degree of the recommended result containing necessary information;
the rewarding module evaluates the conciseness of the recommended result to determine the coincidence degree that the recommended result does not contain unnecessary information;
the rewarding module evaluates constraint compliance of the recommended result to determine compliance of the recommended result to the user-defined condition;
And the rewarding module evaluates the relatedness of the recommended result to determine the coincidence degree of the recommended result matching the user structural feature.
The beneficial effects of the invention are as follows:
the system can enable a user to input a custom condition through an interactive interface so as to directly intervene in a recommendation result and quickly obtain feedback, and compared with the implicit behavior of a traditional recommendation system, the system can provide richer interactive and changeable user characteristics, is more friendly to the user and has good interpretability. Meanwhile, the result output by the system is a complete description matched with the input of the user, so that the system can be directly applied to a downstream searching and recommending system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is evident that the figures in the following description are only some embodiments of the invention, from which other figures can be obtained without inventive effort for a person skilled in the art.
Fig. 1 is a block diagram illustrating an online recruitment generation recommendation system according to an exemplary embodiment of the present invention;
Fig. 2 is a schematic diagram of an online recruitment generation recommender training process according to an exemplary embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present invention, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.
In addition, in the description of the present invention, the terminology used is for the purpose of illustration only and is not intended to limit the scope of the present invention. The terms "comprises" and/or "comprising" are used to specify the presence of stated elements, steps, operations, and/or components, but do not preclude the presence or addition of one or more other elements, steps, operations, and/or components. The terms "first," "second," and the like may be used for describing various elements, do not represent a sequence, and are not intended to limit the elements. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more. These terms are only used to distinguish one element from another element. These and/or other aspects will become apparent to those skilled in the art from the following description, when taken in conjunction with the accompanying drawings, wherein the present invention is described in connection with embodiments thereof. The drawings are intended to depict embodiments of the invention for purposes of illustration only. Those skilled in the art will readily recognize from the following description that alternative embodiments of the illustrated structures and methods of the present invention may be employed without departing from the principles of the present invention.
In the related technology of recruitment field recommendation systems, a traditional recommendation system based on ordering can recommend positions possibly met according to user figures and historical behaviors to job seekers, and in the whole recommendation system, users passively accept recommendation results and cannot actively intervene, but when the users have personalized requirements, for example, the job seekers consider trade replacement or recruiters have special skill requirements, the recommendation system is difficult to directly intervene in a recommendation process and immediately obtain feedback, and the whole recommendation process is black box, so that the interpretation is poor. In addition, the traditional recommendation has the problems of information overload, too high background knowledge requirement and poor experience. The general large language model lacks the corpus of the relevant recruitment field, lacks knowledge of the field, such as the requirements of different positions on different skills, and cannot be directly applied to the recommendation task.
The embodiment of the invention discloses an online recruitment generation type recommendation system, as shown in fig. 1, comprising:
the input module is used for respectively converting each input of a user into texts and splicing the texts into a group of text features, wherein each input of the user comprises a user instruction, a user structural feature and a user-defined condition, the text converted by the user instruction is a user instruction text, the text converted by the user structural feature is a user feature text, and the text converted by the user-defined condition is a user-customized constraint condition text;
The generation module is used for generating a recommendation result according to the text characteristics, the recommendation result represents complete description matched with each input of a user, the generation module performs first-stage training through part of training sets in a plurality of training sets, and performs third-stage training based on the generation module after the first-stage training and the reward module after the second-stage training;
and the rewarding module is used for evaluating the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result, and the rewarding module performs second-stage training through part of training sets in the training sets.
The system is a recommendation system based on generation, can enable a user to input a user-defined condition through an interactive interface so as to directly intervene in a recommendation result and quickly obtain feedback, and can provide richer user characteristics capable of being interactively changed, is more friendly to the user, has better experience and has better interpretability compared with implicit behaviors (black boxes) of the traditional recommendation system. As shown in FIG. 1, the system of the application supports the user to modify the conditions according to the requirements, namely, the user can customize the conditions according to the requirements, directly intervene on the generated recommendation results to generate more satisfactory recommendation results, and the recommendation results can be directly displayed to the user as interpretable basis, and can also be input into the original traditional recommendation systems in the form of characteristics, so that the recommendation quality of the traditional recommendation systems is improved. Meanwhile, the result output by the system is a complete description matched with the input of the user, such as a complete description of one position of the job seeker, so that the system can be directly applied to a downstream searching and recommending system.
It can be understood that on the aspect of characteristics, the input module adopts the characteristics (namely text characteristics) described by plain text to replace the existing ID characteristics (such as user portraits), so that the whole recommendation system can directly input user-defined conditions to intervene in final recommendation results, the personalized requirements of users are met, and the intervention can be fed back quickly; on the output level, the generation module directly generates a recommendation result according to the characteristics of the plain text description (namely, text characteristics) instead of ordering and outputting a list in a candidate set, for example, a job description (complete description) which is most suitable for the background and the working experience of a job seeker can be generated for the job seeker, the job can be provided for the job seeker to help the job seeker to clearly find the job seeker direction, the intuitiveness is improved, the interpretability is improved, and the job setting is carried out for a specific job seeker, so that the system has good expansibility; in the system, three-stage reinforcement learning is used in the data training level, after three-stage training, the generation module after the third-stage training can be used as a final recommendation model, as shown in fig. 1, the system is interactive, and the generation can be an iterative process, i.e. a user can continuously add new custom conditions to obtain new recommendation results.
The input to the system consists essentially of three parts: user instruction, user structured feature, and user custom condition, so that the user input is < refer to user instruction, user structured feature, user custom condition >, the three partially formatted (i.e., converted) text is stitched together to form a set of text features, i.e., < user instruction text, user feature text, user custom constraint text > (this set of text features is illustrated in fig. 1 as instruction text, user feature text, user custom constraint, respectively).
The system provided by the application is a bidirectional reciprocity recommendation system based on a large language model, supports the reciprocal matching of a job seeker and a recruiter, and can serve the job seeker and the recruiter at the same time. Taking the recommendation of a proper position for a job seeker as an illustration, a user instruction is a text, such as "please recommend a proper position according to the information of the candidate"; the user structural features, such as basic portrait, educational background, working experience and the like, need to be converted into text, and for the structural features of the user's historical behavior sequence, examples of common formats are [ < job ID1, initiate chat application >, < job ID2, deliver resume >, … ], and the like, which are also converted into text, such as "initiate chat application to job ID1 of a company, deliver resume to job ID 2". Custom conditions may be presented through, for example, a visual input interface, such as a button, and may also take the form of, for example, an input box or a drop down menu. Taking a job seeker as an example, the job seeker can input custom conditions such as a work place, a compensation range, a working time length and the like, and after the custom conditions are input, the custom conditions are converted into user custom constraint condition texts, for example, when the user inputs that the work place is Beijing, the user custom constraint condition texts obtained after conversion are "[ work place preference ] Beijing.
In one embodiment, the generation module uses a multi-layer neural network;
and the generation module adopts a serialization generation mode to aggregate the text characteristics currently output by the input module and the information of the generated text as input.
As shown in fig. 1, the generation module is a LLM-based generation module. The generic large language model (Large Language Model, LLM) currently generally refers to a large model of the billion or even billions of parameters based on a transducer architecture pre-trained on a large scale corpus, which exhibits good generalization over common natural language processing problems. The large language model has excellent language modeling and generating capability, and the natural language has better interpretability. The LLM is not particularly limited in the present application, and may be GPT series, chatGLM, LLaMA, etc.
The generating module is a multi-layer neural network, for example, a neural network composed of a plurality of layers Transformer Decoders, the input of the generating module is the text feature constructed in the previous module, and then the complete description matched with the input of the user is generated as a recommendation result, for example, the job description conforming to the feature of the job seeker is generated as the recommendation result.
The above-mentioned serialization indicates that the neural network generation process is a multi-step process, where each step generates only one word, for example, 3 words of [ w0, w1, w2] have been generated, in the next step, the input of the neural network model of the generation module contains both < user instruction text, user feature text, user customization constraint > (i.e., text feature currently output by the input module), and 3 words of < w0, w1, w2> (i.e., information of the generated text) are spliced to generate the next word w3, and a complete text can be generated by repeating the process. A stop condition may be set during the generation, for example, a maximum generation length may be set, or a stop may be set if a certain keyword is generated.
In one embodiment, the reward module employs a plurality of sub-modules for evaluating the recommendation from a plurality of evaluation dimensions, respectively;
and the input of each sub-module is the text characteristic and the recommended result, and the output is a scalar value, and the scalar value represents the conformity degree of the recommended result in one evaluation dimension.
In one embodiment, each sub-module uses a multi-layer neural network structure, and parameters of the multi-layer neural network structure used by each sub-module are different.
The reward module is composed of a plurality of sub-modules, each sub-module is a multi-layer neural network, for example, each sub-module is a multi-layer Transformer structure, but model parameters in the Transformer network structure are different. The input of each sub-module is the input of the whole recommendation system plus the generated recommendation result, and the output is a scalar value which represents the degree of coincidence of the recommendation result in a certain dimension, and the larger the value of the scalar value is, the more coincidence is represented.
It can be understood that the reward module in the system of the present application may be one sub-module, two sub-modules, three sub-modules, four sub-modules, etc., where each sub-module corresponds to one evaluation dimension, so that the system has good expansibility, and supports to add new evaluation dimensions, and only needs to add corresponding sub-modules.
In one embodiment, the at least one evaluation dimension includes integrity, conciseness, constraint compliance, and relevance;
the integrity represents the coincidence degree of the recommendation result containing necessary information;
the conciseness characterizes the degree of coincidence that the recommendation result does not contain unnecessary information;
the constraint compliance characterizes the compliance degree of the recommended result to the user-defined condition;
The relevance characterizes how well the recommendation matches the user's structured features.
For example, in fig. 1, the reward module of the present application designs four sub-modules, which are a completeness reward module, a brevity reward module, a constraint compliance reward module, and a relativity reward module, respectively, and the four sub-modules are used for evaluating the compliance degree of the recommendation result in four evaluation dimensions, namely, completeness, brevity, constraint compliance, and relativity, respectively.
For completeness, whether the recommendation result (such as the generated recommendation position description) contains necessary information such as position names, skill requirements, position responsibilities and the like is evaluated, and the necessary information can be predefined according to different user roles.
For simplicity, whether the recommendation result (such as generated recommendation position description) is repeated and redundant is evaluated, unnecessary information is not required to be contained, and different unnecessary information can be predefined according to different roles of users, such as interview traffic mode, interview place and the like which are irrelevant to work.
For constraint compliance, a recommendation result (e.g., a generated recommendation job description) is evaluated for compliance with user-defined conditions, such as user-customized requirements regarding the work site.
For relevance, the background requirements (e.g., academic, years of work, etc.), skill requirements in the recommendation results (e.g., generated recommendation job descriptions), whether the skill requirements match the background of job seekers, work experiences, learned professions, learned skills, etc. are evaluated.
The four evaluation dimensions are optional embodiments, more evaluation dimensions can be expanded to realize effective evaluation of the evaluation result, and the application has no specific number of limitations on the sub-modules and the corresponding evaluation dimensions.
As described above, the system of the present application uses a three-stage training framework, the whole system undergoes three-stage training processes, and reinforcement learning is performed on the models obtained in the first two stages by using any one of the Actor-critic algorithms, for example, the PPO algorithm, so as to further improve the recommendation performance, so that the recommendation result finally generated by the recommendation system can be aligned with the matched samples, and the recommendation performance of the recommendation system is improved.
In one embodiment, the partial training set for performing the first stage training on the generating module includes:
and the first training set comprises a plurality of first complete descriptions obtained after randomly sampling and overwriting part of user instructions and/or randomly deleting part of user structural features from a pre-constructed candidate instruction set.
As shown in fig. 1, the system of the present application supports the user to modify the conditions according to the requirements, i.e. to customize the conditions according to the requirements. As shown in fig. 2, in order to improve the robustness of the final generating module (i.e. the recommendation model), during the first stage training, the LLM is used to automatically reform user instructions with the same semantics but different expressions, and the reform process may be, for example, randomly sampling part of the user instructions from the candidate instruction set and reformulating the user instructions into user instructions with the same semantics but different expressions, where the reformulated user instructions are converted into new user instruction texts and used as input of the input module, and further, new text features can be generated based on the input, so that the generating module can output recommendation results based on the new text features, and the recommendation results are new complete descriptions (for example, recommendation job descriptions of job seekers) matched with the new text features. As shown in fig. 2, during the first stage training, the user structure, such as a certain historical behavior, a certain image feature, etc., may be deleted randomly, so as to improve the robustness of the final generating module (i.e., the recommendation model), for example, a part of the image features may be deleted randomly from the user image, and the deleted user image features are formatted (i.e., converted) into new user feature text, and are not input by the input module, so that a new text feature can be generated based on the input, and the generating module can output a recommendation result based on the new text feature, where the recommendation result is a new complete description (e.g., a recommendation job description of a job seeker) matched with the new text feature.
It should be noted that, during the first stage training, the above process of randomly sampling and rewriting a part of the user instruction from the candidate instruction set and randomly deleting a part of the user structural feature may be performed simultaneously, so as to obtain a new user instruction text and a new user feature text, and use the new user instruction text as an input of the input module, so that a new text feature can be generated based on the input, so that the generation module can output a recommendation result based on the new text feature, where the recommendation result is a new complete description (such as a recommendation job description of a job seeker) matched with the new text feature.
It will be appreciated that, during the first stage of training, the first training set used includes at least the new complete description (i.e., the first complete description) obtained in the above process as a sample, i.e., the first training set includes a plurality of first complete descriptions obtained by randomly sampling and overwriting part of the user instructions and/or randomly deleting part of the user structured features from the pre-constructed candidate instruction set.
The first stage training described in the present application can be used as an instruction fine tuning stage (as shown in fig. 2), a sample pair of job seeker-job positions that achieve matching is constructed from a platform log of a recommendation system, input is constructed according to the description in the "input module", instructions can be randomly selected from the available instruction set (i.e. the candidate instruction set constructed in advance) on the selection of user instructions, and sentences are rewritten by using the pre-training large model, so that job position descriptions that achieve matching (that is, the job seekers and recruiters are communicated and achieve preliminary intention of collaboration) are used as label text. The LLM language model loss function, namely the probability that the language model is required to maximize the generation of the label text, is adopted to train the generation module.
In one embodiment, the partial training set for performing the second stage training on the reward module includes:
a second training set for training a sub-module for evaluating the recommendation from the integrity, the second training set comprising a plurality of second complete descriptions obtained by randomly deleting a portion of text in the plurality of first complete descriptions;
a third training set for training a sub-module for evaluating the recommendation from the brevity, the third training set including a plurality of third complete descriptions obtained by adding noise text to a portion of text in the plurality of first complete descriptions;
a fourth training set for training a sub-module for evaluating the recommendation result from the constraint compliance, the fourth training set including a plurality of fourth complete descriptions obtained by adding conditional text to the user-customized constraint condition text corresponding to the plurality of first complete descriptions;
and a fifth training set for training a sub-module for evaluating the recommended results from the correlation, wherein the fifth training set comprises a plurality of comparison sample pairs constructed based on interaction behaviors among users, each comparison sample pair is a text feature of the current user, namely a sample pair of different recommended results, and is used for representing the correlation between the current user and the recommended results generated by different interaction behaviors.
As described above, the system of the present application may implement the reward module by using four sub-modules, for example, to determine the quality of the recommended result generated by the generating module, and for each sub-module, the second stage training needs to be performed by using training data, which may be used as a training stage of the reward model (as shown in fig. 2).
For the integrity rewards module, certain content in the first complete description in the first training set after the first stage training is randomly deleted to construct the second training set, for example, a negative sample (namely, a second complete description) is automatically generated by randomly deleting certain content in the recommended position description matched with the job seeker, and the recommended position description (namely, the first complete description) before deletion is taken as a positive sample. As shown in fig. 2, for the integrity bonus module, it is necessary to make the bonus point (re 1 in fig. 2) obtained from the first full description (position description 1 in fig. 2) larger than the bonus point (re 2 in fig. 2) obtained from the second full description (position description 2 in fig. 2), and train the integrity bonus module by maximizing the difference (difference loss in fig. 2) between the two bonus points.
For the conciseness rewarding module, noise texts are added to some of the first complete descriptions in the first training set after the first stage training to construct a third training set, for example, negative samples (namely third complete descriptions) are automatically generated by adding noise texts to some of the recommended job descriptions matched with job seekers, and the recommended job descriptions (namely the first complete descriptions) before deletion are taken as positive samples. As shown in fig. 2, for the brevity bonus module, it is necessary to make the bonus point (illustrated as reorder 1 in fig. 2) obtained by the first full description (illustrated as job description 1 in fig. 2) larger than the bonus point (illustrated as reorder 2 in fig. 2) of the third full description (illustrated as job description 2), and train the brevity bonus module by maximizing the difference (illustrated as difference loss in fig. 2) between the two bonus points.
For the constraint compliance rewarding module, a new complete description (namely, a fourth complete description) is generated as a positive sample based on a new customized constraint condition text by adding some new condition text, such as a job seeker manually adding a job site and the like, in the customized constraint condition text, and a position recommendation description (namely, a first complete description) before the new condition text is not added is taken as a negative sample. As shown in fig. 2, for the constraint compliance bonus module, it is necessary to make the bonus point (re 1 in fig. 2) obtained by the first full description (re 1 in fig. 2) larger than the bonus point (re 2 in fig. 2) of the fourth full description (re 2 in fig. 2) and train the constraint compliance bonus module by maximizing the difference (difference loss in fig. 2) between the two bonus points.
For the relevance rewarding module, the user (job seeker and recruiter) can achieve matching on the recruitment platform, and various interaction behaviors exist, such as actively initiating chat by the job seeker, actively initiating chat by the recruiter, refusing chat by the recruiter, actively initiating interview invitation by the recruiter, and the like, and compared samples can be constructed by using the interaction behaviors, such as achieving matching between the job seeker and the position A, initiating chat with the position B but not achieving matching, initiating chat with the position C but not receiving reply, and the relevance of the job seeker to the 3 positions is gradually decreased. Therefore, the application constructs the text feature of the job seeker and different job descriptions into a comparison sample pair (namely the text feature of the current user-sample pair of different recommendation results), for example, the text feature of the job seeker-job A, the text feature of the job seeker-job B and the text feature of the job seeker-job C. Different comparison samples can be input into the relevance rewarding module to obtain different rewarding points, the rewarding point of the position A is larger than the rewarding point of the position B, the rewarding point of the position B is larger than the rewarding point of the position C, and the relevance rewarding module is trained by maximizing the difference value between the rewarding point of the position A and the rewarding point of the position B and the difference value between the rewarding point of the position B and the rewarding point of the position C. It will be appreciated that the foregoing is illustrative of a user having a diagonal color as a job seeker, and that similar pairs of comparison samples may be constructed for recruiters to train the relevancy rewards module.
It can be appreciated that the above procedure of recommending job positions to job seekers is also applicable to the procedure of recommending suitable candidates to recruiters, and will not be repeated here.
In one embodiment, the generating module after the first stage training is used as an Actor, the rewarding module after the second stage training is used as Critic, and the generating module after the first stage training is subjected to the third stage training through an Actor-Critic algorithm to obtain the generating module after the third stage training.
After the first two stages of training are carried out, the generating module is used as an Actor,4 rewarding modules are used as Critic after the first two stages of models are obtained, any reinforcement learning algorithm of the Actor-Critic can be used for carrying out third stage training to further train the generating module, and the generating module after the third stage training can be used as a final recommended model. The training set of the third stage training is similar to the training set of the first stage training. The Actor-Critic algorithm is a reinforcement learning algorithm, combines the advantages of a value function and a strategy function, and can achieve a good effect in practical application. The Actor-Critic algorithm has high convergence rate, and can update the strategy and the value function in real time in the prior recruitment real-time system, and because the strategy function in the Actor-Critic algorithm is directly optimized, rather than being optimized through estimation as the value function, the recommended strategy update for the user in the recruitment system is more accurate.
The embodiment of the invention discloses an online recruitment generation type recommendation method, which comprises the following steps:
the input module is used for respectively converting each input of a user into a text and splicing the text into a group of text features, wherein each input of the user comprises a user instruction, a user structural feature and a user-defined condition, the text converted by the user instruction is a user instruction text, the text converted by the user structural feature is a user feature text, and the text converted by the user-defined condition is a user-customized constraint condition text;
the input module inputs the text features into the generation module, and the generation module outputs recommendation results according to the text features, wherein the recommendation results represent complete descriptions matched with various inputs of a user;
the rewarding module evaluates the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result;
the method further comprises the steps of: training in three stages to obtain a final recommended model according to training results:
training the generating module in a first stage;
training the reward module in a second stage;
And based on the generation module after the first-stage training and the rewarding module after the second-stage training, performing a third-stage training on the generation module, wherein the generation module after the third-stage training is used as a final recommendation model.
The method of the application firstly converts the user characteristics (such as portrait, behavior, etc.) into natural language through an input module, combines the natural language with specific instruction text, then inputs the natural language characteristics into a generation module of a large language model, and finally generates the job description meeting the user conditions. The specific description of the above three modules (input module, generation module and bonus module) may be referred to above, and will not be repeated here.
It should be noted that, the reward module may also adopt a manner of a plurality of sub-modules in the foregoing embodiment, and the configuration of each sub-module, the corresponding evaluation dimension, the evaluation method, the use of the training set and the training process may refer to the foregoing embodiment, which is not repeated herein.
In one embodiment, the third stage training is performed on the generating module based on the generating module after the first stage training and the rewarding module after the second stage training, including:
taking the generation module after the first-stage training as an Actor;
Taking the reward module trained in the second stage as Critic;
and performing the third-stage training on the generation module after the first-stage training through an Actor-Critic algorithm.
In one embodiment, the at least one evaluation dimension includes integrity, conciseness, constraint compliance, and relevance, and the evaluating the compliance of the recommendation in the at least one evaluation dimension includes:
the rewarding module evaluates the completeness of the recommended result to determine the coincidence degree of the recommended result containing necessary information;
the rewarding module evaluates the conciseness of the recommended result to determine the coincidence degree that the recommended result does not contain unnecessary information;
the rewarding module evaluates constraint compliance of the recommended result to determine compliance of the recommended result to the user-defined condition;
and the rewarding module evaluates the relatedness of the recommended result to determine the coincidence degree of the recommended result matching the user structural feature.
The specific meanings of the four evaluation dimensions, namely the integrity, the conciseness, the constraint compliance and the relativity, and the training process of the corresponding submodule and the submodule are as in the previous embodiment, and are not repeated here.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Furthermore, one of ordinary skill in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It will be understood by those skilled in the art that while the invention has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (9)

1. An online recruitment generation type recommendation system, the system comprising:
the input module is used for respectively converting each input of a user into texts and splicing the texts into a group of text features, wherein each input of the user comprises a user instruction, a user structural feature and a user-defined condition, the text converted by the user instruction is a user instruction text, the text converted by the user structural feature is a user feature text, and the text converted by the user-defined condition is a user-customized constraint condition text;
the generation module is used for generating a recommendation result according to the text characteristics, the recommendation result represents complete description matched with each input of a user, the generation module carries out first-stage training through part training sets in a plurality of training sets, and carries out third-stage training based on the generation module after the first-stage training and the reward module after the second-stage training, wherein the generation module after the first-stage training is used as an Actor, the reward module after the second-stage training is used as Critic, and the generation module after the first-stage training is carried out the third-stage training through an Actor-Critic algorithm to obtain the generation module after the third-stage training;
The rewarding module is used for evaluating the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result, and the rewarding module performs second-stage training through part of training sets in the training sets; the rewarding module adopts a plurality of submodules which are used for evaluating the recommended results from a plurality of evaluation dimensions respectively.
2. The system of claim 1, wherein the generation module uses a multi-layer neural network;
and the generation module adopts a serialization generation mode to aggregate the text characteristics currently output by the input module and the information of the generated text as input.
3. The system of claim 1, wherein each sub-module has inputs for the text feature and the recommendation, and outputs as a scalar value that characterizes compliance of the recommendation in one of the evaluation dimensions.
4. A system as claimed in claim 3, wherein each of said sub-modules uses a multi-layer neural network structure, the parameters of the multi-layer neural network structure used by each of said sub-modules being different.
5. The system of claim 1, wherein the at least one evaluation dimension comprises integrity, conciseness, constraint compliance, and relevance;
The integrity represents the coincidence degree of the recommendation result containing necessary information;
the conciseness characterizes the degree of coincidence that the recommendation result does not contain unnecessary information;
the constraint compliance characterizes the compliance degree of the recommended result to the user-defined condition;
the relevance characterizes how well the recommendation matches the user's structured features.
6. The system of claim 5, wherein the portion of the training set for first stage training of the generation module comprises:
and the first training set comprises a plurality of first complete descriptions obtained after randomly sampling and overwriting part of user instructions and/or randomly deleting part of user structural features from a pre-constructed candidate instruction set.
7. The system of claim 6, wherein the portion of the training set for second stage training of the reward module comprises:
a second training set for training a sub-module for evaluating the recommendation from the integrity, the second training set comprising a plurality of second complete descriptions obtained by randomly deleting a portion of text in the plurality of first complete descriptions;
a third training set for training a sub-module for evaluating the recommendation from the brevity, the third training set including a plurality of third complete descriptions obtained by adding noise text to a portion of text in the plurality of first complete descriptions;
A fourth training set for training a sub-module for evaluating the recommendation result from the constraint compliance, the fourth training set including a plurality of fourth complete descriptions obtained by adding conditional text to the user-customized constraint condition text corresponding to the plurality of first complete descriptions;
and a fifth training set for training a sub-module for evaluating the recommended results from the correlation, wherein the fifth training set comprises a plurality of comparison sample pairs constructed based on interaction behaviors among users, each comparison sample pair is a text feature of the current user, namely a sample pair of different recommended results, and is used for representing the correlation between the current user and the recommended results generated by different interaction behaviors.
8. An online recruitment generation type recommendation method, comprising:
the input module is used for respectively converting each input of a user into a text and splicing the text into a group of text features, wherein each input of the user comprises a user instruction, a user structural feature and a user-defined condition, the text converted by the user instruction is a user instruction text, the text converted by the user structural feature is a user feature text, and the text converted by the user-defined condition is a user-customized constraint condition text;
The input module inputs the text features into a LLM-based generation module, and the generation module outputs recommendation results according to the text features, wherein the recommendation results represent complete descriptions matched with various inputs of a user;
the rewarding module evaluates the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result, and adopts a plurality of submodules which evaluate the recommended result from a plurality of evaluation dimensions respectively;
the method further comprises the steps of: training in three stages to obtain a final recommended model according to training results:
training the generating module in a first stage;
training the reward module in a second stage;
and carrying out third-stage training on the generating module based on the generating module after the first-stage training and the rewarding module after the second-stage training, wherein the generating module after the third-stage training is used as a final recommendation model, the generating module after the first-stage training is used as an Actor, the rewarding module after the second-stage training is used as Critic, and the generating module after the first-stage training is subjected to the third-stage training through an Actor-Critic algorithm to obtain the generating module after the third-stage training.
9. The method of claim 8, wherein the at least one evaluation dimension includes integrity, conciseness, constraint compliance, and relevance, the evaluating the degree of compliance of the recommendation in the at least one evaluation dimension comprising:
the rewarding module evaluates the completeness of the recommended result to determine the coincidence degree of the recommended result containing necessary information;
the rewarding module evaluates the conciseness of the recommended result to determine the coincidence degree that the recommended result does not contain unnecessary information;
the rewarding module evaluates constraint compliance of the recommended result to determine compliance of the recommended result to the user-defined condition;
and the rewarding module evaluates the relatedness of the recommended result to determine the coincidence degree of the recommended result matching the user structural feature.
CN202310700570.0A 2023-06-14 2023-06-14 Online recruitment generation type recommendation system and method Active CN116452169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310700570.0A CN116452169B (en) 2023-06-14 2023-06-14 Online recruitment generation type recommendation system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310700570.0A CN116452169B (en) 2023-06-14 2023-06-14 Online recruitment generation type recommendation system and method

Publications (2)

Publication Number Publication Date
CN116452169A CN116452169A (en) 2023-07-18
CN116452169B true CN116452169B (en) 2023-11-24

Family

ID=87134073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310700570.0A Active CN116452169B (en) 2023-06-14 2023-06-14 Online recruitment generation type recommendation system and method

Country Status (1)

Country Link
CN (1) CN116452169B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628346B (en) * 2023-07-19 2024-01-05 深圳须弥云图空间科技有限公司 Training method and device for search word recommendation model
CN116757652B (en) * 2023-08-17 2023-10-20 北京华品博睿网络技术有限公司 Online recruitment recommendation system and method based on large language model
CN117252260A (en) * 2023-09-06 2023-12-19 山东心法科技有限公司 Interview skill training method, equipment and medium based on large language model

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05313875A (en) * 1992-05-13 1993-11-26 Hitachi Ltd Text collection/evaluation system
WO2019137493A1 (en) * 2018-01-12 2019-07-18 刘伟 Machine learning system for matching resume of job applicant with job requirements
CN111104595A (en) * 2019-12-16 2020-05-05 华中科技大学 Deep reinforcement learning interactive recommendation method and system based on text information
CN111783422A (en) * 2020-06-24 2020-10-16 北京字节跳动网络技术有限公司 Text sequence generation method, device, equipment and medium
CN113553510A (en) * 2021-07-30 2021-10-26 华侨大学 Text information recommendation method and device and readable medium
CN114969517A (en) * 2022-05-11 2022-08-30 深圳市欢太科技有限公司 Training method and recommendation method and device of object recommendation model and electronic equipment
CN115564393A (en) * 2022-10-24 2023-01-03 深圳今日人才信息科技有限公司 Recruitment requirement similarity-based job recommendation method
CN116128461A (en) * 2023-04-04 2023-05-16 北京华品博睿网络技术有限公司 Bidirectional recommendation system and method for online recruitment
CN116127186A (en) * 2022-12-09 2023-05-16 之江实验室 Knowledge-graph-based individual matching recommendation method and system for person sentry

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05313875A (en) * 1992-05-13 1993-11-26 Hitachi Ltd Text collection/evaluation system
WO2019137493A1 (en) * 2018-01-12 2019-07-18 刘伟 Machine learning system for matching resume of job applicant with job requirements
CN111104595A (en) * 2019-12-16 2020-05-05 华中科技大学 Deep reinforcement learning interactive recommendation method and system based on text information
CN111783422A (en) * 2020-06-24 2020-10-16 北京字节跳动网络技术有限公司 Text sequence generation method, device, equipment and medium
CN113553510A (en) * 2021-07-30 2021-10-26 华侨大学 Text information recommendation method and device and readable medium
CN114969517A (en) * 2022-05-11 2022-08-30 深圳市欢太科技有限公司 Training method and recommendation method and device of object recommendation model and electronic equipment
CN115564393A (en) * 2022-10-24 2023-01-03 深圳今日人才信息科技有限公司 Recruitment requirement similarity-based job recommendation method
CN116127186A (en) * 2022-12-09 2023-05-16 之江实验室 Knowledge-graph-based individual matching recommendation method and system for person sentry
CN116128461A (en) * 2023-04-04 2023-05-16 北京华品博睿网络技术有限公司 Bidirectional recommendation system and method for online recruitment

Also Published As

Publication number Publication date
CN116452169A (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN116452169B (en) Online recruitment generation type recommendation system and method
Chen et al. Automatic concept classification of text from electronic meetings
CN110209774A (en) Handle the method, apparatus and terminal device of session information
US11545042B2 (en) Personalized learning system
Wang et al. Attention-based CNN for personalized course recommendations for MOOC learners
CN113254604B (en) Reference specification-based professional text generation method and device
CN115438176B (en) Method and equipment for generating downstream task model and executing task
CN113987167A (en) Dependency perception graph convolutional network-based aspect-level emotion classification method and system
CN114168707A (en) Recommendation-oriented emotion type conversation method
CN112069781A (en) Comment generation method and device, terminal device and storage medium
CN113326367B (en) Task type dialogue method and system based on end-to-end text generation
CN112182439B (en) Search result diversification method based on self-attention network
CN116821457B (en) Intelligent consultation and public opinion processing system based on multi-mode large model
JP7373091B1 (en) Information processing system, information processing method and program
Wang et al. A survey of the evolution of language model-based dialogue systems
CN117271745A (en) Information processing method and device, computing equipment and storage medium
CN116882450A (en) Question-answering model editing method and device, electronic equipment and storage medium
CN116561251A (en) Natural language processing method
CN116701566A (en) Multi-round dialogue model and dialogue method based on emotion
CN113641789B (en) Viewpoint retrieval method and system based on hierarchical fusion multi-head attention network and convolution network
CN111222533B (en) Deep learning visual question-answering method and system based on dependency tree
CN113392640B (en) Title determination method, device, equipment and storage medium
CN109815323B (en) Human-computer interaction training question-answer generation algorithm
CN117633196B (en) Question-answering model construction method and project question-answering method
CN116431779B (en) FAQ question-answering matching method and device in legal field, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240316

Address after: Room 13, 1801, 18th Floor, Building 1, No.16 Taiyanggong Middle Road, Chaoyang District, Beijing, 100028

Patentee after: Beijing Hanlan Wolf Technology Co.,Ltd.

Country or region after: China

Address before: 09 / F, 1801, 18 / F, building 1, No. 16, Taiyanggong Middle Road, Chaoyang District, Beijing 100028

Patentee before: BEIJING HUAPIN BORUI NETWORK TECHNOLOGY CO.,LTD.

Country or region before: China