Disclosure of Invention
In order to solve the problems, the invention aims to provide the online recruitment generation type recommendation system and the online recruitment generation type recommendation method, which can directly intervene in a recommendation result and quickly obtain feedback, are more friendly to users and have good interpretability.
The embodiment of the invention provides an online recruitment generation type recommendation system, which comprises:
the input module is used for respectively converting each input of a user into texts and splicing the texts into a group of text features, wherein each input of the user comprises a user instruction, a user structural feature and a user-defined condition, the text converted by the user instruction is a user instruction text, the text converted by the user structural feature is a user feature text, and the text converted by the user-defined condition is a user-customized constraint condition text;
The generation module is used for generating a recommendation result according to the text characteristics, the recommendation result represents complete description matched with each input of a user, the generation module performs first-stage training through part of training sets in a plurality of training sets, and performs third-stage training based on the generation module after the first-stage training and the reward module after the second-stage training;
and the rewarding module is used for evaluating the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result, and the rewarding module performs second-stage training through part of training sets in the training sets.
As a further improvement of the invention, the generation module uses a multi-layer neural network;
and the generation module adopts a serialization generation mode to aggregate the text characteristics currently output by the input module and the information of the generated text as input.
As a further improvement of the invention, the reward module adopts a plurality of submodules, and the submodules are used for evaluating the recommendation results from a plurality of evaluation dimensions respectively;
and the input of each sub-module is the text characteristic and the recommended result, and the output is a scalar value, and the scalar value represents the conformity degree of the recommended result in one evaluation dimension.
As a further improvement of the invention, each sub-module uses a multi-layer neural network structure, and parameters of the multi-layer neural network structure used by each sub-module are different.
As a further refinement of the present invention, the at least one evaluation dimension includes integrity, conciseness, constraint compliance, and relevance;
the integrity represents the coincidence degree of the recommendation result containing necessary information;
the conciseness characterizes the degree of coincidence that the recommendation result does not contain unnecessary information;
the constraint compliance characterizes the compliance degree of the recommended result to the user-defined condition;
the relevance characterizes how well the recommendation matches the user's structured features.
As a further improvement of the present invention, the partial training set for performing the first stage training on the generating module includes:
and the first training set comprises a plurality of first complete descriptions obtained after randomly sampling and overwriting part of user instructions and/or randomly deleting part of user structural features from a pre-constructed candidate instruction set.
As a further improvement of the present invention, the partial training set for performing the second stage training on the reward module includes:
A second training set for training a sub-module for evaluating the recommendation from the integrity, the second training set comprising a plurality of second complete descriptions obtained by randomly deleting a portion of text in the plurality of first complete descriptions;
a third training set for training a sub-module for evaluating the recommendation from the brevity, the third training set including a plurality of third complete descriptions obtained by adding noise text to a portion of text in the plurality of first complete descriptions;
a fourth training set for training a sub-module for evaluating the recommendation result from the constraint compliance, the fourth training set including a plurality of fourth complete descriptions obtained by adding conditional text to the user-customized constraint condition text corresponding to the plurality of first complete descriptions;
and a fifth training set for training a sub-module for evaluating the recommended results from the correlation, wherein the fifth training set comprises a plurality of comparison sample pairs constructed based on interaction behaviors among users, each comparison sample pair is a text feature of the current user, namely a sample pair of different recommended results, and is used for representing the correlation between the current user and the recommended results generated by different interaction behaviors.
As a further improvement of the invention, the generation module after the first-stage training is used as an Actor, the reward module after the second-stage training is used as Critic, and the generation module after the first-stage training is subjected to the third-stage training through an Actor-Critic algorithm to obtain the generation module after the third-stage training.
The embodiment of the invention also provides an online recruitment generation type recommendation method, which comprises the following steps:
the input module is used for respectively converting each input of a user into a text and splicing the text into a group of text features, wherein each input of the user comprises a user instruction, a user structural feature and a user-defined condition, the text converted by the user instruction is a user instruction text, the text converted by the user structural feature is a user feature text, and the text converted by the user-defined condition is a user-customized constraint condition text;
the input module inputs the text features into the generation module, and the generation module outputs recommendation results according to the text features, wherein the recommendation results represent complete descriptions matched with various inputs of a user;
the rewarding module evaluates the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result;
The method further comprises the steps of: training in three stages to obtain a final recommended model according to training results:
training the generating module in a first stage by utilizing part of training sets in a plurality of training sets;
training the reward module in a second stage by utilizing part of training sets in a plurality of training sets;
and based on the generation module after the first-stage training and the rewarding module after the second-stage training, performing a third-stage training on the generation module, wherein the generation module after the third-stage training is used as a final recommendation model.
As a further improvement of the present invention, the at least one evaluation dimension includes integrity, conciseness, constraint compliance, and relevance, and the evaluating the degree of compliance of the recommendation in the at least one evaluation dimension includes:
the rewarding module evaluates the completeness of the recommended result to determine the coincidence degree of the recommended result containing necessary information;
the rewarding module evaluates the conciseness of the recommended result to determine the coincidence degree that the recommended result does not contain unnecessary information;
the rewarding module evaluates constraint compliance of the recommended result to determine compliance of the recommended result to the user-defined condition;
And the rewarding module evaluates the relatedness of the recommended result to determine the coincidence degree of the recommended result matching the user structural feature.
The beneficial effects of the invention are as follows:
the system can enable a user to input a custom condition through an interactive interface so as to directly intervene in a recommendation result and quickly obtain feedback, and compared with the implicit behavior of a traditional recommendation system, the system can provide richer interactive and changeable user characteristics, is more friendly to the user and has good interpretability. Meanwhile, the result output by the system is a complete description matched with the input of the user, so that the system can be directly applied to a downstream searching and recommending system.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present invention, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.
In addition, in the description of the present invention, the terminology used is for the purpose of illustration only and is not intended to limit the scope of the present invention. The terms "comprises" and/or "comprising" are used to specify the presence of stated elements, steps, operations, and/or components, but do not preclude the presence or addition of one or more other elements, steps, operations, and/or components. The terms "first," "second," and the like may be used for describing various elements, do not represent a sequence, and are not intended to limit the elements. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more. These terms are only used to distinguish one element from another element. These and/or other aspects will become apparent to those skilled in the art from the following description, when taken in conjunction with the accompanying drawings, wherein the present invention is described in connection with embodiments thereof. The drawings are intended to depict embodiments of the invention for purposes of illustration only. Those skilled in the art will readily recognize from the following description that alternative embodiments of the illustrated structures and methods of the present invention may be employed without departing from the principles of the present invention.
In the related technology of recruitment field recommendation systems, a traditional recommendation system based on ordering can recommend positions possibly met according to user figures and historical behaviors to job seekers, and in the whole recommendation system, users passively accept recommendation results and cannot actively intervene, but when the users have personalized requirements, for example, the job seekers consider trade replacement or recruiters have special skill requirements, the recommendation system is difficult to directly intervene in a recommendation process and immediately obtain feedback, and the whole recommendation process is black box, so that the interpretation is poor. In addition, the traditional recommendation has the problems of information overload, too high background knowledge requirement and poor experience. The general large language model lacks the corpus of the relevant recruitment field, lacks knowledge of the field, such as the requirements of different positions on different skills, and cannot be directly applied to the recommendation task.
The embodiment of the invention discloses an online recruitment generation type recommendation system, as shown in fig. 1, comprising:
the input module is used for respectively converting each input of a user into texts and splicing the texts into a group of text features, wherein each input of the user comprises a user instruction, a user structural feature and a user-defined condition, the text converted by the user instruction is a user instruction text, the text converted by the user structural feature is a user feature text, and the text converted by the user-defined condition is a user-customized constraint condition text;
The generation module is used for generating a recommendation result according to the text characteristics, the recommendation result represents complete description matched with each input of a user, the generation module performs first-stage training through part of training sets in a plurality of training sets, and performs third-stage training based on the generation module after the first-stage training and the reward module after the second-stage training;
and the rewarding module is used for evaluating the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result, and the rewarding module performs second-stage training through part of training sets in the training sets.
The system is a recommendation system based on generation, can enable a user to input a user-defined condition through an interactive interface so as to directly intervene in a recommendation result and quickly obtain feedback, and can provide richer user characteristics capable of being interactively changed, is more friendly to the user, has better experience and has better interpretability compared with implicit behaviors (black boxes) of the traditional recommendation system. As shown in FIG. 1, the system of the application supports the user to modify the conditions according to the requirements, namely, the user can customize the conditions according to the requirements, directly intervene on the generated recommendation results to generate more satisfactory recommendation results, and the recommendation results can be directly displayed to the user as interpretable basis, and can also be input into the original traditional recommendation systems in the form of characteristics, so that the recommendation quality of the traditional recommendation systems is improved. Meanwhile, the result output by the system is a complete description matched with the input of the user, such as a complete description of one position of the job seeker, so that the system can be directly applied to a downstream searching and recommending system.
It can be understood that on the aspect of characteristics, the input module adopts the characteristics (namely text characteristics) described by plain text to replace the existing ID characteristics (such as user portraits), so that the whole recommendation system can directly input user-defined conditions to intervene in final recommendation results, the personalized requirements of users are met, and the intervention can be fed back quickly; on the output level, the generation module directly generates a recommendation result according to the characteristics of the plain text description (namely, text characteristics) instead of ordering and outputting a list in a candidate set, for example, a job description (complete description) which is most suitable for the background and the working experience of a job seeker can be generated for the job seeker, the job can be provided for the job seeker to help the job seeker to clearly find the job seeker direction, the intuitiveness is improved, the interpretability is improved, and the job setting is carried out for a specific job seeker, so that the system has good expansibility; in the system, three-stage reinforcement learning is used in the data training level, after three-stage training, the generation module after the third-stage training can be used as a final recommendation model, as shown in fig. 1, the system is interactive, and the generation can be an iterative process, i.e. a user can continuously add new custom conditions to obtain new recommendation results.
The input to the system consists essentially of three parts: user instruction, user structured feature, and user custom condition, so that the user input is < refer to user instruction, user structured feature, user custom condition >, the three partially formatted (i.e., converted) text is stitched together to form a set of text features, i.e., < user instruction text, user feature text, user custom constraint text > (this set of text features is illustrated in fig. 1 as instruction text, user feature text, user custom constraint, respectively).
The system provided by the application is a bidirectional reciprocity recommendation system based on a large language model, supports the reciprocal matching of a job seeker and a recruiter, and can serve the job seeker and the recruiter at the same time. Taking the recommendation of a proper position for a job seeker as an illustration, a user instruction is a text, such as "please recommend a proper position according to the information of the candidate"; the user structural features, such as basic portrait, educational background, working experience and the like, need to be converted into text, and for the structural features of the user's historical behavior sequence, examples of common formats are [ < job ID1, initiate chat application >, < job ID2, deliver resume >, … ], and the like, which are also converted into text, such as "initiate chat application to job ID1 of a company, deliver resume to job ID 2". Custom conditions may be presented through, for example, a visual input interface, such as a button, and may also take the form of, for example, an input box or a drop down menu. Taking a job seeker as an example, the job seeker can input custom conditions such as a work place, a compensation range, a working time length and the like, and after the custom conditions are input, the custom conditions are converted into user custom constraint condition texts, for example, when the user inputs that the work place is Beijing, the user custom constraint condition texts obtained after conversion are "[ work place preference ] Beijing.
In one embodiment, the generation module uses a multi-layer neural network;
and the generation module adopts a serialization generation mode to aggregate the text characteristics currently output by the input module and the information of the generated text as input.
As shown in fig. 1, the generation module is a LLM-based generation module. The generic large language model (Large Language Model, LLM) currently generally refers to a large model of the billion or even billions of parameters based on a transducer architecture pre-trained on a large scale corpus, which exhibits good generalization over common natural language processing problems. The large language model has excellent language modeling and generating capability, and the natural language has better interpretability. The LLM is not particularly limited in the present application, and may be GPT series, chatGLM, LLaMA, etc.
The generating module is a multi-layer neural network, for example, a neural network composed of a plurality of layers Transformer Decoders, the input of the generating module is the text feature constructed in the previous module, and then the complete description matched with the input of the user is generated as a recommendation result, for example, the job description conforming to the feature of the job seeker is generated as the recommendation result.
The above-mentioned serialization indicates that the neural network generation process is a multi-step process, where each step generates only one word, for example, 3 words of [ w0, w1, w2] have been generated, in the next step, the input of the neural network model of the generation module contains both < user instruction text, user feature text, user customization constraint > (i.e., text feature currently output by the input module), and 3 words of < w0, w1, w2> (i.e., information of the generated text) are spliced to generate the next word w3, and a complete text can be generated by repeating the process. A stop condition may be set during the generation, for example, a maximum generation length may be set, or a stop may be set if a certain keyword is generated.
In one embodiment, the reward module employs a plurality of sub-modules for evaluating the recommendation from a plurality of evaluation dimensions, respectively;
and the input of each sub-module is the text characteristic and the recommended result, and the output is a scalar value, and the scalar value represents the conformity degree of the recommended result in one evaluation dimension.
In one embodiment, each sub-module uses a multi-layer neural network structure, and parameters of the multi-layer neural network structure used by each sub-module are different.
The reward module is composed of a plurality of sub-modules, each sub-module is a multi-layer neural network, for example, each sub-module is a multi-layer Transformer structure, but model parameters in the Transformer network structure are different. The input of each sub-module is the input of the whole recommendation system plus the generated recommendation result, and the output is a scalar value which represents the degree of coincidence of the recommendation result in a certain dimension, and the larger the value of the scalar value is, the more coincidence is represented.
It can be understood that the reward module in the system of the present application may be one sub-module, two sub-modules, three sub-modules, four sub-modules, etc., where each sub-module corresponds to one evaluation dimension, so that the system has good expansibility, and supports to add new evaluation dimensions, and only needs to add corresponding sub-modules.
In one embodiment, the at least one evaluation dimension includes integrity, conciseness, constraint compliance, and relevance;
the integrity represents the coincidence degree of the recommendation result containing necessary information;
the conciseness characterizes the degree of coincidence that the recommendation result does not contain unnecessary information;
the constraint compliance characterizes the compliance degree of the recommended result to the user-defined condition;
The relevance characterizes how well the recommendation matches the user's structured features.
For example, in fig. 1, the reward module of the present application designs four sub-modules, which are a completeness reward module, a brevity reward module, a constraint compliance reward module, and a relativity reward module, respectively, and the four sub-modules are used for evaluating the compliance degree of the recommendation result in four evaluation dimensions, namely, completeness, brevity, constraint compliance, and relativity, respectively.
For completeness, whether the recommendation result (such as the generated recommendation position description) contains necessary information such as position names, skill requirements, position responsibilities and the like is evaluated, and the necessary information can be predefined according to different user roles.
For simplicity, whether the recommendation result (such as generated recommendation position description) is repeated and redundant is evaluated, unnecessary information is not required to be contained, and different unnecessary information can be predefined according to different roles of users, such as interview traffic mode, interview place and the like which are irrelevant to work.
For constraint compliance, a recommendation result (e.g., a generated recommendation job description) is evaluated for compliance with user-defined conditions, such as user-customized requirements regarding the work site.
For relevance, the background requirements (e.g., academic, years of work, etc.), skill requirements in the recommendation results (e.g., generated recommendation job descriptions), whether the skill requirements match the background of job seekers, work experiences, learned professions, learned skills, etc. are evaluated.
The four evaluation dimensions are optional embodiments, more evaluation dimensions can be expanded to realize effective evaluation of the evaluation result, and the application has no specific number of limitations on the sub-modules and the corresponding evaluation dimensions.
As described above, the system of the present application uses a three-stage training framework, the whole system undergoes three-stage training processes, and reinforcement learning is performed on the models obtained in the first two stages by using any one of the Actor-critic algorithms, for example, the PPO algorithm, so as to further improve the recommendation performance, so that the recommendation result finally generated by the recommendation system can be aligned with the matched samples, and the recommendation performance of the recommendation system is improved.
In one embodiment, the partial training set for performing the first stage training on the generating module includes:
and the first training set comprises a plurality of first complete descriptions obtained after randomly sampling and overwriting part of user instructions and/or randomly deleting part of user structural features from a pre-constructed candidate instruction set.
As shown in fig. 1, the system of the present application supports the user to modify the conditions according to the requirements, i.e. to customize the conditions according to the requirements. As shown in fig. 2, in order to improve the robustness of the final generating module (i.e. the recommendation model), during the first stage training, the LLM is used to automatically reform user instructions with the same semantics but different expressions, and the reform process may be, for example, randomly sampling part of the user instructions from the candidate instruction set and reformulating the user instructions into user instructions with the same semantics but different expressions, where the reformulated user instructions are converted into new user instruction texts and used as input of the input module, and further, new text features can be generated based on the input, so that the generating module can output recommendation results based on the new text features, and the recommendation results are new complete descriptions (for example, recommendation job descriptions of job seekers) matched with the new text features. As shown in fig. 2, during the first stage training, the user structure, such as a certain historical behavior, a certain image feature, etc., may be deleted randomly, so as to improve the robustness of the final generating module (i.e., the recommendation model), for example, a part of the image features may be deleted randomly from the user image, and the deleted user image features are formatted (i.e., converted) into new user feature text, and are not input by the input module, so that a new text feature can be generated based on the input, and the generating module can output a recommendation result based on the new text feature, where the recommendation result is a new complete description (e.g., a recommendation job description of a job seeker) matched with the new text feature.
It should be noted that, during the first stage training, the above process of randomly sampling and rewriting a part of the user instruction from the candidate instruction set and randomly deleting a part of the user structural feature may be performed simultaneously, so as to obtain a new user instruction text and a new user feature text, and use the new user instruction text as an input of the input module, so that a new text feature can be generated based on the input, so that the generation module can output a recommendation result based on the new text feature, where the recommendation result is a new complete description (such as a recommendation job description of a job seeker) matched with the new text feature.
It will be appreciated that, during the first stage of training, the first training set used includes at least the new complete description (i.e., the first complete description) obtained in the above process as a sample, i.e., the first training set includes a plurality of first complete descriptions obtained by randomly sampling and overwriting part of the user instructions and/or randomly deleting part of the user structured features from the pre-constructed candidate instruction set.
The first stage training described in the present application can be used as an instruction fine tuning stage (as shown in fig. 2), a sample pair of job seeker-job positions that achieve matching is constructed from a platform log of a recommendation system, input is constructed according to the description in the "input module", instructions can be randomly selected from the available instruction set (i.e. the candidate instruction set constructed in advance) on the selection of user instructions, and sentences are rewritten by using the pre-training large model, so that job position descriptions that achieve matching (that is, the job seekers and recruiters are communicated and achieve preliminary intention of collaboration) are used as label text. The LLM language model loss function, namely the probability that the language model is required to maximize the generation of the label text, is adopted to train the generation module.
In one embodiment, the partial training set for performing the second stage training on the reward module includes:
a second training set for training a sub-module for evaluating the recommendation from the integrity, the second training set comprising a plurality of second complete descriptions obtained by randomly deleting a portion of text in the plurality of first complete descriptions;
a third training set for training a sub-module for evaluating the recommendation from the brevity, the third training set including a plurality of third complete descriptions obtained by adding noise text to a portion of text in the plurality of first complete descriptions;
a fourth training set for training a sub-module for evaluating the recommendation result from the constraint compliance, the fourth training set including a plurality of fourth complete descriptions obtained by adding conditional text to the user-customized constraint condition text corresponding to the plurality of first complete descriptions;
and a fifth training set for training a sub-module for evaluating the recommended results from the correlation, wherein the fifth training set comprises a plurality of comparison sample pairs constructed based on interaction behaviors among users, each comparison sample pair is a text feature of the current user, namely a sample pair of different recommended results, and is used for representing the correlation between the current user and the recommended results generated by different interaction behaviors.
As described above, the system of the present application may implement the reward module by using four sub-modules, for example, to determine the quality of the recommended result generated by the generating module, and for each sub-module, the second stage training needs to be performed by using training data, which may be used as a training stage of the reward model (as shown in fig. 2).
For the integrity rewards module, certain content in the first complete description in the first training set after the first stage training is randomly deleted to construct the second training set, for example, a negative sample (namely, a second complete description) is automatically generated by randomly deleting certain content in the recommended position description matched with the job seeker, and the recommended position description (namely, the first complete description) before deletion is taken as a positive sample. As shown in fig. 2, for the integrity bonus module, it is necessary to make the bonus point (re 1 in fig. 2) obtained from the first full description (position description 1 in fig. 2) larger than the bonus point (re 2 in fig. 2) obtained from the second full description (position description 2 in fig. 2), and train the integrity bonus module by maximizing the difference (difference loss in fig. 2) between the two bonus points.
For the conciseness rewarding module, noise texts are added to some of the first complete descriptions in the first training set after the first stage training to construct a third training set, for example, negative samples (namely third complete descriptions) are automatically generated by adding noise texts to some of the recommended job descriptions matched with job seekers, and the recommended job descriptions (namely the first complete descriptions) before deletion are taken as positive samples. As shown in fig. 2, for the brevity bonus module, it is necessary to make the bonus point (illustrated as reorder 1 in fig. 2) obtained by the first full description (illustrated as job description 1 in fig. 2) larger than the bonus point (illustrated as reorder 2 in fig. 2) of the third full description (illustrated as job description 2), and train the brevity bonus module by maximizing the difference (illustrated as difference loss in fig. 2) between the two bonus points.
For the constraint compliance rewarding module, a new complete description (namely, a fourth complete description) is generated as a positive sample based on a new customized constraint condition text by adding some new condition text, such as a job seeker manually adding a job site and the like, in the customized constraint condition text, and a position recommendation description (namely, a first complete description) before the new condition text is not added is taken as a negative sample. As shown in fig. 2, for the constraint compliance bonus module, it is necessary to make the bonus point (re 1 in fig. 2) obtained by the first full description (re 1 in fig. 2) larger than the bonus point (re 2 in fig. 2) of the fourth full description (re 2 in fig. 2) and train the constraint compliance bonus module by maximizing the difference (difference loss in fig. 2) between the two bonus points.
For the relevance rewarding module, the user (job seeker and recruiter) can achieve matching on the recruitment platform, and various interaction behaviors exist, such as actively initiating chat by the job seeker, actively initiating chat by the recruiter, refusing chat by the recruiter, actively initiating interview invitation by the recruiter, and the like, and compared samples can be constructed by using the interaction behaviors, such as achieving matching between the job seeker and the position A, initiating chat with the position B but not achieving matching, initiating chat with the position C but not receiving reply, and the relevance of the job seeker to the 3 positions is gradually decreased. Therefore, the application constructs the text feature of the job seeker and different job descriptions into a comparison sample pair (namely the text feature of the current user-sample pair of different recommendation results), for example, the text feature of the job seeker-job A, the text feature of the job seeker-job B and the text feature of the job seeker-job C. Different comparison samples can be input into the relevance rewarding module to obtain different rewarding points, the rewarding point of the position A is larger than the rewarding point of the position B, the rewarding point of the position B is larger than the rewarding point of the position C, and the relevance rewarding module is trained by maximizing the difference value between the rewarding point of the position A and the rewarding point of the position B and the difference value between the rewarding point of the position B and the rewarding point of the position C. It will be appreciated that the foregoing is illustrative of a user having a diagonal color as a job seeker, and that similar pairs of comparison samples may be constructed for recruiters to train the relevancy rewards module.
It can be appreciated that the above procedure of recommending job positions to job seekers is also applicable to the procedure of recommending suitable candidates to recruiters, and will not be repeated here.
In one embodiment, the generating module after the first stage training is used as an Actor, the rewarding module after the second stage training is used as Critic, and the generating module after the first stage training is subjected to the third stage training through an Actor-Critic algorithm to obtain the generating module after the third stage training.
After the first two stages of training are carried out, the generating module is used as an Actor,4 rewarding modules are used as Critic after the first two stages of models are obtained, any reinforcement learning algorithm of the Actor-Critic can be used for carrying out third stage training to further train the generating module, and the generating module after the third stage training can be used as a final recommended model. The training set of the third stage training is similar to the training set of the first stage training. The Actor-Critic algorithm is a reinforcement learning algorithm, combines the advantages of a value function and a strategy function, and can achieve a good effect in practical application. The Actor-Critic algorithm has high convergence rate, and can update the strategy and the value function in real time in the prior recruitment real-time system, and because the strategy function in the Actor-Critic algorithm is directly optimized, rather than being optimized through estimation as the value function, the recommended strategy update for the user in the recruitment system is more accurate.
The embodiment of the invention discloses an online recruitment generation type recommendation method, which comprises the following steps:
the input module is used for respectively converting each input of a user into a text and splicing the text into a group of text features, wherein each input of the user comprises a user instruction, a user structural feature and a user-defined condition, the text converted by the user instruction is a user instruction text, the text converted by the user structural feature is a user feature text, and the text converted by the user-defined condition is a user-customized constraint condition text;
the input module inputs the text features into the generation module, and the generation module outputs recommendation results according to the text features, wherein the recommendation results represent complete descriptions matched with various inputs of a user;
the rewarding module evaluates the coincidence degree of the recommended result in at least one evaluation dimension according to the text characteristics and the recommended result;
the method further comprises the steps of: training in three stages to obtain a final recommended model according to training results:
training the generating module in a first stage;
training the reward module in a second stage;
And based on the generation module after the first-stage training and the rewarding module after the second-stage training, performing a third-stage training on the generation module, wherein the generation module after the third-stage training is used as a final recommendation model.
The method of the application firstly converts the user characteristics (such as portrait, behavior, etc.) into natural language through an input module, combines the natural language with specific instruction text, then inputs the natural language characteristics into a generation module of a large language model, and finally generates the job description meeting the user conditions. The specific description of the above three modules (input module, generation module and bonus module) may be referred to above, and will not be repeated here.
It should be noted that, the reward module may also adopt a manner of a plurality of sub-modules in the foregoing embodiment, and the configuration of each sub-module, the corresponding evaluation dimension, the evaluation method, the use of the training set and the training process may refer to the foregoing embodiment, which is not repeated herein.
In one embodiment, the third stage training is performed on the generating module based on the generating module after the first stage training and the rewarding module after the second stage training, including:
taking the generation module after the first-stage training as an Actor;
Taking the reward module trained in the second stage as Critic;
and performing the third-stage training on the generation module after the first-stage training through an Actor-Critic algorithm.
In one embodiment, the at least one evaluation dimension includes integrity, conciseness, constraint compliance, and relevance, and the evaluating the compliance of the recommendation in the at least one evaluation dimension includes:
the rewarding module evaluates the completeness of the recommended result to determine the coincidence degree of the recommended result containing necessary information;
the rewarding module evaluates the conciseness of the recommended result to determine the coincidence degree that the recommended result does not contain unnecessary information;
the rewarding module evaluates constraint compliance of the recommended result to determine compliance of the recommended result to the user-defined condition;
and the rewarding module evaluates the relatedness of the recommended result to determine the coincidence degree of the recommended result matching the user structural feature.
The specific meanings of the four evaluation dimensions, namely the integrity, the conciseness, the constraint compliance and the relativity, and the training process of the corresponding submodule and the submodule are as in the previous embodiment, and are not repeated here.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Furthermore, one of ordinary skill in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It will be understood by those skilled in the art that while the invention has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.