CN117370540B

CN117370540B - Recommendation model generation method based on large language model and corresponding product

Info

Publication number: CN117370540B
Application number: CN202311675494.9A
Authority: CN
Inventors: 朱洪银; 张闯; 王敏
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2023-12-07
Filing date: 2023-12-07
Publication date: 2024-03-01
Anticipated expiration: 2043-12-07
Also published as: CN117370540A

Abstract

The embodiment of the invention provides a large language model-based recommendation model generation method and a corresponding product, which are applied to the technical field of data recommendation, wherein the method comprises the following steps: acquiring training data comprising a plurality of user characteristic data and a plurality of user behavior sequence data corresponding to each user characteristic data; then converting the training data into structured first data, and generating a training input sequence according to the first data; shielding partial fields in the training input sequence to obtain a first text; the masked partial field is a second text; and then, according to the first text and the second text, adjusting the universal large language model to obtain the target recommendation model. According to the embodiment of the invention, the large language model can be applied to the recommendation system, and the recommendation model with stronger prediction capability can be obtained, so that more accurate recommendation service can be provided for users.

Description

Recommendation model generation method based on large language model and corresponding product

Technical Field

The present invention relates to the technical field of data recommendation, and in particular, to a method for generating a recommendation model based on a large language model, a device for generating a recommendation model based on a large language model, an electronic device, and a non-volatile computer readable storage medium.

Background

In recent years, a recommendation system is one of the biggest business income sources of many companies, and particularly for the internet industry, searching, advertising and recommending three businesses are called as the most profit. The recommendation technique has received much attention for its ability to rapidly push on-line traffic growth and improve user engagement.

In order to improve accuracy of the recommendation system, prediction capability of a recommendation model in the recommendation system needs to be improved; how to improve the predictive capability of the recommendation model is one of the problems to be solved in the present day.

Disclosure of Invention

In view of the above, a method for generating a large language model based recommendation model and a corresponding product are proposed to overcome or at least partially solve the above problems, comprising:

a method of generating a recommendation model based on a large language model, the method comprising:

acquiring training data, wherein the training data comprises a plurality of user characteristic data and a plurality of user behavior sequence data corresponding to each user characteristic data;

converting the training data into structured first data, and generating a training input sequence according to the first data;

Masking part of fields in the training input sequence to obtain a first text; the masked partial field is a second text;

and adjusting the universal large language model according to the first text and the second text to obtain a target recommendation model.

Optionally, the converting the training data into the structured first data includes:

determining the duration of the behavior corresponding to each user behavior sequence data, and determining target behavior sequence data from a plurality of user behavior sequence data according to the duration of the behavior corresponding to each user behavior sequence data;

determining target user characteristic data corresponding to the target behavior sequence data from the plurality of user characteristic data;

converting the target behavior sequence data into structured target behavior sequence data, and converting the target user feature data into structured target user feature data;

and obtaining first data according to the structured target behavior sequence data and the structured target user characteristic data.

Optionally, the converting the target behavior sequence data into the structured target behavior sequence data and the converting the target user feature data into the structured target user feature data include:

Performing textual conversion on the target behavior sequence data and the target user characteristic data to obtain target behavior sequence data and target user characteristic data in a text form;

and carrying out structural conversion on the text-form target behavior sequence data and the target user characteristic data to obtain first data.

Optionally, the non-textual target behavior sequence data and the target user feature data are data in a form of a table, and the text conversion is performed on the target behavior sequence data and the target user feature data to obtain the textual target behavior sequence data and the target user feature data, which includes:

and converting the target behavior sequence data and the target user characteristic data in the form of a table into the target behavior sequence data and the target user characteristic data in the form of texts.

Optionally, the obtaining the first data according to the structured target behavior sequence data and the structured target user feature data includes:

and preprocessing the structured target behavior sequence data and the structured target user characteristic data to obtain first data.

Optionally, the determining the target behavior sequence data from the plurality of user behavior sequence data according to the duration of the behavior corresponding to each user behavior sequence data includes:

Determining a duration interval to which each user behavior sequence data belongs according to duration of the behavior corresponding to the user behavior sequence data;

and acquiring target behavior sequence data from the user behavior sequence data corresponding to each duration interval according to a preset proportion.

Optionally, the obtaining, according to a preset ratio, target behavior sequence data from user behavior sequence data corresponding to each duration interval includes:

determining the quantity of data to be acquired corresponding to each duration interval according to the preset proportion;

and randomly acquiring target behavior sequence data from the user behavior sequence data corresponding to each duration interval according to the number of the data to be acquired.

Optionally, the adjusting the universal large language model according to the first text and the second text to obtain a target recommendation model includes:

processing the first text using a bi-directional encoder of the generic large language model and processing the second text using a uni-directional decoder of the generic large language model;

and according to the processing result of the bidirectional encoder and the processing result of the unidirectional decoder, adjusting model parameters of the universal large language model to obtain the target recommendation model.

Optionally, the masking a part of fields in the training input sequence to obtain a first text includes:

and shielding part of entities in the training input sequence to obtain the first text.

and shielding part of sentences in the training input sequence to obtain the first text.

and shielding part of text segments in the training input sequence to obtain the first text.

Alternatively, the slots of the masked portions of the training input sequence are arranged in chronological order.

Optionally, a part of the fields of the masked portion in the first text is provided with a start flag, and the second text is provided with an end flag.

Optionally, each text unit of the first text is provided with an in-position flag and an inter-position flag, the in-position flag and the inter-position flag being mapped in the embedded vector of the corresponding text unit.

Optionally, the method further comprises:

receiving a recommended task;

And inputting the recommendation task into the target recommendation model, and receiving target recommendation data output by the target recommendation model.

Optionally, the target recommendation model is configured to determine, when generating a target text unit of target recommendation data, whether the target text unit and a last generated text unit belong to the same entity; and when the target text unit and the last generated text unit belong to the same entity, updating the in-position mark of the same entity.

Optionally, the target recommendation model is provided with an entity pool, and the entity pool comprises a plurality of entities; the target recommendation model is used for deleting entities which are not matched with the target text unit in the entity pool when the target text unit is generated.

The embodiment of the invention also provides a device for generating the recommendation model based on the large language model, which comprises the following steps:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring training data, and the training data comprises a plurality of user characteristic data and a plurality of user behavior sequence data corresponding to the user characteristic data;

the generating module is used for converting the training data into structured first data and generating a training input sequence according to the first data;

The shielding module is used for shielding part of fields in the training input sequence to obtain a first text; the masked partial field is a second text;

and the adjustment module is used for adjusting the universal large language model according to the first text and the second text to obtain a target recommendation model.

Optionally, the generating module is configured to determine a duration of time for which the behavior corresponding to each user behavior sequence data is stored, and determine target behavior sequence data from the plurality of user behavior sequence data according to the duration of time for which the behavior corresponding to each user behavior sequence data is stored; determining target user characteristic data corresponding to the target behavior sequence data from the plurality of user characteristic data; converting the target behavior sequence data into structured target behavior sequence data, and converting the target user feature data into structured target user feature data; and obtaining first data according to the structured target behavior sequence data and the structured target user characteristic data.

Optionally, the generating module is configured to perform textual conversion on the target behavior sequence data and the target user feature data, so as to obtain the target behavior sequence data and the target user feature data in a text form; and carrying out structural conversion on the text-form target behavior sequence data and the target user characteristic data to obtain first data.

Optionally, the non-text target behavior sequence data and the target user characteristic data are data in a form of a table, and the generating module is used for converting the target behavior sequence data and the target user characteristic data in the form of the table into the target behavior sequence data and the target user characteristic data in the form of text.

Optionally, the generating module is configured to pre-process the structured target behavior sequence data and the structured target user feature data to obtain first data.

Optionally, the generating module is configured to determine, according to a duration of the behavior corresponding to the user behavior sequence data, a duration interval to which each user behavior sequence data belongs; and acquiring target behavior sequence data from the user behavior sequence data corresponding to each duration interval according to a preset proportion.

Optionally, the generating module is configured to determine, according to the preset proportion, the number of data to be acquired corresponding to each duration interval; and randomly acquiring target behavior sequence data from the user behavior sequence data corresponding to each duration interval according to the number of the data to be acquired.

Optionally, the adjustment module is configured to process the first text using a bi-directional encoder of the generic large language model, and process the second text using a uni-directional decoder of the generic large language model; and according to the processing result of the bidirectional encoder and the processing result of the unidirectional decoder, adjusting model parameters of the universal large language model to obtain the target recommendation model.

Optionally, the masking module is configured to mask a part of entities in the training input sequence to obtain the first text.

Optionally, the masking module is configured to mask a part of sentences in the training input sequence to obtain the first text.

Optionally, the masking module is configured to mask a part of text segments in the training input sequence to obtain the first text.

Optionally, the apparatus further comprises:

the recommending module is used for receiving recommending tasks; and inputting the recommendation task into the target recommendation model, and receiving target recommendation data output by the target recommendation model.

The embodiment of the invention also provides electronic equipment, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the generation method of the recommendation model based on the large language model when being executed by the processor.

The embodiment of the invention also provides a nonvolatile computer readable storage medium, wherein the nonvolatile computer readable storage medium stores a computer program, and the computer program realizes the generation method of the recommendation model based on the large language model when being executed by a processor.

The embodiment of the invention has the following advantages:

in the embodiment of the invention, training data comprising a plurality of user characteristic data and a plurality of user behavior sequence data corresponding to each user characteristic data can be acquired first; then converting the training data into structured first data, and generating a training input sequence according to the first data; shielding partial fields in the training input sequence to obtain a first text; the masked partial field is a second text; and then, according to the first text and the second text, adjusting the universal large language model to obtain the target recommendation model. According to the embodiment of the invention, the large language model can be applied to the recommendation system, and the recommendation model with stronger prediction capability can be obtained, so that more accurate recommendation service can be provided for users.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow chart of the steps of a method for generating a recommendation model based on a large language model according to an embodiment of the present invention;

FIG. 2 is a flow chart of steps of another method for generating a recommendation model based on a large language model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an application scenario according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a device for generating a recommendation model based on a large language model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an electronic device according to an embodiment of the present invention;

fig. 6 is a schematic structural view of a nonvolatile computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to improve the accuracy of recommendation system prediction, the advantages of the large language model can be applied to the recommendation system by integrating the large language model into the recommendation system; in particular, the large language model may help the recommender system understand and infer relationships between user features and behavior sequences and between entities in the behavior sequences by utilizing semantic information in natural language data, which enables the recommender system to more fully understand the needs and preferences of the user.

And the large language model is trained by a large amount of text data, which can help understand the relationship between different concepts and views. By incorporating a large language model into the recommendation system, implicit knowledge can be used to generate more unique and logical recommendations, which may lead to more creative and personalized recommendations for the user.

By utilizing the natural language processing capability of the large language model, various single recommendation tasks can be integrated into a unified framework, and the recommendation model can be quickly adapted to a new field with limited data by the knowledge of the large language model pre-training and the learning capability of a few samples.

However, currently, applying a large language model to a recommendation system, so as to construct a robust and integrated recommendation system, so that the recommendation system can fully utilize the massive knowledge and reasoning capability of the large language model faces a plurality of challenges: for example: there is currently no precedent for applying large language models to recommendation systems; training the pre-training based recommendation model directly from scratch is not only time consuming and laborious, but also lacks the general knowledge and reasoning ability to support large language models; at the same time, the recommendation data has unique features, such as fixed entities and sequential user behavior, that are different from the original text corpus used to train the language model, and direct fine tuning may eliminate many functions specific to the recommendation task. In order to apply the large language model to the recommendation system, the embodiment of the invention provides a generation method of a recommendation model based on the large language model, and referring to fig. 1, fig. 1 shows a step flow chart of the generation method of the recommendation model based on the large language model in the embodiment of the invention; as shown in fig. 1, the method may include the steps of:

Step 101, acquiring training data, wherein the training data comprises a plurality of user characteristic data and a plurality of user behavior sequence data corresponding to each user characteristic data.

In order to maintain the advantages of a large language model and supplement new knowledge in the recommendation data, the embodiment of the invention can acquire the pre-collected related data of the historical user, for example: different user characteristic data and user behavior sequence data corresponding to the different user characteristic data can be collected; the user feature data may refer to features of the user, for example: name, gender, age, etc.; user behavior sequence data may refer to behaviors performed by a user, such as: click on the a merchandise, click on the B video, etc.

Step 102, converting the training data into structured first data, and generating a training input sequence according to the first data.

After training data comprising a plurality of user characteristic data and a plurality of user behavior sequence data corresponding to each user characteristic data is obtained, the unstructured training data can be converted into structured first data; then, a training input sequence is generated based on the generated first data, which may be used to train the model.

Step 103, shielding partial fields in the training input sequence to obtain a first text; the masked partial field is the second text.

In order to enable a large language model to have recommendation capability, after the training input sequence is generated, the embodiment of the invention can firstly carry out shielding treatment on partial fields in the training input sequence; in the training input sequence after the masking process, part of fields are masked and part of fields are unmasked; based on the masked partial fields and the unmasked partial fields, a first text may be combined; in addition, the masked partial field in the training input sequence may be used as a second text, and the second text may refer to an actual field corresponding to the masked partial field, for example: the masked field is "watermelon", and the "watermelon" field may be used as the second text, which is not limited by the embodiments of the present invention.

And 104, adjusting the universal large language model according to the first text and the second text to obtain the target recommendation model.

After the first text and the second text are obtained, the first text and the second text can be used for adjusting a preset general large language model; the adjusted general large language model can be used as a target recommendation model. Illustratively, due to the presence of the masked partial fields in the first text, the generic large language model may derive recommendation capabilities by learning the prediction mask partial fields.

Wherein the generic large language model may refer to GLM (General Language Model, generic language model), which may be a pre-trained model, as embodiments of the invention are not limited in this respect.

Referring to fig. 2, a flowchart illustrating steps of another method for generating a recommendation model based on a large language model according to an embodiment of the present invention may include the steps of:

Step 201, acquiring training data, wherein the training data comprises a plurality of user characteristic data and a plurality of user behavior sequence data corresponding to each user characteristic data.

In order to maintain the advantages of a large language model and supplement new knowledge in recommended data, the embodiment of the invention can firstly acquire a plurality of user characteristic data acquired in advance and a plurality of user behavior sequence data corresponding to each user characteristic data.

Step 202, determining duration of behavior corresponding to each user behavior sequence data, and determining target behavior sequence data from a plurality of user behavior sequence data according to the duration of behavior corresponding to each user behavior sequence data.

Training data in a recommender system should take into account the interests and preferences of users at different times compared to plain language text. Long-term preferences are generally stable, reflecting the overall preferences of the user, which do not change frequently over time, but lack timeliness, which may not reflect the interests of the current user.

Short-term preferences tend to change frequently over time and more closely reflect the current level of interest of the user. Therefore, the embodiment of the invention can determine one or more target behavior sequence data from a plurality of user behavior sequence data based on the duration of the behavior corresponding to different behavior sequence data.

Specifically, the duration of the behavior corresponding to each user behavior sequence data on the user can be determined first, that is, whether it is a long-term behavior or a short-term behavior, etc.

After determining the duration of the behavior corresponding to each user behavior sequence data on the user, the user behavior sequence data with different duration can be obtained from the plurality of user behavior sequence data according to the duration of the behavior corresponding to each user behavior sequence data as target behavior sequence data.

In one embodiment of the present invention, the target behavior sequence data may be determined by the following sub-steps, including:

and step 11, determining a duration interval to which each user behavior sequence data belongs according to the duration of the behavior corresponding to the user behavior sequence data.

In some embodiments, the user behavior sequence data may be classified according to the duration of the behavior corresponding to the user behavior sequence data on the user; specifically, according to the duration of the behavior corresponding to the user behavior sequence data on the user, determining the duration interval to which each user behavior sequence data belongs; the duration intervals may be provided in plurality, and different duration intervals may correspond to long-term preference, medium-term preference, short-term preference, and the like.

And a sub-step 12 of acquiring target behavior sequence data from the user behavior sequence data corresponding to each duration interval according to a preset proportion.

After the duration intervals to which each user behavior sequence data belongs are determined, target behavior sequence data can be obtained from the user behavior sequence data corresponding to each duration interval according to a preset proportion. The preset proportion can be set according to practical situations, and the embodiment of the invention is not limited to the above.

As an example, the target behavior sequence data may be obtained from the user behavior sequence data corresponding to each duration interval in the following manner:

determining the quantity of data to be acquired corresponding to each duration interval according to a preset proportion; and randomly acquiring target behavior sequence data from the user behavior sequence data corresponding to each duration interval according to the number of the data to be acquired.

Specifically, the number of data to be acquired corresponding to each duration interval can be determined according to a preset proportion; for example: the duration interval comprises a long-term duration interval, a medium-term duration interval and a short-term duration interval; the preset ratio is 2:3:5; if 100 pieces of behavior sequence data are needed, it can be determined that 20 pieces of behavior sequence data need to be acquired from a long-term duration interval, 30 pieces of behavior sequence data need to be acquired from a medium-term duration interval, and 50 pieces of behavior sequence data need to be acquired from a short-term duration interval.

When determining the number of data to be acquired corresponding to each duration interval, acquiring behavior sequence data from user behavior sequence data corresponding to each duration interval according to the number of data to be acquired corresponding to each duration interval, and taking the acquired behavior sequence data as target behavior sequence data; the method and the device for obtaining the behavior sequence data are not limited in this embodiment, and the behavior sequence data can be obtained randomly from the user behavior sequence data corresponding to each duration interval or can be obtained from the user behavior sequence data corresponding to each duration interval according to a preset rule.

Step 203, determining target user characteristic data corresponding to the target behavior sequence data from the plurality of user characteristic data.

After determining the target behavior sequence data, user feature data corresponding to the target behavior sequence data may be determined from the plurality of user feature data, and the user feature data corresponding to the target behavior sequence data may be used as target user feature data.

Step 204, converting the target behavior sequence data into structured target behavior sequence data, and converting the target user characteristic data into structured target user characteristic data.

In some embodiments, the resulting target behavior sequence data and target user characteristic data may be unstructured data, which may be converted into structured target behavior sequence data for more flexible application in various downstream tasks, and unstructured target user characteristic data may be converted into structured target user characteristic data.

In one embodiment of the present invention, the structured target behavior sequence data and the structured target user feature data may be obtained by:

performing textual conversion on the target behavior sequence data and the target user characteristic data to obtain target behavior sequence data and target user characteristic data in a text form; and carrying out structural conversion on the target behavior sequence data and the target user characteristic data in the text form.

In some embodiments, the target behavior sequence data and the target user characteristic data may belong to non-textual data, such as: form, etc., whereby the non-textual target behavior sequence data and the target user characteristic data may be converted in a textual manner to obtain textual target behavior sequence data and textual target user characteristic data.

Then, the text-form target behavior sequence data and the text-form target user feature data can be subjected to structural conversion, so that the structured target behavior sequence data and the structured target user feature data are obtained.

For example, the non-textual target behavior sequence data and the target user feature data may be tabular data, and performing textual conversion on the non-textual target behavior sequence data and the target user feature data includes:

In some embodiments, the data in the form of a traditional table may be textual, thereby assisting the model in better capturing the relationships between features and behavior sequences; specifically, the target behavior sequence data in a tabular form and the target user characteristic data in a tabular form may be converted into the target behavior sequence data in a text form and the target user characteristic data in a text form.

Step 205, obtaining first data according to the structured target behavior sequence data and the structured target user characteristic data.

After obtaining the target behavior sequence data and the target user characteristic data in the structured text form, the first data may be generated from the target behavior sequence data and the target user characteristic data in the structured text form.

Illustratively, step 205 may be implemented by:

In some embodiments, structured target behavior sequence data and structured target user feature data may be pre-processed, for example: data cleaning, data supplementing, data conversion and the like; after the preprocessing, first data can be obtained.

Step 206, shielding partial fields in the training input sequence to obtain a first text; the masked partial field is the second text.

In order to enable a large language model to have recommendation capability, after the training input sequence is generated, the embodiment of the invention can firstly carry out shielding treatment on partial fields in the training input sequence; in the training input sequence after the masking process, part of fields are masked and part of fields are unmasked; based on the masked partial fields and the unmasked partial fields, a first text may be combined; in addition, the masked partial field in the training input sequence may be used as a second text, and the second text may refer to an actual field corresponding to the masked partial field.

In an embodiment of the invention, a part of the fields of the masked portion in the first text are provided with a start flag and the second text is provided with an end flag.

In some embodiments, the masked portions may be referred to as "slots", for each slot, to identify the start and end positions of the input sequence, start and end flags may be set for subsequent autoregressive generation; illustratively, the START flag may be "[ START ]" token, and the END flag may be "[ END ]" token; wherein the start flag may be on the first character of the input slot and the end flag may be the last character of the output slot.

In one embodiment of the present invention, each text unit of the first text is provided with an in-position flag and an inter-position flag, the in-position flag and the inter-position flag being mapped in the embedded vector of the corresponding text unit.

In some embodiments, a 2D position coding mechanism is extended, where each text unit is coded with two position ids (position id and inter-position id). Specifically, the inter-position id represents a specific position in the text, which is the position of the corresponding slot in the text when it comes. By location id, it is meant the location in the slot where a text unit is located.

For entities, the intra-location id and inter-location id may represent an internal relationship between entities, ranging from 1 to the length of the entity containing S for text units of the autoregressive blank portion. The intra-location id and inter-location id may be mapped into vector space.

In an embodiment of the present invention, masking a part of fields in a training input sequence to obtain a first text includes:

and shielding part of entities in the training input sequence to obtain a first text.

In one embodiment, the training input sequence may be physically masked while masking some of the fields in the training input sequence; specifically, the entity tag in the training input sequence may be used as the input sequence and then mapped to the output sequence using an automatic encoder. During the mapping process, if a certain entity tag is randomly deleted, the auto-encoder will attempt to recover this deleted tag by a combination of other entity tags. Thus, the mutual dependency relationship between the entities can be captured, and the entity information in the text can be extracted.

In another embodiment of the present invention, masking a part of fields in a training input sequence to obtain a first text includes:

And shielding part of sentences in the training input sequence to obtain a first text.

In another embodiment, the training input sequence may be sentence-level masked while masking a portion of the fields in the training input sequence; specifically, under the constraint of sentence-level labeling, the mask span must be a complete sentence. This means that in text processing, the text must be divided by sentence boundaries and each sentence ensured to be independent and complete. If a sentence is split into multiple parts, or there is a crossover between one sentence and another, then these cases are considered to be non-conforming to the rules of sentence-level labeling

In yet another embodiment of the present invention, masking a portion of a field in a training input sequence to obtain a first text includes:

and shielding part of text segments in the training input sequence to obtain a first text.

In yet another embodiment, the training input sequence may be document-level masked while masking a portion of the fields in the training input sequence; specifically, a portion of a text segment (i.e., a continuous segment or subsequence of a certain length) is selected from the training input sequence for labeling and processing. This sampling length is sampled from a uniform distribution in the range of 50% -100% of the original length. This sampling approach is to reduce the size and complexity of the labeled dataset while retaining enough information and context to train an efficient model. The segments with a certain length are randomly sampled from the training input sequence for marking, so that the diversity and coverage of marking data can be increased, and meanwhile, the marking workload and cost are reduced. It should be noted that in document level annotation, there may be information loss and bias in the sampled data, so that an appropriate sampling strategy and length range needs to be selected to ensure the representativeness and validity of the sampling.

It should be noted that, the masking at entity level, sentence level and document level may be alternatively performed; when multitasking is required, multiple levels of masking may be performed simultaneously, as the embodiments of the invention are not limited in this regard.

As an example, the slots of the masked portions of the training input sequence are arranged in chronological order.

In some embodiments, the order of the slots of the masked portions of the training input sequence are arranged in a chronological order to maintain the interrelationship between the different entities. Based on this, the pretraining goal of the index sequence [1,2, ], m ] of length m can be defined as expressed as follows:

(1)

wherein p is probability, θ is model parameter, s _i For the ith slot, x _corrupt Is the first text.

Step 207, processing the first text using a bi-directional encoder of the generic large language model and processing the second text using a uni-directional decoder of the generic large language model.

In some embodiments, after the first text and the second text are obtained, the first text may be processed using a bi-directional encoder of a generic large language model and the second text may be processed using a uni-directional decoder of the generic large language model; based on this processing, the processing result of the bi-directional encoder and the processing result of the unidirectional decoder can be obtained.

And step 208, according to the processing result of the bidirectional encoder and the processing result of the unidirectional decoder, adjusting model parameters of the universal large language model to obtain a target recommendation model.

Based on the processing result of the bidirectional encoder and the processing result of the unidirectional decoder, the universal large language model can be adjusted; specifically, the model parameters of the general large language model can be adjusted according to the processing result of the bidirectional encoder and the processing result of the unidirectional decoder, and the adjusted general large language model is the target recommendation model.

Illustratively, the GLM may be GLM-10B (two-way dense model) which is adapted to the specific recommended task of the present invention using an efficient low-rank adaptation parameter tuning method.

Specifically, a trainable low-rank decomposition matrix is injected into the converter architecture in GLM-10B, adamW optimization is performed using GPU (Graphics Processing Unit, graphics processor) device resources.

To achieve efficient use of memory and distributed training, a depth speed module may be used, with each GPU set to a batch size of 32, a peak learning rate of 1X 10-5, and a maximum length of input text units of 1024.

After the target recommendation model is obtained, the following data recommendation may be performed based on the target recommendation model:

receiving a recommended task; and inputting the recommendation task into the target recommendation model, and receiving target recommendation data output by the target recommendation model.

In some embodiments, a recommendation task may be generated when data recommendation needs to be actively made to the user, or a recommendation service needs to be provided to the user in response to an operation by the user; the recommendation task may include feature data for the user for use in predicting the user.

After receiving the recommendation task, the recommendation task can be input into a target recommendation model; the target recommendation model predicts the recommendation task and can output target recommendation data, wherein the target recommendation data can be an entity, a sentence or a section of speech.

In an embodiment of the present invention, a target recommendation model is configured to determine, when a target text unit of target recommendation data is generated, whether the target text unit and a last generated text unit belong to the same entity; and when the target text unit and the last generated text unit belong to the same entity, updating the in-position mark of the same entity.

In some embodiments, the target recommendation data may be composed of one or more target text units; when generating the target recommendation data, the target recommendation model can firstly judge whether the currently generated target text unit and the last generated text unit belong to the same entity.

If the target text unit and the last generated text unit do not belong to the same entity, the generation of the current entity can be ended, and the generation of the next entity can be started; otherwise, if the target text unit and the last generated text unit belong to the same entity, the in-position mark of the same entity can be updated. By way of example, a Trie algorithm (prefix tree algorithm) may be utilized to check whether the target text unit and the last generated text unit belong to the same entity, as in the embodiment of the invention, which is not limiting.

In an embodiment of the present invention, the target recommendation model is provided with an entity pool, and the entity pool includes a plurality of entities; the target recommendation model is used for deleting entities in the entity pool, which are not matched with the target text unit, when the target text unit is generated.

In some embodiments, one or more entity pools may be pre-established, which may include all entities present in the recommended scenario set for that scenario; after the target recommendation model generates the target text unit, the entity which is not matched with the target text unit in the entity pool can be deleted to filter out a part of the entities, so that the speed of matching the entities is increased.

For example, as shown in fig. 3, a target recommendation model may be generated in the recommendation system in advance based on the above method; then, when receiving the recommended task, the recommendation system can input the recommended task into the target recommendation model, and push the recommended data output by the target recommended task to the user.

In the embodiment of the invention, training data can be acquired first, wherein the training data comprises a plurality of user characteristic data and a plurality of user behavior sequence data corresponding to each user characteristic data; then determining the duration of the behavior corresponding to each user behavior sequence data, and determining target behavior sequence data from a plurality of user behavior sequence data according to the duration of the behavior corresponding to each user behavior sequence data; then, determining target user characteristic data corresponding to the target behavior sequence data from the plurality of user characteristic data, and generating a training input sequence according to the target behavior sequence data and the target user characteristic data; masking part of fields in the training input sequence to obtain a first text; the masked partial field is a second text; processing the first text using a bi-directional encoder of the generic large language model and processing the second text using a uni-directional decoder of the generic large language model; and according to the processing result of the bidirectional encoder and the processing result of the unidirectional decoder, adjusting model parameters of the universal large language model to obtain a target recommendation model. According to the embodiment of the invention, the model can be trained based on the preferences of users with different durations, so that the model can be considered more comprehensively. And different levels of shielding are set, so that different recommended tasks can be compatible.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Referring to fig. 4, a schematic structural diagram of a recommendation model generating device based on a large language model according to an embodiment of the present invention may include the following modules:

an acquisition module 401, configured to acquire training data, where the training data includes a plurality of user feature data and a plurality of user behavior sequence data corresponding to each user feature data;

a generating module 402, configured to generate a training input sequence according to training data;

a masking module 403, configured to mask a part of fields in the training input sequence to obtain a first text; the masked partial field is a second text;

and the adjustment module 404 is configured to adjust the universal large language model according to the first text and the second text to obtain the target recommendation model.

In an alternative embodiment of the present invention, the generating module 402 is configured to determine a duration of time for which the behavior corresponding to each user behavior sequence data is stored, and determine target behavior sequence data from the plurality of user behavior sequence data according to the duration of time for which the behavior corresponding to each user behavior sequence data is stored; determining target user characteristic data corresponding to the target behavior sequence data from a plurality of user characteristic data; and generating a training input sequence according to the target behavior sequence data and the target user characteristic data.

In an alternative embodiment of the present invention, the generating module 402 is configured to pre-process the target behavior sequence data and the target user feature data to obtain the training input sequence.

In an alternative embodiment of the present invention, the generating module 402 is configured to perform a textual conversion on the non-textual target behavior sequence data and the target user feature data to obtain the training input sequence when the target behavior sequence data and the target user feature data are non-textual data.

In an alternative embodiment of the present invention, the non-text target behavior sequence data and the target user feature data are data in a table form, and the generating module 402 is configured to convert the target behavior sequence data and the target user feature data in the table form into the target behavior sequence data and the target user feature data in a text form.

In an alternative embodiment of the present invention, the generating module 402 is configured to convert the unstructured target behavior sequence data into structured target behavior sequence data and convert the unstructured target user feature data into structured target user feature data when the target behavior sequence data and the target user feature data are unstructured data; and generating a training input sequence according to the structured target behavior sequence data and the structured target user characteristic data.

In an optional embodiment of the present invention, the generating module 402 is configured to determine, according to a duration of time of the behavior corresponding to the user behavior sequence data, a duration interval to which each user behavior sequence data belongs; and acquiring target behavior sequence data from the user behavior sequence data corresponding to each duration interval according to a preset proportion.

In an alternative embodiment of the present invention, the generating module 402 is configured to determine, according to a preset ratio, a number of data to be acquired corresponding to each duration interval; and randomly acquiring target behavior sequence data from the user behavior sequence data corresponding to each duration interval according to the number of the data to be acquired.

In an alternative embodiment of the present invention, the adaptation module 404 is configured to process the first text using a bi-directional encoder of a generic large language model and process the second text using a uni-directional decoder of the generic large language model; and according to the processing result of the bidirectional encoder and the processing result of the unidirectional decoder, adjusting model parameters of the universal large language model to obtain a target recommendation model.

In an alternative embodiment of the present invention, the masking module 403 is configured to mask a part of the entities in the training input sequence to obtain the first text.

In an alternative embodiment of the present invention, the masking module 403 is configured to mask a portion of sentences in the training input sequence to obtain the first text.

In an alternative embodiment of the present invention, the masking module 403 is configured to mask a portion of the text segment in the training input sequence to obtain the first text.

In an alternative embodiment of the invention, the slots of the masked portions of the training input sequence are arranged in chronological order.

In an alternative embodiment of the invention, part of the fields of the masked portion of the first text are provided with a start flag and the second text is provided with an end flag.

In an alternative embodiment of the invention, each text unit of the first text is provided with an in-position marker and an inter-position marker, the in-position marker and the inter-position marker being mapped in the embedded vector of the corresponding text unit.

In an alternative embodiment of the present invention, the apparatus further comprises:

In an optional embodiment of the present invention, the target recommendation model is configured to determine, when generating a target text unit of target recommendation data, whether the target text unit and a last generated text unit belong to the same entity; and when the target text unit and the last generated text unit belong to the same entity, updating the in-position mark of the same entity.

In an optional embodiment of the present invention, the target recommendation model is provided with an entity pool, and the entity pool includes a plurality of entities; the target recommendation model is used for deleting entities in the entity pool, which are not matched with the target text unit, when the target text unit is generated.

In the embodiment of the invention, training data comprising a plurality of user characteristic data and a plurality of user behavior sequence data corresponding to each user characteristic data can be acquired first; then generating an input sequence for training according to the data for training; shielding partial fields in the training input sequence to obtain a first text; the masked partial field is a second text; and then, according to the first text and the second text, adjusting the universal large language model to obtain the target recommendation model. According to the embodiment of the invention, the large language model can be applied to the recommendation system, and the recommendation model with stronger prediction capability can be obtained, so that more accurate recommendation service can be provided for users.

The embodiment of the invention also provides an electronic device, as shown in fig. 5, the electronic device 5 includes a processor 501, a memory 502, and a computer program stored on the memory 502 and capable of running on the processor, and the computer program realizes the method of wear leveling as above when being executed by the processor.

The embodiment of the invention also provides a non-volatile computer readable storage medium, as shown in fig. 6, on which non-volatile computer readable storage medium 6 is stored a computer program 601, which computer program 601 when executed by a processor implements the method of wear leveling as above.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing has described in detail the principles and implementations of the present invention with specific examples applied thereto, the foregoing examples being provided only to facilitate an understanding of the methods and core ideas thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A method for generating a recommendation model based on a large language model, the method comprising:

According to the first text and the second text, adjusting a general large language model to obtain a target recommendation model;

wherein the converting the training data into structured first data comprises:

determining the duration of the behavior corresponding to each user behavior sequence data, and determining the duration interval of each user behavior sequence data according to the duration of the behavior corresponding to the user behavior sequence data;

determining the quantity of data to be acquired corresponding to each duration interval according to a preset proportion;

according to the quantity of the data to be acquired, randomly acquiring target behavior sequence data from user behavior sequence data corresponding to each duration interval;

2. The method of claim 1, wherein the converting the target behavior sequence data into structured target behavior sequence data and converting the target user characteristic data into structured target user characteristic data comprises:

and carrying out structural conversion on the target behavior sequence data and the target user characteristic data in the text form.

3. The method according to claim 2, wherein the non-textual target behavior sequence data and the target user feature data are in the form of a table, and the transforming the target behavior sequence data and the target user feature data into textual target behavior sequence data and target user feature data includes:

4. A method according to any one of claims 1-3, wherein the obtaining the first data from the structured target behavior sequence data and the structured target user profile data comprises:

5. The method of claim 1, wherein the adjusting the generic large language model to obtain the target recommendation model based on the first text and the second text comprises:

6. The method of claim 1, wherein masking a portion of the fields in the training input sequence to obtain the first text comprises:

7. The method of claim 1, wherein masking a portion of the fields in the training input sequence to obtain the first text comprises:

8. The method of claim 1, wherein masking a portion of the fields in the training input sequence to obtain the first text comprises:

9. The method according to any one of claims 6 to 8, wherein,

the slots of the masked portions of the training input sequence are arranged in chronological order.

10. The method of claim 1, wherein a portion of the fields of the masked portion of the first text are provided with a start flag and the second text is provided with an end flag.

11. The method of claim 1, wherein each text unit of the first text is provided with an in-position flag and an inter-position flag, the in-position flag and the inter-position flag being mapped in an embedded vector of the corresponding text unit.

12. The method according to claim 1, wherein the method further comprises:

receiving a recommended task;

13. The method of claim 12, wherein the step of determining the position of the probe is performed,

The target recommendation model is used for judging whether the target text unit and a last generated text unit belong to the same entity when generating the target text unit of target recommendation data; and when the target text unit and the last generated text unit belong to the same entity, updating the in-position mark of the same entity.

14. The method of claim 13, wherein the step of determining the position of the probe is performed,

the target recommendation model is provided with an entity pool, and the entity pool comprises a plurality of entities; the target recommendation model is used for deleting entities which are not matched with the target text unit in the entity pool when the target text unit is generated.

15. A large language model-based recommendation model generation apparatus, the apparatus comprising:

The adjustment module is used for adjusting the universal large language model according to the first text and the second text to obtain a target recommendation model;

the generation module is used for determining duration of the behavior corresponding to the user behavior sequence data and determining duration intervals of the behavior corresponding to the user behavior sequence data according to the duration of the behavior corresponding to the user behavior sequence data; determining the quantity of data to be acquired corresponding to each duration interval according to a preset proportion; according to the quantity of the data to be acquired, randomly acquiring target behavior sequence data from user behavior sequence data corresponding to each duration interval; determining target user characteristic data corresponding to the target behavior sequence data from the plurality of user characteristic data; converting the target behavior sequence data into structured target behavior sequence data, and converting the target user feature data into structured target user feature data; and obtaining first data according to the structured target behavior sequence data and the structured target user characteristic data.

16. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing a method of generating a large language model based recommendation model as claimed in any one of claims 1 to 14.

17. A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium has stored thereon a computer program that, when executed by a processor, implements the large language model-based recommendation model generation method according to any one of claims 1 to 14.