CN116522996A

CN116522996A - Training method of recommendation model, recommendation method and related device

Info

Publication number: CN116522996A
Application number: CN202310385125.XA
Authority: CN
Inventors: 王伟; 许涛; 刘畅; 杜涛
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-04-03
Filing date: 2023-04-03
Publication date: 2023-08-01

Abstract

The embodiment of the application discloses a training method of a recommendation model, a recommendation method and a related device, wherein a user behavior sequence sample marked with a real result is input into the recommendation model to obtain an overall value vector. And then calibrating the overall value vector according to the pricing value vector to obtain a calibrated value vector. Then, the calibrated value vector and various behavior features of the previous user are sent into the recommendation model again to obtain a prediction result, the calibrated value vector can help the recommendation model to obtain a more accurate prediction result, and a loss function is determined based on the behavior features, the prediction result and the real result; and updating parameters of the recommended model based on the loss function until the loss function converges.

Description

Training method of recommendation model, recommendation method and related device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a training method, a recommending method and a related device of a recommending model.

Background

In the prior art, the recommended model is generally a neural network model such as MLP, the training of the model simply depends on vector samples, various characteristic data are converted into vectors which are spliced together, the vectors are sent into the model for learning and training, and deviation of the samples cannot be corrected and calibrated in time, so that the recommended content is inaccurate.

Disclosure of Invention

The embodiment of the application provides a training method, a recommending method and a related device of a recommending model, wherein the technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a training method of a recommendation model, where the training method of the recommendation model includes: acquiring behavior sequence samples, wherein each behavior sequence sample is a behavior feature sequence marked with a real result, and the behavior feature sequence is a sequence formed by a plurality of behavior features arranged according to time; inputting the behavior sequence sample into a recommendation model to obtain an overall value vector; obtaining a calibration value vector according to the pricing value vector and the overall value vector; inputting the calibration value vector and the behavior sequence sample into the recommendation model to obtain a prediction result; determining a loss function based on the behavioral characteristics, the predicted outcome, and the actual outcome; and updating parameters of the recommendation model based on the loss function until the loss function converges.

In a second aspect, embodiments of the present application provide a recommendation method, including: acquiring a behavior feature sequence to be recommended; and inputting the behavior feature sequence to be recommended into a recommendation model to obtain a prediction result, wherein the recommendation model is the recommendation model.

In a third aspect, an embodiment of the present application provides a training apparatus for a recommendation model, where the training apparatus for a recommendation model includes: the system comprises an acquisition module, a judgment module and a judgment module, wherein the acquisition module is used for acquiring behavior sequence samples, each behavior sequence sample is a behavior characteristic sequence marked with a real result, and the behavior characteristic sequence is a sequence formed by a plurality of behavior characteristics arranged according to time; the coding module is used for inputting the behavior feature sequence into the recommendation model to obtain an overall value vector; the calibration module is used for obtaining a calibration value vector according to the pricing value vector and the overall value vector; the decoding module is used for inputting the calibration value vector and the behavior sequence sample into the recommendation model to obtain a prediction result; a loss module for determining a loss function based on the behavioral characteristics and the prediction results; and the updating module is used for updating parameters of the recommendation model based on the loss function until the loss function converges.

In a fourth aspect, embodiments of the present application provide a recommendation device, including: the acquisition module is used for acquiring a behavior feature sequence to be recommended; and the input module is used for inputting the behavior feature sequence to be recommended into a recommendation model to obtain a prediction result, wherein the recommendation model is the recommendation model.

In a fifth aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.

In a sixth aspect, embodiments of the present application provide an electronic device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

The technical scheme provided by some embodiments of the present application has at least the following beneficial effects:

in one or more embodiments of the present application, the overall value vector is obtained by inputting the user behavior sequence samples labeled with the real results into a recommendation model. And then calibrating the overall value vector according to the pricing value vector to obtain a calibration value vector, thereby realizing timely correction and calibration of the deviation of the sample. Then, the calibrated value vector and various behavior features of the previous user are sent into the recommendation model again to obtain a prediction result, the calibrated value vector can help the recommendation model to obtain a more accurate prediction result, and a loss function is determined based on the behavior features, the prediction result and the real result; and updating parameters of the recommended model based on the loss function until the loss function converges. After the recommendation model is trained, the recommendation content obtained by using the recommendation model is more accurate than the recommendation content obtained by using the model in a traditional mode, and the problem that the recommendation content is inaccurate due to the fact that deviation of a sample cannot be corrected and calibrated in time is solved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of a training system of a recommendation model provided in the present specification

Fig. 2 is a flow chart of a training method of a recommendation model provided in the present specification.

Fig. 3 is a schematic diagram of a recommendation model network according to the corresponding embodiment of fig. 2.

Fig. 4 is a flowchart of a specific implementation of step S100 in the training method of the recommendation model according to the corresponding embodiment of fig. 2.

FIG. 5 is a flowchart showing a specific implementation of step S130 in the training method of the recommendation model according to the corresponding embodiment of FIG. 4.

FIG. 6 is a flowchart of a specific implementation of step S200 in the training method of the recommendation model according to the corresponding embodiment of FIG. 2.

FIG. 7 is a flowchart of a specific implementation of step S210 in the training method of the recommendation model according to the corresponding embodiment of FIG. 6.

Fig. 8 is a schematic flow chart of a recommendation method provided in the present specification.

Fig. 9 is a flowchart of a specific implementation of step S900 in the recommendation method according to the corresponding embodiment of fig. 8.

Fig. 10 is a flowchart of a specific implementation of step S930 in the recommendation method according to the corresponding embodiment of fig. 9.

Fig. 11 is a schematic structural diagram of a training device of a recommendation model provided in the present specification.

Fig. 12 is a schematic structural diagram of a training device of a recommendation model provided in the present specification.

Fig. 13 is a schematic structural view of an electronic device provided in the present specification.

Fig. 14 is a schematic diagram of the structure of the operating system and user space provided in this specification.

Fig. 15 is an architecture diagram of the android operating system of fig. 14.

FIG. 16 is an architecture diagram of the IOS operating system of FIG. 14.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the embodiments of the present application, are within the scope of the embodiments of the present application.

In the description of the embodiments of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the embodiments of the present application, it should be noted that, unless explicitly stated and limited otherwise, the word "comprise" and "having" and any variations thereof, is intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the above terms in the embodiments of the present application will be understood by those of ordinary skill in the art in a specific context. Furthermore, in the description of the embodiments of the present application, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The embodiments of the present application are described in detail below in connection with specific embodiments.

Referring to fig. 1, a schematic view of a scenario of a training system of a recommendation model according to an embodiment of the present application is provided. As shown in fig. 1, the training system of the recommendation model may include at least a client cluster and a service platform 100.

The client cluster may include at least one client, as shown in fig. 1, specifically including a client 1 corresponding to a user 1, a client 2 corresponding to a user 2, …, and a client n corresponding to a user n, where n is an integer greater than 0.

Each client in the client cluster may be a communication-enabled electronic device including, but not limited to: wearable devices, handheld devices, personal computers, tablet computers, vehicle-mounted devices, smart phones, computing devices, or other processing devices connected to a wireless modem, etc. Electronic devices in different networks may be called different names, for example: a user equipment, an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent or user equipment, a cellular telephone, a cordless telephone, a personal digital assistant (personal digital assistant, PDA), an electronic device in a 5G network or future evolution network, and the like.

The service platform 100 may be a separate server device, such as: rack-mounted, blade, tower-type or cabinet-type server equipment or hardware equipment with stronger computing capacity such as workstations, mainframe computers and the like is adopted; the server cluster may also be a server cluster formed by a plurality of servers, and each server in the server cluster may be formed in a symmetrical manner, wherein each server is functionally equivalent and functionally equivalent in a transaction link, and each server may independently provide services to the outside, and the independent provision of services may be understood as no assistance of another server is needed.

In one or more embodiments of the present application, the service platform 100 may establish a communication connection with at least one client in the client cluster, based on which interactions of data in the training process of the recommendation model, such as online transaction data interactions, are completed, the transaction data including, but not limited to, various types of behavioral characteristic data interactions, and the specific transaction service type is determined based on the actual application.

The recommendation model obtained by the service platform 100 based on the training method of the recommendation model of the embodiment of the application can realize content recommendation to the client; as another example, the service platform 100 may obtain training data and user behavior data from clients.

It should be noted that, the service platform 100 establishes a communication connection with at least one client in the client cluster through a network for interactive communication, where the network may be a wireless network, or may be a wired network, where the wireless network includes, but is not limited to, a cellular network, a wireless local area network, an infrared network, or a bluetooth network, and the wired network includes, but is not limited to, an ethernet network, a universal serial bus (universal serial bus, USB), or a controller area network. In one or more embodiments of the specification, techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like are used to represent data exchanged over a network (e.g., target compression packages). All or some of the links may also be encrypted using conventional encryption techniques such as secure socket layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), and the like. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

The embodiment of the training system for the recommendation model provided by the embodiment of the present application belongs to the same concept as the training method for the recommendation model in one or more embodiments, and an execution subject corresponding to the training method for the recommendation model related to one or more embodiments in the description may be the service platform 100 described above; the execution subject corresponding to the training method of the recommendation model in one or more embodiments of the specification may also be an electronic device corresponding to the client, which is specifically determined based on an actual application environment. The implementation process of the training system embodiment of the recommendation model may be described in detail in the following method embodiment, which is not described herein.

Based on the schematic view of the scenario shown in fig. 1, a detailed description is provided below of a training method of a recommendation model provided in one or more embodiments of the present application.

Referring to fig. 2, a flow diagram of a method of training a recommendation model, which may be implemented in a computer program and may be executed on a von neumann system-based training device, is provided for one or more embodiments of the present application. The computer program may be integrated in the application or may run as a stand-alone tool class application. The training device of the recommendation model can be a service platform.

Specifically, the training method of the recommendation model comprises the following steps:

step S000, obtaining behavior sequence samples, wherein each behavior sequence sample is a behavior feature sequence marked with a real result, and the behavior feature sequence is a sequence formed by a plurality of behavior features arranged according to time.

And step S100, inputting the behavior sequence sample into a recommendation model to obtain an overall value vector.

And step 200, obtaining a calibration value vector according to the pricing value vector and the overall value vector.

And step S300, inputting the calibration value vector and the behavior sequence sample into the recommendation model to obtain a prediction result.

Step S400, determining a loss function based on the behavior feature, the prediction result and the real result.

And step S500, updating parameters of the recommendation model based on the loss function until the loss function converges.

In an embodiment of the present application, the overall value vector is obtained by inputting the user behavior sequence samples marked with the real results into a recommendation model. And then calibrating the overall value vector according to the pricing value vector to obtain a calibration value vector, thereby realizing timely correction and calibration of the deviation of the sample. Then, the calibrated value vector and various behavior features of the previous user are sent into the recommendation model again to obtain a prediction result, the calibrated value vector can help the recommendation model to obtain a more accurate prediction result, and a loss function is determined based on the behavior features, the prediction result and the real result; and updating parameters of the recommended model based on the loss function until the loss function converges. After the recommendation model is trained, the recommendation content obtained by using the recommendation model is more accurate than the recommendation content obtained by using the model in a traditional mode, and the problem that the recommendation content is inaccurate due to the fact that deviation of a sample cannot be corrected and calibrated in time is solved.

In one embodiment of the present application, the recommendation model includes an encoder and a decoder; the step S100 specifically includes: inputting the behavior sequence samples into an encoder to obtain an overall value vector; the step S300 specifically includes: and inputting the calibration value vector and the behavior sequence sample into the decoder to obtain a prediction result.

In the actual implementation process, the recommendation model can be trained by using only one neural network model, and can also be jointly trained by adopting a plurality of neural network models.

When joint training is required, the recommended model may include an encoder and decoder that train with the same neural network model, as shown in fig. 3.

The method comprises the following specific steps: acquiring behavior sequence samples, wherein each behavior sequence sample is a behavior characteristic sequence marked with a real result; inputting the behavior sequence samples into an encoder to obtain an overall value vector; obtaining a calibration value vector according to the pricing value vector and the overall value vector; inputting the calibration value vector and the behavior sequence sample into the decoder to obtain a prediction result; determining a loss function based on the behavioral characteristics, the predicted outcome, and the actual outcome; and updating parameters of the encoder and the decoder based on the loss function until the loss function converges. After the model is trained, the decoder is only required to be taken out to be used as a recommendation model for actual use for deployment.

When training is performed by using a neural network model, it can be simply understood that both the encoder and the decoder are the neural network model. The encoder and the decoder are adopted for joint training, the updated parameters of the encoder and the decoder can be different, the pertinence and the accuracy of the model parameter updating can be improved, the training model is more accurate, and the efficiency is higher.

In step S000, the behavioral characteristics may include user characteristics, recommended item characteristics, and pre-training characteristics. The user characteristics comprise user basic information, user behavior and other basic information related to the user. The recommended item features include basic features and aggregated features of objects that interact with the user. The pre-training features are obtained after the user behavior time sequence is processed, wherein the user behavior time sequence is a time sequence formed by arranging the user behaviors according to the time sequence, and the user behavior time sequence only comprises the user behaviors and does not comprise the features of the behavior objects.

Specifically, the way to obtain the pre-training features may be: and inputting the user behavior time sequence into a pre-training model, and outputting pre-training features by the pre-training model.

The training method of the pre-training model specifically comprises the following steps: acquiring a user behavior time sequence sample set, wherein each user behavior time sequence sample is calibrated with corresponding pre-training features in advance; respectively inputting the data of each user behavior time sequence sample into a pre-training model to obtain pre-training characteristics output by the pre-training model; if the data of the user behavior time sequence sample is input into the pre-training model, the obtained pre-training model is inconsistent with pre-training features calibrated in advance for the user behavior time sequence sample, and the coefficient of type grade judgment is adjusted until the obtained pre-training model is consistent with the pre-training features calibrated in advance for the user behavior time sequence sample; after the data of all the user behavior time sequence samples are input into the pre-training model, the obtained pre-training model is consistent with pre-training features calibrated in advance for the user behavior time sequence samples, and training is finished.

It will be appreciated that the behavioral characteristics entered into the recommendation model may include only user characteristics and recommendation characteristics, and may also include user characteristics, recommendation characteristics, and pre-training characteristics.

In step S100, the specific process flow of the recommended model may be referred to as the following embodiment.

Specifically, in some embodiments, the specific implementation of step S100 may refer to fig. 4. Fig. 4 is a detailed description of step S100 in the training method of the recommendation model according to the corresponding embodiment of fig. 2, where the recommendation model includes a plurality of behavior towers, each of the behavior towers corresponds to a different type of behavior feature, and step S100 may include the following steps:

step S110, the behavior features are sequentially and one by one input into the recommendation model.

Step S120, each time a behavior feature is input, a corresponding behavior tower is called according to the type of the behavior feature.

And step S130, inputting each behavior characteristic into the corresponding behavior tower to obtain a corresponding target value.

Step S140, determining an overall value vector according to each target value.

In an embodiment of the present application, the recommendation model contains a number of behavioral towers, the number of which is fixed. Each behavior tower corresponds to a type of behavior feature, and the types corresponding to each behavior tower are different.

When the behavior features are sequentially and one by one input into the recommendation model, each time a behavior feature is input, according to the type of the behavior feature, a corresponding behavior tower is called, the behavior feature is input into the behavior tower, so that a corresponding target value is obtained, and after the target value corresponding to each behavior feature is obtained, a first total value can be obtained.

The above-described behavioral tower is essentially a simple neural network, such as DNN. The method has the advantages that the user behavior characteristics including the user characteristics, the recommended item characteristics and the pre-training characteristics input by the characteristic layer are integrated, a vector which is integrated with all the characteristics is formed, and compared with the characteristics in the characteristic layer, the vector is spliced together by multiple characteristics, the dimension is smaller, the target value can be calculated more conveniently, and the vector is also more convenient to input into a downstream time sequence network.

According to the embodiment of the specification, the behavior towers are adopted, so that various problems caused by different coding lengths of different user behavior characteristic data can be effectively solved, the behavior characteristic data are arranged into the behavior characteristic sequences, the corresponding behavior towers are called once every time each behavior characteristic is input to obtain the behavior data, and the number of behavior characteristics in one behavior characteristic sequence corresponds to the number of times of calling the behavior towers, so that the recommendation model can adapt to the behavior characteristic sequences with various lengths, various problems caused by different lengths of processing the user characteristic data before are solved, the cutting-off of the ultra-long user characteristic data is not needed, the information loss is not brought, the supplement of the ultra-short user characteristic data is not needed, and the problem of storage waste is not caused.

In step S110, the specific manner of arranging each behavior feature according to the time sequence and inputting each behavior feature into the recommendation model one by one may be that, after arranging each behavior feature according to the time sequence, one behavior feature with the earliest time is firstly input into the recommendation model, and the following behavior features are sequentially input into the recommendation model according to the arranged sequence.

In another embodiment, the behavior input recommendation model with the earliest time can be sequentially extracted from the set of all the behavior features until all the behavior features are input into the recommendation model.

In step S120, when the behavior features are sequentially input into the recommendation model one by one, each behavior feature is input, a corresponding behavior tower is called according to the type of the behavior feature, and the behavior feature is input into the behavior tower.

For example, in one embodiment there are three behavioral towers, behavioral tower A, behavioral tower S, and behavioral tower D.

The behavior tower A corresponds to the behavior feature with the type a, the behavior tower S corresponds to the behavior feature with the type S, and the behavior tower D corresponds to the behavior feature with the type D. And for a behavior feature sequence, the types of the contained behavior features are asadd according to the sequence, and when the behavior feature sequence is input into a recommendation model, the calling sequence of the behavior tower is that the behavior tower A is called firstly, then the behavior tower S is called, then the previous behavior tower A is called, then the behavior tower D is called, and then the previous behavior tower D is called.

In step S130, the behavior tower performs an integration operation on the plurality of user features, recommended item features, and pre-training features that constitute one behavior feature, to finally obtain a target value.

Specifically, in some embodiments, the specific implementation of step S130 may refer to fig. 5. Fig. 5 is a detailed description of step S130 in the training method of the recommended model according to the corresponding embodiment of fig. 4, where step S130 may include the following steps:

step S132, inputting each behavior feature into the corresponding behavior tower, to obtain a corresponding behavior value and a behavior probability.

Step S134, corresponding target values are determined according to the behavior values and the behavior probabilities of each behavior feature.

In the embodiment of the application, the behavior tower performs integrated operation on the behavior characteristics to obtain the behavior value and the behavior probability corresponding to the behavior characteristics, and then obtains the target value based on the behavior value and the behavior probability.

In step S132, the behavioral value is a weight matrix representing the value generated in the recommended scenario. The behavior probability is a vector that represents the probability of proceeding to the next behavior.

Specifically, in some embodiments, the following embodiments may be referred to for a specific implementation of step S132. In this embodiment, the details of step S132 in the training method of the recommendation model according to the corresponding embodiment of fig. 5 are described, where the behavioral towers are arranged in sequence, and step S132 may include the following steps:

and inputting the behavior characteristics and the preset probability into a corresponding behavior tower to obtain corresponding behavior value and behavior probability.

And taking the behavior probability as a preset probability input of the next behavior tower until all the behavior towers obtain the behavior value and the behavior probability.

In this embodiment, the input to the behavioral tower includes a preset probability in addition to the behavioral characteristics. The preset probability of the first behavioral tower defaults to 1, and the preset probabilities of other behavioral towers are all the behavioral probabilities output by the last behavioral tower. That is, after one behavior tower obtains the behavior value and the behavior probability through the integration operation, the behavior probability output by the behavior tower is input to the next behavior tower immediately adjacent to the behavior tower. The behavior probability output by the behavior tower represents the probability that the next behavior occurs after the behavior represented by the behavior characteristics input by the behavior tower occurs, namely the probability that the next behavior occurs can be understood, and the probability that the next behavior occurs is input into the next behavior tower and is equivalent to adding a probability weight to the operation of the next behavior tower so as to ensure the accuracy of the operation result.

In step S134, the behavior probability represents the probability that the next behavior occurs, which is calculated by the last behavior through the behavior tower. Since the calculation of the behavior value of the next behavior is related to the next behavior, the probability of the behavior needs to be introduced, and the weighted calculation is performed to obtain the weighted behavior value.

In one embodiment, the calculation of the target value is a behavioral value point multiplied by a behavioral probability.

In step S140, the target values corresponding to all the behavior features in the behavior sequence sample are aggregated together, so as to obtain an overall value vector, where each dimension (or each component) of the overall value vector is a target value corresponding to a behavior feature.

In step S200, the overall value vector is calibrated by the pricing value vector, so that timely correction and calibration of the deviation of the sample appearance are realized.

Specifically, in some embodiments, the specific implementation of step S200 may refer to fig. 6. Fig. 6 is a detailed description of step S200 in the training method of the recommended model according to the corresponding embodiment of fig. 2, where step S200 may include the following steps:

step S210, a pricing calibration algorithm is used to determine the pricing value vector.

And step S220, calibrating the overall value vector according to the pricing value vector to obtain a calibrated value vector.

In this embodiment, a pricing calibration algorithm is first used to obtain a pricing value vector, and then the pricing value vector is used to calibrate the overall value vector. The pricing calibration algorithm may be executed by a pricing calibration module that determines a uniform pricing for each behavior and calibrates a corresponding target value based on the uniform pricing.

The unified pricing is based on analysis and algorithm back measurement of a large number of previous behavior data of the user, so that the specific value of each behavior is obtained, for example, the value of clicking behavior is often larger than that of browsing behavior, the importance degree of the user on different behaviors can be obtained from big data through pricing of each behavior, and the products which are most focused and interested by the user at the moment can be more accurately recommended to the user through weighted recommendation among the behaviors with different importance degrees. Through unified pricing, the problem of inaccurate judgment results caused by lack of inactive user data can be solved. The richness of the real user behavior data is very inconsistent, the behavior data of inactive users are very sparse or missing compared with those of active users, and some data quality cannot fully reflect the behavior value. The model is not easy to train effectively on the data, and the effect of the model can obviously fluctuate. And the behavior value of the inactive user is corrected through unified pricing of big data, so that the behavior value of the inactive user is close to the real behavior value as much as possible, and the inactive user is more suitable for model training. For use, the prediction result of the model trained in the mode, which is output by the model and aims at the inactive user, is more accurate than the traditional model.

In step S210, the pricing value vector may be determined directly from the user pricing calibration algorithm, and the following embodiments may be referred to for specific steps of the pricing algorithm:

specifically, in some embodiments, the specific implementation of step S210 may refer to fig. 7. Fig. 7 is a detailed description of step S210 in the training method of the recommended model according to the corresponding embodiment of fig. 6, where step S210 may include the following steps:

and step S212, determining the pricing value corresponding to each behavior feature.

And step S214, determining the pricing value vector according to the pricing value corresponding to each behavior feature.

In this embodiment, the pricing values corresponding to the behavior features included in the overall value vector are determined first, then the pricing values corresponding to the behavior features are aggregated together to obtain a pricing value vector,

in step S212, the manner of determining the pricing value corresponding to each behavior feature may be obtained by querying a correspondence table of behaviors and values. The corresponding relation table of the behaviors and the values is obtained based on the past massive behavior data analysis and algorithm return test of the user, and the table is a multi-row two-list (one is the behavior and the other is the value), and can be converted into a standard calibration vector.

In other embodiments, the pricing value corresponding to each action may also be determined by a correlation calculation formula.

In other embodiments, the corresponding pricing value may also be determined by a pricing model.

In particular, in some embodiments, the behavior sequence samples may be input into a pricing model that outputs pricing values for each of the behavior features in the behavior sequence samples.

The training method of the pricing model specifically comprises the following steps: acquiring a behavior sequence sample set, wherein the behavior sequence sample set comprises a plurality of behavior sequence samples, and each behavior sequence sample is used for calibrating the pricing value corresponding to each behavior feature in the behavior sequence sample; inputting the behavior sequence samples in the behavior sequence sample set into the pricing model to obtain pricing values corresponding to each behavior feature in the behavior sequence samples output by the pricing model; if the pricing value corresponding to each behavior feature in the obtained behavior sequence sample is inconsistent with the pricing value corresponding to each behavior feature in the calibrated behavior sequence sample after the behavior sequence samples with the ratio not exceeding the preset ratio are input into the pricing model, the coefficient of the pricing model is adjusted; and if the behavior sequence samples exceeding the preset proportion in the behavior sequence sample set are input into the pricing model, the pricing value corresponding to each behavior feature in the obtained behavior sequence sample is consistent with the pricing value corresponding to each behavior feature in the calibrated behavior sequence sample, and training is finished.

It should be noted that, the pricing value corresponding to each behavior feature in the behavior sequence sample calibrated on each behavior sequence sample is obtained based on the analysis and algorithm return of a large amount of previous behavior data of the user, and is the result of big data calculation.

In step S214, the pricing values corresponding to the behavior features are aggregated together to form a multidimensional vector, so as to obtain a pricing value vector, where each dimension (or component) corresponds to a pricing value corresponding to the behavior feature.

In step S220, the overall value vector is calibrated according to the pricing value vector, and there are various ways to obtain the calibrated value vector, for example, the calibration is performed by calculating the overall value vector by using a basic calibration algorithm such as analysis of variance, insurance regression, etc., and the following embodiments may be referred to.

In this embodiment, the overall value vector is calibrated, and the difference between the overall value vector and the pricing value vector is reduced within a certain threshold. At this time, the model is equivalent to unified guidance based on a user behavior pricing algorithm (command center), and large value deviation does not occur.

Specifically, in some embodiments, the following embodiments may be referred to for a specific implementation of step S220. In this embodiment, according to the detailed description of step S220 in the training method of the recommendation model shown in the corresponding embodiment of fig. 6, in the training method of the recommendation model, each dimension of the overall value vector corresponds to a target value, each dimension of the pricing value vector corresponds to a pricing value corresponding to a behavior feature, step S220 may include the following steps:

Determining a degree of difference between the pricing value vector and the overall value vector;

and if the difference degree is greater than a preset difference degree threshold, adjusting the overall value vector to reduce the difference between the target value corresponding to at least one behavior feature and the pricing value corresponding to the behavior feature, and obtaining a calibration value vector.

In an embodiment of the present application, a degree of difference between the pricing value vector and the overall value vector is determined, and then the overall value vector is adjusted according to the degree of difference.

Specifically, the difference degree may be determined by calibrating with a basic calibration algorithm such as analysis of variance and order preserving regression, and the corresponding difference degree is the variance in the analysis of variance and the loss function in the order preserving regression.

For example, when calibrated by analysis of variance, the degree of variance is the mean difference between the pricing value vector and the overall value vector, the predetermined degree of variance threshold is the least significant difference, and the calculation formula is:

wherein LSD is the least significant difference, S _c ² To price the joint variance of the value vector and the overall value vector, n ₁ To price the number of dimensions (i.e., how many components there are) of the value vector, S ₁ ² To price the variance of the value vector, n ₂ S2 is the number of dimensions (i.e., how many components there are) of the overall value vector ₂ ³ Is the variance of the overall value vector, t _α/2 Is the t-test coefficient. When the mean difference between the bid value vector and the overall value vector is greater than a predetermined degree of difference threshold, the overall value vector needs to be adjusted.

For example, in calibration by the order regression, the degree of difference is the loss function of the order regression, and the calculation formula is:

where loss is a loss function, w _i For sample weight, x _i To price value, y _i For target value, N is the number of behavioral characteristics.

In the order preserving regression calibration, the difference threshold, i.e. the loss function threshold, can be set according to the actual requirement, for example, 0.1. In making the adjustment to the overall value vector, in particular, reference may be made to the following embodiments. In one embodiment, when the degree of variance is greater than a predetermined degree of variance threshold, each target value of the overall value vector is adjusted to approximate the corresponding pricing value by a predetermined adjustment magnitude until the degree of variance is less than the predetermined degree of variance threshold. The pricing value corresponding to the target value is the pricing value corresponding to the behavior feature corresponding to the target value. In another embodiment, when the degree of variance is greater than a predetermined degree of variance threshold, the target value of the overall value vector that differs most from the corresponding pricing value is adjusted by a predetermined adjustment magnitude until the degree of variance is less than the predetermined degree of variance threshold. In yet another embodiment, when the degree of variance is greater than the predetermined degree of variance threshold, the target value in the overall value vector that differs most from the corresponding pricing value is sequentially adjusted to the corresponding pricing value until the degree of variance is less than the predetermined degree of variance threshold. At this time, the unified guidance is equivalent to the guidance based on the user behavior pricing algorithm (command center), and no large value deviation occurs.

In step S300, the behavior sequence samples are input into the recommendation model again, and the calibration value vector is input together when the behavior sequence samples are input, and the calibration value vector is used as the input of the first behavior tower to correct the output of each behavior tower to be closer to the true value. The output result in step S300 is a predicted result, i.e., a probability of occurrence of each behavior feature in the predicted behavior feature sequence. Therefore, in step S400, the behavior feature, the prediction result, and the real result are calculated, so as to obtain the loss function. The calculation of the loss function may be implemented by cross entropy calculation, each step determining whether this behavior occurs in the representation of the real label.

The specific calculation formula of the loss function is as follows:

where loss is a loss function, x _i For behavioral characteristics, y _i Is the behavior characteristic x _i Corresponding genuine label, L (x _i ) For behavior features x in behavior sequence samples _i Loss of p (x) _i ) Is the behavior characteristic x _i Probability of predicting positive class, p (x) corresponding to all behavior features in behavior sequence sample _i ) Together, the predicted result is formed, and N is the number of the input behavior sequence samples.

In step S500, the loss function is updated, specifically by back-propagation, gradient descent, and training is stopped when the loss function converges or is smaller than a predetermined loss threshold.

Referring to fig. 8, a flow diagram of a recommendation method, which may be implemented in dependence on a computer program, that may be run on a von neumann system-based recommendation device is provided for one or more embodiments of the present application. The computer program may be integrated in the application or may run as a stand-alone tool class application. The recommending means may be a service platform.

Specifically, the recommendation method comprises the following steps:

step S800, a behavior feature sequence to be recommended is obtained.

Step S900, inputting the behavior feature sequence to be recommended into a recommendation model to obtain a prediction result, wherein the recommendation model is the recommendation model described above.

In the embodiment of the application, the behavior feature sequence to be recommended is obtained first, and then the behavior feature sequence to be recommended is input into the recommendation model trained through the embodiment, so that a prediction result is obtained. Compared with the recommended content obtained by using the model in the traditional mode, the recommended content obtained by using the recommended model is more accurate, and the problem that the recommended content is inaccurate due to the fact that deviation of a sample cannot be corrected and calibrated in time is solved.

In step S800, the behavior feature sequence to be recommended, that is, the behavior feature sequence of the user to be recommended, is a feature sequence in which all behavior features of the user to be recommended in a period of time are arranged in time sequence, and may reflect the preference of the user to some extent.

In step S900, the recommendation model is the recommendation model trained by the above embodiment, and the output of the recommendation model is the prediction result.

Specifically, in some embodiments, the specific implementation of step S900 may refer to fig. 9. Fig. 9 is a detailed description of step S900 in the recommendation method according to the corresponding embodiment of fig. 8, where the model includes a plurality of behavior towers, each of the behavior towers corresponds to a different type of behavior feature, and the recommendation method may include the following steps:

step S910, inputting the behavior features into the recommendation model one by one in sequence.

Step S920, calling a corresponding behavior tower according to the type of the behavior feature when one behavior feature is input.

Step S930, inputting each behavior feature into the corresponding behavior tower, to obtain a prediction result.

In the embodiment of the present application, the processing of the behavior features by the recommendation model is still completed through the behavior tower, that is, in the training step, the parameters of the behavior tower are mainly updated when the parameters are updated. And inputting each behavior characteristic into the corresponding behavior tower to obtain a prediction result.

In step S930, the following examples may be referred to for a specific method for obtaining the prediction result by the behavioral tower.

Specifically, in some embodiments, the specific implementation of step S930 may refer to fig. 10. Fig. 10 is a detailed description of step S930 in the recommendation method according to the corresponding embodiment of fig. 9, where step S930 may include the following steps:

step S932, inputting each behavior feature into the corresponding behavior tower, to obtain a corresponding behavior probability.

And step S934, normalizing each behavior probability to obtain a prediction result.

In this embodiment, each behavior feature is input into the corresponding behavior tower to obtain a corresponding behavior probability, and then each behavior probability is normalized to obtain a prediction result.

In step S932, as in the training step, the corresponding behavior probabilities can be obtained by simply inputting the behavior features into the corresponding behavior towers in order.

In step S934, after the behavior probabilities are obtained, a softmax normalization process is performed on the behavior probabilities, so that the sum of all the behavior probabilities is 1, and the probability of each recommended item is obtained, where the probability of the recommended item can be understood as the recommendation degree of the recommended item.

It should be noted that, as described in the foregoing training method, one behavior feature includes a user feature and a recommended item feature, and the user feature includes basic information about the user, such as user basic information and user behavior. Recommended item features include, among other things, underlying features and aggregated features of objects that interact with the user, i.e., recommended items, which may also be generally referred to as materials, which may include one or more of products, documents, cards, buttons, etc. in particular applications. Therefore, the behavior probability output by the behavior tower is not only the probability of occurrence of the behavior, but also the probability of interaction between the user and the recommended item, and the higher the probability of interaction between the user and the recommended item is, the more interesting the recommended item user is proved, and the recommended item is more suitable for being recommended to the user as the recommended item.

Specifically, in some embodiments, the specific implementation of step S934 may refer to the following embodiments. The present embodiment is a detailed description of step S934 in the recommendation method according to the corresponding embodiment of fig. 10, where step S934 may include the following steps:

and normalizing each behavior probability to obtain a corresponding recommended item probability.

And sequencing each recommended item according to the probability of the recommended item to obtain a prediction result.

In this embodiment, after the probability of the recommended item is obtained, each recommended item is ranked according to the probability of the recommended item, so as to form a prediction result, so that recommended items of interest to the user can be displayed more intuitively.

The training device of the recommendation model provided in the embodiment of the present application will be described in detail with reference to fig. 11. It should be noted that, the training device of the recommendation model shown in fig. 11 is used to execute the method of the embodiment shown in fig. 1 to 7, and for convenience of explanation, only the relevant part of the embodiment of the present application is shown, and specific technical details are not disclosed, please refer to the embodiment shown in fig. 1 to 7 of the embodiment of the present application.

Referring to fig. 11, a schematic structural diagram of a training device of a recommendation model according to an embodiment of the present application is shown. The training means 11 of the recommendation model may be implemented as all or part of the user terminal by software, hardware or a combination of both. According to some embodiments, the training device 11 of the recommendation model includes an acquisition module 111, an encoding module 112, a calibration module 113, a decoding module 114, a loss module 115, and an updating module 116, specifically configured to: the obtaining module 111 is configured to obtain behavior sequence samples, where each behavior sequence sample is a behavior feature sequence marked with a real result, and the behavior feature sequence is a sequence formed by a plurality of behavior features arranged according to time. The encoding module 112 is configured to input the behavior sequence samples into a recommendation model to obtain an overall value vector. And the calibration module 113 is configured to obtain a calibration value vector according to the pricing value vector and the overall value vector. The decoding module 114 is configured to input the calibration value vector and the behavior sequence sample into the recommendation model to obtain a prediction result. A penalty module 115 for determining a penalty function based on the behavioral characteristics, the predicted outcomes, and the actual outcomes. And an updating module 116, configured to update parameters of the recommendation model based on the loss function until the loss function converges.

In one embodiment, the recommendation model includes an encoder and a decoder; the encoding module 112 specifically includes: the encoder module is used for inputting the behavior sequence samples into an encoder to obtain an overall value vector; the decoder module, for decoding module 114 specifically includes: and inputting the calibration value vector and the behavior sequence sample into the decoder to obtain a prediction result.

In one embodiment, the recommendation model includes a plurality of behavior towers, each of which corresponds to a different type of behavior feature, and the encoding module 112 specifically includes: the first input sub-module is used for inputting the behavior characteristics into the recommendation model one by one according to the sequence; the second calling sub-module is used for calling a corresponding behavior tower according to the type of each behavior feature when each behavior feature is input; the value determining submodule is used for inputting each behavior characteristic into the corresponding behavior tower to obtain a corresponding target value; and the vector determination submodule is used for determining an overall value vector according to each target value.

In one embodiment, the value determination submodule specifically includes: the value probability unit is used for inputting each behavior characteristic into the corresponding behavior tower to obtain corresponding behavior value and behavior probability; and the target value unit is used for determining the corresponding target value according to the behavior value and the behavior probability of each behavior feature.

In one embodiment, the behavior towers are arranged in order, and the value probability unit is specifically configured to perform: inputting the behavior characteristics and the preset probability into a corresponding behavior tower to obtain corresponding behavior value and behavior probability; and taking the behavior probability as a preset probability input of the next behavior tower until all the behavior towers obtain the behavior value and the behavior probability.

In one embodiment, the calibration module 113 specifically includes: a pricing sub-module for determining the pricing value vector using a pricing calibration algorithm; and the calibration sub-module is used for calibrating the overall value vector according to the pricing value vector to obtain a calibrated value vector.

In one embodiment, the pricing submodule specifically includes: a pricing value unit, configured to determine a pricing value corresponding to each of the behavior features; and the user pricing unit is used for determining the pricing value vector according to the pricing value corresponding to each behavior feature.

In one embodiment, each dimension of the overall value vector corresponds to a target value, each dimension of the pricing value vector corresponds to a pricing value corresponding to a behavioral characteristic, and the calibration submodule specifically includes: a degree of difference unit for determining a degree of difference between the pricing value vector and the overall value vector; and the difference adjustment unit is used for adjusting the overall value vector if the difference degree is larger than a preset difference degree threshold value, so that the difference value between the target value corresponding to at least one behavior feature and the pricing value corresponding to the behavior feature is reduced, and a calibration value vector is obtained.

It should be noted that, when the training device for a recommendation model provided in the foregoing embodiment performs the training method for a recommendation model, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the training device of the recommendation model provided in the above embodiment and the training method embodiment of the recommendation model belong to the same concept, which embody the detailed implementation process and are not described herein.

The recommending apparatus provided in the embodiment of the present application will be described in detail with reference to fig. 12. It should be noted that, the recommending apparatus shown in fig. 12 is used to execute the method of the embodiment shown in fig. 8 to 10, and for convenience of explanation, only the portion relevant to the embodiment of the present application is shown, and specific technical details are not disclosed, please refer to the embodiment shown in fig. 8 to 10 of the embodiment of the present application.

Fig. 12 is a schematic structural diagram of a recommendation device according to an embodiment of the present application. The recommendation device 12 may be implemented as all or part of the user terminal by software, hardware or a combination of both. According to some embodiments, the recommendation device 12 includes an acquisition module 121, an input module 122, and specifically configured to: the acquiring module 121 is configured to acquire a behavior feature sequence to be recommended. The input module 122 is configured to input the behavior feature sequence to be recommended into a recommendation model, so as to obtain a prediction result, where the recommendation model is the recommendation model described above.

In one embodiment, the model includes a plurality of behavioral towers, each corresponding to a different type of behavioral characteristics, and the obtaining module 121 specifically includes: the second input sub-module is used for inputting the behavior characteristics into the recommendation model one by one according to the sequence; the second calling sub-module is used for calling a corresponding behavior tower according to the type of each behavior feature when each behavior feature is input; and the prediction result submodule is used for inputting each behavior characteristic into the corresponding behavior tower to obtain a prediction result.

In one embodiment, the prediction result submodule specifically includes: the behavior probability unit is used for inputting each behavior characteristic into the corresponding behavior tower to obtain corresponding behavior probability; and the prediction result unit is used for carrying out normalization processing on each behavior probability to obtain a prediction result.

It should be noted that, in the recommendation device provided in the foregoing embodiment, only the division of the foregoing functional modules is used for illustrating when the recommendation method is executed, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the recommending device and the recommending method provided in the above embodiments belong to the same concept, which embody the detailed implementation process in the method embodiment, and are not described herein again.

The foregoing embodiment numbers of the present application are merely for description, and do not represent advantages or disadvantages of the embodiments.

In the embodiment of the application, the overall value vector is obtained by sequentially arranging various behavior features of the user and then inputting a recommendation model. And then calibrating the overall value vector according to the pricing value vector to obtain a calibration value vector, thereby realizing timely correction and calibration of the deviation of the sample. Then, the calibrated value vector and various behavior features of the previous user are sent into the recommendation model again to obtain a prediction result, the calibrated value vector can help the recommendation model to obtain a more accurate prediction result, and a loss function is determined based on the behavior features, the prediction result and the real result; and updating parameters of the recommended model based on the loss function until the loss function converges. After the recommendation model is trained, the recommendation content obtained by using the recommendation model is more accurate than the recommendation content obtained by using the model in a traditional mode, and the problem that the recommendation content is inaccurate due to the fact that deviation of a sample cannot be corrected and calibrated in time is solved.

The embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are adapted to be loaded by a processor and execute the training method and the recommendation method of the recommendation model in the embodiment shown in fig. 1 to 10, and the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 10, which is not repeated herein.

The embodiment of the present application further provides a computer program product, where at least one instruction is stored in the computer program product, where the at least one instruction is loaded by the processor and executed by the processor to implement the training method and the recommendation method of the recommendation model in the embodiment shown in fig. 1 to 10, and the specific implementation process may refer to the specific description of the embodiment shown in fig. 1 to 10, which is not repeated herein.

Referring to fig. 13, a block diagram of an electronic device according to an exemplary embodiment of the present application is shown. The electronic device in embodiments of the present application may include one or more of the following components: processor 110, memory 120, input device 130, output device 140, and bus 150. The processor 110, the memory 120, the input device 130, and the output device 140 may be connected by a bus 150.

Processor 110 may include one or more processing cores. The processor 110 utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. In one embodiment, the processor 110 may be implemented in at least one hardware form of digital signal processing (digital signal processing, DSP), field-programmable gate array (field-programmable gate array, FPGA), programmable logic array (programmable logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processor (central processing unit, CPU), an image processor (graphics processing unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.

The memory 120 may include a random access memory (random Access Memory, RAM) or a read-only memory (ROM). In one embodiment, the memory 120 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, which may be an Android (Android) system, including an Android system-based deep development system, an IOS system developed by apple corporation, including an IOS system-based deep development system, or other systems, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the electronic device in use, such as phonebooks, audiovisual data, chat log data, and the like.

Referring to FIG. 14, the memory 120 may be divided into an operating system space in which the operating system is running and a user space in which native and third party applications are running. In order to ensure that different third party application programs can achieve better operation effects, the operating system allocates corresponding system resources for the different third party application programs. However, the requirements of different application scenarios in the same third party application program on system resources are different, for example, under the local resource loading scenario, the third party application program has higher requirement on the disk reading speed; in the animation rendering scene, the third party application program has higher requirements on the GPU performance. The operating system and the third party application program are mutually independent, and the operating system often cannot timely sense the current application scene of the third party application program, so that the operating system cannot perform targeted system resource adaptation according to the specific application scene of the third party application program.

In order to enable the operating system to distinguish specific application scenes of the third-party application program, data communication between the third-party application program and the operating system needs to be communicated, so that the operating system can acquire current scene information of the third-party application program at any time, and targeted system resource adaptation is performed based on the current scene.

Taking an operating system as an Android system as an example, as shown in fig. 15, a program and data stored in the memory 120 may be stored in the memory 120 with a Linux kernel layer 320, a system runtime library layer 340, an application framework layer 360 and an application layer 380, where the Linux kernel layer 320, the system runtime library layer 340 and the application framework layer 360 belong to an operating system space, and the application layer 380 belongs to a user space. The Linux kernel layer 320 provides the underlying drivers for various hardware of the electronic device, such as display drivers, audio drivers, camera drivers, bluetooth drivers, wi-Fi drivers, power management, and the like. The system runtime layer 340 provides the main feature support for the Android system through some C/c++ libraries. For example, the SQLite library provides support for databases, the OpenGL/ES library provides support for 3D graphics, the Webkit library provides support for browser kernels, and the like. Also provided in the system runtime library layer 340 is a An Zhuoyun runtime library (Android run) which provides mainly some core libraries that can allow developers to write Android applications using the Java language. The application framework layer 360 provides various APIs that may be used in building applications, which developers can also build their own applications by using, for example, campaign management, window management, view management, notification management, content provider, package management, call management, resource management, location management. At least one application program is running in the application layer 380, and these application programs may be native application programs of the operating system, such as a contact program, a short message program, a clock program, a camera application, etc.; and may also be a third party application developed by a third party developer, such as a game-like application, instant messaging program, photo beautification program, etc.

Taking an operating system as an IOS system as an example, the programs and data stored in the memory 120 are shown in fig. 16, the IOS system includes: core operating system layer 420 (Core OS layer), core service layer 440 (Core Services layer), media layer 460 (Media layer), and touchable layer 480 (Cocoa Touch Layer). The core operating system layer 420 includes an operating system kernel, drivers, and underlying program frameworks that provide more hardware-like functionality for use by the program frameworks at the core services layer 440. The core services layer 440 provides system services and/or program frameworks required by the application, such as a Foundation (Foundation) framework, an account framework, an advertisement framework, a data storage framework, a network connection framework, a geographic location framework, a sports framework, and the like. The media layer 460 provides an interface for applications related to audiovisual aspects, such as a graphics-image related interface, an audio technology related interface, a video technology related interface, an audio video transmission technology wireless play (AirPlay) interface, and so forth. The touchable layer 480 provides various commonly used interface-related frameworks for application development, with the touchable layer 480 being responsible for user touch interactions on the electronic device. Such as a local notification service, a remote push service, an advertisement framework, a game tool framework, a message User Interface (UI) framework, a User Interface UIKit framework, a map framework, and so forth.

Among the frameworks illustrated in fig. 16, frameworks related to most applications include, but are not limited to: the infrastructure in core services layer 440 and the UIKit framework in touchable layer 480. The infrastructure provides many basic object classes and data types, providing the most basic system services for all applications, independent of the UI. While the class provided by the UIKit framework is a basic UI class library for creating touch-based user interfaces, iOS applications can provide UIs based on the UIKit framework, so it provides the infrastructure for applications to build user interfaces, draw, process and user interaction events, respond to gestures, and so on.

The manner and principle of implementing data communication between the third party application program and the operating system in the IOS system may refer to the Android system, and the embodiments of the present application are not described herein again.

The input device 130 is configured to receive input instructions or data, and the input device 130 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 140 is used to output instructions or data, and the output device 140 includes, but is not limited to, a display device, a speaker, and the like. In one example, the input device 130 and the output device 140 may be combined, and the input device 130 and the output device 140 are a touch display screen for receiving a touch operation thereon or thereabout by a user using a finger, a touch pen, or any other suitable object, and displaying a user interface of each application program. Touch display screens are typically provided on the front panel of an electronic device. The touch display screen may be designed as a full screen, a curved screen, or a contoured screen. The touch display screen may also be designed as a combination of a full screen and a curved screen, a combination of a special-shaped screen and a curved screen, and embodiments of the present application are not limited thereto.

In addition, those skilled in the art will appreciate that the configuration of the electronic device shown in the above-described figures does not constitute a limitation of the electronic device, and the electronic device may include more or less components than illustrated, or may combine certain components, or may have a different arrangement of components. For example, the electronic device further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (wireless fidelity, wiFi) module, a power supply, and a bluetooth module, which are not described herein.

In the embodiment of the present application, the execution subject of each step may be the electronic device described above. In one embodiment, the execution subject of each step is an operating system of the electronic device. The operating system may be an android system, an IOS system, or other operating systems, which embodiments of the present application do not limit.

The electronic device of the embodiment of the present application may further have a display device mounted thereon, and the display device may be various devices capable of realizing a display function, for example: cathode ray tube displays (cathode ray tubedisplay, CR), light-emitting diode displays (light-emitting diode display, LED), electronic ink screens, liquid crystal displays (liquid crystal display, LCD), plasma display panels (plasma display panel, PDP), and the like. A user may utilize a display device on electronic device 101 to view displayed text, images, video, etc. The electronic device may be a smart phone, a tablet computer, a gaming device, an AR (Augmented Reality ) device, an automobile, a data storage device, an audio playing device, a video playing device, a notebook, a desktop computing device, a wearable device such as an electronic watch, electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic article of clothing, etc.

In the electronic device shown in fig. 13, where the electronic device may be a terminal, the processor 110 may be configured to invoke the network optimization application stored in the memory 120 and specifically perform the following operations:

acquiring behavior sequence samples, wherein each behavior sequence sample is a behavior feature sequence marked with a real result, and the behavior feature sequence is a sequence formed by a plurality of behavior features arranged according to time;

inputting the behavior sequence sample into a recommendation model to obtain an overall value vector;

obtaining a calibration value vector according to the pricing value vector and the overall value vector;

inputting the calibration value vector and the behavior sequence sample into the recommendation model to obtain a prediction result;

determining a loss function based on the behavioral characteristics, the predicted outcome, and the actual outcome;

and updating parameters of the recommendation model based on the loss function until the loss function converges.

In one embodiment, the recommendation model includes an encoder and a decoder; the processor 110, when executing the input of the behavior sequence samples into the recommendation model to obtain the overall value vector, specifically performs the following operations: inputting the behavior sequence samples into an encoder to obtain an overall value vector; the processor 110 specifically performs the following operations when performing the input of the calibration value vector and the behavior sequence sample into the recommendation model to obtain a prediction result: and inputting the calibration value vector and the behavior sequence sample into the decoder to obtain a prediction result.

In one embodiment, the recommendation model includes a plurality of behavior towers, each behavior tower corresponding to a different type of behavior feature, and the processor 110, when executing the input of the behavior sequence samples into the recommendation model, specifically performs the following operations to obtain an overall value vector: inputting the behavior features into the recommendation model one by one according to the sequence; each time a behavior feature is input, a corresponding behavior tower is called according to the type of the behavior feature; inputting each behavior characteristic into the corresponding behavior tower to obtain a corresponding target value; based on each target value, an overall value vector is determined.

In one embodiment, the processor 110, when executing the entering of each of the behavior features into the corresponding behavior tower, obtains the corresponding target value, specifically performs the following operations: inputting each behavior characteristic into the corresponding behavior tower to obtain corresponding behavior value and behavior probability; and determining a corresponding target value according to the behavior value and the behavior probability of each behavior feature.

In one embodiment, the behavior towers are arranged in sequence, and the processor 110 specifically performs the following operations when performing inputting each behavior feature into the corresponding behavior tower to obtain the corresponding behavior value and behavior probability: inputting the behavior characteristics and the preset probability into a corresponding behavior tower to obtain corresponding behavior value and behavior probability; and taking the behavior probability as a preset probability input of the next behavior tower until all the behavior towers obtain the behavior value and the behavior probability.

In one embodiment, the processor 110, when executing the calibration value vector from the pricing value vector and the overall value vector, performs the following operations: determining the pricing value vector by adopting a pricing calibration algorithm; and calibrating the overall value vector according to the pricing value vector to obtain a calibrated value vector.

In one embodiment, the processor 110, when executing the determination of the pricing value vector using the pricing calibration algorithm, specifically performs the following operations: determining a pricing value corresponding to each behavior feature; and determining the pricing value vector according to the pricing value corresponding to each behavior feature.

In one embodiment, each dimension of the overall value vector corresponds to a target value, each dimension of the pricing value vector corresponds to a pricing value corresponding to a behavioral characteristic, and the processor 110, when executing the calibration of the overall value vector according to the pricing value vector to obtain a calibrated value vector, specifically performs the following operations: determining a degree of difference between the pricing value vector and the overall value vector; and if the difference degree is greater than a preset difference degree threshold, adjusting the overall value vector to reduce the difference between the target value corresponding to at least one behavior feature and the pricing value corresponding to the behavior feature, and obtaining a calibration value vector.

In one embodiment, the processor 110 may also execute the steps of the model application, i.e. execute the recommendation method, and when executing the recommendation method, specifically execute the following operations: acquiring a behavior feature sequence to be recommended; and inputting the behavior feature sequence to be recommended into a recommendation model to obtain a prediction result, wherein the recommendation model is the recommendation model.

In one embodiment, the model includes a plurality of behavior towers, each behavior tower corresponding to a different type of behavior feature, and the processor 110 specifically performs the following operations when executing the input of the behavior feature sequence into the recommendation model to obtain the prediction result: inputting the behavior features into the recommendation model one by one according to the sequence; each time a behavior feature is input, a corresponding behavior tower is called according to the type of the behavior feature; and inputting each behavior characteristic into the corresponding behavior tower to obtain a prediction result.

In one embodiment, the processor 110, when executing the input of each of the behavior characteristics into the corresponding behavior tower to obtain the prediction result, specifically performs the following operations: inputting each behavior characteristic into the corresponding behavior tower to obtain corresponding behavior probability; and normalizing each behavior probability to obtain a prediction result.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.

It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the embodiments of the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions. For example, object features, interaction behavior features, user information, and the like, which are referred to in the embodiments of the present application, are acquired with sufficient authorization.

The foregoing disclosure is merely illustrative of preferred embodiments of the present application and, of course, should not be taken as limiting the scope of embodiments of the present application, and therefore, equivalent changes may be made in accordance with the claims of embodiments of the present application while remaining within the scope of the embodiments of the present application.

Claims

1. A training method of a recommendation model, the training method of the recommendation model comprising:

2. The method of training a recommendation model of claim 1, the recommendation model comprising an encoder and a decoder;

inputting the behavior sequence sample into a recommendation model to obtain an overall value vector, wherein the method specifically comprises the following steps of:

inputting the behavior sequence samples into an encoder to obtain an overall value vector;

inputting the calibration value vector and the behavior sequence sample into the recommendation model to obtain a prediction result, wherein the method specifically comprises the following steps:

and inputting the calibration value vector and the behavior sequence sample into the decoder to obtain a prediction result.

3. The method for training a recommendation model according to claim 1 or 2, wherein the recommendation model comprises a plurality of behavior towers, each behavior tower corresponds to a different type of behavior feature, and the step of inputting the behavior sequence sample into the recommendation model to obtain an overall value vector specifically comprises the following steps:

Inputting the behavior features into the recommendation model one by one according to the sequence;

each time a behavior feature is input, a corresponding behavior tower is called according to the type of the behavior feature;

inputting each behavior characteristic into the corresponding behavior tower to obtain a corresponding target value;

based on each target value, an overall value vector is determined.

4. The method for training a recommendation model according to claim 3, wherein the step of inputting each behavior feature into the corresponding behavior tower to obtain a corresponding target value specifically comprises:

inputting each behavior characteristic into the corresponding behavior tower to obtain corresponding behavior value and behavior probability;

and determining a corresponding target value according to the behavior value and the behavior probability of each behavior feature.

5. The method for training a recommendation model according to claim 4, wherein the behavioral towers are arranged in sequence, and the step of inputting each behavioral characteristic into a corresponding behavioral tower to obtain a corresponding behavioral value and behavioral probability specifically comprises:

inputting the behavior characteristics and the preset probability into a corresponding behavior tower to obtain corresponding behavior value and behavior probability;

6. The method for training a recommendation model according to claim 1, wherein the obtaining a calibration value vector according to the pricing value vector and the overall value vector specifically comprises:

determining the pricing value vector by adopting a pricing calibration algorithm;

and calibrating the overall value vector according to the pricing value vector to obtain a calibrated value vector.

7. The recommendation model training method of claim 6, wherein determining a pricing value vector using a pricing calibration algorithm comprises:

determining a pricing value corresponding to each behavior feature;

and determining the pricing value vector according to the pricing value corresponding to each behavior feature.

8. The method for training a recommendation model according to claim 6, wherein each dimension of the overall value vector corresponds to a target value, each dimension of the pricing value vector corresponds to a pricing value corresponding to a behavioral characteristic, and the calibrating the overall value vector according to the pricing value vector to obtain a calibrated value vector specifically comprises:

9. A recommendation method, the recommendation method comprising:

acquiring a behavior feature sequence to be recommended;

inputting the behavior feature sequence to be recommended into a recommendation model to obtain a prediction result, wherein the recommendation model is the recommendation model of any one of claims 1 to 8.

10. The recommendation method as claimed in claim 9, wherein the model comprises a plurality of behavior towers, each behavior tower corresponding to a different type of behavior feature, and the step of inputting a behavior feature sequence into the recommendation model to obtain a prediction result specifically comprises:

and inputting each behavior characteristic into the corresponding behavior tower to obtain a prediction result.

11. The recommendation method of claim 10, wherein the inputting each behavior feature into the corresponding behavior tower to obtain a prediction result specifically comprises:

inputting each behavior characteristic into the corresponding behavior tower to obtain corresponding behavior probability;

and normalizing each behavior probability to obtain a prediction result.

12. A training apparatus of a recommendation model, the training apparatus of a recommendation model comprising:

the system comprises an acquisition module, a judgment module and a judgment module, wherein the acquisition module is used for acquiring behavior sequence samples, each behavior sequence sample is a behavior characteristic sequence marked with a real result, and the behavior characteristic sequence is a sequence formed by a plurality of behavior characteristics arranged according to time;

the coding module is used for inputting the behavior sequence samples into a recommendation model to obtain an overall value vector;

the calibration module is used for obtaining a calibration value vector according to the pricing value vector and the overall value vector;

the decoding module is used for inputting the calibration value vector and the behavior feature sequence into the recommendation model to obtain a prediction result;

a loss module for determining a loss function based on the behavioral characteristics, the predictive result, and the real result;

And the updating module is used for updating parameters of the recommendation model based on the loss function until the loss function converges.

13. A recommendation device, the recommendation device comprising:

the acquisition module is used for acquiring a behavior feature sequence to be recommended;

the input module is used for inputting the behavior feature sequence to be recommended into a recommendation model to obtain a prediction result, wherein the recommendation model is the recommendation model of any one of claims 1 to 9.

14. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 11.

15. A computer program product storing at least one instruction for loading by a processor and performing the method steps of any one of claims 1 to 11.

16. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-11.