CN114625954A

CN114625954A - Information recommendation method, model training method, information characterization method, device and equipment

Info

Publication number: CN114625954A
Application number: CN202011460168.2A
Authority: CN
Inventors: 张继海; 林方全; 杨程; 崔紫强; 裴勇泉; 张晗崴; 张京桥; 吴强
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2022-06-14

Abstract

The embodiment of the application provides a method, a device and equipment for information recommendation, model training and information characterization. The method comprises the following steps: determining first information currently triggered by a user to be recommended; acquiring second information associated with the user behavior existing in the first information; wherein the second information is determined from historical information behavior data of a plurality of users; according to the first information and the second information, obtaining the information characteristics of the first information through a pre-constructed characteristic extraction model; and recommending information to the user to be recommended according to the information characteristics of the first information. According to the technical scheme, the multi-channel recall link does not need to be enumerated manually, a good recommendation effect is obtained by enhancing the information representation, the universality is enhanced, and the stability of the recommendation effect is guaranteed.

Description

Information recommendation method, model training method, information characterization method, device and equipment

Technical Field

The application relates to the technical field of information, in particular to a method, a device and equipment for information recommendation, model training and information characterization.

Background

With the development of the internet, users can obtain various information through the internet, such as: videos, music, articles, merchandise, and so on. In the internet platform, a user can search for information required by the user through a search function. Meanwhile, in order to facilitate the user to acquire information, the internet platform can actively recommend information to the user.

Taking the e-commerce field as an example, with the rapid development of the e-commerce field, the number of online commodities is increased explosively. In an e-commerce system, it is one of very important issues how to screen out and display and recommend commodities in which a user is interested from a large number of commodity candidates.

In the conventional recommendation scheme, a multi-way recall strategy is adopted in a recall stage in a recommendation process, that is, a multi-way recall link needs to be enumerated manually, and different weight strategies need to be designed for different recalls, so that different scenes need to be designed respectively, the universality is not strong, the requirement on field knowledge for manually enumerating the multi-way recall is high, and the effect is obviously reduced once important recall links are lost.

Disclosure of Invention

In view of the above, the present application is proposed to provide an information recommendation, model training, information characterization method, apparatus and device that solve the above problems, or at least partially solve the above problems.

Thus, in one embodiment of the present application, an information recommendation method is provided. The method comprises the following steps:

determining first information currently triggered by a user to be recommended;

acquiring second information associated with the first information existing user behavior; wherein the second information is determined from historical information behavior data of a plurality of users;

according to the first information and the second information, obtaining the information characteristics of the first information through a pre-constructed characteristic extraction model;

and recommending information to the user to be recommended according to the information characteristics of the first information.

In another embodiment of the present application, a model training method is provided. The method comprises the following steps:

acquiring second sample information and third sample information which are associated with the first sample information by user behaviors; wherein the second sample information and the third sample information are determined according to historical information behavior data of a plurality of users;

obtaining information characteristics of the first sample information through a characteristic extraction model according to the first sample information and the second sample information;

obtaining reference information characteristics through the characteristic extraction model according to the third sample information;

and optimizing network parameters of the feature extraction model according to a first difference between the information features of the first sample information and the reference information features.

In one embodiment of the present application, an information characterization method is provided. The method comprises the following steps:

determining first information;

and obtaining the information characteristics of the first information through a pre-constructed characteristic extraction model according to the first information and the second information.

In one embodiment of the present application, a method of merchandise recommendation is provided. The method comprises the following steps:

determining a first commodity currently triggered by a user to be recommended;

acquiring a second commodity associated with the first commodity in user behavior; wherein the second item is determined from historical item behavior data of a plurality of users;

according to the first commodity and the second commodity, commodity features of the first commodity are obtained through a pre-constructed feature extraction model;

and recommending the commodity to the user to be recommended according to the commodity characteristic of the first commodity.

In one embodiment of the present application, an article recommendation device is provided. The device comprises:

the first determining module is used for determining a first commodity currently triggered by a user to be recommended;

the first acquisition module is used for acquiring a second commodity which is associated with the first commodity in the user behavior; wherein the second item is determined from historical item behavior data of a plurality of users;

the second acquisition module is used for acquiring the commodity characteristics of the first commodity through a pre-constructed characteristic extraction model according to the first commodity and the second commodity;

and the first recommending module is used for recommending the commodity to the user to be recommended according to the commodity characteristic of the first commodity.

In another embodiment of the present application, an electronic device is provided. The electronic device includes:

a memory and a processor, wherein,

the memory is used for storing programs;

the processor, coupled with the memory, to execute the program stored in the memory to:

determining first information currently triggered by a user to be recommended;

a memory and a processor, wherein,

the memory is used for storing programs;

In another embodiment of the present application, an electronic device is provided. The electronic device includes: a memory and a processor, wherein,

the memory is used for storing programs;

determining first information;

the memory is used for storing programs;

determining a first commodity currently triggered by a user to be recommended;

and recommending the commodity to the user to be recommended according to the commodity characteristics of the first commodity.

According to the technical scheme, the information which is obtained by semantically enhancing the first information through the pre-constructed feature extraction model by combining the second information which is associated with the user behavior of the first information can better represent the information which the user to be recommended wants to obtain next. Therefore, better recommendation effect can be obtained for recommending information for the user based on the information characteristics. Compared with the prior art, the technical scheme provided by the embodiment of the application does not need to manually enumerate multiple recall links, obtains a better recommendation effect by enhancing the information representation, enhances the universality and also ensures the stability of the recommendation effect.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flowchart of an information recommendation method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating a model training method according to another embodiment of the present disclosure;

FIG. 3 is a diagram structure provided in one implementation of the present application;

FIG. 4a is a diagram of an example of model training provided in an embodiment of the present application;

FIG. 4b is a diagram of an example of a merchandise recommendation provided in an embodiment of the present application;

fig. 5 is a block diagram illustrating an information recommendation apparatus according to an embodiment of the present application;

FIG. 6 is a block diagram of a model training apparatus according to another embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device according to another embodiment of the present application.

Detailed Description

At present, in an existing recommendation scheme, a multi-way recall strategy is adopted in a recall stage in a recommendation process, that is, multi-way recall links need to be enumerated manually, and different weight strategies need to be designed for different recalls, so that different scenes need to be designed respectively, the universality is not strong, the requirement on field knowledge for manually enumerating the multi-way recall is high, and the effect is obviously reduced once important recall links are lost.

In order to solve the above technical problem, an embodiment of the present application provides a new information recommendation method, and in the method, a plurality of recall links do not need to be enumerated manually, a better recommendation effect is obtained by enhancing information representation, so that not only is the universality enhanced, but also the stability of the recommendation effect is ensured.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below according to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Further, in some flows described in the specification, claims, and above-described figures of the present application, a number of operations are included that occur in a particular order, which operations may be performed out of order or in parallel as they occur herein. The sequence numbers of the operations, e.g., 101, 102, etc., are used merely to distinguish between the various operations, and do not represent any order of execution per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Fig. 1 shows a flowchart of an information recommendation method according to an embodiment of the present application. The execution main body of the method can be a client or a server. The client may be hardware integrated on the terminal and having an embedded program, may also be application software installed in the terminal, and may also be tool software embedded in an operating system of the terminal, which is not limited in this embodiment of the present application. The terminal can be any terminal equipment including a mobile phone, a computer and the like. The server may be a common server, a cloud, a virtual server, or the like, which is not specifically limited in this embodiment of the application. As shown in fig. 1, the method includes:

101. determining first information currently triggered by a user to be recommended.

102. And acquiring second information associated with the first information existing user behavior.

103. And obtaining the information characteristics of the first information through a pre-constructed characteristic extraction model according to the first information and the second information.

104. And recommending information to the user to be recommended according to the information characteristics of the first information.

The information can be commodities, audios and videos, friend accounts, financial products, games, news and the like. For example: in the e-commerce field, information refers to commodities; for another example: in the video field, information refers to video; another example is: in the field of news information, information refers to news.

In the foregoing 101, the first information is information for a current trigger information action of the user to be recommended. The information behavior may include: click behavior, browse behavior, collection behavior, and so on. In the e-commerce domain, the informational behavior may also include purchasing behavior.

In 102, the second information is information related to the presence of the user behavior in the first information, and is determined according to historical information behavior data of a plurality of users. Wherein, the information behavior may include: one or more of click behavior, browse behavior, collection behavior, and purchase behavior. The historical information behavior data may be recorded data generated by a plurality of users for a plurality of information-triggered information behaviors. In actual application, the plurality of users can be all users or part of users on the information platform; the plurality of information may be all or part of the information on the information platform. The plurality of pieces of information include the first information and the second information. The historical information behavior data may be generated by the plurality of users for the plurality of information within a first preset time period. The first preset time period may be set according to actual needs, for example: the first preset time period may be set to the last 1 year, the last 1 month, or the last week.

It should be noted that the historical information behavior data is obtained by recording through user authorization.

In an implementation, the second information associated with the first information presence user behavior may include other information that has historically triggered the information behavior within a second preset time period at which a user triggered the information behavior with respect to the first information, where the other information is information different from the first information. That is, the user also triggers the information behavior with respect to the other information within the second preset time period. The second preset time period may be set according to actual needs, for example: the second preset time period may be set to 1 hour or 1 minute. Specifically, the trigger time point may be a start point of a second preset time period.

For example: the first information is a commodity A; historically, after a user clicks the commodity A, the commodity B and the commodity C are clicked within the next 1 minute; then, the second information includes the article B and the article C.

In yet another implementation, the second information associated with the presence of user behavior with the first information may include information associated with the first information historically obtained by a user through an association operation. That is, the user has historically performed an association operation with respect to the first information and the other information, thereby associating the first information with the other information.

Taking the e-commerce field as an example, the association operation may be specifically an operation of collecting the first information and the other information into the same wish list. For example: the first information is wedding dress a; historically, a user has collected wedding dress a and show dress b into a wish list with the topic of 'wedding proceeding music'; then, the second information includes the grass clothes b.

In another implementation, the second information associated with the presence of the user behavior with the first information may include other information for which the information behavior was triggered again after the information behavior was triggered for the first information by the user in history. The information behavior triggered for the further information may be understood as the first information behavior of the user after triggering the information behavior for the first information.

For example: the first information is a commodity A, and other information for which the information action is triggered again is a commodity B after a user triggers the information action aiming at the commodity A historically; therefore, the second information includes the article B.

In 103, the semantic enhancement is performed on the first information through a pre-constructed feature extraction model in combination with the second information, so as to obtain the information feature of the first information. In one example, the parameters in the feature extraction model may be set empirically. In another example, the parameters in the feature extraction model may be obtained by training learning, and specifically, the feature extraction model may be a machine learning model, such as: linear models, neural network models, and the like. The neural network model has the characteristics of large-scale parallel, distributed storage and processing, self-organization, self-adaption, self-learning capability and the like. The specific training process of the feature extraction model will be described in detail in the following embodiments.

In the above 104, the information to be recommended may be retrieved from the information base by using the information characteristic of the first information as a retrieval basis.

In an implementation scheme, the information characteristic may be in a vector form, so that information to be recommended can be obtained through vector retrieval according to the information characteristic of the first information; and recommending the information to be recommended to the user to be recommended. Wherein, the vector retrieval means: and for the characteristic vector of the retrieval object, searching one or more vectors which are closest to the characteristic vector of the retrieval object in a fixed vector library, and taking the data object corresponding to the one or more vectors which are closest to the characteristic vector of the retrieval object as the retrieval result of the retrieval object. In this embodiment of the application, the retrieval object is the first information, and the retrieval result of the retrieval object is also the information to be recommended. The specific implementation of vector retrieval can be referred to the prior art and is not described in detail herein.

According to the technical scheme, the semantic enhancement is performed on the first information through the pre-constructed feature extraction model by combining the second information associated with the user behavior of the first information, and the obtained information features of the first information can better represent the information which the user to be recommended wants to obtain next. Therefore, better recommendation effect can be obtained for recommending information for the user based on the information characteristics. Compared with the prior art, the technical scheme provided by the embodiment of the application does not need to manually enumerate multiple recall links, obtains a better recommendation effect by enhancing the information representation, enhances the universality and also ensures the stability of the recommendation effect. For convenience, the relationship network may be constructed in advance according to historical information behavior data of a plurality of users. The relational network comprises a plurality of nodes and a plurality of connecting edges, different nodes correspond to different information, the connecting edges among the nodes are used for representing behavior associated events, namely the information corresponding to the nodes at two ends of the connecting edges is associated through the behavior associated events represented by the connecting edges. The behavior related event represented by the connection edge may be an event that the same user historically triggers an information behavior for the information corresponding to the nodes at the two ends of the connection edge within a second preset time period; or the behavior related event represented by the connection edge may be a related operation event executed by a user in history for respective corresponding information of nodes at both ends of the connection edge; or, the behavior-related event represented by the connection edge may be an event in which, historically, after a user triggers an information behavior for information corresponding to one of the two end nodes of the connection edge, other information for which the information behavior is triggered again is information corresponding to the other end node of the two end nodes of the connection edge.

In this way, when the other information related to a certain information is acquired later, the other information related to the certain information can be conveniently acquired from the relation network.

Therefore, the method may further include:

104. a relationship network is obtained.

Wherein the relationship network is constructed according to the historical information behavior data; the relationship network comprises a first node corresponding to the first information and a neighbor node of the first node; the relational network also comprises a connection edge between the nodes, which is used for representing the behavior correlation event.

105. And determining the second information from the information corresponding to the neighbor node of the first node in the relational network.

In the aforementioned 104, in an example, in the relational network, the neighboring nodes of the first node may include a 1 st order neighboring node of the first node, and the 1 st order neighboring node of the first node is directly connected to the first node through a connection edge, that is, information corresponding to the 1 st order neighboring node of the first node is directly associated with the first information.

Further, in the relational network, the neighbor nodes of the first node may further include 2 nd order neighbor nodes of the first node. The 2-order neighbor node of the first node is directly connected with the 1-order neighbor node of the first node through the connecting edge, and the 1-order neighbor node of the first node is directly connected with the first node through the connecting edge, that is, the information corresponding to the 2-order neighbor node of the first node is indirectly associated with the first information.

By analogy, in the relational network, the neighbor nodes of the first node may further include N-order neighbor nodes of the first node. The information corresponding to the N-order neighbor node of the first node is indirectly associated with the first information. Wherein N is an integer greater than 2.

In practical application, the relationship network may be a graph structure.

The inventor finds out through research that in the two information behaviors with correlation, the subsequent information behavior of the user is likely to be a supplement to the prior information behavior of the user. For example, after a user purchases a mobile phone, the user triggers the purchase action again to purchase a related accessory of the mobile phone, and the related accessory of the mobile phone is a supplement of the mobile phone. When information recommendation is performed for a user to be recommended, that is, a prediction supplement of a current information behavior of the user to be recommended is provided, for example: after the user purchases the mobile phone, it is reasonable to recommend the relevant mobile phone accessories to the user. Therefore, in order to improve the recommendation effect, in a specific example, the neighbor nodes of the first node include a 1 st order neighbor node; and a first connecting edge connecting the first node and the 1 st-order neighbor node in the relational network is a directed edge directed to the 1 st-order neighbor node by the first node, so as to indicate that in a first behavior correlation event represented by the first connecting edge, the time of a user for triggering information behaviors according to the first information is earlier than the time of the user for triggering information behaviors corresponding to the 1 st-order neighbor node. That is, in the first behavior correlation event, the user triggers the first information first and then triggers the information corresponding to the 1 st-order neighbor node. The inventor further researches and discovers that the relevance between the information triggered by the user at this time and the information triggered by the user at the last time is larger; for the information triggered last time, the information triggered this time by the user has higher recommendation reference significance. Therefore, in order to further improve the recommendation effect, the first behavior related event refers to an event that, historically, after a user triggers a related information behavior with respect to the first information, another information to which the related information behavior is triggered is information corresponding to the 1 st-order neighbor node again.

Furthermore, the neighbor nodes of the first node comprise n-1 order neighbor nodes and n order neighbor nodes of the first node; wherein n is an integer greater than 1; and a second connecting edge connecting the n-1 order neighbor node and the n-order neighbor node in the relational network is a directed edge directed to the n-order neighbor node by the n-1 order neighbor node, so as to indicate that in a second behavior correlation event represented by the second connecting edge, the time of a user for an information triggering behavior corresponding to the n-1 order neighbor node is earlier than the time of the user for the information triggering behavior corresponding to the n-order neighbor node. Specifically, the second behavior related event refers to an event that, after a user triggers a related information behavior for information corresponding to the n-1 order neighbor node in history, other information for which the related information behavior is triggered is information corresponding to the n order neighbor node again.

In the above 105, in an example, all or part of information corresponding to the neighbor nodes of the first node in the relationship network may be used as the second information. Usually, the number of neighboring nodes of the first node is large, and in consideration of the computational power of the feature extraction model, the information portion corresponding to the neighboring node of the first node is generally used as the second information. For example: the neighbor nodes of the first node may be sorted according to the order from small to large, and the information corresponding to the M neighbor nodes that are sorted in the front is used as the second information, where the value of M may be set according to actual needs, for example: may be 512. Note: the order of the x-order neighbor node of the first node relative to the first node in the relational network is x, and x is a positive integer.

In order to improve the representation capability of the model, representative information can be extracted from the information corresponding to the neighbor nodes of the first node in the relational network and used as the second information. In practical application, the strength of the relationship between the neighbor node and the first node can be considered to be increased, the corresponding neighbor node is selected according to the strength of the relationship, information corresponding to the selected neighbor node is used as second information, and semantic enhancement is performed on the first node according to the information corresponding to the selected neighbor node. Wherein, the strength of the relation can be calculated according to actual needs. For example: the larger the historical occurrence frequency of the behavior correlation event represented by the connection edge between the first node and the neighbor node is, the stronger the relationship strength between the neighbor node and the first node is.

In an example, in the foregoing 105, "determining the second information from the information corresponding to the neighbor node of the first node in the relationship network" may specifically be implemented by:

1051. and determining 1 order neighbor nodes from the neighbor nodes of the first node.

1052. And acquiring a first historical occurrence frequency of a first behavior correlation event represented by a first connecting edge connecting the first node and the 1-order neighbor node in the relational network.

1053. And when the first historical occurrence frequency is greater than or equal to a first occurrence frequency threshold value, sampling information corresponding to the 1-order neighbor node to serve as the second information.

In 1051, in the above relationship network, the number of the 1 st order neighbor nodes of the first node may be multiple. In practical application, a 1 st order neighbor node can be determined from the neighbor nodes of the first node in a traversal manner.

1052, wherein the first historical occurrence is determined according to the historical information behavior data.

1053 above, the size of the first occurrence threshold may be set according to actual needs, which is not specifically limited in this application. And when the first historical occurrence frequency is greater than or equal to the first occurrence frequency threshold, determining the information corresponding to the 1-order neighbor node as second information. The first generation time threshold is a super parameter, and adaptation adjustment can be performed on different scenes.

Further, in 105, "determining the second information from the information corresponding to the neighbor node of the first node in the relationship network" may further include:

1054. and determining n-order neighbor nodes from the neighbor nodes of the first node.

1055. And acquiring a second historical occurrence frequency of a second behavior correlation event represented by a second connecting edge of the n-1 order neighbor node connected with the first node and the n order neighbor node in the relational network.

1056. And when the second historical occurrence frequency is greater than or equal to a second occurrence frequency threshold value, sampling information corresponding to the n-order neighbor node to serve as the second information.

In 1054 above, n is an integer greater than 1. The number of n-order neighbor nodes in the neighbor nodes of the first node may be multiple, so in practical application, one n-order neighbor node can be determined from the neighbor nodes of the first node in a traversal manner.

In 1055 above, the second history occurrence frequency is determined according to the history information behavior data.

In 1056 above, the size of the second occurrence threshold may be set according to actual needs, which is not specifically limited in this application. The second occurrence threshold is a super parameter, and adaptation adjustment can be performed in different scenes. And when the second historical occurrence frequency is greater than or equal to a second occurrence frequency threshold value, determining the information corresponding to the n-order neighbor node as second information. In one example, the first occurrence threshold and the second occurrence threshold are equal in size.

In an example, the "obtaining a relationship network" in 104 can be implemented by the following steps:

1041. and displaying the type of the relationship network to the user to be recommended for the user to be recommended to select.

1042. And acquiring a relation network corresponding to the type selected by the user to be recommended.

And the types of the behavior related events represented by the connecting edges in the different types of relational networks are different. The types of the behavior related events represented by the connecting edges in the same relational network are the same.

In actual application, different types can be defined in advance for different behavior associated events.

For example: when the behavior associated event represented by the connection edge is an event that the same user historically triggers information behaviors aiming at the information corresponding to the nodes at the two ends of the connection edge in a second preset time period, defining the type of the behavior associated event as a first type; when the behavior associated event represented by the connection edge is an associated operation event which is executed by a user aiming at the information corresponding to the nodes at the two ends of the connection edge in history, defining the type of the behavior associated event as a second type; and defining the type of the behavior associated event as a third type when the behavior associated event represented by the connection edge is an event that other information targeted by the information behavior is information corresponding to the other end node in the two end nodes of the connection edge after the information behavior is triggered by a user aiming at the information corresponding to the end node in the two end nodes of the connection edge and then the information behavior is triggered again.

For user selection convenience, relevant explanations of the types of the relationship network can also be presented so that the user can know the types.

Through user selection, a relationship network related to the current requirements of the user can be obtained, and the recommendation effect is improved.

In one example, the feature extraction model may be a neural network-based sequence-to-sequence model. In 103, "obtaining the information feature of the first information through a pre-constructed feature extraction model according to the first information and the second information" may specifically be implemented by:

1031. and combining the initial information characteristics of the first information and the initial information characteristics of the second information into a first initial characteristic sequence.

1032. And obtaining a first updated feature sequence through the sequence-to-sequence model based on the neural network according to the first initial feature sequence.

1033. And determining the information characteristic of the first information according to the first updated characteristic sequence.

In practical applications, the information features and the initial information features in the embodiments of the present application may be specifically in the form of vectors. Of course, the matrix may also be used, and this is not specifically limited in this embodiment of the present application.

The above neural network-based sequence-to-sequence model may include a GRU (Gate recovery Unit) model, an LSTM (Long Short-Term Memory), a self-attention model, and the like. The self-attention model may be a multi-layer self-attention model.

In an example, when the sequence-to-sequence model based on the neural network is a GRU model or an LSTM model, the number of the second information is plural in the 1031; the smaller the order of the node corresponding to the second information in the relationship network relative to the first node is, the closer the position of the initial information feature in the first initial feature sequence is to the initial information feature of the first information. Specifically, the initial information feature of the first information may be at a first-ranked position in the initial feature sequence. Thus, the initial feature sequence is input into the neural network-based sequence-to-sequence model, which is capable of identifying the precedence order between the various pieces of information.

In another example, when the sequence-to-sequence model based on the neural network is a self-attention model, the method may further include:

106. and acquiring an attribute embedded vector and a position embedded vector of each piece of information in the first information and the second information.

107. And determining the initial information characteristics of each piece of information according to the attribute embedded vector and the position embedded vector of each piece of information.

In the above 106, the attribute embedding vector of each piece of information corresponds to the attribute feature of the piece of information; the position embedding vector of each information corresponds to the position characteristic of the node corresponding to the information relative to the first node in the relational network.

In 106, the attribute embedded vector of each piece of information may be determined based on the description data of the piece of information. For enhancing information representation, the description data is multi-modal data; the multimodal data includes at least two of: text, image, video, audio. Taking the article as an example, the text may include an article identification number, an article title, and the like; the image may comprise an image of the item within an item detail page; the video may include a video of the item within an item detail page.

In one example, the description data includes an information identification number. The "determining an attribute embedded vector of each piece of information according to description data of the piece of information" specifically includes: determining an identification number embedding vector of each piece of information according to the information identity identification number of the information; the attribute embedded vector of each piece of information includes an identification number embedded vector of the piece of information.

In another example, the description data includes an information title. The "determining an attribute embedded vector of each piece of information according to description data of the piece of information" specifically includes: performing word segmentation processing on the information title of each piece of information to obtain a plurality of words corresponding to the information; determining respective word embedding vectors of a plurality of words corresponding to each piece of information; the attribute embedded vector of each piece of information includes a word embedded vector of each of a plurality of words corresponding to the piece of information.

In an implementation, the position embedding vector of each information may be determined according to the order of the node corresponding to the information in the relation network relative to the first node. The same order, the corresponding position embedding vectors are the same.

In the above 107, in an example, the attribute embedding vector and the position embedding vector of each information may be added element by element to obtain the initial information characteristic of the information.

In this way, the concept of order recognition from the attention model, namely the sequential order of each piece of information, can be helped by embedding the vector in the position.

1032, obtaining a first updated feature sequence according to the first initial feature sequence through the sequence-to-sequence model based on the neural network. And performing correlation processing on the first initial characteristic sequence based on the sequence-to-sequence model of the neural network to obtain a first updated characteristic sequence. The specific processing procedures can be found in the prior art and are not described in detail herein.

In 1033 above, specifically, the feature at a specified position in the updated feature sequence may be used as the information feature of the first information. Wherein, the designated position can be designated according to actual needs. In an example, a feature at a first position in the first updated sequence of features is determined to be an information feature of the first information; wherein the first position in the first updated feature sequence corresponds to a position of an initial information feature of the first information in the first initial feature sequence. For example: the initial information feature of the first information is at the first-ranked position in the initial feature sequence, and then the feature at the first-ranked position in the updated feature sequence is determined as the information feature of the first information.

It should be added that, in the embodiment of the present application, the order 1 neighbor node, the order n-1 neighbor node, and the order n neighbor node are all relative to the first node.

In practical applications, there may be some information that is not associated with other information through user behavior, such as: there is no user behavior or less user behavior information (called long tail information). For the long tail information, it is likely that the second information associated with the user behavior cannot be acquired. In order to improve the recommendation effect of the information and avoid vicious circle of hot information recommendation, the method may further include the following steps:

108. and if the second information associated with the user behavior of the first information is not acquired, acquiring the description data of the first information.

109. And obtaining the information characteristics of the first information through the characteristic extraction model according to the description data of the first information.

In 108 above, the description data is multi-modal data; the multimodal data includes at least two of: text, image, video, audio. Taking the article as an example, the text may include an article identification number, an article title, and the like; the image may comprise an image of the item within an item detail page; the video may include a video of the item within an item detail page.

In the embodiment, the multi-modal fusion input enables the model to better learn the information features of the information without user behaviors or with fewer behaviors, and is helpful for improving the recommendation effect of the information.

It should be added that, in the above embodiments, the order 1 neighbor node, the order n-1 neighbor node, and the order n neighbor node are all relative to the first node.

In one embodiment, the information is a commodity. That is, the information recommendation method is specifically a commodity recommendation method, and the commodity recommendation method includes:

and Z1, determining the first commodity currently triggered by the user to be recommended.

And Z2, acquiring a second commodity relevant to the user behavior of the first commodity.

Wherein the second item is determined from historical item behavior data for a plurality of users.

And Z3, obtaining the commodity characteristics of the first commodity through a pre-constructed characteristic extraction model according to the first commodity and the second commodity.

And Z4, recommending commodities to the user to be recommended according to the commodity characteristics of the first commodity.

The specific implementation manners of the steps Z1 to Z4 may refer to the corresponding contents in the above embodiments, and are not described herein again.

Here, it should be noted that: the method provided in the embodiment of the present application may include all or part of the steps in the embodiments in addition to the steps described above, and specific reference may be made to corresponding contents in the embodiments above, which are not described herein again.

In another aspect of the present application, an information characterizing method is provided. The method comprises the following steps:

h1, determining the first information.

H2, obtaining second information related to the user behavior existing in the first information.

Wherein the second information is determined according to historical information behavior data of a plurality of users.

H3, obtaining the information characteristics of the first information through a pre-constructed characteristic extraction model according to the first information and the second information.

In the above H1, the first information may be determined from a plurality of information by way of traversal.

The specific implementation manners of the steps H2 and H3 may refer to the corresponding contents in the above embodiments, and are not described herein again.

In practical application, the finally obtained information characteristics of the first information can be used in scenes such as information recommendation and information search.

Taking a recommended scene as an example, the information characterization method can be adopted to obtain the information characteristics of all information, and the information characteristics are stored in an information characteristic library; subsequently, according to the information characteristics of the information currently triggered by the user to be recommended, one or more pieces of information which are most similar to the information characteristics of the information currently triggered by the user to be recommended are searched and obtained in an information characteristic library; and recommending the one or more pieces of information to the user to be recommended as the information to be recommended.

Taking a search scene as an example, the information characterization method can be adopted to obtain the information characteristics of all information, and the information characteristics are stored in an information characteristic library; subsequently, according to information description data (such as information titles, images, information identity identifiers and the like) input by a user, determining information characteristics serving as search bases, and searching for one or more pieces of information which are most similar to the information characteristics serving as the search bases in an information characteristic library; and returning the one or more information as a search result to the user. For example: the information is a commodity; the user can input the commodity key words or commodity images in the search engine, and then search results can be obtained.

Here, it should be noted that: the method provided in the embodiment of the present application may include all or part of the steps in the embodiments in addition to the steps described above, and for details, reference may be made to corresponding contents in the embodiments described above, and details are not repeated here.

Fig. 2 is a schematic flow chart illustrating a model training method according to another embodiment of the present application. The execution main body of the method can be a client or a server. The client may be hardware integrated on the terminal and having an embedded program, may also be application software installed in the terminal, and may also be tool software embedded in an operating system of the terminal, which is not limited in this embodiment of the present application. The terminal can be any terminal equipment including a mobile phone, a computer and the like. The server may be a common server, a cloud, a virtual server, or the like, which is not specifically limited in this embodiment of the application. As shown in fig. 2, the method includes:

201. and obtaining second sample information and third sample information which are associated with the user behavior existing in the first sample information.

Wherein the second sample information and the third sample information are determined according to historical information behavior data of a plurality of users.

202. And obtaining the information characteristics of the first sample information through a characteristic extraction model according to the first sample information and the second sample information.

203. And obtaining reference information characteristics through the characteristic extraction model according to the third sample information.

204. And optimizing network parameters of the feature extraction model according to a first difference between the information features of the first sample information and the reference information features.

In 201, the second sample information is associated with the first sample information by the user behavior; the third sample information is also associated with the first sample information by user behavior. The second sample information is different from the third sample information, that is, the second sample information is different from the third sample information.

In one implementation, the second sample information or the third sample information associated with the first sample information presence user behavior may include other information that historically triggered the information behavior by a user within a second preset time period at which the user triggered the information behavior with respect to the first sample information. That is, the user also triggers an information behavior with respect to the other information within the second preset time period. The second preset time period may be set according to actual needs, for example: the second preset time period may be set to 1 hour or 1 minute. The inventor finds that historically, the information triggered by the user after triggering the information behavior aiming at the first sample information has higher recommendation reference significance. Therefore, the triggering time point may be specifically a starting point of the second preset time period.

For example: the first sample information is a commodity A; historically, after a user clicks the commodity A, the commodity B and the commodity C are clicked within the next 1 minute; then, the second sample information may include the article B, and the third sample information may include the article C.

In yet another implementation, the second sample information or the third sample information associated with the first sample information may include information associated with the first sample information historically obtained by a user through an association operation, i.e., the user has historically performed an association operation on the first sample information and another information, so that the first sample information and the another information are associated.

In another implementation, the second sample information or the third sample information associated with the first sample information and the user behavior may include other information for which the information behavior is triggered again after the information behavior is triggered by the user with respect to the first sample information in history. The information behavior triggered by the other information can be understood as the first information behavior of the user after the information behavior is triggered by the first sample information.

And 202, combining the second sample information, performing semantic enhancement on the first sample information through a pre-constructed feature extraction model, and obtaining the information features of the first sample information.

In 203, the number of the third sample information may be one or more. The third sample information is associated with the first sample information, and therefore, the third sample information can be understood as a positive sample, that is, the third sample information is used as a positive sample to train the feature extraction model.

In practical application, the feature extraction model can be trained by combining a negative sample in addition to the positive sample. Specifically, the feature extraction model may be trained using, as a negative sample, fourth sample information associated with the presence of user behavior in other sample information that is different from the first sample information.

In 204, the initial value of each network parameter in the feature extraction model may be a random value. Subsequently, the network parameters of the feature extraction model can be updated by calculating the gradient through feedback according to the first difference between the information feature of the first sample information and the reference information feature. The implementation of the specific gradient calculation and update steps can be found in the prior art and will not be described in detail here.

Compared with the prior art, the technical scheme provided by the embodiment of the application does not need to manually enumerate multiple recall links, obtains a better recommendation effect by enhancing the information representation, enhances the universality and also ensures the stability of the recommendation effect.

For convenience, the relationship network may be constructed in advance according to historical information behavior data of a plurality of users. The relational network comprises a plurality of nodes and a plurality of connecting edges, different nodes correspond to different information, the connecting edges among the nodes are used for representing behavior associated events, namely the information corresponding to the nodes at two ends of the connecting edges is associated through the behavior associated events represented by the connecting edges. The behavior related event represented by the connection edge may be an event that the same user historically triggers an information behavior for the information corresponding to the nodes at the two ends of the connection edge within a second preset time period; or the behavior related event represented by the connection edge may be a related operation event executed by a user in history for respective corresponding information of nodes at both ends of the connection edge; or, the behavior-related event represented by the connection edge may be an event in which, historically, after a user triggers an information behavior for information corresponding to one of the two end nodes of the connection edge, other information for which the information behavior is triggered again is information corresponding to the other end node of the two end nodes of the connection edge. In this way, when other information associated with a certain sample information is acquired subsequently, the acquisition relationship network can acquire the other information associated with the certain sample information. Therefore, the method may further include:

205. a relationship network is obtained.

Wherein the relationship network is constructed according to the historical information behavior data; the relationship network comprises a first node corresponding to the first sample information and a neighbor node of the first node; the relational network also comprises a connection edge between the nodes, which is used for representing the behavior correlation event.

206. And determining the second sample information and the third sample information from the information corresponding to the neighbor node of the first node in the relational network.

In the above 205, in an example, in the relational network, the neighboring nodes of the first node may include a 1 st order neighboring node of the first node, and the 1 st order neighboring node of the first node is directly connected to the first node through a connection edge, that is, information corresponding to the 1 st order neighboring node of the first node is directly associated with the first sample information.

Further, in the relational network, the neighbor nodes of the first node may further include 2 nd order neighbor nodes of the first node. The 2-order neighbor node of the first node is directly connected with the 1-order neighbor node of the first node through the connecting edge, and the 1-order neighbor node of the first node is directly connected with the first node through the connecting edge, that is, the information corresponding to the 2-order neighbor node of the first node is indirectly associated with the first sample information.

By analogy, in the relational network, the neighbor nodes of the first node may further include N-order neighbor nodes. The information corresponding to the N-th order neighbor node of the first node is indirectly associated with the first sample information. Wherein N is an integer greater than 2.

The inventor finds out through research that in the two information behaviors with correlation, the subsequent information behavior of the user is likely to be a supplement to the prior information behavior of the user. For example, after a user purchases a mobile phone, the user triggers the purchase action again to purchase a related accessory of the mobile phone, and the related accessory of the mobile phone is a supplement of the mobile phone. When information recommendation is performed on a user to be recommended, the method is a prediction supplement of the current information behavior of the user to be recommended. Therefore, in order to improve the recommendation effect, in a specific example, the neighbor nodes of the first node include a 1 st order neighbor node of the first node; and a first connecting edge connecting the first node and the 1 st-order neighbor node in the relational network is a directed edge directed to the 1 st-order neighbor node by the first node, so as to indicate that in a first behavior correlation event represented by the first connecting edge, the time of a user for triggering information behaviors by the first sample information is earlier than the time of the user for triggering information behaviors by the information corresponding to the 1 st-order neighbor node. That is, in the first behavior association event, the user triggers the first sample information first, and then triggers the information corresponding to the 1 st-order neighbor node. Specifically, the first behavior related event refers to an event that, after a user triggers a related information behavior for the first sample information in history, another information for which the related information behavior is triggered again is information corresponding to the 1 st-order neighbor node.

Furthermore, neighbor nodes of the first node comprise n-1 order neighbor nodes and n order neighbor nodes; wherein n is an integer greater than 1; and a second connecting edge connecting the n-1 order neighbor node and the n-order neighbor node in the relational network is a directed edge directed to the n-order neighbor node by the n-1 order neighbor node, so as to indicate that in a second behavior correlation event represented by the second connecting edge, the time of a user for an information triggering behavior corresponding to the n-1 order neighbor node is earlier than the time of the user for the information triggering behavior corresponding to the n-order neighbor node. Specifically, the second behavior related event refers to an event that, after a user triggers a related information behavior for information corresponding to the n-1 order neighbor node in history, other information for which the related information behavior is triggered is information corresponding to the n order neighbor node again.

In 206, a first part of the information corresponding to the neighbor node of the first node in the relationship network may be used as the second sample information, and a second part may be used as the third sample information. The first portion does not intersect the second portion.

In an implementation scheme, in the foregoing 206, "determining the second sample information and the third sample information from information corresponding to a neighboring node of the first node in the relationship network" may specifically be implemented by adopting the following steps:

2061. an order 1 neighbor node is determined from a plurality of order 1 neighbor nodes of the first node in the relational network.

In an example, an order 1 neighbor node may be selected from a plurality of order 1 neighbor nodes of the first node in the relational network; acquiring a first historical occurrence frequency of a first behavior correlation event represented by a first connecting edge connecting the first node and the selected 1-order neighbor node in the relational network; and when the first historical occurrence frequency is greater than or equal to a first occurrence frequency threshold value, taking the selected 1-order neighbor node as a finally determined node.

2062. Determining nodes found in the relational network from the determined 1 st order neighbor nodes along a first path in the relational network.

Wherein the first path is distinct from each path through the first node.

2063. And determining the third sample information from the determined information corresponding to the 1-order neighbor node and the found information corresponding to the node.

2064. And determining the second sample information from the information corresponding to other nodes in the relational network.

Wherein the other nodes refer to nodes in the relational network except the determined 1 st order neighbor nodes and the found nodes.

In the above 2062, the path is constituted by a plurality of nodes and connecting edges between the nodes. The starting point of the first path is the determined 1-order neighbor node, and the first path does not pass through the first node. The first paths may be multiple paths, where the multiple first paths include all paths that start from the determined 1 st-order neighbor node in the relational network and do not pass through the first node.

In an example, the step 2063 of "determining the third sample information from the information corresponding to the determined 1 st order neighbor node and the information corresponding to the found node" may include the following steps:

and S20, using the information corresponding to the determined 1-order neighbor node as third sample information.

And S21, determining 2 nd order neighbor nodes of the first node from the found nodes.

The 2 nd order neighbor nodes of the first node in the found nodes are possibly a plurality of, so that one of the found nodes can be determined in a traversal mode.

S22, obtaining the third history occurrence frequency of a third behavior correlation event represented by a third connecting edge connecting the determined 1-order neighbor node and the determined 2-order neighbor node in the relational network.

And S23, when the occurrence frequency of the third history is greater than or equal to a second occurrence frequency threshold, sampling the information corresponding to the determined 2-order neighbor node to serve as third sample information.

Further, the step 2063 of "determining the third sample information from the information corresponding to the determined 1 st order neighbor node and the information corresponding to the found node" may further include the steps of:

and S24, determining the i-order neighbor nodes of the first node from the found nodes.

Wherein i is an integer greater than 2. There may be a plurality of i-order neighbor nodes of the first node among the found nodes, so that one of the i-order neighbor nodes can be determined by adopting a traversal method.

S25, obtaining a fourth history occurrence frequency of a fourth row serving as a correlation event represented by a fourth connecting edge of the i-1 order adjacent node connected with the first node and the determined i order adjacent node in the relational network.

And S26, when the fourth historical occurrence frequency is greater than or equal to a second occurrence frequency threshold value, sampling information corresponding to the determined i-order neighbor node to serve as the third sample information.

In an example, the aforementioned 2064 of "determining the second sample information from the information corresponding to the other nodes in the relationship network" may be implemented by the following steps:

and S31, determining 1 st order neighbor nodes of the first node from the other nodes.

There may be more than one 1 st order neighbor node of the first node among the other nodes, and in one example, one of them may be found in a traversal manner.

S32, acquiring the first historical occurrence frequency of the first behavior correlation event represented by the first connecting edge connecting the first node and the 1 st order neighbor node in the relational network.

S33, when the first historical occurrence frequency is larger than or equal to a first occurrence frequency threshold value, sampling information corresponding to the 1-order neighbor node to serve as the second sample information.

Further, the "determining the second sample information from the information corresponding to other nodes in the relationship network" in 2064 may further include the following steps:

and S34, determining the n-order neighbor nodes of the first node from the other nodes.

S35, acquiring a second history occurrence frequency of a second behavior correlation event represented by a second connecting edge of the n-1 order neighbor node connected with the first node and the n order neighbor node in the relational network.

S36, when the second historical occurrence frequency is larger than or equal to a second occurrence frequency threshold value, sampling the information corresponding to the n-order neighbor node to be used as the second sample information.

In one implementation, the feature extraction model is a neural network-based sequence-to-sequence model. In 202, "obtaining the information feature of the first sample information according to the first sample information and the second sample information through a pre-constructed feature extraction model" may specifically be implemented by:

2021. and combining the initial information features of the first sample information and the initial information features of the second sample information into a first initial sample feature sequence.

2022. And obtaining a first updated sample feature sequence through the sequence-to-sequence model based on the neural network according to the first initial sample feature sequence.

2023. And determining the information characteristic of the first sample information according to the first updated sample characteristic sequence.

In an example, in the above 2021, the number of the second sample information is plural. The smaller the order of the node corresponding to the second sample information in the relationship network relative to the first node is, the closer the position of the initial information feature in the first initial sample feature sequence is to the initial information feature of the first sample information. Specifically, the initial information feature of the first sample information may be at a first-ordered position in the first initial sample feature sequence. In this way, the first initial sample feature sequence is input into the neural network-based sequence-to-sequence model, which is capable of identifying the precedence order between the pieces of information.

207. and acquiring an attribute embedded vector and a position embedded vector of each sample information in the first sample information and the second sample information.

The attribute embedded vector of each sample information corresponds to the attribute characteristic of the sample information; the position embedding vector of each sample information corresponds to the position characteristics of the node corresponding to the sample information in the relation network relative to the first node.

208. And determining the initial information characteristics of each sample information according to the attribute embedded vector and the position embedded vector of each sample information.

In the step 207, the attribute embedded vector of each sample information may be determined according to the description data of the sample information. For enhancing information representation, the description data is multi-modal data; the multimodal data includes at least two of: text, image, video, audio.

In one example, the description data includes an information identification number. The "determining the attribute embedded vector of each sample information according to the description data of the sample information" specifically includes: determining an identification number embedding vector of each sample information according to the information identity identification number of each sample information; the attribute embedding vector of each piece of information includes an identification number embedding vector of the piece of sample information.

In another example, the description data includes an information title. The "determining the attribute embedded vector of each sample information according to the description data of the sample information" specifically includes: performing word segmentation processing on the information title of each sample information to obtain a plurality of words corresponding to the sample information; obtaining respective word embedding vectors of a plurality of words corresponding to each sample information; the attribute embedded vector of each sample information includes a word embedded vector of each of a plurality of words corresponding to the sample information.

In an implementation scheme, the position embedding vector of each sample information may be determined according to the order of the node corresponding to the sample information in the relationship network relative to the first node. The same order, the corresponding position embedding vectors are the same.

In 208, in an example, the attribute embedding vector and the position embedding vector of each sample information may be added element by element to obtain the initial information characteristic of the sample information.

In this way, the concept of order recognition from the attention model, namely the sequential order of each sample information, can be helped by embedding the vector in the position.

In the above 2022, according to the first initial sample feature sequence, a first updated sample feature sequence is obtained through the sequence-to-sequence model based on the neural network. And obtaining a first updated sample feature sequence by performing correlation processing on the first initial sample feature sequence based on the sequence-to-sequence model of the neural network. The specific processing procedures can be found in the prior art and are not described in detail herein.

In the above 2023, specifically, the feature at a specified position in the first updated sample feature sequence may be used as the information feature of the first sample information. Wherein, the designated position can be designated according to actual needs. In an example, determining a feature at a first position in the first updated sequence of sample features as an information feature of the first sample information; wherein the first position in the first updated sample feature sequence corresponds to a position of an initial information feature of the first sample information in the first initial sample feature sequence. For example: the initial information feature of the first sample information is at the first-ranked position in the first initial sample feature sequence, and then the feature at the first-ranked position in the first updated sample feature sequence is determined as the information feature of the first sample information.

In order to further improve the model training effect and accelerate convergence, the method may further include:

209. and determining information to be masked from the second sample information.

In 209, the information to be masked may be randomly determined from the second sample information.

In 208 above, "determining the initial information feature of the information to be masked according to the attribute embedded vector and the position embedded vector of the information to be masked" includes:

2081. masking the attribute embedded vector of the information to be masked to obtain a masked attribute embedded vector;

in an example, the attribute embedding vector of the information to be masked is multiplied by 0 to obtain a zero vector, and the zero vector is used as a masked attribute embedding vector.

In another example, the element values in the attribute embedded vector of the information to be masked may be randomly modified to obtain a masked attribute embedded vector.

Certainly, in practical application, other masking manners may also be used for masking, which is not specifically limited in this embodiment of the present application.

2082. Adding the masked attribute embedded vector and the position embedded vector of the information to be masked according to elements to obtain initial information characteristics of the information to be masked;

the method may further include:

209. and performing parameter optimization on the feature extraction model according to the feature at the second position in the first updated sample feature sequence and the second difference of the attribute embedding vector of the information to be masked.

Wherein the second position in the first updated sample feature sequence corresponds to a position of an initial information feature of the information to be masked in the first initial sample feature sequence.

In this embodiment, that is, the attribute embedding vector of the information to be masked is masked, and the position embedding vector of the information to be masked is directly used as the initial information feature of the information to be masked and is input into the feature extraction model. And the feature at the second position in the first updated sample feature sequence is also the attribute embedded vector of the mask information obtained by the feature extraction model prediction. And performing parameter optimization on the feature extraction model through returning and calculating a gradient according to the feature at a second position in the first updated sample feature sequence and the difference of the attribute embedding vector of the information to be masked. Only the attribute embedded vector of the information to be masked is masked, and the position embedded vector is reserved, so that the model can know the order of the information to be masked which needs to be restored, and the model can be better learned.

It should be noted that, in practical application, the number of information to be masked may be one or more. When the number of the information to be masked is multiple, the second difference corresponding to each of the multiple information to be masked can be integrated, and the parameter optimization is performed on the feature extraction model through the return calculation gradient. Specifically, the feature extraction model may be optimized according to a sum of second differences corresponding to each of the plurality of information to be masked.

In addition, in practical applications, the step 204 and the step 209 may be executed together, that is, the first difference and the second difference are combined, and the feature extraction model is optimized through calculating the gradient in a back-passing manner. Specifically, the feature extraction model may be optimized for parameters according to a sum of the first difference and the second difference.

In an implementation scheme, in 209, "determine information to be masked from the second sample information" specifically includes: and determining information to be masked from the second sample information through random masking.

For example: in the training process, masks of 0 and 1 are randomly generated for each sample information in the second sample information. The above-mentioned information to be masked refers to information masked by 0. Information masked by a 1 may be understood as being unmasked. Specifically, the probability of generating 0 can be set in advance as the super parameter a, and then the probability of generating 1 is 1-a. The value of the super parameter a may be set according to actual needs, which is not specifically limited in this application, for example: and a is 0.1.

For example: a, taking 0.1, wherein the number of the second sample information is 5, and generating 0 and four 1; then, one of the 5 pieces of second sample information is randomly selected to be masked by 0, and the other four pieces are masked by 1.

In addition, after the attribute embedded vector of each sample information except the information to be masked in the second sample information is multiplied by 1, the attribute embedded vector and the position embedded vector of the sample information are added in element to obtain the initial information characteristic of the sample information.

Further, the method may further include:

210. and updating the position embedding vector according to the first difference of the information characteristic of the first sample information and the reference information characteristic to be used as a next embedding basis.

The location embedding vector may be updated based on the first difference via a network gradient backhaul. Thus, position embedding can be performed based on the updated position embedding vector at the next training.

It should be noted that, in practical application, the first difference and the second difference may be integrated, and the location embedding vector may be updated through network gradient feedback.

Further, the method may further include:

211. and updating the attribute embedded vector according to the information characteristic of the first sample information and the first difference of the reference information characteristic to be used as a next embedding basis.

The attribute-embedded vector may be updated based on the first difference via a network gradient backhaul. In this way, attribute embedding may be performed based on the updated attribute embedding vector at the next training.

It should be noted that, in practical application, the first difference and the second difference may be integrated, and the attribute embedded vector may be updated through network gradient feedback.

In one example, the feature extraction model is a neural network-based sequence-to-sequence model; the third sample information is a plurality of; in 203, the "obtaining the reference information feature through the feature extraction model according to the third sample information" may specifically be implemented by:

2031. and combining initial information features of the third sample information into a second initial sample feature sequence.

2032. And obtaining a second updated sample feature sequence through the sequence model based on the sequence of the neural network according to the second initial sample feature sequence.

2033. And determining the reference information characteristic according to the second updated sample characteristic sequence.

In an example, in the above 2031, the smaller the order of the node corresponding to the third sample information in the relationship network relative to the first node is, the earlier the position of the initial information feature in the second initial sample feature sequence is.

212. and acquiring an attribute embedded vector and a position embedded vector of each sample information in the third sample information.

213. And determining the initial information characteristics of each sample information according to the attribute embedded vector and the position embedded vector of each sample information.

The specific process for determining the attribute embedded vector and the position embedded vector of each sample information in 212 may refer to the corresponding contents of "determining the attribute embedded vector and the position embedded vector of each information" in the embodiments, and will not be described herein again.

In the above 213, in an example, the attribute embedding vector and the position embedding vector of each sample information may be added element by element to obtain the initial information characteristic of the sample information.

In 2032, the second initial sample feature sequence may be input to the sequence-to-sequence model based on the neural network, so as to obtain a second updated sample feature sequence output by the model. The processing flow inside the model can be referred to in the prior art, and is not described in detail here.

In the above 2033, the feature at the specified position in the second updated sample feature sequence may be used as the reference information feature. Specifically, the feature at the third position in the second updated sample feature sequence may be used as the reference information feature; and the second position in the second updated sample feature sequence corresponds to the position of the initial information feature of the information corresponding to the determined 1-order neighbor node in the second initial sample feature sequence.

It should be noted that, in the above embodiments, the order-1 neighboring node, the order-n-1 neighboring node, and the order-n neighboring node are all relative to the first node.

It should be noted that, for the content of each step in each embodiment corresponding to the method in the training phase, which is not described in detail, reference may be made to the corresponding content in the embodiment corresponding to the method in the application phase, and details are not described here.

The technical scheme provided by the embodiment of the application can be applied to scenes such as video recommendation, news consultation recommendation, music recommendation, commodity recommendation, friend recommendation and financial product recommendation. The training method provided by the embodiment of the present application will be described in detail below by taking the commodity recommendation field as an example:

firstly, a learning sample is constructed for the model, so that the robust commodity representation can be conveniently learned by the model in the process of fitting the minimum training error. Fig. 3 is a diagram structure constructed for the main product a. The nodes of the circles, arrows, and individual letters in FIG. 3 represent the following items:

circle node: the circle nodes 301 in fig. 3 represent items, and each letter represents a different item. Each article may consist of an article number and a title of the article.

Arrow head: the arrows in FIG. 3 represent directed edges in the graph structure, e.g., a to h have an arrow indicating that the user clicks on item a and then clicks on h.

Sample preparation: in fig. 3, a is illustrated as the primary commodity and h is a positive sample after the primary commodity a is given. At this time, the main commodity a has two first-order neighbors, namely b and c, representing that a user clicks the commodity a and then clicks the first-order neighbor b or c, the first-order neighbors of b and c respectively have own first-order neighbors, the first-order neighbors of b are d, and the first-order neighbors of c are e and f. The first-order neighbors of b and c are also referred to as the second-order neighbors of the primary good a. The same way h also has its own first and second order neighbors.

Positive sample: in the schematic diagram, h represents a positive model sample corresponding to the master commodity a, and is used for model learning. Here, h is a member in a first-order neighborhood of a, when h is a positive sample, each-order neighbor starting from h only represents the neighbor of h, but does not appear in the neighbor of a, which is to avoid information leakage, i.e. a label cannot appear in training input, otherwise, the model can be correctly classified and judged without learning. Similarly, there are other first-order neighbors under the main commodity a as positive samples, and on this basis, all the first-order neighbors of the main commodity a can be traversed to form a plurality of positive samples of the model.

Negative sample: after the positive samples are determined, the negative samples are randomly sampled from the main commodity a, that is, in one training process, the positive samples of other main commodities are randomly sampled into the negative samples of the main commodity a. Similarly, the positive sample of the main commodity a can be the negative sample of other main commodities with a certain probability, and the parameters in the network are updated through the error learning of the positive and negative samples.

After the user behavior and the commodity are constructed into the graph structure, 7 steps are required, and finally, the feature vector (that is, the final embedded characterization vector) of the commodity can be obtained, which will be described in sequence with reference to fig. 4 a.

S1: a multi-modal merchandise representation input.

The multi-modal merchandise representation input vector (i.e., the attribute embedding vector in the above embodiment) includes an identification number embedding vector of the id mapping of the merchandise and a word embedding vector corresponding to each word in the title of the merchandise.

These vectors are learned and can be updated via network gradient feedback. The multi-modal data can be flexibly expanded and added with other text, image, video, audio and other data.

S2: the order in the graph structure is obtained.

As shown in fig. 3, in the case that the main commodity a corresponds to the positive sample h, each node has a corresponding order relative to the main commodity node (i.e., the node corresponding to a), and each order corresponds to one order embedded vector (i.e., the position embedded vector in the above embodiment), which can also be updated through network gradient feedback.

In actual operation, the commodities of each order input into the model are filtered and screened to a certain extent, and the commodity sampling probability is higher as the user behavior is richer. See in particular the steps relating to "sampling" in the above embodiments.

S3: and (4) random masking.

In the training process, masks of 0 and 1 are randomly generated, wherein the probability of generating 0 is a super parameter a, and the probability of generating 1 is 1-a. For example, in a case where a is 0.1, if the product f corresponds to a mask 0, the multi-modal product representation input vector of the product f needs to be multiplied by 0 when being input to the model, and the multi-modal product representation input vector of the product f enters the error calculation of S5 described below.

Specifically, the vectors ultimately input to the multi-layered self-attention model are: the mask generated at S3 is multiplied by the multi-modal merchandise characterization input vector at S1, plus the order embedding vector at S2.

It should be noted that, after the commodity f is masked, the order of the recovered f needs to be known in the calculation of S5, so that the model can be better learned.

S4: and inputting the vector obtained in the step into a multilayer self-attention model to obtain an output corresponding to each position.

S5: a multi-modal commodity characterization output for each location is obtained from a multi-layer self-attention model.

S6: and (4) calculating an error (namely the second difference) at a corresponding position where 0 is generated in the random mask, returning and calculating a gradient, and updating the network parameters.

For details, reference may be made to the related contents of the "second difference" in the above embodiments, and details are not described herein.

S7: and (3) outputting the multi-mode commodity representation corresponding to the prediction positive sample by the multi-mode commodity representation of the main commodity, calculating an error (namely the first difference), returning a gradient, and updating the network parameters.

For details, reference may be made to the relevant content of the "first difference" in the above embodiments, and details are not described herein.

When the trained multilayer self-attention model is put into application subsequently, the multi-modal commodity representation output of the model can be used for representing commodities, and the vector retrieval engine is used for obtaining and recommending the similar commodities of topK. The scheme can be applied to detail page baby recommendation in a mobile phone shopping application program, namely recommending commodities similar to the first commodity in the detail page of the first commodity.

Specifically, as shown in fig. 4 b:

step 400: the client M1 sends the first commodity currently triggered by the user to the server M2.

Step 401: the server M2 acquires the second article associated with the first article presence user action.

Step 402: the server M2 obtains the feature vector of the first commodity through the trained neural network model according to the first commodity and the second commodity.

Step 403: and the server M2 obtains the commodity to be recommended through vector retrieval according to the feature vector of the first commodity.

And obtaining the topK similar commodity as the commodity to be recommended through vector retrieval. That is, through vector retrieval, the similarity between the first commodity and the plurality of commodities can be calculated; and sequencing the commodities according to the similarity from large to small, and selecting K commodities which are sequenced at the front as commodities to be recommended.

Step 404: the server M2 sends the commodity to be recommended to the client M1.

Step 405: the client M1 shows the article to be recommended in the detail sheet of the first article.

Specifically, after clicking to enter the detail page of the first commodity, the user may display a commodity similar to the first commodity below the detail page of the first commodity.

According to the scheme, commodities similar to the current baby can be recalled quickly, and the user experience is improved. According to the scheme, a plurality of recall links are not required to be set through manual enumeration, and the effect (also called recall rate) of multi-path recall can be achieved or even exceeded by using the neighbor enhancement vector embedding representation fused through a multi-mode technology. Note: the topK recall rate is the ratio of the number of hits in the top K similar products to the total number of hits.

In order to solve the problem of insufficient commodity learning with less behavior data, text information such as a commodity title is introduced, similar commodities of the commodities can be quickly obtained through vector retrieval after the commodities are characterized, and the method effectively improves the recall rate.

According to the scheme, a stable graph structure is introduced when the model is built, the representation of the commodities is enhanced through the neighbors, the model is friendly to new commodities through multi-mode fusion input, and the recall effect of the long-tail commodities is further improved.

Fig. 5 is a block diagram illustrating a structure of an information recommendation apparatus according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:

a first determining module 501, configured to determine first information currently triggered by a user to be recommended;

a first obtaining module 502, configured to obtain second information associated with a user behavior existing in the first information; wherein the second information is determined from historical information behavior data of a plurality of users;

a second obtaining module 503, configured to obtain, according to the first information and the second information, an information feature of the first information through a pre-constructed feature extraction model;

the first recommending module 504 is configured to recommend information to the user to be recommended according to the information characteristic of the first information.

Optionally, the first obtaining module is further configured to:

acquiring a relationship network; wherein the relationship network is constructed according to the historical information behavior data; the relationship network comprises a first node corresponding to the first information and a neighbor node of the first node; the relational network also comprises a connecting edge between the nodes, which is used for representing a behavior correlation event;

and determining the second information from the information corresponding to the neighbor node of the first node in the relational network.

Optionally, the neighbor nodes of the first node include a 1-order neighbor node;

and a first connecting edge connecting the first node and the 1 st-order neighbor node in the relational network is a directed edge directed to the 1 st-order neighbor node by the first node, so as to indicate that in a first behavior correlation event represented by the first connecting edge, the time of a user for triggering information behaviors according to the first information is earlier than the time of the user for triggering information behaviors corresponding to the 1 st-order neighbor node.

Optionally, the first behavior related event refers to an event that, after a user triggers a related information behavior with respect to the first information in history, another information to which the related information behavior is triggered is information corresponding to the 1 st-order neighbor node again.

Optionally, the neighbor nodes of the first node include n-1 order neighbor nodes and n order neighbor nodes; wherein n is an integer greater than 1;

and a second connecting edge connecting the n-1 order neighbor node and the n-order neighbor node in the relational network is a directed edge directed to the n-order neighbor node by the n-1 order neighbor node, so as to indicate that in a second behavior correlation event represented by the second connecting edge, the time of a user for an information triggering behavior corresponding to the n-1 order neighbor node is earlier than the time of the user for the information triggering behavior corresponding to the n-order neighbor node.

Optionally, the second behavior related event refers to an event that, after a user triggers a related information behavior for information corresponding to the n-1 order neighbor node in history, other information for which the related information behavior is triggered is information corresponding to the n order neighbor node again.

Optionally, the feature extraction model is a sequence-to-sequence model based on a neural network;

the second obtaining module is specifically configured to:

combining the initial information features of the first information and the initial information features of the second information into a first initial feature sequence;

obtaining a first updated feature sequence through the sequence-to-sequence model based on the neural network according to the first initial feature sequence;

and determining the information characteristic of the first information according to the first updated characteristic sequence.

Optionally, the number of the second information is multiple; the smaller the order of the node corresponding to the second information in the relationship network relative to the first node is, the closer the position of the initial information feature in the first initial feature sequence is to the initial information feature of the first information.

Optionally, the sequence-to-sequence model based on the neural network is a self-attention model;

the second obtaining module is further configured to:

acquiring attribute embedded vectors and position embedded vectors of each information in the first information and the second information; wherein, the attribute embedded vector of each information corresponds to the attribute characteristic of the information; the position embedding vector of each piece of information corresponds to the position characteristics of the node corresponding to the information in the relation network relative to the first node;

and determining the initial information characteristics of each piece of information according to the attribute embedded vector and the position embedded vector of each piece of information.

Optionally, the second obtaining module is further configured to:

and determining the position embedding vector of the information according to the order of the node corresponding to the information in the relation network relative to the first node.

Optionally, the second obtaining module is further configured to:

determining attribute embedded vectors of the information according to the description data of the information;

the description data is multi-modal data; the multimodal data includes at least two of: text, image, video, audio.

Optionally, the description data includes an information identity identification number;

the second obtaining module is specifically configured to:

determining an identification number embedding vector of each piece of information according to the information identity identification number of each piece of information;

the attribute embedding vector of each piece of information includes an identification number embedding vector of the piece of information.

Optionally, the description data includes an information title;

the second obtaining module is specifically configured to:

performing word segmentation processing on the information title of each piece of information to obtain a plurality of words corresponding to the information;

determining respective word embedding vectors of a plurality of words corresponding to each piece of information;

the attribute embedded vector of each piece of information includes a word embedded vector of each of a plurality of words corresponding to the piece of information.

Optionally, the second obtaining module is specifically configured to:

determining a feature at a first position in the first updated sequence of features as an information feature of the first information;

wherein the first position in the first updated feature sequence corresponds to a position of an initial information feature of the first information in the first initial feature sequence.

Optionally, the first obtaining module is specifically configured to:

determining a 1-order neighbor node from the neighbor nodes of the first node;

acquiring a first historical occurrence frequency of a first behavior correlation event represented by a first connecting edge connecting the first node and the 1-order neighbor node in the relational network; the first historical occurrence frequency is determined according to the historical information behavior data;

and when the first historical occurrence frequency is greater than or equal to a first occurrence frequency threshold value, sampling information corresponding to the 1-order neighbor node to serve as the second information.

Optionally, the first obtaining module is specifically further configured to:

determining n-order neighbor nodes from the neighbor nodes of the first node; wherein n is an integer greater than 1;

acquiring a second historical occurrence frequency of a second behavior correlation event represented by a second connecting edge of the n-1 order neighbor node connected with the first node and the n order neighbor node in the relational network; the second historical occurrence frequency is determined according to the historical information behavior data;

and when the second historical occurrence frequency is greater than or equal to a second occurrence frequency threshold value, sampling information corresponding to the n-order neighbor node to serve as the second information.

Optionally, the first recommending module is specifically configured to:

obtaining information to be recommended through vector retrieval according to the information characteristics of the first information;

and recommending the information to be recommended to the user to be recommended.

Optionally, the first obtaining module is further configured to obtain description data of the first information if second information associated with a user behavior existing in the first information is not obtained;

the description data is multi-modal data; the multimodal data includes at least two of: text, image, video, audio;

the second obtaining module is further configured to obtain the information feature of the first information through the feature extraction model according to the description data of the first information.

Here, it should be noted that: the information recommendation device provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principle and technical effect of each module may refer to the corresponding content in the above method embodiments, which is not described herein again.

Fig. 5 shows a block diagram of an information characterization apparatus according to another embodiment of the present application. As shown in fig. 5, the apparatus includes:

a first determining module 501, configured to determine first information;

a second obtaining module 503, configured to obtain, according to the first information and the second information, an information feature of the first information through a feature extraction model that is constructed in advance.

Here, it should be noted that: the information characterization device provided in the above embodiment can implement the technical solutions described in the above method embodiments, and the specific implementation principle and technical effect of each module can refer to the corresponding content in the above method embodiments, and are not described herein again.

Fig. 5 is a block diagram illustrating a structure of a product recommendation device according to another embodiment of the present application. As shown in fig. 5, the apparatus includes:

the first determining module 501 is configured to determine a first commodity currently triggered by a user to be recommended;

a first obtaining module 502, configured to obtain a second product related to a user behavior of the first product; wherein the second item is determined from historical item behavior data of a plurality of users;

a second obtaining module 503, configured to obtain, according to the first commodity and the second commodity, a commodity feature of the first commodity through a pre-constructed feature extraction model;

the first recommending module 504 is configured to recommend a commodity to the user to be recommended according to the commodity feature of the first commodity.

Here, it should be noted that: the commodity recommendation device provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the specific implementation principle and technical effect of each module may refer to the corresponding content in the above method embodiments, and are not described herein again.

Fig. 6 shows a block diagram of a model training apparatus according to another embodiment of the present application. As shown in fig. 6, the apparatus includes:

a third obtaining module 601, configured to obtain second sample information and third sample information that are associated with the first sample information by the user behavior; wherein the second sample information and the third sample information are determined according to historical information behavior data of a plurality of users;

a fourth obtaining module 602, configured to obtain an information feature of the first sample information through a feature extraction model according to the first sample information and the second sample information;

the fourth obtaining module 602 is further configured to obtain, according to the third sample information, a reference information feature through the feature extraction model;

a first optimizing module 603, configured to perform network parameter optimization on the feature extraction model according to a first difference between the information feature of the first sample information and the reference information feature.

Optionally, the third obtaining module is further configured to:

acquiring a relation network corresponding to the first sample information; wherein the relationship network is constructed according to the historical information behavior data; the relationship network comprises a first node corresponding to the first sample information and a neighbor node of the first node; the relational network also comprises a connecting edge between the nodes, which is used for representing a behavior correlation event;

and determining the second sample information and the third sample information from the information corresponding to the neighbor node of the first node in the relational network.

Optionally, the third obtaining module is specifically configured to:

determining a 1 st order neighbor node from a plurality of 1 st order neighbor nodes of the first node in the relational network;

determining nodes found in the relational network from the determined 1 st order neighbor nodes along a first path in the relational network; wherein the first path is distinct from each path through the first node;

determining the third sample information from the information corresponding to the determined 1-order neighbor node and the information corresponding to the found node;

determining the second sample information according to the information corresponding to other nodes in the relational network; wherein the other nodes refer to nodes in the relational network except the determined 1 st order neighbor nodes and the found nodes.

the fourth obtaining module is specifically configured to:

combining the initial information features of the first sample information and the initial information features of the second sample information into a first initial sample feature sequence;

obtaining a first updated sample characteristic sequence through the sequence-to-sequence model based on the neural network according to the first initial sample characteristic sequence;

and determining the information characteristic of the first sample information according to the first updated sample characteristic sequence.

the fourth obtaining module is specifically further configured to:

acquiring attribute embedded vectors and position embedded vectors of each sample information in the first sample information and the second sample information; the attribute embedding vector of each sample information corresponds to the attribute feature of the sample information; the position embedding vector of each sample information corresponds to the position characteristics of the node corresponding to the sample information in the relation network relative to the first node;

and determining the initial information characteristics of each sample information according to the attribute embedded vector and the position embedded vector of each sample information.

Optionally, the apparatus further includes a mask module, configured to:

determining information to be masked from the second sample information;

the fourth obtaining module is specifically configured to:

multiplying the attribute embedding vector of the information to be masked by 0, and then adding the attribute embedding vector of the information to be masked and the position embedding vector of the information to be masked according to elements to obtain the initial information characteristic of the information to be masked;

the first optimization module is further configured to:

performing parameter optimization on the feature extraction model according to the feature at a second position in the first updated sample feature sequence and a second difference of the attribute embedding vector of the information to be masked;

Optionally, the mask module is specifically configured to:

and determining information to be masked from the first sample information and the second sample information through random masking.

Optionally, the first optimization module is further configured to:

and updating the position embedding vector according to the information characteristic of the first sample information and the first difference of the reference information characteristic to be used as a next embedding basis.

Optionally, the first optimization module is further configured to:

and updating the attribute embedded vector according to the information characteristic of the first sample information and the first difference of the reference information characteristic to be used as a next embedding basis.

Optionally, the feature extraction model is a sequence-to-sequence model based on a neural network; the third sample information is a plurality of;

the fourth obtaining module is further configured to:

combining initial information features of the third sample information into a second initial sample feature sequence;

obtaining a second updated sample characteristic sequence through the sequence model obtained based on the sequence of the neural network according to the second initial sample characteristic sequence;

and determining the reference information characteristic according to the second updated sample characteristic sequence.

Here, it should be noted that: the model training device provided in the above embodiments may implement the technical solutions described in the above method embodiments, and the principles and technical effects of the specific implementation of the above modules or units may refer to the corresponding contents in the above method embodiments, which are not described herein again.

Fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device includes a memory 1101 and a processor 1102. The memory 1101 may be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device. The memory 1101 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The memory 1101 is used for storing programs;

the processor 1102 is coupled to the memory 1101, and configured to execute the program stored in the memory 1101, so as to implement the information recommendation method, the model training method, the information characterization method, and the product recommendation method provided in the foregoing method embodiments.

Further, as shown in fig. 7, the electronic device further includes: communication components 1103, display 1104, power components 1105, audio components 1106, and the like. Only some of the components are schematically shown in fig. 7, and the electronic device is not meant to include only the components shown in fig. 7.

Accordingly, embodiments of the present application further provide a computer-readable storage medium storing a computer program, where the computer program, when executed by a computer, can implement the steps or functions of the information recommendation method, the model training method, the information characterization method, and the product recommendation method provided in the foregoing method embodiments.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An information recommendation method, comprising:

determining first information currently triggered by a user to be recommended;

acquiring second information associated with the first information existing user behavior; wherein the second information is determined according to historical information behavior data of a plurality of users;

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the first node's neighbor nodes comprise 1 st order neighbor nodes;

4. The method of claim 2 or 3, wherein the feature extraction model is a neural network-based sequence-to-sequence model;

according to the first information and the second information, the information characteristics of the first information are obtained through a pre-constructed characteristic extraction model, and the method comprises the following steps:

obtaining a first updated feature sequence through the sequence-to-sequence model according to the first initial feature sequence;

5. The method according to claim 4, wherein the second information is plural in number; the smaller the order of the node corresponding to the second information in the relationship network relative to the first node is, the closer the position of the initial information feature in the first initial feature sequence is to the initial information feature of the first information.

6. The method of claim 4, wherein the sequence-to-sequence model is a self-attention model;

the method further comprises the following steps:

acquiring attribute embedded vectors and position embedded vectors of each information in the first information and the second information; wherein, the attribute embedded vector of each information corresponds to the attribute characteristic of the information; the position embedding vector of each information corresponds to the position characteristics of the node corresponding to the information in the relation network relative to the first node;

7. The method of claim 4, wherein determining the information characteristic of the first information according to the first updated characteristic sequence comprises:

determining a feature at a first position in the first updated feature sequence as an information feature of the first information;

8. The method according to claim 2 or 3, wherein determining the second information from information corresponding to neighbor nodes of the first node in the relational network comprises:

determining a 1-order neighbor node from the neighbor nodes of the first node;

9. The method according to any one of claims 1 to 3, wherein recommending information to the user to be recommended according to the information characteristic of the first information comprises:

10. The method of any of claims 1 to 3, further comprising:

if second information associated with the user behavior existing in the first information is not acquired, acquiring description data of the first information; the description data is multi-modal data; the multimodal data includes at least two of: text, image, video, audio;

and obtaining the information characteristics of the first information through the characteristic extraction model according to the description data of the first information.

11. The method of claim 2 or 3, wherein obtaining a relationship network comprises:

displaying the type of the relationship network to the user to be recommended for the user to be recommended to select; the types of the behavior associated events represented by the connecting edges in the different types of relational networks are different;

and acquiring a relation network corresponding to the type selected by the user to be recommended.

12. A method of model training, comprising:

acquiring second sample information and third sample information which are associated with the user behavior of the first sample information; wherein the second sample information and the third sample information are determined according to historical information behavior data of a plurality of users;

13. An information characterizing method, comprising:

determining first information;

14. A method of recommending merchandise, comprising:

determining a first commodity currently triggered by a user to be recommended;

acquiring a second commodity which is associated with the user behavior of the first commodity; wherein the second item is determined from historical item behavior data of a plurality of users;

15. An article recommendation device, comprising:

16. An electronic device, comprising: a memory and a processor, wherein,

the memory is used for storing programs;

determining first information currently triggered by a user to be recommended;

17. An electronic device, comprising: a memory and a processor, wherein,

the memory is used for storing programs;

18. An electronic device, comprising: a memory and a processor, wherein,

the memory is used for storing programs;

determining first information;

19. An electronic device, comprising: a memory and a processor, wherein,

the memory is used for storing programs;

determining a first commodity currently triggered by a user to be recommended;