CN112464087A - Recommendation probability output method and device, storage medium and electronic equipment - Google Patents

Recommendation probability output method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112464087A
CN112464087A CN202011320852.0A CN202011320852A CN112464087A CN 112464087 A CN112464087 A CN 112464087A CN 202011320852 A CN202011320852 A CN 202011320852A CN 112464087 A CN112464087 A CN 112464087A
Authority
CN
China
Prior art keywords
data
recommendation
dimension
feature
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011320852.0A
Other languages
Chinese (zh)
Other versions
CN112464087B (en
Inventor
楼马晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202011320852.0A priority Critical patent/CN112464087B/en
Publication of CN112464087A publication Critical patent/CN112464087A/en
Application granted granted Critical
Publication of CN112464087B publication Critical patent/CN112464087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a recommendation probability output method and device, a storage medium and electronic equipment, and belongs to the field of artificial intelligence. Wherein, the method comprises the following steps: acquiring material information of a candidate material, wherein the material information comprises first data in a structured format and second data in a multi-modal format; mapping and coding the first data to obtain first characteristic data of a first dimension; performing full-connection transformation on the second data to obtain second feature data of the first dimension; and outputting a recommendation probability of recommending the candidate materials to a target user based on a recommendation model according to the first characteristic data and the second characteristic data. By the method and the device, the technical problem that recommendation of the recommendation model in the related technology is inaccurate is solved, the service range of the recommendation model is increased, and the recommendation effect of the recommendation model is improved.

Description

Recommendation probability output method and device, storage medium and electronic equipment
Technical Field
The invention relates to the field of artificial intelligence, in particular to a recommendation probability output method and device, a storage medium and electronic equipment.
Background
In the related art, the recommendation system can train a model by using user information, commodity information and commodity feedback of a user, and recommend according to the prediction result sequence of the model.
A Factorization Machine (FM) is a general prediction method in the related art, and can estimate reliable parameters for prediction even if data is very sparse. Unlike traditional simple linear models, the factorizer models all nested variable interactions taking into account the intersection between features. The recommendation system has a plurality of category attributes, and each category attribute generates a large number of 0 features after onehot processing, so that samples are sparse. FM is essentially a simple linear model, and compared with a deep learning model, the FM is not strong in fitting ability; and FM main ability lies in handling id type's characteristic, need to do the mould input again after the branch case processing for continuous type characteristic, and for the input of multimode nothing to handle.
Deep FM is a depth model in the related art that combines FM and Deep Neural Networks (DNNs). It can learn feature crossing of low dimension like FM, and can learn feature crossing of high dimension like DNN. No additional manual work is required for feature engineering other than the initial input features. Deep FM combines the advantages of FM and DNN, but still cannot process multimodal input from text, images, video, and audio.
The applicant finds that in more and more scenes, semantic features contained in texts, images, videos and audios can play a role in optimizing the effect of a recommendation model, such as pictures of materials displayed to a user in a page and texts of material titles, and the information is the first sensory impression of the materials to the user, and plays a very important role in whether the user clicks the materials or not. The recommendation model in the related art cannot input unstructured data of multiple modes, so that the recommendation result of the recommendation model used in the related art is inaccurate.
In view of the above problems in the related art, no effective solution has been found at present.
Disclosure of Invention
The embodiment of the invention provides a recommendation probability output method and device, a storage medium and electronic equipment.
According to an aspect of an embodiment of the present application, there is provided a method for outputting a recommendation probability, including: acquiring material information of a candidate material, wherein the material information comprises first data in a structured format and second data in a multi-modal format; mapping and coding the first data to obtain first characteristic data of a first dimension; performing full-connection transformation on the second data to obtain second feature data of the first dimension; and outputting a recommendation probability of recommending the candidate materials to a target user based on a recommendation model according to the first characteristic data and the second characteristic data.
Further, performing full-connection transformation on the second data to obtain second feature data of the first dimension includes: dividing the second data into a plurality of sub-data according to data types, wherein each data type corresponds to an unstructured modal format, and the modal format comprises: tables, text, images, video, audio; aiming at each sub data of the multiple sub data, extracting the characteristics of the sub data by adopting a corresponding pre-training model, and summarizing to obtain multi-modal characteristic data of a second dimension; determining a mapping size of the first feature data, and setting a transformation size of a full connection layer of the recommendation model as the mapping size, wherein the mapping size corresponds to the first dimension; and inputting the multi-modal feature data into the full connection layer, and outputting second feature data of the first dimension.
Further, outputting the recommendation probability of the candidate material based on a recommendation model according to the first characteristic data and the second characteristic data comprises: splicing the first characteristic data and the second characteristic data to obtain third characteristic data; performing feature intersection and feature dimension reduction processing on the third feature data to obtain fourth feature data; and inputting the fourth characteristic data and the second characteristic data into a full-connection layer of a recommendation model, and outputting the recommendation probability of the candidate material.
Further, performing mapping coding on the first data to obtain first feature data of a first dimension includes: extracting a plurality of ID data in the first data, wherein each ID data in the plurality of ID data corresponds to a material attribute of the candidate material; and mapping and coding the plurality of ID data to obtain first characteristic data of a first dimension.
Further, after obtaining the material information of the candidate material, the method further comprises: extracting a related material sequence of the candidate material; performing column clustering on the associated material sequence according to the user ID of the candidate material to obtain material ID data; adopting a random walk algorithm to move randomly on a first type network node and a second type network node, and calculating a mapping vector sequence of the associated material sequence, wherein the first type network node corresponds to the user ID of the target user, and the second type network node corresponds to one associated material in the associated material sequence; obtaining an interest value of the target user based on the mapping vector sequence by adopting a depth interest model; inputting the interest value into a fully connected layer of the recommendation model, wherein input data of the fully connected layer further comprises fourth feature data generated based on the first feature data and the second feature data, and the second feature data.
Further, obtaining an interest value of the target user based on the mapping vector sequence by using a deep interest model comprises: calculating interest weights of each associated material and the candidate material in the mapping vector sequence through an activation function; and counting the interest weight of each associated material through a collection pool to obtain the interest value of the target user.
Further, before outputting a recommendation probability of recommending the candidate material to a target user based on a recommendation model according to the first characteristic data and the second characteristic data, the method further includes: acquiring training sample data, wherein the training sample data comprises structured data and multi-modal unstructured data; and training according to the training sample data to obtain the recommendation model.
According to another aspect of the embodiments of the present application, there is also provided an output apparatus for recommending a probability, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring material information of candidate materials, and the material information comprises first data in a structured format and second data in a multi-modal format; the mapping module is used for mapping and coding the first data to obtain first feature data of a first dimension; the transformation module is used for carrying out full-connection transformation on the second data to obtain second feature data of the first dimension; and the output module is used for outputting the recommendation probability of recommending the candidate materials to the target user based on a recommendation model according to the first characteristic data and the second characteristic data.
Further, the transformation module comprises: a classification unit, configured to divide the second data into multiple sub-data according to data types, where each data type corresponds to an unstructured modality format, and the modality format includes: tables, text, images, video, audio; the extraction unit is used for extracting the characteristics of the subdata by adopting a corresponding pre-training model aiming at each part of subdata of the multiple parts of subdata and summarizing to obtain multi-modal characteristic data of a second dimension; a setting unit, configured to determine a mapping size of the first feature data, and set a transformation size of a full connection layer of the recommendation model as the mapping size, where the mapping size corresponds to the first dimension; and the output unit is used for inputting the multi-modal feature data into the full connection layer and outputting the second feature data of the first dimension.
Further, the output module includes: the splicing unit is used for splicing the first characteristic data and the second characteristic data to obtain third characteristic data; the processing unit is used for performing feature intersection and feature dimension reduction processing on the third feature data to obtain fourth feature data; and the output unit is used for inputting the fourth characteristic data and the second characteristic data into a full connection layer of a recommendation model and outputting the recommendation probability of the candidate material.
Further, the mapping module includes: an extracting unit, configured to extract a plurality of ID data in the first data, where each ID data in the plurality of ID data corresponds to a material attribute of the candidate material; and the mapping unit is used for mapping and coding the plurality of ID data to obtain first characteristic data of a first dimension.
Further, the apparatus further comprises: the extracting module is used for extracting the related material sequence of the candidate material after the first obtaining module obtains the material information of the candidate material; the clustering module is used for performing column clustering on the associated material sequences according to the user ID of the candidate material to obtain material ID data; the computing module is used for randomly moving on a first type network node and a second type network node by adopting a random walk algorithm and computing a mapping vector sequence of the associated material sequence, wherein the first type network node corresponds to the user ID of the target user, and the second type network node corresponds to one associated material in the associated material sequence; the second obtaining module is used for obtaining the interest value of the target user based on the mapping vector sequence by adopting a deep interest model; an input module, configured to input the interest value into a fully-connected layer of the recommendation model, where input data of the fully-connected layer further includes fourth feature data generated based on the first feature data and the second feature data, and the second feature data.
Further, the second obtaining module includes: the calculating unit is used for calculating the interest weight of each associated material and the candidate material in the mapping vector sequence through an activation function; and the counting unit is used for counting the interest weight of each associated material through the summarizing pool to obtain the interest value of the target user.
Further, the apparatus further comprises: a third obtaining module, configured to obtain training sample data before the output module outputs, based on a recommendation model, a recommendation probability of recommending the candidate material to a target user according to the first feature data and the second feature data, where the training sample data includes structured data and multi-modal unstructured data; and the training module is used for training according to the training sample data to obtain the recommendation model.
According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program that executes the above steps when the program is executed.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein: a memory for storing a computer program; a processor for executing the steps of the method by running the program stored in the memory.
Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the steps of the above method.
According to the method and the device, the material information of the candidate materials including the multi-modal format is obtained, the first data is mapped and encoded to obtain the first characteristic data of the first dimension, the second data is subjected to full connection transformation to obtain the second characteristic data of the first dimension, the recommendation probability of recommending the candidate materials to the target user is output based on the recommendation model according to the first characteristic data and the second characteristic data, the material characteristic constraint recommendation model can be obtained through the multi-modal format by fusing the structured data and the unstructured multi-modal data of the candidate materials, the technical problem that recommendation of the recommendation model in the related technology is inaccurate is solved, the service range of the recommendation model is increased, and the recommendation effect of the recommendation model is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a server according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of outputting a recommendation probability according to an embodiment of the present invention;
FIG. 3 is a network architecture diagram of a recommendation model according to an embodiment of the present invention;
FIG. 4 is a network architecture diagram of a random walk algorithm in an embodiment of the present invention;
FIG. 5 is a diagram of a DIN network architecture in an embodiment of the invention;
FIG. 6 is a block diagram of an apparatus for outputting recommendation probabilities according to an embodiment of the present invention;
fig. 7 is a block diagram of an electronic device implementing an embodiment of the invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
The method provided by the embodiment one of the present application may be executed in a server, a computer, or a similar computing device. Taking an example of the server running on the server, fig. 1 is a hardware structure block diagram of a server according to an embodiment of the present invention. As shown in fig. 1, the server 10 may include one or more (only one shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and is not intended to limit the structure of the server. For example, the server 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a server program, for example, a software program and a module of application software, such as a server program corresponding to an output method of recommendation probability in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the server program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to server 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In the present embodiment, a method for outputting a recommendation probability is provided, and fig. 2 is a flowchart of a method for outputting a recommendation probability according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S202, material information of candidate materials is obtained, wherein the material information comprises first data in a structured format and second data in a multi-modal format;
the candidate materials in the embodiment can be videos, webpage links, commodities and the like, and are applied to various recommendation systems, such as recommending videos to users on a video sharing platform, recommending and displaying commodities to users on an online website, recommending material promotion pages in a software screen opening page and the like.
Step S204, mapping and coding the first data to obtain first feature data of a first dimension;
the mapping coding of the embodiment is a process of feature extraction and vectorization, and discrete variables are converted into continuous vectors for converting the first data into a machine language recognizable by a recommendation model.
Step S206, carrying out full-connection transformation on the second data to obtain second feature data of the first dimension;
because the input data are finally input into the full connection layer after conversion, the mapping sizes (embedding sizes) of the feature data of the candidate materials can be unified by transforming the full connection layer regardless of the data in the structured format or the data in the unstructured multi-mode, so that the unstructured multi-mode data constrain the recommendation model, and the recommendation accuracy of the recommendation model is improved.
And S208, outputting a recommendation probability of recommending candidate materials to the target user based on a recommendation model according to the first characteristic data and the second characteristic data.
In this embodiment, the recommendation model may be applied to various scenarios of recommendation systems, and may rank a plurality of candidate materials based on the recommendation probability, and preferentially push a candidate material with a high recommendation probability to a target user, or push a candidate material with a high recommendation probability to a target user at a preferred position and a preferred time.
Through the steps, the material information of the candidate materials including the multi-modal format is obtained, the first data are mapped and encoded to obtain the first feature data of the first dimension, the second data are subjected to full connection transformation to obtain the second feature data of the first dimension, the recommendation probability of recommending the candidate materials to the target user is output based on the recommendation model according to the first feature data and the second feature data, the material feature constraint recommendation model can be obtained through the multi-modal format by fusing the structured data and the unstructured multi-modal data of the candidate materials, the technical problem that recommendation of the recommendation model in the related technology is inaccurate is solved, the service range of the recommendation model is enlarged, and the recommendation effect of the recommendation model is improved.
Optionally, before outputting, based on the recommendation model, a recommendation probability of recommending the candidate material to the target user according to the first feature data and the second feature data, the method further includes: acquiring training sample data, wherein the training sample data comprises structured data and multi-modal unstructured data; and training according to the training sample data to obtain a recommendation model.
Alternatively, the unstructured data may be, but is not limited to, the following modality formats: table, text, image, video, audio.
In an implementation manner of this embodiment, performing full join transform on the second data to obtain the second feature data of the first dimension includes:
s11, dividing the second data into a plurality of sub-data according to the data type, wherein each data type corresponds to an unstructured modal format, and the modal format includes: tables, text, images, video, audio;
the second data in this embodiment is video information, audio information, and the like of the candidate material.
S12, aiming at each sub data of the multiple sub data, extracting the characteristics of the sub data by adopting a corresponding pre-training model, and summarizing to obtain second-dimension multi-modal characteristic data;
in one example, extracting characteristics from video and audio data of the candidate material through a relevant pre-training model, and extracting an intermediate hidden result to obtain multi-dimensional multi-modal characteristic data, for example, the multi-modal characteristic data of the second dimension is characteristic data of an intermediate result of 128 dimensions;
s13, determining a mapping size of the first feature data, and setting a transformation size of a full Connected Layer (FC) of the recommended model as the mapping size, wherein the mapping size corresponds to the first dimension;
and S14, inputting the multi-mode feature data into the full connection layer and outputting second feature data of the first dimension.
Based on the above example, 128 dimensions are first transformed through the FC layer to dimensions consistent with the embedding size (e.g. 32 dimensions) of other id (first feature data), and then spliced together with the embedding of other id and input into the recommendation model.
The network structure of the recommended model of the present embodiment includes an input layer, a mapping layer (coding layer), and a full connection layer (output layer). Fig. 3 is a network structure diagram of a recommendation model according to an embodiment of the present invention, which includes an input layer, a mapping layer (coding layer), and a full connection layer (output layer), and the system may rank a plurality of candidate materials based on recommendation probabilities output by the full connection layer, and preferentially push candidate materials with high recommendation probabilities to a target user, or push candidate materials with high recommendation probabilities to the target user at a preferred position and at a preferred time.
In the input layer, data in other structured formats including a user ID (uid), a material ID (item _ ID), and other IDs such as a user city are input, and in this embodiment, the ID data is taken as an example for explanation, and in addition, the input data also includes audio and video feature data, and data in a multi-modal format is also unstructured data.
In the network structure, the scheme of the embodiment includes: an embedding (a mode of converting discrete variables into continuous vector representation, mapping coding) part of id; an xdepfm (extremely deep factorizer) part based on a CIN (Compressed Interaction Network) module; based on DIN (Deep Interest model, Deep Interest Network) Interest values. The following description is made with reference to specific embodiments:
in an implementation manner of this embodiment, outputting the recommendation probability of the candidate material based on the recommendation model according to the first feature data and the second feature data includes: splicing the first characteristic data and the second characteristic data to obtain third characteristic data; performing feature intersection and feature dimension reduction processing on the third feature data to obtain fourth feature data; and inputting the fourth characteristic data and the second characteristic data into a full connection layer of the recommendation model, and outputting the recommendation probability of the candidate material.
In this embodiment, when feature extraction of the id field is processed, the model adopts a network structure based on CIN xdeepfm, feature intersection is performed on the third feature data, and Deep Part is used to perform dimension reduction processing on the third feature data, so as to improve feature depth.
In an implementation manner of this embodiment, performing mapping coding on the first data to obtain the first feature data of the first dimension includes: extracting a plurality of ID data in the first data, wherein each ID data in the plurality of ID data corresponds to a material attribute of the candidate material; and mapping and coding the plurality of ID data to obtain first characteristic data of a first dimension.
In an implementation manner of this embodiment, after obtaining the material information of the candidate material, the method further includes:
s21, extracting the related material sequence of the candidate material;
s22, performing column clustering on the associated material sequences according to the user ID of the candidate material to obtain material ID data;
s23, randomly moving on a first type of network node and a second type of network node by adopting a random walk algorithm (deepwalk), and calculating a mapping vector sequence of an associated material sequence, wherein the first type of network node corresponds to a user ID of a target user, and the second type of network node corresponds to one associated material in the associated material sequence;
in the embodiment, for the embedding of the item, a method of depwalk is adopted, and the embedding vector of the item is calculated by randomly walking on a network node formed by the uid and the item, so that the obtained embedding vector has the following characteristics: the closer the embedding vector distance of the item can be reached by fewer nodes; the closer the embedding vectors are to items recommended to more of the same uid. After the depwadk embedding vector of the item is obtained, in the subsequent DIN or xdepfm structure, embedding of the item is fixed to embedding obtained by depwadk, and does not continuously participate in training. The other ids are kept trained with the rest of the model. Fig. 4 is a network structure diagram of a random walk algorithm in the embodiment of the present invention, where two types of network nodes correspond to an associated material and a user ID, respectively.
S24, obtaining an interest value of the target user based on the mapping vector sequence by adopting a deep interest model;
in one example, obtaining the interest value of the target user based on the mapping vector sequence by using the deep interest model comprises: calculating interest weights of each associated material and the candidate material in the mapping vector sequence through an activation function; and counting the interest weight of each associated material through the aggregation pool to obtain the interest value of the target user.
In this example, based on historical feedback data of the user and the item, we can obtain a sequence of items that have interacted with the user from the historical behavior of the user, i.e., a sequence of associated items, item _1, item _2,. item _ N, from which interest information about the user can be extracted. Fig. 5 is a diagram of a network architecture for DIN in an embodiment of the present invention, with user behavior characteristics including characteristics associated with material sequences.
DIN carries out embedding operation on the sequence of the items, and the embedding operation is regarded as representation of historical interest of the user, the interest of the user and the item of the current sample obtain a weight value of each item in the item sequence through an activation unit, and finally, the interest representation of the user, namely the interest value, is obtained through sum posing.
And S25, inputting the interest value into a full-connection layer of the recommendation model, wherein the input data of the full-connection layer further comprises fourth feature data generated based on the first feature data and the second feature data, and the second feature data.
By the scheme of the embodiment, the model prediction or training is performed by taking the unstructured information such as the pictures and the text information of the candidate materials as input in the recommendation model, so that the effect of the model is improved, meanwhile, the feature intersection and the interest value of the structured data are performed, and the accuracy of the recommendation algorithm is improved.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
In this embodiment, an output device of recommendation probability is further provided, which is used to implement the foregoing embodiments and preferred embodiments, and the description that has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 6 is a block diagram of an output apparatus for recommending a probability according to an embodiment of the present invention, as shown in fig. 6, the apparatus including: a first obtaining module 60, a mapping module 62, a transformation module 64, an output module 66, wherein,
a first obtaining module 60, configured to obtain material information of a candidate material, where the material information includes first data in a structured format and second data in a multi-modal format;
a mapping module 62, configured to perform mapping coding on the first data to obtain first feature data of a first dimension;
a transformation module 64, configured to perform full connection transformation on the second data to obtain second feature data of the first dimension;
and the output module 66 is configured to output a recommendation probability for recommending the candidate material to the target user based on a recommendation model according to the first characteristic data and the second characteristic data.
Optionally, the transformation module includes: a classification unit, configured to divide the second data into multiple sub-data according to data types, where each data type corresponds to an unstructured modality format, and the modality format includes: tables, text, images, video, audio; the extraction unit is used for extracting the characteristics of the subdata by adopting a corresponding pre-training model aiming at each part of subdata of the multiple parts of subdata and summarizing to obtain multi-modal characteristic data of a second dimension; a setting unit, configured to determine a mapping size of the first feature data, and set a transformation size of a full connection layer of the recommendation model as the mapping size, where the mapping size corresponds to the first dimension; and the output unit is used for inputting the multi-modal feature data into the full connection layer and outputting the second feature data of the first dimension.
Optionally, the output module includes: the splicing unit is used for splicing the first characteristic data and the second characteristic data to obtain third characteristic data; the processing unit is used for performing feature intersection and feature dimension reduction processing on the third feature data to obtain fourth feature data; and the output unit is used for inputting the fourth characteristic data and the second characteristic data into a full connection layer of a recommendation model and outputting the recommendation probability of the candidate material.
Optionally, the mapping module includes: an extracting unit, configured to extract a plurality of ID data in the first data, where each ID data in the plurality of ID data corresponds to a material attribute of the candidate material; and the mapping unit is used for mapping and coding the plurality of ID data to obtain first characteristic data of a first dimension.
Optionally, the apparatus further comprises: the extracting module is used for extracting the related material sequence of the candidate material after the first obtaining module obtains the material information of the candidate material; the clustering module is used for performing column clustering on the associated material sequences according to the user ID of the candidate material to obtain material ID data; the computing module is used for randomly moving on a first type network node and a second type network node by adopting a random walk algorithm and computing a mapping vector sequence of the associated material sequence, wherein the first type network node corresponds to the user ID of the target user, and the second type network node corresponds to one associated material in the associated material sequence; the second obtaining module is used for obtaining the interest value of the target user based on the mapping vector sequence by adopting a deep interest model; an input module, configured to input the interest value into a fully-connected layer of the recommendation model, where input data of the fully-connected layer further includes fourth feature data generated based on the first feature data and the second feature data, and the second feature data.
Optionally, the second obtaining module includes: the calculating unit is used for calculating the interest weight of each associated material and the candidate material in the mapping vector sequence through an activation function; and the counting unit is used for counting the interest weight of each associated material through the summarizing pool to obtain the interest value of the target user.
Optionally, the apparatus further comprises: a third obtaining module, configured to obtain training sample data before the output module outputs, based on a recommendation model, a recommendation probability of recommending the candidate material to a target user according to the first feature data and the second feature data, where the training sample data includes structured data and multi-modal unstructured data; and the training module is used for training according to the training sample data to obtain the recommendation model.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Example 3
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, material information of the candidate material is obtained, wherein the material information comprises first data in a structured format and second data in a multi-modal format;
s2, performing mapping coding on the first data to obtain first feature data of a first dimension;
s3, performing full-connection transformation on the second data to obtain second feature data of the first dimension;
and S4, outputting a recommendation probability of recommending the candidate materials to the target user based on a recommendation model according to the first characteristic data and the second characteristic data.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, material information of the candidate material is obtained, wherein the material information comprises first data in a structured format and second data in a multi-modal format;
s2, performing mapping coding on the first data to obtain first feature data of a first dimension;
s3, performing full-connection transformation on the second data to obtain second feature data of the first dimension;
and S4, outputting a recommendation probability of recommending the candidate materials to the target user based on a recommendation model according to the first characteristic data and the second characteristic data.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
Fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 7, including a processor 71, a communication interface 72, a memory 73 and a communication bus 74, where the processor 71, the communication interface 72, and the memory 73 complete communication with each other through the communication bus 74, and the memory 73 is used for storing computer programs; and a processor 71 for executing the program stored in the memory 73.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A method for outputting a recommendation probability, comprising:
acquiring material information of a candidate material, wherein the material information comprises first data in a structured format and second data in a multi-modal format;
mapping and coding the first data to obtain first characteristic data of a first dimension;
performing full-connection transformation on the second data to obtain second feature data of the first dimension;
and outputting a recommendation probability of recommending the candidate materials to a target user based on a recommendation model according to the first characteristic data and the second characteristic data.
2. The method of claim 1, wherein performing a full join transform on the second data to obtain second feature data of the first dimension comprises:
dividing the second data into a plurality of sub-data according to data types, wherein each data type corresponds to an unstructured modal format, and the modal format comprises: tables, text, images, video, audio;
aiming at each sub data of the multiple sub data, extracting the characteristics of the sub data by adopting a corresponding pre-training model, and summarizing to obtain multi-modal characteristic data of a second dimension;
determining a mapping size of the first feature data, and setting a transformation size of a full connection layer of the recommendation model as the mapping size, wherein the mapping size corresponds to the first dimension;
and inputting the multi-modal feature data into the full connection layer, and outputting second feature data of the first dimension.
3. The method of claim 1, wherein outputting the recommendation probability for the candidate material based on a recommendation model based on the first and second signature data comprises:
splicing the first characteristic data and the second characteristic data to obtain third characteristic data;
performing feature intersection and feature dimension reduction processing on the third feature data to obtain fourth feature data;
and inputting the fourth characteristic data and the second characteristic data into a full-connection layer of a recommendation model, and outputting the recommendation probability of the candidate material.
4. The method of claim 1, wherein map-coding the first data to obtain first feature data of a first dimension comprises:
extracting a plurality of ID data in the first data, wherein each ID data in the plurality of ID data corresponds to a material attribute of the candidate material;
and mapping and coding the plurality of ID data to obtain first characteristic data of a first dimension.
5. The method of claim 1, wherein after obtaining material information for a candidate material, the method further comprises:
extracting a related material sequence of the candidate material;
performing column clustering on the associated material sequence according to the user ID of the candidate material to obtain material ID data;
adopting a random walk algorithm to move randomly on a first type network node and a second type network node, and calculating a mapping vector sequence of the associated material sequence, wherein the first type network node corresponds to the user ID of the target user, and the second type network node corresponds to one associated material in the associated material sequence;
obtaining an interest value of the target user based on the mapping vector sequence by adopting a depth interest model;
inputting the interest value into a fully connected layer of the recommendation model, wherein input data of the fully connected layer further comprises fourth feature data generated based on the first feature data and the second feature data, and the second feature data.
6. The method of claim 5, wherein obtaining interest values of a target user based on the sequence of mapping vectors using a deep interest model comprises:
calculating interest weights of each associated material and the candidate material in the mapping vector sequence through an activation function;
and counting the interest weight of each associated material through a collection pool to obtain the interest value of the target user.
7. The method of claim 1, wherein prior to outputting a recommendation probability for recommending the candidate material to a target user based on a recommendation model according to the first and second characterizing data, the method further comprises:
acquiring training sample data, wherein the training sample data comprises structured data and multi-modal unstructured data;
and training according to the training sample data to obtain the recommendation model.
8. An apparatus for outputting a recommended probability, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring material information of candidate materials, and the material information comprises first data in a structured format and second data in a multi-modal format;
the mapping module is used for mapping and coding the first data to obtain first feature data of a first dimension;
the transformation module is used for carrying out full-connection transformation on the second data to obtain second feature data of the first dimension;
and the output module is used for outputting the recommendation probability of recommending the candidate materials to the target user based on a recommendation model according to the first characteristic data and the second characteristic data.
9. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program is operative to perform the method steps of any of the preceding claims 1 to 7.
10. An electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; wherein:
a memory for storing a computer program;
a processor for performing the method steps of any of claims 1 to 7 by executing a program stored on a memory.
CN202011320852.0A 2020-11-23 2020-11-23 Recommendation probability output method and device, storage medium and electronic equipment Active CN112464087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011320852.0A CN112464087B (en) 2020-11-23 2020-11-23 Recommendation probability output method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011320852.0A CN112464087B (en) 2020-11-23 2020-11-23 Recommendation probability output method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN112464087A true CN112464087A (en) 2021-03-09
CN112464087B CN112464087B (en) 2024-03-01

Family

ID=74799183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011320852.0A Active CN112464087B (en) 2020-11-23 2020-11-23 Recommendation probability output method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112464087B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910372A (en) * 2023-09-11 2023-10-20 腾讯科技(深圳)有限公司 Information push model processing method and device, information push method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471895A (en) * 2018-10-29 2019-03-15 清华大学 The extraction of electronic health record phenotype, phenotype name authority method and system
CN109783655A (en) * 2018-12-07 2019-05-21 西安电子科技大学 A kind of cross-module state search method, device, computer equipment and storage medium
WO2019205795A1 (en) * 2018-04-26 2019-10-31 腾讯科技(深圳)有限公司 Interest recommendation method, computer device, and storage medium
CN110717098A (en) * 2019-09-20 2020-01-21 中国科学院自动化研究所 Meta-path-based context-aware user modeling method and sequence recommendation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019205795A1 (en) * 2018-04-26 2019-10-31 腾讯科技(深圳)有限公司 Interest recommendation method, computer device, and storage medium
CN109471895A (en) * 2018-10-29 2019-03-15 清华大学 The extraction of electronic health record phenotype, phenotype name authority method and system
CN109783655A (en) * 2018-12-07 2019-05-21 西安电子科技大学 A kind of cross-module state search method, device, computer equipment and storage medium
CN110717098A (en) * 2019-09-20 2020-01-21 中国科学院自动化研究所 Meta-path-based context-aware user modeling method and sequence recommendation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
苏玉龙: ""基于深度学习的文本信息分析"", 《中国优秀硕士学位论文全文数据库 信息科技辑》, pages 1 - 56 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910372A (en) * 2023-09-11 2023-10-20 腾讯科技(深圳)有限公司 Information push model processing method and device, information push method and device
CN116910372B (en) * 2023-09-11 2024-01-26 腾讯科技(深圳)有限公司 Information push model processing method and device, information push method and device

Also Published As

Publication number Publication date
CN112464087B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN113627447B (en) Label identification method, label identification device, computer equipment, storage medium and program product
CN108984555B (en) User state mining and information recommendation method, device and equipment
CN111061946A (en) Scenario content recommendation method and device, electronic equipment and storage medium
CN110781407A (en) User label generation method and device and computer readable storage medium
CN108319888B (en) Video type identification method and device and computer terminal
CN112580328A (en) Event information extraction method and device, storage medium and electronic equipment
CN113254711B (en) Interactive image display method and device, computer equipment and storage medium
CN114330966A (en) Risk prediction method, device, equipment and readable storage medium
CN113283238A (en) Text data processing method and device, electronic equipment and storage medium
CN115131698A (en) Video attribute determination method, device, equipment and storage medium
CN114398973B (en) Media content tag identification method, device, equipment and storage medium
CN115438169A (en) Text and video mutual inspection method, device, equipment and storage medium
CN114329204A (en) Information pushing method, device, equipment, medium and computer product
CN112464087B (en) Recommendation probability output method and device, storage medium and electronic equipment
CN116881462A (en) Text data processing, text representation and text clustering method and equipment
CN116958738A (en) Training method and device of picture recognition model, storage medium and electronic equipment
CN116662495A (en) Question-answering processing method, and method and device for training question-answering processing model
CN115860783A (en) E-commerce platform user feedback analysis method and system based on artificial intelligence
CN116957128A (en) Service index prediction method, device, equipment and storage medium
CN113254788B (en) Big data based recommendation method and system and readable storage medium
CN112287239B (en) Course recommendation method and device, electronic equipment and storage medium
CN113919338B (en) Method and device for processing text data
CN115168609A (en) Text matching method and device, computer equipment and storage medium
CN110866195B (en) Text description generation method and device, electronic equipment and storage medium
CN112685516A (en) Multi-channel recall recommendation method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant