CN114461893A

CN114461893A - Information recommendation method, related device, equipment and storage medium

Info

Publication number: CN114461893A
Application number: CN202011239317.2A
Authority: CN
Inventors: 林岳
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-11-09
Filing date: 2020-11-09
Publication date: 2022-05-10

Abstract

The application discloses an information recommendation method based on artificial intelligence technology, which comprises the steps of obtaining N pieces of information to be recommended; acquiring target recommendation information from the N pieces of information to be recommended; acquiring a first probability value and a second probability value; determining a first recommendation result from the N pieces of information to be recommended according to the first probability value and the second probability value; and sending the first recommendation result to the terminal equipment so that the terminal equipment displays the first recommendation result. The embodiment of the application also provides a related device, equipment and a storage medium, the conversion rate of the information to be recommended is used as available information, so that maximum profit is achieved, meanwhile, all the information to be recommended is used as information for exploring the interest of the user and pushed to the user with equal probability, so that the interest points of the user are developed, the information which is actually interesting to the user can be found in a plurality of times of recommendation, and the information recommendation effect is improved.

Description

Information recommendation method, related device, equipment and storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method, a related apparatus, a device, and a storage medium for information recommendation.

Background

With the rapid development of information technology and the internet, a service platform can recommend diversified information to a user by using a recommendation algorithm, for example, a travel Application (APP) can recommend information such as air tickets, hotels, restaurants and shopping to the user, and for example, a video APP can recommend movies, television shows or integrated art to the user.

The traditional recommendation algorithms include a collaborative filtering algorithm, a content-based recommendation algorithm and a mixed recommendation algorithm, and the algorithms are difficult to solve the cold start problem of a new user well. Based on this, currently, several algorithms are provided to solve the cold start problem, such as random recommendation algorithms and preference-based recommendation algorithms.

However, the random recommendation algorithm does not consider the interaction relationship between the user and the information, and therefore, the recommendation result is often not satisfactory for the user. The recommendation algorithm based on the preference needs to acquire preference information of the user, and for a new user without any information, the recommendation algorithm cannot be used, so that the recommendation algorithm has great limitation and is difficult to achieve a good recommendation effect.

Disclosure of Invention

The embodiment of the application provides an information recommendation method, a related device, equipment and a storage medium, and by utilizing the principle of multi-arm slot machine problem (MAB), the conversion rate of information to be recommended is used as available information, so that the maximum benefit is realized, meanwhile, all information to be recommended is used as information for exploring the interest of a user and is pushed to the user with equal probability, so that the interest points of the user are developed, the information which is actually interested by the user can be found in a plurality of times of recommendation, and the information recommendation effect is favorably improved.

In view of the above, an aspect of the present application provides an information recommendation method, including:

acquiring N pieces of information to be recommended, wherein N is an integer greater than or equal to 2;

acquiring target recommendation information from the N pieces of information to be recommended, wherein the conversion rate corresponding to the target recommendation information is the maximum value of the conversion rates corresponding to the N pieces of information to be recommended;

acquiring a first probability value and a second probability value, wherein the sum of the first probability value and the second probability value is 1;

determining a first recommendation result from the N pieces of information to be recommended according to a first probability value and a second probability value, wherein the first probability value represents the probability of determining the first recommendation result from the N pieces of information to be recommended with equal probability, and the second probability value represents the probability of taking the target recommendation information as the first recommendation result;

and sending the first recommendation result to the terminal equipment so that the terminal equipment displays the first recommendation result.

Another aspect of the present application provides an information recommendation apparatus, including:

the device comprises an acquisition module, a recommendation module and a recommendation module, wherein the acquisition module is used for acquiring N pieces of information to be recommended, and N is an integer greater than or equal to 2;

the acquisition module is further used for acquiring target recommendation information from the N pieces of information to be recommended, wherein the conversion rate corresponding to the target recommendation information is the maximum value of the conversion rates corresponding to the N pieces of information to be recommended;

the acquisition module is further used for acquiring a first probability value and a second probability value, wherein the sum of the first probability value and the second probability value is 1;

the determining module is used for determining a first recommendation result from the N pieces of information to be recommended according to a first probability value and a second probability value, wherein the first probability value represents the probability of determining the first recommendation result from the N pieces of information to be recommended with equal probability, and the second probability value represents the probability of taking the target recommendation information as the first recommendation result;

and the sending module is used for sending the first recommendation result to the terminal equipment so that the terminal equipment can display the first recommendation result.

In one possible design, in another implementation of another aspect of an embodiment of the present application,

the determining module is specifically used for acquiring a first probability range corresponding to the first probability value and a second probability range corresponding to the second probability value;

randomly acquiring a target probability value from the first probability range and the second probability range;

if the target probability value belongs to the first probability range, randomly acquiring a first recommendation result from the N pieces of information to be recommended according to equal probability;

and if the target probability value belongs to the second probability range, determining the target recommendation information as a first recommendation result.

the acquisition module is specifically used for acquiring the historical recommendation number and the historical converted number corresponding to each piece of information to be recommended aiming at each piece of information to be recommended in the N pieces of information to be recommended;

determining N converted rates according to the historical recommendation number and the historical converted number corresponding to each piece of information to be recommended, wherein the N converted rates comprise the converted rate corresponding to each piece of information to be recommended;

and determining the information to be recommended corresponding to the maximum value of the N converted rates as target recommendation information.

the obtaining module is specifically configured to obtain N confidence intervals corresponding to N pieces of information to be recommended, where each confidence interval corresponds to one piece of information to be recommended;

acquiring a first confidence interval from the N confidence intervals, wherein an upper bound value of the first confidence interval is the maximum value of the N upper bound values, and each upper bound value corresponds to one confidence interval;

and if the first confidence interval meets the information recommendation condition, determining the information to be recommended corresponding to the first confidence interval as target recommendation information.

In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the information recommendation apparatus further includes a processing module;

the determining module is further configured to determine, after the obtaining module obtains the first confidence intervals from the N confidence intervals, to-be-recommended information corresponding to the first confidence intervals as to-be-adjusted information if the first confidence intervals do not satisfy the information recommendation condition;

the acquisition module is also used for acquiring the historical recommendation number and the historical converted number corresponding to the information to be adjusted;

the determining module is further used for determining the average reward corresponding to the information to be adjusted according to the historical recommendation number and the historical converted number corresponding to the information to be adjusted;

the processing module is used for adjusting the first confidence interval according to the average reward corresponding to the information to be adjusted to obtain a second confidence interval;

the obtaining module is further configured to obtain a third confidence interval according to the second confidence interval and (N-1) confidence intervals, where an upper bound value of the third confidence interval is a maximum value of N upper bound values, each upper bound value of the N upper bound values corresponds to one confidence interval, and the (N-1) confidence intervals are confidence intervals excluding the second confidence interval from the N confidence intervals;

the determining module is further configured to determine information to be recommended corresponding to the third confidence interval as target recommendation information if the third confidence interval meets the information recommendation condition.

the obtaining module is further configured to obtain the total number of adjustments corresponding to the N confidence intervals after obtaining the first confidence interval from the N confidence intervals;

the determining module is further used for determining that the first confidence interval meets the information recommendation condition if the total adjustment times are greater than or equal to the adjustment time threshold;

the determining module is further configured to determine that the first confidence interval meets the information recommendation condition if the total number of adjustments is smaller than the adjustment number threshold.

the acquisition module is further used for acquiring a lower limit value of the first confidence interval;

the determining module is further configured to determine that the first confidence interval meets the information recommendation condition if the lower limit value of the first confidence interval is greater than or equal to the upper limit value of each confidence interval in the (N-1) confidence intervals, where the (N-1) confidence intervals are confidence intervals excluding the first confidence interval from the N confidence intervals;

the determining module is further configured to determine that the first confidence interval does not satisfy the information recommendation condition if the lower limit value of the first confidence interval is smaller than the upper limit value of any one confidence interval in the (N-1) confidence intervals.

determining N probability distributions according to the historical recommendation number and the historical converted number corresponding to each piece of information to be recommended, wherein each probability distribution corresponds to one piece of information to be recommended;

for each probability distribution in the N probability distributions, acquiring a random probability value from each probability distribution to obtain N random probability values;

determining a first random probability value from the N random probability values, wherein the first random probability value is the maximum value of the N random probability values;

and determining the information to be recommended corresponding to the first random probability value as target recommendation information.

the processing module is further used for sending the first recommendation result to the terminal device by the sending module, so that after the terminal device displays the first recommendation result, if response information sent by the terminal device is received, the historical recommendation number and the historical converted number corresponding to the first recommendation result are updated;

and the processing module is further used for updating the historical recommendation number corresponding to the first recommendation result if the response information sent by the terminal equipment is not received.

the acquisition module is further used for acquiring the updated historical recommendation number and the updated historical converted number of each piece of information to be recommended aiming at each piece of information to be recommended in the N pieces of information to be recommended;

the determining module is further configured to determine N updated converted rates according to the updated historical recommendation number and the updated historical converted number of each piece of information to be recommended, where the N updated converted rates include the updated converted rate of each piece of information to be recommended;

the determining module is further configured to determine information to be recommended corresponding to a maximum value of the N updated converted rates as a second recommendation result;

and the sending module is further used for sending the second recommendation result to the terminal device so that the terminal device can display the second recommendation result.

the determining module is further used for determining N updated probability distributions according to the historical recommendation number and the historical converted number corresponding to each piece of information to be recommended, wherein each updated probability distribution corresponds to one piece of information to be recommended;

the obtaining module is further configured to obtain a random probability value from each of the updated probability distributions for each of the N updated probability distributions to obtain N updated random probability values;

the determining module is further configured to determine a second random probability value from the N updated random probability values, where the second random probability value is a maximum value of the N updated random probability values;

the determining module is further configured to determine information to be recommended corresponding to the second random probability value as a second recommendation result;

In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the first recommendation result includes at least one of advertisement information, electronic ticket information, friend information, video and audio information, text information, and web page information;

the obtaining module is further used for obtaining the user identifier of the user before the sending module sends the first recommendation result to the terminal equipment so that the terminal equipment displays the first recommendation result, wherein the user identifier is inconsistent with the stored user identifier;

and the determining module is also used for determining the terminal equipment according to the user identification of the user.

Another aspect of the present application provides a server, including: a memory, a processor, and a bus system;

wherein, the memory is used for storing programs;

a processor for executing the program in the memory, the processor for performing the above-described aspects of the method according to instructions in the program code;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

Another aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the method of the above-described aspects.

In another aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by the above aspects.

According to the technical scheme, the embodiment of the application has the following advantages:

the embodiment of the application provides an information recommendation method, which includes the steps of firstly obtaining N pieces of information to be recommended, then obtaining target recommendation information from the N pieces of information to be recommended, wherein the conversion rate corresponding to the target recommendation information is the maximum value of the N conversion rates, and obtaining a first probability value and a second probability value, wherein the sum of the first probability value and the second probability value is 1, so that a first recommendation result is determined from the N pieces of information to be recommended according to the first probability value and the second probability value, finally, the first recommendation result is sent to a terminal device, and the terminal device displays the first recommendation result. By the mode, the conversion rate of the information to be recommended is used as available information by utilizing the principle of the MAB, so that the maximum benefit is realized, meanwhile, all the information to be recommended is used as information for exploring the interest of the user and is pushed to the user with equal probability, so that the interest points of the user are developed, the information which is actually interested by the user can be found in a plurality of times of recommendation, and the information recommendation effect is improved.

Drawings

FIG. 1 is a schematic diagram of an environment of an information recommendation system in an embodiment of the present application;

FIG. 2 is a schematic diagram of an architecture of an information recommendation system according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating an information recommendation method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of determining a first recommendation based on different probability ranges in an embodiment of the present application;

FIG. 5 is another illustration of determining a first recommendation based on different probability ranges in an embodiment of the present application;

FIG. 6 is a diagram illustrating an implementation of an upper confidence bound algorithm in an embodiment of the present application;

FIG. 7 is a schematic diagram of updating confidence regions based on an upper confidence bound algorithm in an embodiment of the present application;

FIG. 8 is another schematic diagram of determining target recommendation information based on an upper confidence bound algorithm in an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating the determination of target recommendation information based on the Thompson sampling algorithm in the embodiment of the present application;

FIG. 10 is a schematic diagram of updating probability distributions in an embodiment of the present application;

FIG. 11 is a schematic diagram of an interface for pushing advertisement information according to an embodiment of the present application;

FIG. 12 is a schematic diagram of an interface for pushing electronic ticket information in the embodiment of the present application;

FIG. 13 is a schematic diagram of an interface for pushing video information according to an embodiment of the present application;

fig. 14 is a schematic interface diagram illustrating friend information push in the embodiment of the present application;

FIG. 15 is a schematic diagram of an embodiment of an information recommendation device in an embodiment of the present application;

fig. 16 is a schematic structural diagram of a server in the embodiment of the present application.

Detailed Description

The embodiment of the application provides an information recommendation method, a related device, equipment and a storage medium, and by utilizing the principle of MAB, the conversion rate of information to be recommended is used as available information, so that the maximum profit is realized, and meanwhile, all the information to be recommended is used as information for exploring the interest of a user and is pushed to the user with equal probability, so that the interest points of the user are developed, the information actually interested by the user can be found in a plurality of times of recommendation, and the information recommendation effect is favorably improved.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

With the rapid growth of internet scale, new users of Applications (APPs) are also growing rapidly, which makes the user cold-start problem of recommendation algorithm, that is, how to quickly mine the general interest of users through rapid exploration and short several experiments when facing new users, more serious. Because the new user has the characteristic that the history and the characteristics are insufficient, the method for recommending the information which the new user may possibly interest in for the new user is provided, and therefore the effect of the recommendation system is improved. Several types of APP to which the present application is applicable will be described below.

Firstly, shopping application;

in the case of cold start of the new user, a recommendation result, for example, recommending electronic products, cosmetics, food or clothes, etc., may be sent to the terminal device used by the new user. In this case, some articles having a low requirement for sex attribute, for example, foods or electronic products, may be preferentially recommended, while cosmetics, men's clothing, women's clothing, and the like may not be preferentially recommended.

Secondly, the application of instant messaging is carried out,

in general, when a new user logs in an instant messaging application for the first time, the recommending system cannot acquire information of the new user, so that the age and the like of the new user cannot be acquired, and in this case, advertisements with low requirements on age attributes, such as food advertisements or electronic product advertisements, can be preferentially recommended, and automobile advertisements, luxury advertisements and the like are not preferentially recommended.

Thirdly, the application of the video class,

for the case of cold start of the new user, a recommendation result, for example, recommending a movie, a tv show, or a variety program, may be sent to the terminal device used by the new user. Considering that the recommendation system cannot acquire information of the new user, so that the gender, age, preference, and the like of the new user cannot be acquired, in this case, some videos with low requirements on the gender attribute, the age attribute, and the preference attribute, such as recommended shows or comedy movies, may be preferentially recommended, and horror and suspicion movies, may not be preferentially recommended.

In order to solve the problem of cold start of a new user in the recommendation system, the present application provides an information recommendation method implemented based on Artificial Intelligence (AI), where the method is applied to the information recommendation system shown in fig. 1, please refer to fig. 1, and fig. 1 is an environment schematic diagram of the information recommendation system in the embodiment of the present application, as shown in the figure, the information recommendation system includes a server and a terminal device, and a client is deployed on the terminal device, and the client is specifically an application client or a web client. And after determining the recommendation result, the server sends the recommendation result to the terminal equipment, and the terminal equipment performs corresponding display. The server related to the application can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, safety service, Content Delivery Network (CDN), big data and an artificial intelligence platform. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a palm computer, a personal computer, a smart television, a smart watch, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. The number of servers and terminal devices is not limited.

The artificial intelligence is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and obtain the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The method and the device can realize automatic recommendation of information by using an artificial intelligence-based Machine Learning algorithm, wherein Machine Learning (ML) is a multi-field cross subject and relates to multi-subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Based on this, the architecture of the information recommendation system will be described below, and for convenience of understanding, please refer to fig. 2, fig. 2 is an architecture schematic diagram of the information recommendation system in the embodiment of the present application, and as shown in the figure, the information recommendation system may construct corresponding recommenders according to information to be recommended, for example, a recommender a is used for recommending information a, a recommender B is used for recommending information B, and a recommender C is used for recommending information C, and each recommender respectively includes a recall stage, a filtering stage, a feature calculation stage, and a ranking stage. In the recall stage, recall is usually performed based on the contents of user images, user preferences, region information, and the like, and if a new user is cold-started, the recall resources are insufficient, and thus recall is performed using a cold-start service. In the filtering stage, manual rules can be used to filter the information content, for example, to filter out repeated information, or to delete some outdated information periodically, etc. In the feature calculation stage, the features of the recall candidate set are calculated by combining the real-time behaviors of the user, the portrait of the user, the knowledge graph, the feature service and the like. In the sorting stage, an algorithm model is used for scoring the recalled candidate sets, and the candidate sets are re-sorted according to a certain strategy according to the scores of the candidate sets.

With reference to the above description, a scheme provided in an embodiment of the present application relates to technologies such as machine learning of artificial intelligence, and a method for information recommendation in the present application is described below, with reference to fig. 3, where an embodiment of the method for information recommendation in the embodiment of the present application includes:

101. the method comprises the steps that a server obtains N pieces of information to be recommended, wherein N is an integer greater than or equal to 2;

in this embodiment, the server obtains at least two pieces of information to be recommended, where the information to be recommended is usually information after going online for a period of time, for example, the information to be recommended is advertisement information of an automobile a, and the advertisement information has been recommended to 200 users in the past month.

It should be noted that, in the present application, the information recommendation device is described as being disposed in a server, and in an actual application, the information recommendation device may also be disposed in a terminal device, so that this is merely an illustration and should not be construed as a limitation to the present application.

102. The server acquires target recommendation information from the N pieces of information to be recommended, wherein the conversion rate corresponding to the target recommendation information is the maximum value of the conversion rates corresponding to the N pieces of information to be recommended;

in this embodiment, the server needs to first extract one piece of target recommendation information from the N pieces of information to be recommended, and the conversion rate of the target recommendation information is the maximum value of the conversion rates of the N pieces of information to be recommended. Illustratively, the conversion rate corresponding to each piece of information to be recommended in the N pieces of information to be recommended may be calculated, and then the information to be recommended corresponding to the maximum value of the conversion rate is selected from the conversion rates, where the information to be recommended is the target recommendation information. For example, the converted rate of each piece of information to be recommended may not be calculated, and only one piece of information to be recommended with the largest converted rate may be determined.

It is understood that the converted rate represents the ratio of the number of times of completing the conversion behavior to the total number of times of popularizing information in a statistical period. Specifically, the converted rate is (recommended number/converted number) × 100%. Where a converted number is understood to be a number having a converting activity, the converting activity includes, but is not limited to, staying on a website for a certain amount of time (e.g., 3 minutes), browsing a particular page on the website (e.g., a registration page or a "contact us" page), registering or submitting an order on the website, making a consultation via a website message or a website online instant messenger, making a consultation via telephone, visiting, consulting or negotiating, actually making a payment or deal.

103. The server acquires a first probability value and a second probability value, wherein the sum of the first probability value and the second probability value is 1;

in this embodiment, the server further needs to acquire a first probability value and a second probability value, where the first probability value and the second probability value may be preset fixed values or dynamically adjusted values, and are not limited herein. The sum of the first probability value and the second probability value is 1.

104. The server determines a first recommendation result from the N information to be recommended according to a first probability value and a second probability value, wherein the first probability value represents the probability of determining the first recommendation result from the N information to be recommended with equal probability, and the second probability value represents the probability of taking the target recommendation information as the first recommendation result;

in this embodiment, the server obtains a first recommendation result from the N pieces of information to be recommended according to the first probability value and the second probability value, where the first recommendation result is one piece of information to be recommended in the N pieces of information to be recommended. Wherein the first probability value belongs to the probability of exploring (explorer) unknown information and the second probability value belongs to the probability of utilizing (explore) known information.

Specifically, the method and the device can determine the first recommendation result based on an epsilon greedy (epsilon-greedy) algorithm, namely, a first probability value (epsilon) which is greater than 0 and less than 1 is generated, and the first probability value (epsilon) is used for randomly selecting one piece of information to be recommended from N pieces of information to be recommended, so that the exploration process is realized. And simultaneously, selecting the information to be recommended (namely the target recommendation information) with the maximum profit according to the second probability value (1-epsilon), thereby realizing the utilization process. In practical application, the degree of the balance between exploration and utilization is controlled through the first probability value (epsilon), and the smaller the first probability value (epsilon), the more conservative the exploration is shown, and the better stability is achieved.

105. And the server sends the first recommendation result to the terminal equipment so that the terminal equipment displays the first recommendation result.

In this embodiment, after the server selects the first recommendation result, the server sends the first recommendation result to the terminal device, and the terminal device displays the first recommendation result. The terminal device may be a terminal device used by the new user, and therefore, the new user can browse the server on the terminal device and push the server as the first recommendation result.

Optionally, on the basis of the embodiments corresponding to fig. 3, in another optional embodiment of the information recommendation method provided in the embodiment of the present application, the server determines the first recommendation result from the N pieces of information to be recommended according to the first probability value and the second probability value, and specifically includes the following steps:

the server acquires a first probability range corresponding to the first probability value and a second probability range corresponding to the second probability value;

the server randomly acquires a target probability value from the first probability range and the second probability range;

if the target probability value belongs to the first probability range, the server randomly obtains a first recommendation result from the N pieces of information to be recommended according to equal probability;

and if the target probability value belongs to the second probability range, the server determines the target recommendation information as a first recommendation result.

In this embodiment, a manner of determining the first recommendation result is described. As can be seen in the foregoing embodiments, it is also necessary to determine a first probability value (ε) and a second probability value (1- ε) before determining the first recommendation, and in the ε greedy algorithm, the first probability value (ε) and the second probability value (1- ε) are typically fixed, e.g., the first probability value (ε) is 0.1 and the second probability value (1- ε) is 0.9. In the softmax algorithm, the first probability value (epsilon) is determined based on Boltzmann distribution, and the second probability value (1-epsilon) is obtained after the first probability value (epsilon) is obtained. It can be understood that, whether the epsilon greedy algorithm or the softmax algorithm is adopted, the first recommendation result is selected from the N pieces of information to be recommended based on the idea of exploration and utilization, which will be described below with reference to fig. 4 and 5.

Exemplarily, referring to fig. 4, fig. 4 is a schematic diagram illustrating that the first recommendation result is determined based on different probability ranges in the embodiment of the present application, as shown in the figure, the first probability value (epsilon) is 0.1, the second probability value (1-epsilon) is 0.9 as an example, and it is assumed that N information to be recommended are information to be recommended a, information to be recommended B, and information to be recommended C, respectively. Based on this, the first probability range is determined to be a range greater than or equal to 0 and less than or equal to 10% from the first probability value (ε), and the second probability range is determined to be a range greater than 10% and less than or equal to 100% from the second probability value (1- ε). Thus, the ratio of the first probability range to the second probability range is 1 to 9, i.e., there is one chance to select to the first probability range and there is nine chances to select to the second probability range. Then, a value is randomly extracted from 0 to 1 as a target probability value, and if the target probability value falls within the first probability range, any one of information to be recommended from the information to be recommended a, the information to be recommended B, and the information to be recommended C is selected as a first recommendation result with equal probability. And if the target probability value falls within a second probability range, directly taking the target recommendation information as a first recommendation result, wherein the target recommendation information is one of the three pieces of information to be recommended selected according to the maximum conversion rate, for example, the information A to be recommended is the target recommendation information.

Exemplarily, referring to fig. 5, fig. 5 is another schematic diagram illustrating that the first recommendation result is determined based on different probability ranges in the embodiment of the present application, as shown in the figure, the first probability value (epsilon) is 0.1, the second probability value (1-epsilon) is 0.9 as an example, and it is assumed that N information to be recommended are information to be recommended a, information to be recommended B, and information to be recommended C, respectively. Based on this, the first probability range is determined to be a range greater than or equal to 0 and less than or equal to 10% from the first probability value (ε), and the second probability range is determined to be a range greater than 10% and less than or equal to 100% from the second probability value (1- ε). Thus, the ratio of the first probability range to the second probability range is 1 to 9, i.e., there is one chance to select to the first probability range and there is nine chances to select to the second probability range. Then, a value is randomly extracted from 0 to 1 as a target probability value, and if the target probability value falls within a first probability range, target recommendation information is directly used as a first recommendation result, wherein the target recommendation information is one of the three pieces of information to be recommended, which is selected according to the maximum conversion rate, for example, information to be recommended a is the target recommendation information. And if the target probability value is in the second probability range, selecting any one piece of information to be recommended from the information A to be recommended, the information B to be recommended and the information C to be recommended as a first recommendation result according to equal probability.

Secondly, in the embodiment of the present application, a manner of determining the first recommendation result is provided, and in the manner, the first probability value and the second probability value are used to balance exploration and utilization, so that the maximum benefit is realized as much as possible. The exploration process needs to consider the experience once, and can explore the information to be recommended with high potential return, namely, the purpose of non-greedy and long-term return is realized. The utilization process is based on a known best strategy, and target recommendation information known to have high return is developed and utilized, namely the purposes of greedy and short-term return are achieved. The exploration process and the utilization process are combined, and the better recommendation result is obtained.

Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment of the information recommendation method provided in the embodiment of the present application, the obtaining, by the server, the target recommendation information from the N pieces of information to be recommended specifically includes the following steps:

the server acquires a historical recommendation number and a historical converted number corresponding to each piece of information to be recommended for each piece of information to be recommended in the N pieces of information to be recommended;

the server determines N converted rates according to the historical recommendation number and the historical converted number corresponding to each piece of information to be recommended, wherein the N converted rates comprise the converted rate corresponding to each piece of information to be recommended;

and the server determines the information to be recommended corresponding to the maximum value of the N converted rates as target recommendation information.

In this embodiment, a method for determining target recommendation information based on historical data is described. As can be seen from the foregoing embodiment, the conversion behavior may include multiple behaviors, and for convenience of description, it will be assumed that "click information and jump to a specific page" is used as one conversion behavior, and the server pushes information to be recommended a to a terminal device used by 100 users, where 20 users click the information to be recommended a, so that the conversion rate of the information to be recommended a is (20/100) × 100% ═ 20%.

Specifically, for example, N pieces of information to be recommended are taken as information to be recommended a, information to be recommended B, and information to be recommended C, please refer to table 1, where table 1 is an illustration of the historical recommendation number and the historical conversion number of each piece of information to be recommended.

TABLE 1

Information to be recommended	Number of historical recommendations	Number of history converted
			Information A to be recommended	200	20
Information B to be recommended	500	100
			Information C to be recommended	800	50

Based on table 1, the historical recommendation number and the historical converted number of each piece of information to be recommended are calculated by the following formula to obtain the converted rate of the information to be recommended:

conversion rate ═ (number of historical conversions/number of historical recommendations) × 100%;

the conversion rate of the information to be recommended a thus obtained is (20/200) × 100% ═ 10%. The conversion rate of the information to be recommended B is (100/500) × 100% ═ 20%. The conversion rate of the information to be recommended C is (50/800) × 100% ═ 6.25%. Based on this, the information to be recommended B may be determined as the target recommendation information.

With reference to the foregoing embodiment, after the information B to be recommended is determined as the target recommendation information, the first recommendation result recommended this time may be determined by using the first probability value and the second probability value. And if the first recommendation result is the information B to be recommended, directly pushing the information B to be recommended to the terminal equipment. In addition, if a plurality of pieces of information need to be recommended, after the information to be recommended B is determined to be pushed, one piece of target recommendation information can be selected from the information to be recommended a and the information to be recommended C, namely, the information to be recommended a with a high conversion rate is selected as new target recommendation information, and then one piece of information to be recommended is determined as another recommendation result by using the first probability value and the second probability value.

It should be noted that, in this embodiment, three pieces of information to be recommended are taken as an example for introduction, in practical application, the value of N may also be other values, and the number of recommendation results is not only 1.

Secondly, in the embodiment of the application, a mode for determining target recommendation information based on historical data is provided, through the mode, under the condition that the related information of a new user cannot be obtained, historical data of other users can still be used as a basis for deducing the preference of the new user, and in addition, the sizes of converted rates can be sorted, so that a plurality of pieces of information are recommended, and a better recommendation effect is achieved.

the server acquires N confidence intervals corresponding to N pieces of information to be recommended, wherein each confidence interval corresponds to one piece of information to be recommended;

the server acquires a first confidence interval from the N confidence intervals, wherein the upper bound value of the first confidence interval is the maximum value of the N upper bound values, and each upper bound value corresponds to one confidence interval;

and if the first confidence interval meets the information recommendation condition, the server determines the information to be recommended corresponding to the first confidence interval as target recommendation information.

In this embodiment, a method for determining target recommendation information based on an Upper Confidence Bound (UCB) algorithm is described. In statistics, confidence intervals can be used to measure the uncertainty of the estimate, the idea of UCB is optimistic facing uncertainty. The more times of trying on a certain to-be-recommended information, the narrower the confidence interval corresponding to the to-be-recommended information is, and the uncertainty of estimation is reduced, while those to-be-recommended information with larger average values tend to be selected multiple times, which is a process of "utilizing". The less the number of times of trying for a certain information to be recommended, the wider the confidence interval corresponding to the information to be recommended, and the higher the uncertainty of estimation, the information to be recommended with the wider confidence interval tends to be selected many times, which is the process of "exploring".

Specifically, in the process of selecting the target recommendation information, the server needs to obtain a confidence interval corresponding to each piece of information to be recommended, wherein each confidence interval has a corresponding upper bound value, then selects the largest upper bound value from the confidence intervals, and determines the confidence interval corresponding to the largest upper bound value as the first confidence interval. If the first confidence interval meets the information recommendation condition, determining the information to be recommended corresponding to the first confidence interval as target recommendation information, otherwise, if the first confidence interval does not meet the information recommendation condition, continuing to select a confidence interval corresponding to the next maximum upper bound value from the N confidence intervals, continuing to judge whether the confidence interval meets the information recommendation condition, and so on until the confidence interval meeting the information recommendation condition is obtained, and determining the information to be recommended corresponding to the confidence interval as the target recommendation information.

It should be noted that, if two or more confidence intervals have equal upper bound values and are the maximum upper bound values, one confidence interval may be randomly selected from the two or more confidence intervals for judgment.

For convenience of understanding, please refer to fig. 6, where fig. 6 is a schematic diagram of implementing an upper confidence interval algorithm in an embodiment of the present application, and as shown in the figure, for example, an i-th confidence interval is selected from N confidence intervals, a confidence interval corresponding to a maximum upper bound value is selected from the N confidence intervals with a 100% probability, then an average value of the confidence intervals is updated according to a selection result, and then an expected value is calculated by using the following formula:

wherein E represents the desired value,

represents the average reward of the jth information to be recommended, n represents the total selection times up to the current time, n_jRepresenting the number of times of selection until the current jth information to be recommended.

As can be seen from the above formula, the larger the mean value is, the smaller the standard deviation is, the greater the probability of being selected is, and the utilization effect is achieved. Meanwhile, the information to be recommended which is selected less times can obtain a test opportunity, and an exploration effect is achieved.

Secondly, in the embodiment of the application, a method for determining the target recommendation information based on the UCB algorithm is provided, and by the method, the limitation caused by the exploration process can be overcome by adopting the UCB algorithm, and the level and suboptimal gap can be known. The UCB algorithm can quickly find the optimal information to be recommended, and can obtain a more accurate result under the condition of enough simulation times, so that the reliability of information recommendation is improved.

Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment of the information recommendation method provided in the embodiment of the present application, after the server acquires the first confidence interval from the N confidence intervals, the method further includes the following steps:

if the first confidence interval does not meet the information recommendation condition, the server determines information to be recommended corresponding to the first confidence interval as information to be adjusted;

the server acquires a historical recommendation number and a historical converted number corresponding to the information to be adjusted;

the server determines the average reward corresponding to the information to be adjusted according to the historical recommendation number and the historical converted number corresponding to the information to be adjusted;

the server adjusts the first confidence interval according to the average reward corresponding to the information to be adjusted to obtain a second confidence interval;

the server acquires a third confidence interval according to the second confidence interval and (N-1) confidence intervals, wherein an upper bound value of the third confidence interval is the maximum value of the N upper bound values, each upper bound value of the N upper bound values corresponds to one confidence interval, and the (N-1) confidence intervals are confidence intervals excluding the second confidence interval from the N confidence intervals;

and if the third confidence interval meets the information recommendation condition, the server determines the information to be recommended corresponding to the third confidence interval as the target recommendation information.

In this embodiment, a processing method that does not satisfy the information recommendation condition is provided. Based on UCB algorithm, in the nth round, the recommendation result needs to be shown to the new user, and in the nth round, the ith information to be recommended and the reward are r_i(n) is belonged to {0,1}, wherein, assuming that the new user clicks the ith information to be recommended, r_iAnd (n) is 1, and if the new user does not click the ith information to be recommended, r_i(n) is 0 and the goal of the UCB algorithm is to maximize the sum of the prizes in all rounds.

Specifically, in the nth round, a first confidence interval is selected from N confidence intervals, the first confidence interval corresponds to the ith information to be recommended, and if the first confidence interval does not meet the information recommendation condition, the information to be recommended corresponding to the first confidence interval is determined as the information to be adjusted, that is, the information to be adjusted is the ith information to be recommended. Based on this, N can be calculated_i(n) and R_i(N) wherein N_iAnd (n) represents the total times of pushing the ith information to be recommended before the nth round, namely the historical recommendation number corresponding to the information to be adjusted. R is_iAnd (n) represents the total reward of the ith information to be recommended before the nth round (namely the situation of conversion action), namely the converted number of the history corresponding to the information to be adjusted is obtained. According to N_i(n) and R_i(n), calculating the average reward corresponding to the information to be adjusted by adopting the following method:

wherein the content of the first and second substances,

and the average reward of the ith information to be recommended before the nth round is represented, namely the average reward corresponding to the information to be adjusted.

The second confidence interval is

Wherein the content of the first and second substances,

based on the above, the server adjusts the first confidence interval corresponding to the information to be adjusted into a second confidence interval, and the width of the second confidence interval is smaller than that of the first confidence interval. At this time, the server needs to select the confidence interval corresponding to the maximum upper bound value from the (N-1) confidence intervals of the previous round and the adjusted second confidence interval, that is, the third confidence interval. Correspondingly, if the third confidence interval meets the information recommendation condition, the server determines the information to be recommended corresponding to the third confidence interval as the target recommendation information, otherwise, if the third confidence interval does not meet the information recommendation condition, the width of the third confidence interval is adjusted, that is, the width of the third confidence interval is compressed, and then subsequent selection is continued, which is not repeated here.

In the embodiment of the present application, a processing method for not meeting the information recommendation condition is provided, and in the above manner, if it is determined that the first confidence interval extracted this time does not meet the information recommendation condition, the width of the first confidence interval needs to be adjusted, and then an appropriate confidence interval is continuously searched until an appropriate confidence interval is obtained. Therefore, the whole process can achieve the effect of optimal selection, the highest converted rate can be estimated under the condition of cold start of a new user, and the purpose of determining target recommendation information based on a UCB algorithm is achieved, so that the feasibility and operability of the scheme are improved.

the server acquires the total adjustment times corresponding to the N confidence intervals;

if the total adjustment times are larger than or equal to the adjustment time threshold, the server determines that the first confidence interval meets the information recommendation condition;

and if the total adjustment times are smaller than the adjustment time threshold, the server determines that the first confidence interval meets the information recommendation condition.

In this embodiment, a method for determining target recommendation information based on the number of iterations is described. In order to improve the information processing efficiency, the recommendation system may further use a preset adjustment time threshold, where the adjustment time threshold represents a maximum number of rounds that can allow selection of one confidence interval from the N confidence intervals. For the server, each time a confidence interval is selected, 1 may be added to the total number of adjustments until the threshold number of adjustments is reached.

Specifically, for convenience of understanding, the adjustment number threshold is taken as 100 as an example, the total adjustment number is K, and K is K +1, that is, each time adjustment is performed, the value K is added to 1, and when K is equal to the adjustment number threshold 100, the width of the confidence interval is not adjusted, but a confidence interval corresponding to the maximum upper limit value is selected from N confidence intervals obtained after 100 adjustments, and the information to be recommended corresponding to the confidence interval is determined as the target recommendation information. Therefore, if the first confidence interval is selected in the 100 th round, the first confidence interval satisfies the information recommendation condition. Conversely, if the first confidence interval was selected before the 100 th round, the first confidence interval does not satisfy the information recommendation condition.

In the embodiment of the application, a method for determining target recommendation information based on iteration times is provided, by which the total adjustment times can be preset before the target recommendation information is estimated, and once the total adjustment times are reached, whether other confidence intervals meet information recommendation conditions is not judged, so that processing resources can be saved to a certain extent, and the efficiency of determining the target recommendation information is improved. The effect of recommending information on the line is improved.

the server acquires a lower limit value of the first confidence interval;

if the lower limit value of the first confidence interval is greater than or equal to the upper limit value of each confidence interval in the (N-1) confidence intervals, the server determines that the first confidence interval meets the information recommendation condition, wherein the (N-1) confidence intervals are the confidence intervals excluding the first confidence intervals in the N confidence intervals;

and if the lower limit value of the first confidence interval is smaller than the upper limit value of any confidence interval in the (N-1) confidence intervals, the server determines that the first confidence interval does not meet the information recommendation condition.

In this embodiment, a method for determining target recommendation information based on a lower bound value of a confidence interval is introduced. Based on the UCB algorithm, it is assumed that the confidence intervals corresponding to each piece of information to be recommended have the same probability distribution in the initial stage, that is, the average value and the expectation are the same, while the actual expectation of each confidence interval is unknown, and it is necessary to make continuous attempts to estimate the average expectation of each confidence interval.

Specifically, for convenience of introduction, N pieces of information to be recommended are taken as information to be recommended a, information to be recommended B, and information to be recommended C for example, please refer to fig. 7, fig. 7 is a schematic diagram of updating a confidence level region based on an upper confidence limit algorithm in the embodiment of the present application, and as can be seen from the diagram (a) in fig. 7, since a confidence level region corresponding to the information to be recommended B has a maximum upper bound value, a confidence level region corresponding to the information to be recommended B is selected. After the selection is finished, as shown in (B) of fig. 7, the confidence interval corresponding to the information B to be recommended is narrowed, and the lower bound value is increased, at this time, the confidence interval corresponding to the information C to be recommended has the largest upper bound value, and then the confidence interval corresponding to the information C to be recommended is selected. After a plurality of selections, a convergence result is obtained.

Referring to fig. 8, fig. 8 is another schematic diagram of determining target recommendation information based on an upper confidence bound algorithm in the embodiment of the present application, as shown in the figure, it is assumed that after multiple selections, a confidence interval corresponding to information C to be recommended has a maximum upper bound value, and a lower bound value of the confidence interval corresponding to the information C to be recommended is higher than upper bound values of other confidence intervals, that is, in this case, the confidence interval corresponding to the information C to be recommended is preferentially selected, so that the purpose of convergence is achieved. And (3) assuming that the confidence degree interval corresponding to the information C to be recommended is a first confidence degree interval, and the confidence degree interval of the information B to be recommended and the confidence degree interval of the information C to be recommended are (N-1) confidence degree intervals, determining that the first confidence degree interval meets the information recommendation condition. On the contrary, if the lower bound value of the first confidence interval is smaller than the upper bound value of any one confidence interval in the (N-1) confidence intervals, it indicates that the convergence state has not been reached, and therefore, the first confidence interval does not satisfy the information recommendation condition.

In the embodiment of the application, a method for determining target recommendation information based on a lower bound value of a confidence interval is provided, and through the method, when the lower bound value of a confidence interval is greater than upper bound values of other confidence intervals, the confidence interval is considered to have reached a convergence condition, and subsequent values also tend to select the confidence interval, that is, information recommendation conditions are reached, so that feasibility and operability of a scheme are improved.

the server determines N probability distributions according to the historical recommendation number and the historical converted number corresponding to each piece of information to be recommended, wherein each probability distribution corresponds to one piece of information to be recommended;

the server acquires a random probability value from each probability distribution in the N probability distributions to obtain N random probability values;

the server determines a first random probability value from the N random probability values, wherein the first random probability value is the maximum value of the N random probability values;

and the server determines the information to be recommended corresponding to the first random probability value as target recommendation information.

In the embodiment, a mode for determining target recommendation information based on a Thompson Sampling (Thompson Sampling) algorithm is provided. The Thompson Sampling algorithm establishes a Beta Distribution (Beta Distribution) for each possible option, and adjusts the initial values of the parameter α and the parameter β according to a priori experience, for example, the initial values of the parameter α and the parameter β are both 1. Then each beta distribution generates a random value, the maximum value in the obtained result is selected as the current option, the parameter alpha and the parameter beta are adjusted according to the actual result, and the whole process is repeated.

Specifically, for convenience of introduction, please refer to fig. 9, where fig. 9 is a schematic diagram of determining target recommendation information based on a thompson sampling algorithm in the embodiment of the present application, and as shown in the figure, N pieces of information to be recommended are taken as information to be recommended a, information to be recommended B, and information to be recommended C for example, to describe, whether each piece of information to be recommended generates a benefit or not is provided with a probability distribution behind the piece of information to be recommended, the probability of generating the benefit is p, and a "probability distribution of the probability p" with a high confidence coefficient can be estimated through a continuous test, so that the problem can be approximately solved. Assuming that the probability distribution p conforms to Beta (alpha, Beta) distribution, each piece of information to be recommended maintains a probability distribution, namely, the information a to be recommended satisfies the probability distribution a, the information B to be recommended satisfies the probability distribution B, and the information C to be recommended satisfies the probability distribution C.

Based on this, when the information to be recommended is selected, a random probability value may be obtained from each probability distribution, for example, a random probability value a is randomly selected from the probability distribution a, a random probability value B is randomly selected from the probability distribution B, a random probability value C is randomly selected from the probability distribution C, and if the random probability value C is the maximum, the random probability value C is determined to be the first random probability value, so that the server may determine the information to be recommended corresponding to the first random probability value as the target recommendation information, for example, the information to be recommended corresponding to the random probability value C is determined to be the target recommendation information.

In the actual pushing process, one piece of information to be recommended is selected each time, if the user generates a conversion behavior on the information to be recommended, the parameter alpha is increased by 1, and otherwise, the parameter beta is increased by 1. Therefore, after multiple selections, the probability distribution of the information to be recommended is narrower and narrower, for easy understanding, please refer to fig. 10, where fig. 10 is a schematic diagram of updating the probability distribution in the embodiment of the present application, and as shown in the figure, N probability distributions are all narrower, and especially, the probability distribution C is narrower, which substantially matches with the expectation.

It should be noted that, according to the priority, the target recommendation information may be selected based on the UCB algorithm and the Thompson Sampling algorithm, and for example, taking the priority of the UCB algorithm higher than the priority of the Thompson Sampling algorithm as an example, assuming that the target recommendation information is determined to be the information a to be recommended based on the UCB algorithm, and the target recommendation information is determined to be the information B to be recommended based on the Thompson Sampling algorithm, then the information a to be recommended is finally used as the target recommendation information. Illustratively, taking the priority of the Thompson Sampling algorithm higher than the priority of the UCB algorithm as an example, assuming that the target recommendation information is determined to be the information a to be recommended based on the UCB algorithm, and the target recommendation information is determined to be the information B to be recommended based on the Thompson Sampling algorithm, then the information B to be recommended is finally taken as the target recommendation information.

Secondly, in the embodiment of the application, a method for determining target recommendation information based on a Thompson Sampling algorithm is provided, and for each conversion behavior, according to a probability density function of the conversion behavior, the probability conforming to the distribution of the probability density function is sampled, so that the exploration and utilization processes are realized, and the target recommendation information is selected, so that the feasibility and operability of the scheme are improved.

Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment of the information recommendation method provided in the embodiment of the present application, after the server sends the first recommendation result to the terminal device, so that the terminal device displays the first recommendation result, the method further includes the following steps:

if the server receives response information sent by the terminal equipment, the server updates the historical recommendation number and the historical converted number corresponding to the first recommendation result;

and if the server does not receive the response information sent by the terminal equipment, the server updates the historical recommendation number corresponding to the first recommendation result.

In this embodiment, a method for feeding back a conversion behavior based on a first recommendation result is described. After the server determines the first recommendation result, the first recommendation result may be sent to the terminal device used by the new user, and the terminal device displays the first recommendation result. The new user may interact with the first recommendation result, for example, click on the first recommendation result, view the first recommendation result, or comment on the first recommendation result.

Specifically, if the user browses the advertisement of a certain commodity, or adds the commodity into a shopping cart, or pays attention to a shop of the commodity, or purchases the commodity, or reviews the commodity, the new user may feed back response information to the server through the terminal device, where the response information carries a user identifier (e.g., an equipment identifier or a temporarily assigned identifier) and a commodity identifier of the new user, and updates the historical recommendation number and the historical converted number of the first recommendation result pair according to the conversion behavior. On the contrary, if the user does not have any feedback on the first recommendation result or directly selects the option of 'not interested', the server does not receive the response information or receives an information rejection request sent by the terminal device, and then the server only updates the historical recommendation number corresponding to the first recommendation result.

For convenience of understanding, taking the first recommendation result as the information a to be recommended as an example, the history recommendation number of the information a to be recommended is 200, and the history conversion number is 20. And if the server receives the response information sent by the terminal equipment, updating the historical recommendation number of the information A to be recommended to be 201, and updating the historical converted number to be 21. On the contrary, if the server does not receive the response information sent by the terminal device, the historical recommendation data of the information a to be recommended is updated to 201, and the historical converted number is still maintained 20.

It can be understood that, after receiving the response information, the server may further extract the following information as a user representation of the new user based on the response information, so as to optimize the subsequent information recommendation process for the new user.

1. Device information: the response information may also carry device information of the terminal device, such as a brand and a model of the mobile phone, and users of certain brands may have a certain degree of differentiation statistically, for example, most users of certain brands are students, or most users of certain brands are business people. Even different models of the same brand of mobile phone may differ, for example, the primary location of a model is female users, or the primary location of a model is children.

2. Position information: the response information can also carry the position information of the terminal equipment, the position information comprises but is not limited to provinces, cities, streets or business circles and the like, and based on the effective position information, the local information can be recommended to realize personalized recommendation. The method for acquiring the location information includes, but is not limited to, Positioning based on a Global Positioning System (GPS), Positioning based on a base station, Positioning based on an Assisted Global Positioning System (AGPS), and Positioning using Wireless Fidelity (WiFi).

3. Network information: the response information may also carry network information of the terminal device, for example, the 4th generation mobile communication technology (4G) network, WiFi information, or a fifth generation mobile communication technology (5th generation mobile networks, 5G), and the like, where the WiFi information may also be used to determine location information and further define a user located in the same WiFi network. If the number of users in the same WiFi network is small, the users may have a closer relationship with each other, and further data mining can be performed by utilizing the closer relationship to perform personalized recommendation.

Further, in the embodiment of the application, a method for feeding back a conversion behavior based on a first recommendation result is provided, and by the method, after the first recommendation result is pushed to a new user, feedback of the new user on the first recommendation result can be further captured, so that whether the conversion behavior occurs is determined, and based on the feedback, a recommendation system can be helped to know the requirements of the new user, content which is more in line with the preference of the new user can be conveniently pushed to the new user subsequently, and therefore the accuracy of information pushing is improved.

Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment of the information recommendation method provided in the embodiment of the present application, the method further includes the following steps:

the server acquires the updated historical recommendation number and the updated historical converted number of each piece of information to be recommended for each piece of information to be recommended in the N pieces of information to be recommended;

the server determines N updated conversion rates according to the updated historical recommendation number and the updated historical conversion number of each piece of information to be recommended, wherein the N updated conversion rates comprise the updated conversion rate of each piece of information to be recommended;

the server determines information to be recommended corresponding to the maximum value in the N updated converted rates as a second recommendation result;

and the server sends the second recommendation result to the terminal equipment so that the terminal equipment displays the second recommendation result.

In this embodiment, a way of dynamically adjusting historical data in a non-cold-start scenario is introduced. As can be seen from the foregoing embodiments, the conversion behavior may include a variety of behaviors, and for convenience of description, the following description will use "click information and jump to a specific page" as a conversion behavior.

Specifically, for example, N pieces of information to be recommended are taken as information to be recommended a, information to be recommended B, and information to be recommended C, please refer to table 2, where table 2 is an illustration of the historical recommendation number and the historical conversion number of each piece of information to be recommended.

TABLE 2

Information to be recommended	Number of historical recommendations	Number of historical conversions
			Information A to be recommended	200	20
Information B to be recommended	500	100
			Information C to be recommended	800	50

Based on table 2, the historical recommendation number and the historical converted number of each piece of information to be recommended are calculated by the following formula to obtain the converted rate of the information to be recommended:

the conversion rate of the information to be recommended a thus obtained is (20/200) × 100% ═ 10%. The conversion rate of the information to be recommended B is (100/500) × 100% ═ 20%. The conversion rate of the information to be recommended C is (50/800) × 100% ═ 6.25%. Based on this, the information to be recommended B may be determined as the target recommendation information. And after the information B to be recommended is determined as the target recommendation information, determining a first recommendation result by using the first probability value (epsilon) and the second probability value (1-epsilon). It should be noted that, the manner of determining the first recommendation result is described in the foregoing embodiment, and therefore, the detailed description is omitted here.

After the server sends the first recommendation result to the terminal equipment, the new user can select whether to click information or not through the terminal equipment and jump to a specific page, if the first recommendation result is clicked, the conversion behavior is realized, otherwise, if the first recommendation information is not clicked, the conversion behavior is not realized. After the first recommendation information is pushed, the preference of a new user for the first recommendation information can be captured, so that the preference of the user can be preferentially referred according to a certain proportion in the subsequent pushing process. The manner of pushing the second recommendation will be described below with reference to two examples.

In a first mode, the second recommendation result is determined based on the overall data.

For convenience of introduction, on the basis of the contents shown in table 2, assuming that the first recommendation result is the information B to be recommended, after the information B to be recommended is recommended to the new user, the new user clicks and views the information B to be recommended, and jumps to a specific page, so as to obtain the updated historical recommendation number and the updated historical conversion number shown in table 3.

TABLE 3

Information to be recommended	Updated historical recommendation number	Updated historical converted number
			Information A to be recommended	200	20
Information B to be recommended	501	101
			Information C to be recommended	800	50

Based on table 3, the updated historical recommendation number and the updated historical converted number of each piece of information to be recommended are calculated by the following formula to obtain the converted rate of the information to be recommended:

the updated conversion rate ═ x 100% (updated historical converted number/updated historical recommended number);

the updated conversion rate of the information to be recommended a is obtained as (20/200) × 100% ═ 10%. The converted rate after updating the information to be recommended B is (101/501) × 100% ═ 20.2%. The conversion rate of the information to be recommended C is (50/800) × 100% ═ 6.25%. Based on this, the information to be recommended B may continue to be determined as the target recommendation information. And after the information B to be recommended is determined as the target recommendation information, determining a second recommendation result by using the first probability value (epsilon) and the second probability value (1-epsilon). Then, the server sends the second recommendation result to the terminal device, and the terminal device displays the second recommendation result. It should be noted that the manner of determining the second recommendation result is similar to the manner of determining the first recommendation result, and details are not repeated here.

In a second mode, a second recommendation result is determined based on the personalized data.

For convenience of introduction, on the basis of the contents shown in table 2, assuming that the first recommendation result is the information B to be recommended, after the information B to be recommended is recommended to the new user, the new user clicks and views the information B to be recommended, and jumps to a specific page, thereby increasing the recommendation number and the converted number according to a certain proportion, for example, amplifying the converted number and the recommendation number by 10 times, so as to obtain the updated historical recommendation number and the updated historical converted number shown in table 4.

TABLE 4

Information to be recommended	Updated historical recommendation number	Updated historical converted number
			Information A to be recommended	200	20
Information B to be recommended	510	110
			Information C to be recommended	800	50

the updated conversion rate of the information to be recommended a is obtained as (20/200) × 100% ═ 10%. The converted rate after updating the information to be recommended B is (101/501) × 100% ═ 21.6%. The conversion rate of the information to be recommended C is (50/800) × 100% ═ 6.25%. Based on this, the information to be recommended B may continue to be determined as the target recommendation information. And after the information B to be recommended is determined as the target recommendation information, determining a second recommendation result by using the first probability value (epsilon) and the second probability value (1-epsilon). Then, the server sends the second recommendation result to the terminal device, and the terminal device displays the second recommendation result. It should be noted that the manner of determining the second recommendation result is similar to the manner of determining the first recommendation result, and details are not repeated here.

It is understood that, in this embodiment, the converted number and the recommended number are amplified by a ratio of 10 times, and in practical applications, the converted number and the recommended number may be amplified by other magnifications, which is not limited herein.

Furthermore, in the embodiment of the application, through a way of dynamically adjusting the historical data in a non-cold-start scene, after the feedback of the new user on the first recommendation information is acquired, the feedback and the historical data of other users can be jointly used as a basis for recommending the preference of the new user, the content of the recommendation result is updated, and a better recommendation effect is achieved. Further, it is also possible to sort the magnitudes of the converted rates, thereby recommending a plurality of pieces of information.

Optionally, on the basis of the foregoing embodiments corresponding to fig. 3, in another optional embodiment of the information recommendation method provided in the embodiment of the present application, the method further includes the following steps:

the server determines N updated probability distributions according to the historical recommendation number and the historical converted number corresponding to each piece of information to be recommended, wherein each updated probability distribution corresponds to one piece of information to be recommended;

the server acquires a random probability value from each updated probability distribution aiming at each updated probability distribution in the N updated probability distributions to obtain N updated random probability values;

the server determines a second random probability value from the N updated random probability values, wherein the second random probability value is the maximum value of the N updated random probability values;

the server determines information to be recommended corresponding to the second random probability value as a second recommendation result;

In this embodiment, a method for dynamically adjusting probability distribution in a non-cold-start scenario is introduced. As can be seen from the foregoing embodiments, the conversion behavior may include a plurality of behaviors, and for convenience of description, the "click information and jump to a specific page" will be used as one conversion behavior.

Specifically, taking N pieces of information to be recommended as information a to be recommended, information B to be recommended, and information C to be recommended as an example, and please refer to the contents of table 2, table 3, and table 4 again, an updated historical recommendation number and an updated historical conversion number are obtained, and an updated Beta distribution is determined according to the updated historical recommendation number and the updated historical conversion number, that is, N updated probability distributions are obtained. Based on this, the target recommendation information may continue to be determined based on the N updated probability distributions, and the second recommendation result may be determined using the first probability value (epsilon) and the second probability value (1-epsilon). Then, the server sends the second recommendation result to the terminal device, and the terminal device displays the second recommendation result. It should be noted that the manner of determining the second recommendation result is similar to the manner of determining the first recommendation result, and details are not repeated here.

Furthermore, in the embodiment of the application, a way of dynamically adjusting probability distribution in a non-cold-start scene is used, and by the way, after feedback of the new user on the first recommendation information is obtained, the feedback and historical data of other users can be used as a basis for recommending the preference of the new user, and the content of a recommendation result is updated. In addition, the conversion rate can be ranked, so that a plurality of information can be recommended, and a better recommendation effect can be achieved.

Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment of the information recommendation method provided in the embodiment of the present application, the first recommendation result includes at least one of advertisement information, electronic ticket information, friend information, video and audio information, text information, and web page information;

the server sends the first recommendation result to the terminal device, so that before the terminal device displays the first recommendation result, the method further comprises the following steps:

the method comprises the steps that a server obtains a user identification of a user, wherein the user identification is inconsistent with a stored user identification;

and the server determines the terminal equipment according to the user identification of the user.

In this embodiment, a method for pushing a recommendation result for a new user in a case of a cold start of the new user is introduced. Before the server sends the first recommendation result to the terminal device, whether the user using the terminal device is a new user can be judged, under normal conditions, if the user is the new user, the user identification of the new user fails to be matched with the stored user identification, and based on the fact that the user identification fails to be matched with the stored user identification, the first recommendation result can be selected by the information recommendation method provided by the application and then the first recommendation result is sent to the terminal device used by the new user. On the contrary, if the user is an old user, the user identification of the old user is matched with the stored user identification, and in this case, the related information of the old user can be extracted and personalized recommendation can be performed based on the information, that is, the problem of cold start of a new user does not exist.

Specifically, the first recommendation result includes, but is not limited to, advertisement information, electronic ticket information, friend information, video and audio information, text information, and web page information, and the first recommendation result will be described below with reference to four specific examples.

Firstly, the first recommendation result is advertisement information;

for easy understanding, please refer to fig. 11, fig. 11 is a schematic diagram of an interface for pushing advertisement information in the embodiment of the present application, as shown in the figure, taking an information application as an example, a new user logs in the information application with the identity of a visitor, and may browse some news information through the information application, and at the same time, the information application may also push some advertisement information to the new user, for example, the pushed advertisement information is "X brand digital camera", related documents and pictures, and the like. If the new user is interested in the advertisement information, the new user can click on the advertisement information and browse more contents, if the new user is not interested in the advertisement information, the new user can directly ignore the advertisement information or click a button of 'uninteresting'.

Secondly, the first recommendation result is electronic ticket information;

for easy understanding, please refer to fig. 12, fig. 12 is a schematic diagram of an interface for pushing electronic ticket information in the embodiment of the present application, as shown, taking a shopping application as an example, a new user logs in the shopping application in the identity of a visitor, and can get some electronic ticket information through the shopping application, for example, the pushed electronic ticket information includes an electronic ticket of restaurant "happy grilled fish", coupon contents of the electronic ticket, a serial number, an expiration date, and the like, and if the new user is interested in the electronic ticket information, the new user can click on the electronic ticket information and view more details, or directly click a button of "confirm coupon". If the new user has no interest in the electronic ticket information, the new user can directly ignore the electronic ticket information.

Thirdly, the first recommendation result is video information;

for easy understanding, please refer to fig. 13, where fig. 13 is a schematic interface diagram of pushing video information in the embodiment of the present application, as shown in the figure, taking a video application as an example, a new user logs in the video application in the identity of a visitor, and movies, tv shows, art programs, animations, and the like can be watched through the video application, and at the same time, the video application may also push some video information to the new user, for example, the pushed video information includes related content of the art program "happy home" and a current video. If the new user is interested in this video information, the certain video can be clicked on and viewed. If the new user has no interest in this video information, it can be ignored directly, or click on a "not interested" button.

Fourthly, the first recommendation result is friend information;

for convenience of understanding, please refer to fig. 14, where fig. 14 is an interface schematic diagram of pushing friend information in the embodiment of the present application, as shown in the figure, taking a social application as an example, a new user logs in the social application in the identity of a visitor, and some friends or public numbers may be added through the social application. If the new user is interested in this friend information, a "click to follow" button may be selected. If the new user has no interest in the friend information, the new user can directly ignore the friend information or click a button of 'not interested'.

Secondly, in the embodiment of the application, a method for pushing recommendation results for a new user under the condition of cold start of the new user is provided, and different types of information can be pushed for the new user through the method, so that the flexibility of the scheme is increased, and meanwhile, information which is possibly interested in the new user is recommended for the new user as far as possible, and the information recommendation effect is improved.

Referring to fig. 15, fig. 15 is a schematic view of an embodiment of an information recommendation apparatus in an embodiment of the present application, and the information recommendation apparatus 20 includes:

an obtaining module 201, configured to obtain N pieces of information to be recommended, where N is an integer greater than or equal to 2;

the obtaining module 201 is further configured to obtain target recommendation information from the N pieces of information to be recommended, where a conversion rate corresponding to the target recommendation information is a maximum value of conversion rates corresponding to the N pieces of information to be recommended;

the obtaining module 201 is further configured to obtain a first probability value and a second probability value, where a sum of the first probability value and the second probability value is 1;

the determining module 202 is configured to determine a first recommendation result from the N pieces of information to be recommended according to a first probability value and a second probability value, where the first probability value represents a probability of determining the first recommendation result from the N pieces of information to be recommended with equal probability, and the second probability value represents a probability of using the target recommendation information as the first recommendation result;

the sending module 203 is configured to send the first recommendation result to the terminal device, so that the terminal device displays the first recommendation result.

In the embodiment of the application, an information recommendation device is provided, which first acquires N pieces of information to be recommended, and then acquires target recommendation information from the N pieces of information to be recommended, where a conversion rate corresponding to the target recommendation information is a maximum value of the N conversion rates, and acquires a first probability value and a second probability value, where a sum of the first probability value and the second probability value is 1, so that a first recommendation result is determined from the N pieces of information to be recommended according to the first probability value and the second probability value, and finally, the first recommendation result is sent to a terminal device, and the terminal device displays the first recommendation result. By adopting the device and utilizing the principle of the MAB, the converted rate of the information to be recommended is used as available information, so that the maximum benefit is realized, meanwhile, all the information to be recommended is used as information for exploring the interest of the user and is pushed to the user with equal probability, so that the interest points of the user are developed, the information which is actually interested by the user can be found in a plurality of times of recommendation, and the information recommendation effect is improved.

Alternatively, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the information recommendation device 20 provided in the embodiment of the present application,

the determining module 202 is specifically configured to obtain a first probability range corresponding to the first probability value and a second probability range corresponding to the second probability value;

In the embodiment of the application, an information recommendation device is provided, and the device is used for balancing exploration and utilization by utilizing the first probability value and the second probability value so as to maximize income as much as possible. The exploration process needs to consider the experience once, and can explore the information to be recommended with high potential return, namely, the purposes of non-greedy and long-term return are achieved. The utilization process is based on a known best strategy, and target recommendation information known to have high return is developed and utilized, namely the purposes of greedy and short-term return are achieved. The exploration process and the utilization process are combined, and the better recommendation result is obtained.

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the information recommendation device 20 provided in the embodiment of the present application,

the obtaining module 201 is specifically configured to obtain, for each to-be-recommended information of the N to-be-recommended information, a history recommendation number and a history converted number corresponding to each to-be-recommended information;

In the embodiment of the application, an information recommendation device is provided, and by adopting the device, under the condition that the related information of a new user cannot be acquired, historical data of other users can still be used as a basis for deducing the preference of the new user, and in addition, the conversion rate can be sequenced, so that a plurality of pieces of information are recommended, and a better recommendation effect is achieved.

the obtaining module 201 is specifically configured to obtain N confidence intervals corresponding to N pieces of information to be recommended, where each confidence interval corresponds to one piece of information to be recommended;

The embodiment of the application provides an information recommendation device, and by adopting the device, the limit caused by the exploration process can be overcome by adopting the UCB algorithm, and the level and suboptimal gap can be known. The UCB algorithm can quickly find the optimal information to be recommended, and can obtain a more accurate result under the condition of enough simulation times, so that the reliability of information recommendation is improved.

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the information recommendation device 20 provided in the embodiment of the present application, the information recommendation device 20 further includes a processing module 204;

the determining module 202 is further configured to, after the obtaining module obtains the first confidence interval from the N confidence intervals, determine, as to-be-adjusted information, to-be-recommended information corresponding to the first confidence interval if the first confidence interval does not satisfy the information recommendation condition;

the obtaining module 201 is further configured to obtain a historical recommendation number and a historical converted number corresponding to the information to be adjusted;

the determining module 202 is further configured to determine an average reward corresponding to the information to be adjusted according to the historical recommendation number and the historical converted number corresponding to the information to be adjusted;

the processing module 204 is configured to adjust the first confidence interval according to the average reward corresponding to the information to be adjusted, so as to obtain a second confidence interval;

the obtaining module 201 is further configured to obtain a third confidence interval according to the second confidence interval and (N-1) confidence intervals, where an upper bound value of the third confidence interval is a maximum value of N upper bound values, each upper bound value of the N upper bound values corresponds to one confidence interval, and the (N-1) confidence intervals are confidence intervals excluding the second confidence interval from the N confidence intervals;

the determining module 202 is further configured to determine, if the third confidence interval meets the information recommendation condition, to-be-recommended information corresponding to the third confidence interval as target recommendation information.

In the embodiment of the application, an information recommendation device is provided, and with the adoption of the device, if it is determined that the first confidence interval taken out this time does not meet the information recommendation condition, the width of the first confidence interval needs to be adjusted, and then a proper confidence interval is continuously searched until a proper confidence interval is obtained. Therefore, the whole process can achieve the effect of optimal selection, the highest converted rate can be estimated under the condition of cold start of a new user, and the purpose of determining target recommendation information based on a UCB algorithm is achieved, so that the feasibility and operability of the scheme are improved.

the obtaining module 201 is further configured to obtain the total number of adjustments corresponding to the N confidence intervals after obtaining the first confidence interval from the N confidence intervals;

the determining module 202 is further configured to determine that the first confidence interval meets the information recommendation condition if the total number of adjustments is greater than or equal to the adjustment number threshold;

the determining module 202 is further configured to determine that the first confidence interval meets the information recommendation condition if the total number of adjustments is smaller than the adjustment number threshold.

In the embodiment of the application, an information recommendation device is provided, and by using the device, the total number of times of adjustment can be preset before target recommendation information is estimated, and once the total number of times of adjustment is reached, whether other confidence intervals meet information recommendation conditions is not judged, so that processing resources can be saved to a certain extent, and the efficiency of determining the target recommendation information is improved. The effect of recommending information on the line is improved.

the obtaining module 201 is further configured to obtain a lower limit value of the first confidence interval;

the determining module 202 is further configured to determine that the first confidence interval meets the information recommendation condition if a lower limit of the first confidence interval is greater than or equal to an upper limit of each confidence interval in the (N-1) confidence intervals, where the (N-1) confidence intervals are confidence intervals excluding the first confidence interval from the N confidence intervals;

the determining module 202 is further configured to determine that the first confidence interval does not satisfy the information recommendation condition if the lower limit of the first confidence interval is smaller than the upper limit of any one confidence interval in the (N-1) confidence intervals.

In the embodiment of the application, an information recommendation device is provided, and by adopting the device, when the lower bound value of a certain confidence interval is greater than the upper bound values of other confidence intervals, the confidence interval can be considered to have reached the convergence condition, and subsequent values also tend to select the confidence interval, that is, the information recommendation condition is reached, so that the feasibility and operability of the scheme are improved.

In the embodiment of the application, an information recommendation device is provided, and by adopting the device, for each conversion behavior, according to the probability density function of the conversion behavior, the probability conforming to the probability density function distribution is sampled, so that the exploration and utilization processes are realized, the target recommendation information is selected, and the feasibility and the operability of the scheme are improved.

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the information recommendation device 20 provided in the embodiment of the present application, the information recommendation device further includes a processing module 204;

the processing module 204 is further configured to send the first recommendation result to the terminal device by the sending module 203, so that after the terminal device displays the first recommendation result, if response information sent by the terminal device is received, the historical recommendation number and the historical converted number corresponding to the first recommendation result are updated;

the processing module 204 is further configured to send the first recommendation result to the terminal device by the sending module 203, so that after the terminal device displays the first recommendation result, if the response information sent by the terminal device is not received, the historical recommendation number corresponding to the first recommendation result is updated.

In the embodiment of the application, an information recommendation device is provided, and by adopting the device, after a first recommendation result is pushed to a new user, feedback of the new user on the first recommendation result can be further captured, so that whether a conversion behavior occurs or not is determined, and based on the feedback, a recommendation system can be helped to know the requirement of the new user, so that contents which are more in line with the preference of the new user can be conveniently and subsequently pushed to the new user, and the accuracy of information pushing is improved.

the obtaining module 201 is further configured to obtain, for each to-be-recommended information of the N to-be-recommended information, an updated historical recommendation number and an updated historical converted number of each to-be-recommended information;

the determining module 202 is further configured to determine N updated converted rates according to the updated historical recommendation number and the updated historical converted number of each piece of information to be recommended, where the N updated converted rates include the updated converted rate of each piece of information to be recommended;

the determining module 202 is further configured to determine information to be recommended corresponding to a maximum value of the N updated converted rates as a second recommendation result;

the sending module 203 is further configured to send the second recommendation result to the terminal device, so that the terminal device displays the second recommendation result.

In the embodiment of the application, an information recommendation device is provided, and by adopting the device, after feedback of a new user on first recommendation information is acquired, the feedback and historical data of other users can be jointly used as a basis for recommending the preference of the new user, and the content of a recommendation result is updated. In addition, the conversion rate can be ranked, so that a plurality of pieces of information can be recommended, and a better recommendation effect can be achieved.

the determining module 202 is further configured to determine N updated probability distributions according to the historical recommendation number and the historical converted number corresponding to each piece of information to be recommended, where each updated probability distribution corresponds to one piece of information to be recommended;

the obtaining module 201 is further configured to obtain, for each updated probability distribution of the N updated probability distributions, a random probability value from each updated probability distribution to obtain N updated random probability values;

the determining module 202 is further configured to determine a second random probability value from the N updated random probability values, where the second random probability value is a maximum value of the N updated random probability values;

the determining module 202 is further configured to determine information to be recommended corresponding to the second random probability value as a second recommendation result;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the information recommendation device 20 provided in this embodiment of the application, the first recommendation result includes at least one of advertisement information, electronic ticket information, friend information, video and audio information, text information, and web page information;

the obtaining module 201 is further configured to obtain a user identifier of the user before the sending module 203 sends the first recommendation result to the terminal device so that the terminal device displays the first recommendation result, where the user identifier is inconsistent with the stored user identifier;

the determining module 202 is further configured to determine the terminal device according to the user identifier of the user.

In the embodiment of the application, the information recommendation device is provided, and by adopting the device, different types of information can be pushed for new users, so that the flexibility of the scheme is increased, and meanwhile, information which is possibly interested in the new users is recommended for the new users as much as possible, so that the information recommendation effect is improved.

The information recommendation device provided by the application can be deployed in a server or a terminal device, and the information recommendation device is deployed in the server as an example. Referring to fig. 16, fig. 16 is a schematic structural diagram of a server 30 according to an embodiment of the present application. The server 30 may include an input device 310, an output device 320, a processor 330, and a memory 340. The output device in the embodiments of the present application may be a display device.

Memory 340 may include both read-only memory and random-access memory, and provides instructions and data to processor 330. A portion of Memory 340 may also include Non-Volatile Random Access Memory (NVRAM).

Memory 340 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof:

and (3) operating instructions: including various operational instructions for performing various operations.

Operating the system: including various system programs for implementing various basic services and for handling hardware-based tasks.

In this embodiment, the processor 330 is configured to:

determining a first recommendation result from the N information to be recommended according to a first probability value and a second probability value, wherein the first probability value represents the probability of determining the first recommendation result from the N information to be recommended with equal probability, and the second probability value represents the probability of taking the target recommendation information as the first recommendation result;

Processor 330 controls the operation of server 30, and processor 330 may also be referred to as a Central Processing Unit (CPU). Memory 340 may include both read-only memory and random-access memory, and provides instructions and data to processor 330. A portion of the memory 340 may also include NVRAM. In a particular application, the various components of the server 30 are coupled together by a bus system 350, wherein the bus system 350 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. For clarity of illustration, however, the various buses are labeled in the figures as bus system 350.

The method disclosed in the embodiments of the present application can be applied to the processor 330, or implemented by the processor 330. The processor 330 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 330. The processor 330 may be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 340, and the processor 330 reads the information in the memory 340 and performs the steps of the above method in combination with the hardware thereof.

The related description of fig. 16 can be understood with reference to the related description and effects of the method portion of fig. 3, and will not be described in detail herein.

Embodiments of the present application also provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method described in the foregoing embodiments.

Embodiments of the present application also provide a computer program product including a program, which, when run on a computer, causes the computer to perform the methods described in the foregoing embodiments.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for information recommendation, comprising:

determining a first recommendation result from the N information to be recommended according to the first probability value and the second probability value, wherein the first probability value represents a probability that the first recommendation result is determined from the N information to be recommended with equal probability, and the second probability value represents a probability that the target recommendation information is used as the first recommendation result;

and sending the first recommendation result to terminal equipment so that the terminal equipment displays the first recommendation result.

2. The method of claim 1, wherein the determining a first recommendation result from the N information to be recommended according to the first probability value and the second probability value comprises:

acquiring a first probability range corresponding to the first probability value and a second probability range corresponding to the second probability value;

if the target probability value belongs to the first probability range, randomly acquiring the first recommendation result from the N pieces of information to be recommended with equal probability;

and if the target probability value belongs to the second probability range, determining the target recommendation information as the first recommendation result.

3. The method according to claim 1, wherein the obtaining target recommendation information from the N pieces of information to be recommended includes:

aiming at each piece of information to be recommended in the N pieces of information to be recommended, acquiring a historical recommendation number and a historical converted number corresponding to each piece of information to be recommended;

and determining the information to be recommended corresponding to the maximum value of the N converted rates as the target recommendation information.

4. The method according to claim 1, wherein the obtaining target recommendation information from the N pieces of information to be recommended includes:

acquiring N confidence intervals corresponding to the N pieces of information to be recommended, wherein each confidence interval corresponds to one piece of information to be recommended;

and if the first confidence interval meets the information recommendation condition, determining the information to be recommended corresponding to the first confidence interval as the target recommendation information.

5. The method of claim 4, wherein after obtaining a first confidence interval from the N confidence intervals, the method further comprises:

if the first confidence interval does not meet the information recommendation condition, determining information to be recommended corresponding to the first confidence interval as information to be adjusted;

acquiring a historical recommendation number and a historical converted number corresponding to the information to be adjusted;

determining the average reward corresponding to the information to be adjusted according to the historical recommendation number and the historical converted number corresponding to the information to be adjusted;

adjusting the first confidence interval according to the average reward corresponding to the information to be adjusted to obtain a second confidence interval;

obtaining a third confidence interval according to the second confidence interval and (N-1) confidence intervals, wherein an upper bound value of the third confidence interval is the maximum value of N upper bound values, each upper bound value of the N upper bound values corresponds to one confidence interval, and the (N-1) confidence intervals are confidence intervals excluding the second confidence interval from the N confidence intervals;

and if the third confidence interval meets the information recommendation condition, determining the information to be recommended corresponding to the third confidence interval as the target recommendation information.

6. The method of claim 4, wherein after obtaining a first confidence interval from the N confidence intervals, the method further comprises:

acquiring the total adjustment times corresponding to the N confidence intervals;

if the total adjustment times are larger than or equal to an adjustment time threshold, determining that the first confidence interval meets the information recommendation condition;

and if the total adjustment times are smaller than the adjustment time threshold, determining that the first confidence interval meets the information recommendation condition.

7. The method of claim 4, wherein after obtaining a first confidence interval from the N confidence intervals, the method further comprises:

acquiring a lower limit value of the first confidence interval;

if the lower limit value of the first confidence interval is greater than or equal to the upper limit value of each confidence interval in (N-1) confidence intervals, determining that the first confidence interval meets the information recommendation condition, wherein the (N-1) confidence intervals are confidence intervals excluding the first confidence interval from the N confidence intervals;

and if the lower limit value of the first confidence interval is smaller than the upper limit value of any confidence interval in the (N-1) confidence intervals, determining that the first confidence interval does not meet the information recommendation condition.

8. The method according to claim 1, wherein the obtaining target recommendation information from the N pieces of information to be recommended includes:

for each of the N probability distributions, obtaining a random probability value from the each probability distribution to obtain N random probability values;

determining a first random probability value from the N random probability values, wherein the first random probability value is a maximum of the N random probability values;

and determining the information to be recommended corresponding to the first random probability value as the target recommendation information.

9. The method according to any one of claims 1 to 8, wherein after sending the first recommendation result to a terminal device to enable the terminal device to display the first recommendation result, the method further comprises:

if response information sent by the terminal equipment is received, updating the historical recommendation number and the historical converted number corresponding to the first recommendation result;

and if the response information sent by the terminal equipment is not received, updating the historical recommendation number corresponding to the first recommendation result.

10. The method of claim 9, further comprising:

for each piece of information to be recommended in the N pieces of information to be recommended, acquiring the updated historical recommendation number and the updated historical converted number of each piece of information to be recommended;

determining N updated conversion rates according to the updated historical recommendation number and the updated historical conversion number of each piece of information to be recommended, wherein the N updated conversion rates comprise the updated conversion rate of each piece of information to be recommended;

determining information to be recommended corresponding to the maximum value of the N updated converted rates as a second recommendation result;

and sending the second recommendation result to the terminal equipment so that the terminal equipment displays the second recommendation result.

11. The method of claim 9, further comprising:

determining N updated probability distributions according to the historical recommendation number and the historical converted number corresponding to each piece of information to be recommended, wherein each updated probability distribution corresponds to one piece of information to be recommended;

for each of the N updated probability distributions, obtaining a random probability value from the each updated probability distribution to obtain N updated random probability values;

determining a second random probability value from the N updated random probability values, wherein the second random probability value is a maximum of the N updated random probability values;

determining the information to be recommended corresponding to the second random probability value as a second recommendation result;

12. The method of claim 1, wherein the first recommendation includes at least one of advertisement information, electronic ticket information, friend information, video and audio information, text information, and web page information;

before the sending of the first recommendation result to the terminal device to enable the terminal device to display the first recommendation result, the method further includes:

acquiring a user identifier of a user, wherein the user identifier is inconsistent with a stored user identifier;

and determining the terminal equipment according to the user identification of the user.

13. An information recommendation apparatus, comprising:

the obtaining module is further configured to obtain target recommendation information from the N pieces of information to be recommended, where a conversion rate corresponding to the target recommendation information is a maximum value of conversion rates corresponding to the N pieces of information to be recommended;

the obtaining module is further configured to obtain a first probability value and a second probability value, where a sum of the first probability value and the second probability value is 1;

a determining module, configured to determine a first recommendation result from the N information to be recommended according to the first probability value and the second probability value, where the first probability value represents a probability that the first recommendation result is determined from the N information to be recommended with equal probability, and the second probability value represents a probability that the target recommendation information is used as the first recommendation result;

and the sending module is used for sending the first recommendation result to the terminal equipment so that the terminal equipment displays the first recommendation result.

14. A server, comprising: a memory, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor for executing the program in the memory, the processor for performing the method of any one of claims 1 to 12 according to instructions in program code;

15. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 12.