WO2024012360A1

WO2024012360A1 - Data processing method and related apparatus

Info

Publication number: WO2024012360A1
Application number: PCT/CN2023/106278
Authority: WO
Inventors: 陈渤; 秦佳锐; 刘卫文; 唐睿明; 张伟楠; 俞勇
Original assignee: 华为技术有限公司
Priority date: 2022-07-11
Filing date: 2023-07-07
Publication date: 2024-01-18
Also published as: CN115293359A

Abstract

A data processing method, which can be applied to the field of artificial intelligence. The method comprises: predicting, according to a first training sample, first operation information of a user for an item by means of a first recommendation model, wherein the first operation information and second operation information are used for determining a first loss, the second operation information comprises information obtained according to an operation log of the user, and the first loss is used for updating the first recommendation model; and predicting, according to a second training sample, third operation information and fourth operation information of the user for the item by means of a second recommendation model and the updated first recommendation model, respectively, wherein the third operation information and the fourth operation information are used for determining a second loss, and the first recommendation model and the second recommendation model are sorting models in different stages of a multi-stage cascade recommendation system. In the present application, a joint training mode is used, such that a model in each stage focuses on the fitting of data of the respective stage thereof, and upstream and downstream stages are also used to assist with training, thereby improving the prediction effect.

Description

A data processing method and related device

This application claims priority to the Chinese patent application filed with the China Patent Office on July 11, 2022, with the application number 202210810008.9 and the invention title "A data processing method and related devices", the entire content of which is incorporated into this application by reference. middle.

Technical field

This application relates to the field of artificial intelligence, and in particular, to a data processing method and related devices.

Background technique

Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that can respond in a manner similar to human intelligence. Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

Industrial information retrieval systems (such as recommendation systems, search engines or advertising platforms) are designed to retrieve the data that users are most interested in from massive amounts of data (such as items, information, advertisements) and provide it to users. However, due to the information explosion on the Internet, major platforms generate millions of new information every day, which brings great challenges to the information retrieval system. In addition, since the system response time acceptable to users is very short (tens of milliseconds), retrieving the most interesting data for users in such a short period of time has become the primary task of the information retrieval system.

Generally speaking, complex machine learning models can better model the relationship between users and items, and therefore have better prediction accuracy, but often also lead to inefficiencies and, therefore, are limited by the latency of online inference. Requirements, becomes more difficult when deployed, and only a small number of items can be scored. On the contrary, due to the relatively low complexity of simple models, it is feasible to score a large number of items in terms of efficiency. However, due to the low capacity of the model, the prediction effect is often unsatisfactory. Therefore, building a multi-stage ranking system is a common solution for industrial information retrieval systems to balance prediction efficiency and effectiveness. The multi-stage ranking system divides the original single system into multiple stages. Simple models can be deployed in the early stages of the system to quickly filter out a large number of irrelevant candidate items, while complex models are usually placed in the later stages of retrieval to be more relevant. users, thereby ranking candidate items more accurately.

However, in the process of training multi-stage ranking models in the existing technology, the recommendation model in each stage only focuses on the training of the current stage, and cannot fit the data in the inference space during training, so it has poor prediction ability.

Contents of the invention

This application provides a data processing method that uses a joint training model to allow each stage model to focus on fitting the data of its own stage, while using the upstream and downstream stages to assist training, thereby improving the prediction effect.

In a first aspect, this application provides a data processing method, which method includes: predicting the user's first operation information on the item through the first recommendation model based on the first training sample; the first training sample is the user and the item Attribute information, the first operation information and the second operation information are used to determine the first loss; the second operation information includes information obtained according to the user's operation log; the first loss is used to update the first recommendation model; According to the second training sample, the third operation information and the fourth operation information of the user on the item are predicted through the second recommendation model and the updated first recommendation model respectively; the second training sample is the attributes of the user and the item information, the first recommendation model and the second recommendation model are ranking models at different stages in the multi-stage cascade recommendation system, the third operation information and the fourth operation information are used to determine the second loss; the second loss is used After updating the updated first recommended model.

Specifically, the updated first recommendation model obtained through the self-learning flow can process the second training sample to obtain the fourth operation information, which serves as the supervision signal of the third operation information (that is, the true value of the second training sample) , can be predicted as a higher-order recommendation model (that is, based on the second training sample, the user's third operation information on the item is predicted by the second recommendation model). In the process of training the low-order recommendation model, the guidance of the fine-ranking model is added, and the interactive information between different stages is used to obtain better performance without changing the system architecture or sacrificing reasoning efficiency.

Compared with the existing technology, the recommendation model at each stage only focuses on the training of the current stage, and cannot fit the data in the inference space during training, so it has poor prediction ability. This invention adopts a joint training model, allowing each stage model to focus on fitting the parameters of each stage. Data, while using the upstream and downstream stages to assist training, thereby improving the prediction effect. In addition, the multi-stage joint optimization proposed in the embodiments of this application is implemented in the form of data exchange between different models without changing the training process of each model. Therefore, it is more suitable for the deployment of industrial systems and achieves better prediction results. .

In a possible implementation, the architecture of a multi-stage recommendation system often adopts the architecture of recall (or can be called matching), rough ranking, fine ranking, and rearrangement (or only includes recall, rough ranking, and fine ranking, or The combination of at least two of them is not limited by this application). Among them, rough sorting can be located between recall and fine sorting. The main goal of the rough sorting layer is to select the best candidate recall sub-sets of hundreds of magnitude from tens of thousands of candidate recall sets to enter fine sorting, which is carried out by fine sorting. Further sort the output.

In a possible implementation, the first recommendation model may be a rough ranking model, and the second recommendation model may be a fine ranking model; or, the first recommendation model may be a recall model, and the second recommendation model may be a fine ranking model. ; Or, the first recommendation model is a recall model, and the second recommendation model is a rough ranking model; or, the first recommendation model is a fine ranking model, and the second recommendation model is a rearrangement model; or, the first recommendation model The model is a coarse ranking model, and the second recommendation model is a rearrangement model; or the first recommendation model is a recall model, and the second recommendation model is a rearrangement model.

In one possible implementation, during model inference, the operation information output by the converged first recommendation model is used to screen items, and the converged second recommendation model is used to predict the user's response to the screened items. Operating information for some or all of the items in.

In one possible implementation, the converged second recommendation model is used to predict the user's operation information for all items in the filtered items (for example, the first recommendation model is a rough ranking model, and the second recommendation model is a fine ranking model. Model).

In one possible implementation, the converged second recommendation model is used to predict the user's operation information for some of the filtered items (for example, the first recommendation model is a rough ranking model, and the second recommendation model is a rearrangement model. Model, based on the prediction results obtained by the first recommendation model, one item screening can be performed, the fine ranking model needs to perform further screening, and the second recommendation model can make predictions based on the items screened by the fine ranking model).

In a possible implementation, the complexity of the second recommendation model is greater than the complexity of the first recommendation model; the complexity is related to at least one of the following: the number of parameters included in the model, the number of network layers included in the model Depth, the width of the network layers included in the model, and the number of feature dimensions of the input data.

When training the first recommendation model, the first training sample can be processed according to the first recommendation model, that is, the first operation information of the user on the item can be predicted through the first recommendation model; the first training sample is the user and item attribute information.

Wherein, when the first recommendation model is a model in an intermediate stage of a multi-stage ranking system, the items in the first training sample may be items filtered by the recommendation model in the upstream stage. The first training sample can be attribute information of users and items.

The user's attribute information may be attributes related to the user's preference characteristics, including at least one of gender, age, occupation, income, hobbies, and educational level. The gender may be male or female, and the age may be 0-100. The number in between, the occupation can be teachers, programmers, chefs, etc., the hobbies can be basketball, tennis, running, etc., and the education level can be elementary school, junior high school, high school, university, etc.; this application does not limit the target users The specific type of attribute information.

Among them, the items can be physical items or virtual items, such as APP, audio and video, web pages, news information, etc. The attribute information of the item can be the item name, developer, installation package size, category, and praise rating. At least one. Taking the item as an application as an example, the category of the item can be chatting, parkour games, office, etc., and the favorable rating can be ratings, comments, etc. for the item; this application is not limited to The specific type of attribute information for the item.

Among them, the first operation information predicted by the first recommendation model can be the user's behavioral operation type for the item, or whether a certain operation type has been performed. The above operation type can be browsing and clicking in the e-commerce platform behavior. , add to shopping cart, purchase and other operation types.

Among them, the second operation information can be used as the ground truth when training the first recommendation model. The items in the first training sample can include exposed items (that is, items that have been presented to the user) and unexposed items ( That is, items that have not yet been presented to the user). For exposed items, the first recommendation model can predict the user's operation information on the exposed items. Correspondingly, the second operation information is the true value of the user's operation information on the exposed items. This part of the information can be obtained based on the interaction records between the user and the items (such as the user's operation log). The behavior log can include the user's actual operation records on each item.

In a possible implementation, the first training sample is attribute information of users, exposed items, and unexposed items, and the second operation The information includes the user's predicted operation information for the unexposed item and the user's actual operation information for the exposed item. The actual operation information is obtained based on the user's operation log.

For unexposed items, the first recommendation model can predict the user's operation information for unexposed items. Correspondingly, the part of the second operation information that is the true value of the user's operation information for unexposed items can be predicted (also It is the prediction operation information). Optionally, the predicted operation information indicates that the user has not performed any operation on the unexposed item (that is, the unexposed sample is regarded as a negative correlation sample), or is obtained through other prediction models.

In existing implementations, the recommendation model is trained using exposure data; during inference, the model needs to sort a large amount of unseen data. This means that the data distribution during training is very different from the data distribution during inference, which will cause the system to be in a suboptimal state. In the embodiment of this application, by predicting (or directly) unexposed data, and using unexposed data Training the recommendation model in a multi-stage ranking system can improve the performance of the model.

In a possible implementation, the first training sample is attribute information of the user and items, including: the first training sample is attribute information of the user and N items, and the first operation information is the user's response to the N items. Operation information of items, the first operation information is used to filter N1 items from the N items; the method also includes: based on the attribute information of the user and some or all of the N1 items, through a third recommendation model , predict the user's fifth operation information for some or all of the N1 items; the fifth operation information and the sixth operation information are used to determine the third loss, and the sixth operation information includes information obtained according to the user's operation log information; the third loss is used to update the third recommendation model to obtain the second recommendation model.

In a second aspect, this application provides a data processing device, which includes:

The first prediction module is used to predict the user's first operation information on the item through the first recommendation model based on the first training sample; the first training sample is the attribute information of the user and the item, and the first operation information and the third The second operation information is used to determine the first loss; the second operation information includes information obtained according to the user's operation log; the first loss is used to update the first recommendation model;

The second prediction module is used to predict the third operation information and the fourth operation information of the user on the item through the second recommendation model and the updated first recommendation model respectively according to the second training sample; the second training The samples are attribute information of users and items, the first recommendation model and the second recommendation model are ranking models at different stages in a multi-stage cascade recommendation system, and the third operation information and the fourth operation information are used to determine the second Loss; the second loss is used to update the updated first recommendation model.

In a possible implementation, the complexity of the second recommendation model is greater than the complexity of the first recommendation model; the complexity is related to at least one of the following:

The number of parameters included in the model, the depth of the network layers included in the model, the width of the network layers included in the model, and the number of feature dimensions of the input data.

In a possible implementation, the first training sample is attribute information of the user, exposed items, and unexposed items, and the second operation information includes the user's predicted operation information for the unexposed items, and the user's predicted operation information for the exposed items. Actual operation information, which is obtained based on the user's operation log; or,

The second training sample is attribute information of users, exposed items, and unexposed items.

In a possible implementation, the predicted operation information indicates that the user has not performed any operation on the unexposed item.

In a possible implementation, the first training sample is attribute information of the user and items, including: the first training sample is attribute information of the user and N items, and the first operation information is the user's response to the N items. The operation information of the items, the first operation information is used to filter N1 items from the N items;

The device also includes:

The third prediction module is used to predict the user's fifth operation information for some or all of the N1 items through a third recommendation model based on the attribute information of the user and some or all of the N1 items; the The fifth operation information and the sixth operation information are used to determine the third loss. The sixth operation information includes information obtained according to the user's operation log; the third loss is used to update the third recommendation model to obtain the second Recommended model.

In a possible implementation, the first recommendation model is a rough ranking model, and the second recommendation model is a fine ranking model; or,

The first recommendation model is a recall model, and the second recommendation model is a refinement model; or,

The first recommendation model is a recall model, and the second recommendation model is a rough ranking model; or,

The first recommendation model is a fine ranking model, and the second recommendation model is a rearrangement model; or,

The first recommendation model is a rough ranking model, and the second recommendation model is a rearrangement model; or,

The first recommendation model is a recall model, and the second recommendation model is a rearrangement model.

In a possible implementation, the attribute information includes user attributes, and the user attributes include at least one of the following:

Gender, age, occupation, income, hobbies, education level.

In a possible implementation, the attribute information includes item attributes, and the item attributes include at least one of the following:

Item name, developer, installation package size, category, and rating.

In a third aspect, embodiments of the present application provide a data processing device, which may include a memory, a processor, and a bus system. The memory is used to store programs, and the processor is used to execute programs in the memory to perform the above-mentioned first aspect. Any optional method.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium that stores a computer program that, when run on a computer, causes the computer to execute the above-mentioned first aspect and any optional Methods.

In a fifth aspect, embodiments of the present application provide a computer program product, which includes code, and when the code is executed, is used to implement the above first aspect and any optional method.

In a sixth aspect, the present application provides a chip system, which includes a processor to support an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods; Or, information. In a possible design, the chip system also includes a memory, which is used to store necessary program instructions and data for executing the device or training the device. The chip system may be composed of chips, or may include chips and other discrete devices.

The embodiment of the present application provides a data processing method, which method includes: predicting the user's first operation information on the item through the first recommendation model based on the first training sample; the first training sample is the attributes of the user and the item Information, the first operation information and the second operation information are used to determine the first loss; the second operation information includes information obtained according to the user's operation log; the first loss is used to update the first recommendation model; according to the Two training samples are used to predict the user's third operation information and fourth operation information on the item through the second recommendation model and the updated first recommendation model respectively; the second training sample is the attribute information of the user and the item, The first recommendation model and the second recommendation model are ranking models at different stages in the multi-stage cascade recommendation system. The third operation information and the fourth operation information are used to determine the second loss; the second loss is used to update The updated No. 1 recommended model. Compared with the existing technology, the recommendation model at each stage only focuses on the training of the current stage, and cannot fit the data in the inference space during training, so it has poor prediction ability. The present invention adopts a joint training model, allowing each stage model to focus on fitting the data of its own stage, while using the upstream and downstream stages to assist training, thereby improving the prediction effect. In addition, the multi-stage joint optimization proposed in the embodiments of this application is implemented in the form of data exchange between different models without changing the training process of each model. Therefore, it is more suitable for the deployment of industrial systems and achieves better prediction results. .

Description of drawings

Figure 1 is a structural schematic diagram of the main framework of artificial intelligence;

Figure 2 is a schematic diagram of a system architecture provided by an embodiment of the present application;

Figure 3 is a schematic diagram of an information recommendation process provided by an embodiment of the present application;

Figure 4 is a schematic flow chart of a data processing method provided by an embodiment of the present application;

Figure 5 is a schematic flow chart of model training provided by an embodiment of the present application;

Figure 6 is a schematic diagram of a data processing device provided by an embodiment of the present application;

Figure 7 is a schematic diagram of an execution device provided by an embodiment of the present application;

Figure 8 is a schematic diagram of a training device provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a chip provided by an embodiment of the present application.

Detailed ways

The embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention. The terms used in the embodiments of the present invention are only used to explain specific embodiments of the present invention and are not intended to limit the present invention.

The embodiments of the present application are described below with reference to the accompanying drawings. Persons of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.

The terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances, and are merely a way of distinguishing objects with the same attributes in describing the embodiments of the present application. Furthermore, the terms "include" and "having" and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, product or apparatus comprising a series of elements need not be limited to those elements, but may include not explicitly other elements specifically listed or inherent to such processes, methods, products or equipment.

First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1. Figure 1 shows a structural schematic diagram of the artificial intelligence main framework. The following is from the "intelligent information chain" (horizontal axis) and "IT value chain" ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.

(1)Infrastructure

Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms. Communicate with the outside through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA, etc.); the basic platform includes distributed computing framework and network and other related platform guarantees and support, which can include cloud storage and Computing, interconnection networks, etc. For example, sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.

(2)Data

Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3)Data processing

Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.

Among them, machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.

Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formalized information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.

Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.

(4) General ability

After the data is processed as mentioned above, some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.

(5) Intelligent products and industry applications

Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart cities, etc.

Embodiments of the present application can be applied to the field of information recommendation. Specifically, they can be applied to application markets, music playback recommendations, video playback recommendations, reading recommendations, news information recommendations, and information recommendations in web pages. This application can be applied to a recommendation system. The recommendation system can determine the recommended objects based on the recommendation model obtained by the data processing method provided by this application. The recommended objects can be, for example, but are not limited to applications (APPs), audio and video, web pages, and news. Information and other items.

In recommendation systems, information recommendation can include processes such as prediction and recommendation. Among them, prediction needs to solve the problem of predicting the user's preference for each item, which can be reflected by the probability of the user selecting the item. Recommendation can be to sort the recommended objects according to the predicted results, for example, according to the predicted degree of preference, sort the objects in order from high to low degree of preference, and recommend information to the user based on the sorting results.

For example, in the application market scenario, the recommendation system can recommend applications to users based on the sorting results. In the music recommendation scenario, the recommendation system can recommend music to users based on the sorting results. In the video recommendation scenario, , the recommendation system can recommend videos to users based on the sorting results.

Next, the application architecture of the embodiment of this application is introduced.

The system architecture provided by the embodiment of the present application will be introduced in detail below with reference to Figure 2. Figure 2 is a schematic diagram of the system architecture provided by an embodiment of the present application. As shown in Figure 2, the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550 and a data collection system 560.

The execution device 510 includes a computing module 511, an I/O interface 512, a preprocessing module 513 and a preprocessing module 514. The target model/rule 501 may be included in the calculation module 511, and the preprocessing module 513 and the preprocessing module 514 are optional.

Data collection device 560 is used to collect training samples. In this embodiment of the present application, the training sample may be the user's historical operation record, which may be the user's behavior logs (logs). The historical operation record may include the user's operation information on items, where the operation information may be Including operation type, user identification, item identification. When the item is an e-commerce product, the operation type may include but is not limited to click, purchase, return, add to shopping cart, etc. When the item is an application, the operation type may include but not limited to click, purchase, return, add to shopping cart, etc. Not limited to clicks, downloads, etc., the training samples are the data used to train the initialized recommendation model. After collecting the training samples, the data collection device 560 stores the training samples into the database 530 .

The training device 520 can train the initialized recommendation model based on the training samples maintained in the database 530 to obtain the target model/rule 501. In the embodiment of this application, the target model/rule 501 can be a multi-stage ranking model. The multi-stage ranking model can predict the user's operation information for the item based on the user and item information. The operation information can be used for information recommendation.

It should be noted that in actual applications, the training samples maintained in the database 530 are not necessarily collected by the data collection device 560 , and may also be received from other devices, or based on the data collected by the data collection device 560 . Obtained by data expansion (for example, the second operation type of the target user on the first item in the embodiment of the present application). In addition, it should be noted that the training device 520 may not necessarily train the target model/rules 501 based entirely on the training samples maintained by the database 530. It may also obtain training samples from the cloud or other places for model training. The above description should not be used as a guarantee for this application. Limitations of Examples.

The target model/rules 501 trained according to the training device 520 can be applied to different systems or devices, such as to the execution device 510 shown in Figure 2. The execution device 510 can be a terminal, such as a mobile phone terminal, a tablet computer, Laptops, augmented reality (AR)/virtual reality (VR) devices, vehicle-mounted terminals, etc., or servers or clouds, etc.

In FIG. 2 , the execution device 510 is configured with an input/output (I/O) interface 512 for data interaction with external devices. The user can input data to the I/O interface 512 through the client device 540 .

The preprocessing module 513 and the preprocessing module 514 are used to perform preprocessing according to the input data received by the I/O interface 512. It should be understood that there may be no preprocessing module 513 and 514 or only one preprocessing module. When the preprocessing module 513 and the preprocessing module 514 do not exist, the computing module 511 can be directly used to process the input data.

When the execution device 510 preprocesses input data, or when the calculation module 511 of the execution device 510 performs calculations and other related processes, the execution device 510 can call data, codes, etc. in the data storage system 550 for corresponding processing. , the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 550.

Finally, the I/O interface 512 presents the processing results to the client device 540, thereby providing them to the user.

In the embodiment of the present application, the execution device 510 may include hardware circuits (such as application specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), general-purpose processors, digital signal processors (digital signal processing, DSP, microprocessor or microcontroller, etc.), or a combination of these hardware circuits. For example, the execution device 510 can be a hardware system with the function of executing instructions, such as a CPU, DSP, etc., or it can be a combination of other hardware circuits. A hardware system with the function of executing instructions, such as ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without the function of executing instructions and a hardware system with the function of executing instructions.

It should be understood that the execution device 510 can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. Some steps of the data processing method provided by the embodiment of the present application can also be implemented by the execution device 510 that does not have the function of executing instructions. The hardware system to realize the function is not limited here.

In the situation shown in FIG. 2 , the user can manually set input data, and the "manually given input data" can be operated through the interface provided by the I/O interface 512 . In another case, the client device 540 can automatically send input data to the I/O interface 512. If requiring the client device 540 to automatically send the input data requires the user's authorization, the user can set corresponding permissions in the client device 540. The user can view the results output by the execution device 510 on the client device 540, and the specific presentation form may be display, sound, action, etc. The client device 540 can also be used as a data collection terminal to collect the input data of the input I/O interface 512 and the output results of the output I/O interface 512 as new sample data, and store them in the database 530. Of course, it is also possible to collect without going through the client device 540. Instead, the I/O interface 512 directly uses the input data input to the I/O interface 512 and the output result of the output I/O interface 512 as a new sample as shown in the figure. The data is stored in database 530.

It is worth noting that Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application. The positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 2, the data The storage system 550 is an external memory relative to the execution device 510. In other cases, the data storage system 550 can also be placed in the execution device 510. It should be understood that the above execution device 510 may be deployed in the client device 540.

Since the embodiments of the present application involve the application of a large number of neural networks, in order to facilitate understanding, the relevant terms involved in the embodiments of the present application and related concepts such as neural networks are first introduced below.

1. Click-throughrate (CTR)

Click probability, also known as click-through rate, refers to the ratio of the number of clicks and the number of exposures to recommended information (for example, recommended items) on a website or application. Click-through rate is usually an important indicator for measuring recommendation systems in recommendation systems.

2. Personalized recommendation system

A personalized recommendation system refers to a system that uses machine learning algorithms to analyze based on the user's historical data (such as the operation information in the embodiment of this application), and uses this to predict new requests and provide personalized recommendation results.

3. Offline training (offline training)

Offline training refers to a module in the personalized recommendation system that iteratively updates the recommendation model parameters according to the machine learning algorithm based on the user's historical data (such as the operation information in the embodiments of this application) until the set requirements are met.

4. Online prediction (onlineinference)

Online prediction refers to predicting the user's preference for recommended items in the current context based on the characteristics of users, items and context based on offline trained models, and predicting the probability of users choosing recommended items.

For example, FIG. 3 is a schematic diagram of a recommendation system provided by an embodiment of the present application. As shown in Figure 3, when a user enters the system, a recommendation request will be triggered. The recommendation system will input the request and its related information (such as the operation information in the embodiment of this application) into the recommendation model, and then predict the user's response to the system. The selection rate of items within. Furthermore, the items are arranged in descending order according to the predicted selection rate or a function based on the selection rate, that is, the recommendation system can display the items in different locations in order as a recommendation result to the user. Users browse different located items and perform user actions such as browsing, selection, and downloading. At the same time, the user's actual behavior will be stored in the log as training data, and the parameters of the recommended model will be continuously updated through the offline training module to improve the prediction effect of the model.

For example, when a user opens the application market in a smart terminal (for example, a mobile phone), the recommendation system in the application market can be triggered. The recommendation system of the application market will be based on the user's historical behavior logs, such as the user's historical download records, user selection records, and the application market's own characteristics. Characteristics, such as time, location and other environmental feature information, are used to predict the probability of users downloading each recommended candidate APP. Based on the calculation results, the recommendation system of the application market can display the candidate APPs in descending order according to the predicted probability value, thereby increasing the download probability of the candidate APPs.

For example, APPs with a higher predicted user selection rate may be displayed in the front recommendation position, and APPs with a lower predicted user selection rate may be displayed in the lower recommendation position.

5. Multi-stage cascade sorting system

The multi-stage cascade sorting system can also be called a multi-stage sorting system in the embodiment of this application. Due to the large number of items in the commercial system, and the user request response time needs to be strictly controlled within tens of milliseconds, the current stage of commercial sorting The system is generally divided into multiple cascaded independent sorting systems. The output of the upstream system is used as the input of the downstream system, thereby filtering layer by layer, reducing the scale of scored items at each stage, and taking into account the final prediction effect and response delay.

The above recommendation model may be a neural network model. The relevant terms and concepts of neural networks that may be involved in the embodiments of this application are introduced below.

(1)Neural network

The neural network can be composed of neural units. The neural unit can refer to an operation unit that takes xs (ie, input data) and intercept 1 as input. The output of the operation unit can be:

Among them, s=1, 2,...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.

(2) Deep neural network

Deep Neural Network (DNN), also known as multi-layer neural network, can be understood as a neural network with many hidden layers. There is no special metric for "many" here. From the division of DNN according to the position of different layers, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN looks very complicated, the work of each layer is actually not complicated. Simply put, it is the following linear relationship expression: in, is the input vector, is the output vector, is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just a pair of input vectors After such a simple operation, the output vector is obtained Since there are many DNN layers, the coefficient W and offset vector The number is also very large. The definitions of these parameters in DNN are as follows: Taking the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4. The summary is: the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as It should be noted that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of describing complex situations in the real world. Theoretically, a model with more parameters has higher complexity and greater "capacity", which means it can complete more complex learning tasks. Training a deep neural network is the process of learning the weight matrix. The ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by the vectors W of many layers).

(3)Loss function

In the process of training a deep neural network, because we hope that the output of the deep neural network is as close as possible to the value that we really want to predict, we can compare the predicted value of the current network with the really desired target value, and then based on the difference between the two to update the weight vector of each layer of the neural network according to the difference (of course, there is usually an initialization process before the first update, that is, preconfiguring parameters for each layer in the deep neural network). For example, if the predicted value of the network If it is high, adjust the weight vector to make its prediction lower, and continue to adjust until the depth is amazing. The network can predict the truly desired target value or a value that is very close to the truly desired target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value". This is the loss function (loss function) or objective function (objective function), which is used to measure the difference between the predicted value and the target value. Important equations. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible.

(4)Back propagation algorithm

The error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial model during the training process, so that the error loss of the model becomes smaller and smaller. Specifically, forward propagation of the input signal until the output will produce an error loss, and backward propagation of the error loss information is used to update the parameters in the initial model, so that the error loss converges. The backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain optimal model parameters, such as weight matrices.

The common multi-stage cascade sorting system in the industry includes subsystems for multiple stages of recall, rough sorting, fine sorting and rearrangement. Among them, the recall system in the earliest stage needs to score tens of thousands of items each time a user requests it, while the rough sorting and fine sorting stages only need to score thousands or hundreds of items, and the rearrangement stage closest to the user even only needs to score Consider the scoring problem of dozens of items. Therefore, the complexity of the models in different stages increases from front to back. Models in the early stages are generally relatively simple, while models in the later stages are very complex. Through this multi-stage cascade sorting system, the prediction effect and prediction delay can be effectively weighed, thereby bringing a good experience to users.

Independently training each subsystem in the multi-stage cascade sorting system is the mainstream method in the industry at this stage. Independently train a machine learning model for different stages of recall, rough sorting, fine sorting and rearrangement, and use the trained model separately Deployed to each stage for service. The advantage of the multi-stage independent training system is that models at different stages are independently trained and deployed, so the operation is simple. At the same time, it is convenient to deploy models suitable for corresponding complexity and prediction capabilities at different stages.

Next, the data processing method provided by the embodiment of the present application will be described by taking the model training stage as an example.

Referring to Figure 4, Figure 4 is a schematic diagram of an embodiment of a data processing method provided by an embodiment of the present application. As shown in Figure 4, a data processing method provided by an embodiment of the present application includes:

401. According to the first training sample, predict the user's first operation information on the item through the first recommendation model; the first training sample is the attribute information of the user and the item, the first operation information and the second operation The information is used to determine the first loss; the second operation information includes information obtained according to the user's operation log; the first loss is used to update the first recommendation model.

In a possible implementation, the execution subject of step 401 may be a terminal device, and the terminal device may be a portable mobile device, such as but not limited to a mobile or portable computing device (such as a smart phone), a personal computer, a server computer, a handheld device (e.g., tablets) or laptop devices, multiprocessor systems, gaming consoles or controllers, microprocessor-based systems, set-top boxes, programmable consumer electronics, mobile phones, devices with wearable or accessory form factors (e.g., watches, glasses, headsets or earbuds), network PCs, minicomputers, mainframe computers, distributed computing environments including any of the above systems or devices etc.

In a possible implementation, the execution subject of step 401 may be a server on the cloud side.

In a possible implementation, the first recommendation model and the second recommendation model can be two ranking models in a multi-stage ranking system. The multi-stage ranking system is divided into multiple cascaded independent recommendation models. The upstream recommendation model The output is used as the input of the downstream system (each recommendation model can predict the user's operation of each item based on the attribute information of the user and the item. The prediction results can be used to filter items, and the downstream recommendation model can be based on the user and filter. The information of the items after the filtering is used to predict the user's operation on each filtered item), thereby filtering layer by layer, reducing the scale of scored items at each stage, and taking into account the final prediction effect and response delay.

In a possible implementation, the first recommendation model may be a rough ranking model, and the second recommendation model may be a fine ranking model; or, the first recommendation model may be a recall model, and the second recommendation model may be a recall model. is a fine ranking model; or the first recommendation model is a recall model, and the second recommendation model is a rough ranking model; or the first recommendation model is a fine ranking model, and the second recommendation model is a rearrangement model. model; or, the first recommendation model is a coarse ranking model, and the second recommendation model is a rearrangement model; or, the first recommendation model is a recall model, and the second recommendation model is a rearrangement model.

In one possible implementation, when performing model inference, the operation information output by the converged first recommendation model is used to screen items, and the converged second recommendation model is used to predict the user's response to the screened items. Operational information for some or all of the items.

In one possible implementation, the converged second recommendation model is used to predict the user's operation information for all items in the filtered items (for example, the first recommendation model is a rough ranking model, and the second recommendation model is a fine ranking model. platoon model).

In a possible implementation, the converged second recommendation model is used to predict the user's operation information for some of the filtered items (for example, the first recommendation model is a rough ranking model, and the second recommendation model is a heavy ranking model. Ranking model, based on the prediction results obtained by the first recommendation model, one-time item screening can be performed, the fine ranking model needs to perform further screening, and the second recommendation model can make predictions based on the items screened by the fine ranking model).

In a possible implementation, the complexity of the second recommendation model is greater than the complexity of the first recommendation model; the complexity is related to at least one of the following: the number of parameters included in the model, the number of parameters included in the model, The depth of the network layer, the width of the network layers included in the model, and the number of feature dimensions of the input data.

When training the first recommendation model, the first training sample can be processed according to the first recommendation model, that is, the first operation information of the item by the user is predicted through the first recommendation model; the first training sample Attribute information for users and items.

The user's attribute information may be attributes related to the user's preference characteristics, including at least one of gender, age, occupation, income, hobbies, and educational level. The gender may be male or female, and the age may be 0-100. The number between them, the profession can be teachers, programmers, chefs, etc., the hobbies can be basketball, tennis, running, etc., and the education level can be elementary school, junior high school, high school, university, etc.; this application does not limit the target users The specific type of attribute information.

Among them, the second operation information can be used as the ground truth when training the first recommendation model. The items in the first training sample can include exposed items (that is, items that have been presented to the user) and unexposed items ( That is, items that have not yet been presented to the user). For exposed items, the first recommendation model can predict the user's operation information for the exposed items. Correspondingly, the second operation information is used as the This part of the information about the true value of the user's operation information on the exposed items can be obtained based on the interaction records between the user and the items (such as the user's operation log). The behavior log can include the user's real operation records on each item.

In a possible implementation, the first training sample includes attribute information of the user, exposed items, and unexposed items, and the second operation information includes the user's predicted operation information on the unexposed items, and the user's predicted operation information on the unexposed items. The actual operation information of the exposed items is obtained according to the user's operation log.

For unexposed items, the first recommendation model can predict the user's operation information for unexposed items. Correspondingly, the part of the second operation information that is the true value of the user's operation information for unexposed items can be predicted (also It is the prediction operation information). Optionally, the predicted operation information indicates that the user has not performed any operation on the unexposed items (that is, the unexposed samples are regarded as negatively correlated samples), or is obtained through other prediction models.

Wherein, the first operation information and the second operation information are used to determine the first loss; the first loss can be used to update the first recommendation model.

The above training based on real operation logs can be called a self-learning flow. In the self-learning flow, the label Y corresponding to the exposed sample in the training data of the self-learning flow can be provided by real user behavior. If it is an unexposed sample, it can as a negative correlation sample. Therefore, the training loss function can remain the same as in the independent training stage, using the cross-entropy loss function for training. The self-learning flow aims to use the data generated in the previous stage to learn and fit on its own and improve the prediction ability of the scoring data in the current stage. The loss function of the self-learning flow can be:

The above formula is the cross-entropy loss function of the i-th stage model, which is a common binary classification loss function in the field of click-through rate prediction, where R _i (x _j ) is the prediction score of the i-th stage model for the j-th sample, y _j is the true label of this sample.

The first recommendation model can be trained iteratively for multiple times in the above manner to obtain the trained first recommendation model.

Similarly, in the self-learning flow, the second recommendation model can be trained. Specifically, the first training sample is the attribute information of the user and N items, and the first operation information is the user's response to the Operation information of N items, the first operation information is used to filter N1 items from the N items; the user and the attribute information of some or all of the N1 items can be used to filter through the third A recommendation model predicts the user's fifth operation information for some or all of the N1 items; the fifth operation information and the sixth operation information are used to determine the third loss, and the sixth operation information includes Information obtained from the user's operation log; the third loss is used to update the third recommendation model to obtain the second recommendation model.

402. According to the second training sample, predict the third operation information and the fourth operation information of the item by the user through the second recommendation model and the updated first recommendation model respectively; the second training sample is the attribute information of users and items, the first recommendation model and the second recommendation model are ranking models of different stages in a multi-stage cascade recommendation system, and the third operation information and the fourth operation information are used to A second loss is determined; the second loss is used to update the updated first recommendation model.

In a possible implementation, the user's third operation information on the item can be predicted through the second recommendation model based on the second training sample, and the updated first recommendation model can be used to predict the user's third operation information based on the second training sample. Describes the fourth operation information of the user on the item.

In one possible implementation, after completing the first stage of the self-learning flow, the tutor-coaching flow can be trained. Specifically, the label Y corresponding to the training data of the tutor-coaching flow is provided by the model in the subsequent stage. At this time, The sequential stage model (relatively complex model) plays the role of a teacher, passing interactive information to the current stage model (relatively simple model) in this way.

Specifically, the updated first recommendation model obtained through the self-learning flow can process the second training sample to obtain the fourth operation information, which serves as the supervision signal of the third operation information (that is, the true value of the second training sample) , can be obtained by prediction as a higher-order recommendation model (that is, based on the second training sample, the user's third operation information on the item is predicted by the second recommendation model). In the process of training the low-order recommendation model, the guidance of the fine ranking model is added, and the interactive information between different stages is used without changing Better performance can be achieved by modifying the system architecture or sacrificing inference efficiency.

Since the post-order model provides a soft label, the training loss function can be composed of two parts. For example, as shown in the following formula, mse loss is the predicted value of the post-order model for point-to-point learning; and ranking loss is the list of preferences for the post-order model (composed of top K top-ranked candidate items).

The above formula consists of two parts, namely L _ranking and L _mse . L _mse is a common loss function for regression tasks, so that the score R _i (x _j ) of the i-th stage model for the sample is close to the score of the i+1-th stage model. R _i+1 (x _j ); L _ranking is a list loss function for learning post-order model preferences. For each request q, maximize the average score of the K _i items that win in the current stage. and the average score of the eliminated (K _i-1 -K _i ) items the distance between.

Taking the multi-stage sorting system including the four stages of recall, rough sorting, fine sorting and rearrangement as an example, the flow chart of a data processing method in the embodiment of this application is introduced:

First, the models for each of the 4 stages are trained independently, and the model for each stage is trained on the original data set using a loss function (such as the cross-entropy loss function).

Repeat the following joint training stages until model performance converges in the rearrangement stage (the last stage):

a) Generate training data

b) For each stage (stage 1-4) model, train through self-learning flow

c) Regenerate training data X for each stage model, and soft label Y is generated by the next stage model

d) For each stage (stage 1-3), the model is trained through the mentor coaching stream.

Referring to Figure 5, Figure 5 is a schematic diagram of a training process of the multi-stage ranking model in the embodiment of the present application:

The whole process can be divided into two stages: independent training and joint training.

In the independent training phase (Phase I), the model in each phase is trained on the original exposure data set using a loss function (such as the cross-entropy loss function). The independent training process is essentially a model warm-up stage, which enables both upstream and downstream models to have basic sorting capabilities. This process is consistent with the traditional process of independent training of multi-stage systems, as shown in the leftmost subfigure in Figure 5.

In the joint training phase (Phase II), the first step is to generate data X (excluding label Y) for each stage that is suitable for the current stage. The data X of each stage is generated by the model of the previous stage. According to the characteristics of the cascade system, the data In the first stage, since there is no preceding stage, the data X and independent training stages remain the same. Then, according to the different labels Y, two different streams are designed for iterative joint training: self-learning stream and tutor-coaching stream.

Self-learning flow (self-learning): The label Y corresponding to the training data The light gray data flow is shown.

Tutor-learning flow: The label Y corresponding to the training data

Next, the beneficial effects of the embodiments of this application are introduced through experiments:

Offline experiments were conducted on three public datasets:

The following are experimental results on recommendation and search tasks:

Table 1: Performance on ML-1M.

Table 2: Performance on TianGong-ST.

Table 3: Performance on Tmall.

Here are the results on the advertising task:

Table 4: Ads Performance on TianGong-ST(w/Bid).

Table 5: Ads Performance on Tmall(w/Bid).

After experiments, it can be seen from the results on several different tasks that compared with the independent training method (Independent) and the combined training method (ICC) in the industry, various indicators of the present invention (RankFlow) have been significantly improved, and can be compared with different training methods. Different models of stages want to be combined and have good compatibility.

An embodiment of the present application provides a data processing method, which method includes: predicting the user's first operation information on items through a first recommendation model based on a first training sample; the first training sample is a user and a Attribute information of the item, the first operation information and the second operation information are used to determine the first loss; the second operation information includes information obtained according to the user's operation log; the first loss is used to update the The first recommendation model; according to the second training sample, predict the third operation information and the fourth operation information of the item by the user through the second recommendation model and the updated first recommendation model respectively; The second training sample is the user and Attribute information of items, the first recommendation model and the second recommendation model are ranking models at different stages in a multi-stage cascade recommendation system, and the third operation information and the fourth operation information are used to determine the second Loss; the second loss is used to update the updated first recommendation model. Compared with the existing technology, the recommendation model at each stage only focuses on the training of the current stage, and cannot fit the data in the inference space during training, so it has poor prediction ability. The present invention adopts a joint training model, allowing each stage model to focus on fitting the data of its own stage, while using the upstream and downstream stages to assist training, thereby improving the prediction effect. In addition, the multi-stage joint optimization proposed in the embodiments of this application is implemented in the form of data exchange between different models without changing the training process of each model. Therefore, it is more suitable for the deployment of industrial systems and achieves better prediction results. .

Referring to Figure 6, Figure 6 shows a data processing device 600 provided by an embodiment of the present application. The device includes:

The first prediction module 601 is used to predict the user's first operation information on the item through the first recommendation model according to the first training sample; the first training sample is the attribute information of the user and the item, and the first The operation information and the second operation information are used to determine the first loss; the second operation information includes information obtained according to the user's operation log; the first loss is used to update the first recommendation model;

For a specific description of the first prediction module 601, reference may be made to the description of step 401 in the above embodiment, which will not be described again here.

The second prediction module 602 is configured to predict the third operation information and the fourth operation information of the user on the item through the second recommendation model and the updated first recommendation model respectively according to the second training sample; The second training sample is attribute information of users and items, the first recommendation model and the second recommendation model are ranking models at different stages in a multi-stage cascade recommendation system, and the third operation information and the The fourth operation information is used to determine a second loss; the second loss is used to update the updated first recommendation model.

For a specific description of the second prediction module 602, reference may be made to the description of step 402 in the above embodiment, which will not be described again here.

In a possible implementation, the first training sample includes attribute information of the user, exposed items, and unexposed items, and the second operation information includes the user's predicted operation information on the unexposed items, and the user's predicted operation information on the unexposed items. The actual operation information of the exposed items, the actual operation information is obtained according to the user's operation log; or,

In a possible implementation, the predicted operation information indicates that the user has not performed any operation on the unexposed items.

In a possible implementation, the first training sample is the attribute information of the user and items, including: the first training sample is the attribute information of the user and N items, and the first operation information is the user For the operation information of the N items, the first operation information is used to filter N1 items from the N items;

The device also includes:

A third prediction module, configured to predict the user's fifth preference for some or all of the N1 items through a third recommendation model based on the attribute information of the user and some or all of the N1 items. Operation information; the fifth operation information and the sixth operation information are used to determine the third loss, the sixth operation information includes information obtained according to the user's operation log; the third loss is used to update the third loss three recommendation models to obtain the second recommendation model.

The first recommendation model is a recall model, and the second recommendation model is a fine ranking model; or,

The first recommendation model is a recall model, and the second recommendation model is a coarse ranking model; or,

Gender, age, occupation, income, hobbies, education level.

Item name, developer, installation package size, category, and rating.

Next, an execution device provided by an embodiment of the present application is introduced. Please refer to Figure 7. Figure 7 is a schematic structural diagram of an execution device provided by an embodiment of the present application. The execution device 700 can be embodied as a mobile phone, a tablet, a notebook computer, Smart wearable devices, servers, etc. are not limited here. The data processing device described in the corresponding embodiment of FIG. 6 may be deployed on the execution device 700 to implement the data processing function in the corresponding embodiment of FIG. 4 . Specifically, the execution device 700 includes: a receiver 701, a transmitter 702, a processor 703, and a memory 704 (the number of processors 703 in the execution device 700 may be one or more), where the processor 703 may include application processing processor 7031 and communication processor 7032. In some embodiments of the present application, the receiver 701, the transmitter 702, the processor 703, and the memory 704 may be connected through a bus or other means.

Memory 704 may include read-only memory and random access memory and provides instructions and data to processor 703 . A portion of memory 704 may also include non-volatile random access memory (NVRAM). The memory 704 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.

Processor 703 controls execution of operations of the device. In specific applications, various components of the execution device are coupled together through a bus system. In addition to the data bus, the bus system may also include a power bus, a control bus, a status signal bus, etc. However, for the sake of clarity, various buses are called bus systems in the figure.

The methods disclosed in the above embodiments of the present application can be applied to the processor 703 or implemented by the processor 703 . The processor 703 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 703 . The above-mentioned processor 703 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, a vision processing unit (VPU), or a tensor processing unit. Unit, TPU) and other processors suitable for AI computing, may further include application specific integrated circuits (ASICs), field-programmable gate arrays (field-programmable gate arrays, FPGAs) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components. The processor 703 can implement or execute each method, step and logical block diagram disclosed in the embodiment of this application. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 704. The processor 703 reads the information in the memory 704 and completes steps 401 to 402 in the above embodiment in combination with its hardware.

The receiver 701 may be configured to receive input numeric or character information and generate signal inputs related to performing relevant settings and functional controls of the device. The transmitter 702 can be used to output numeric or character information through the first interface; the transmitter 702 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 702 can also include a display device such as a display screen .

The embodiment of the present application also provides a training device. Please refer to Figure 8. Figure 8 is a schematic structural diagram of the training device provided by the embodiment of the present application. Specifically, the training device 800 is implemented by one or more servers. The training device 800 There may be relatively large differences due to different configurations or performance, and may include one or more central processing units (CPU) 88 (for example, one or more processors) and memory 832, one or more storage applications Storage medium 830 for program 842 or data 844 (eg, one or more mass storage devices). Among them, the memory 832 and the storage medium 830 may be short-term storage or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device. Furthermore, the central processor 88 may be configured to communicate with the storage medium 830 and execute a series of instruction operations in the storage medium 830 on the training device 800 .

Training device 800 may also include one or more power supplies 826, one or more wired or wireless network interfaces 850, a One or more input and output interfaces 858; or, one or more operating systems 841, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

Specifically, the training device can perform steps 401 to 402 in the above embodiment.

An embodiment of the present application also provides a computer program product that, when run on a computer, causes the computer to perform the steps performed by the foregoing execution device, or causes the computer to perform the steps performed by the foregoing training device.

Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a program for performing signal processing. When the program is run on a computer, it causes the computer to perform the steps performed by the aforementioned execution device. , or, causing the computer to perform the steps performed by the aforementioned training device.

The execution device, training device or terminal device provided by the embodiment of the present application may specifically be a chip. The chip includes: a processing unit and a communication unit. The processing unit may be, for example, a processor. The communication unit may be, for example, an input/output interface. Pins or circuits, etc. The processing unit can execute the computer execution instructions stored in the storage unit, so that the chip in the execution device executes the data processing method described in the above embodiment, or so that the chip in the training device executes the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.

Specifically, please refer to Figure 9. Figure 9 is a schematic structural diagram of a chip provided by an embodiment of the present application. The chip can be represented as a neural network processor NPU 900. The NPU 900 serves as a co-processor and is mounted to the host CPU. ), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 903. The arithmetic circuit 903 is controlled by the controller 904 to extract the matrix data in the memory and perform multiplication operations.

NPU 900 can implement the data processing method provided in the embodiment described in Figure 4 through the cooperation between various internal devices.

More specifically, in some implementations, the computing circuit 903 in the NPU 900 includes multiple processing units (Process Engine, PE). In some implementations, arithmetic circuit 903 is a two-dimensional systolic array. The arithmetic circuit 903 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 903 is a general-purpose matrix processor.

For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit obtains the corresponding data of matrix B from the weight memory 902 and caches it on each PE in the arithmetic circuit. The operation circuit takes matrix A data and matrix B from the input memory 901 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator (accumulator) 908 .

The unified memory 906 is used to store input data and output data. The weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 905, and the DMAC is transferred to the weight memory 902. The input data is also transferred to unified memory 906 via DMAC.

BIU is the Bus Interface Unit, that is, the bus interface unit 910, which is used for the interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 909.

The bus interface unit 910 (Bus Interface Unit, BIU for short) is used to fetch the memory 909 to obtain instructions from the external memory, and is also used for the storage unit access controller 905 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 906 or the weight data to the weight memory 902 or the input data to the input memory 901 .

The vector calculation unit 907 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit 903, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. Mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.

In some implementations, vector calculation unit 907 can store the processed output vectors to unified memory 906 . For example, the vector calculation unit 907 can apply a linear function; or a nonlinear function to the output of the operation circuit 903, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value. In some implementations, vector calculation unit 907 generates normalized values, pixel-wise summed values, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 903, such as for use in a subsequent layer in a neural network.

The instruction fetch buffer 909 connected to the controller 904 is used to store instructions used by the controller 904;

The unified memory 906, the input memory 901, the weight memory 902 and the fetch memory 909 are all On-Chip memories. External memory is private to the NPU hardware architecture.

The processor mentioned in any of the above places can be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the above programs.

In addition, it should be noted that the device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate. The physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the device embodiments provided in this application, the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art can clearly understand that the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, training device, or network device, etc.) to execute the steps described in various embodiments of this application. method.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, the computer instructions may be transferred from a website, computer, training device, or data The center transmits to another website site, computer, training equipment or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store, or a data storage device such as a training device or a data center integrated with one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Claims

A data processing method, characterized in that the method includes:

According to the first training sample, the first operation information of the item by the user is predicted through the first recommendation model; the first training sample is the attribute information of the user and the item, and the first operation information and the second operation information are To determine the first loss; the second operation information includes information obtained according to the user's operation log; the first loss is used to update the first recommendation model;

According to the second training sample, the third operation information and the fourth operation information of the item by the user are predicted through the second recommendation model and the updated first recommendation model respectively; the second training sample is the user and attribute information of items, the first recommendation model and the second recommendation model are ranking models at different stages in a multi-stage cascade recommendation system, and the third operation information and the fourth operation information are used to determine the third Two losses; the second loss is used to update the updated first recommendation model.
The method according to claim 1, characterized in that when performing model inference, the operation information output by the converged first recommendation model is used to screen items, and the converged second recommendation model is used for Predict user operation information for some or all of the filtered items.
The method according to claim 1 or 2, characterized in that the complexity of the second recommendation model is greater than the complexity of the first recommendation model; the complexity is related to at least one of the following:

The number of parameters included in the model, the depth of the network layers included in the model, the width of the network layers included in the model, and the number of feature dimensions of the input data.
The method according to any one of claims 1 to 3, characterized in that,

The first training sample includes attribute information of the user, exposed items, and unexposed items, and the second operation information includes the user's predicted operation information on the unexposed items, and the user's actual operation information on the exposed items, The actual operation information is obtained based on the user's operation log; or,

The second training sample is attribute information of users, exposed items, and unexposed items.
The method of claim 4, wherein the predicted operation information indicates that the user has not performed any operation on the unexposed items.
The method according to any one of claims 1 to 5, characterized in that the first training sample is attribute information of a user and items, including: the first training sample is attribute information of a user and N items, so The first operation information is the user's operation information on the N items, and the first operation information is used to filter N1 items from the N items;

The method also includes:

According to the attribute information of the user and some or all of the N1 items, the fifth operation information of the user on some or all of the N1 items is predicted through a third recommendation model; the fifth The operation information and the sixth operation information are used to determine the third loss. The sixth operation information includes information obtained according to the user's operation log; the third loss is used to update the third recommendation model to obtain the Describe the second recommendation model.
The method according to any one of claims 1 to 6, characterized in that,

The first recommendation model is a rough ranking model, and the second recommendation model is a fine ranking model; or,

The first recommendation model is a recall model, and the second recommendation model is a fine ranking model; or,

The first recommendation model is a recall model, and the second recommendation model is a coarse ranking model; or,

The first recommendation model is a fine ranking model, and the second recommendation model is a rearrangement model; or,

The first recommendation model is a rough ranking model, and the second recommendation model is a rearrangement model; or,

The first recommendation model is a recall model, and the second recommendation model is a rearrangement model.
The method according to any one of claims 1 to 7, characterized in that the attribute information includes user attributes, and the user attributes include at least one of the following:

Gender, age, occupation, income, hobbies, education level.
The method according to any one of claims 1 to 8, characterized in that the attribute information includes item attributes, and the item attributes include at least one of the following:

Item name, developer, installation package size, category, and rating.
A data processing device, characterized in that the device includes:

The first prediction module is used to predict the user's first operation information on the item based on the first training sample through the first recommendation model; the first training sample is the attribute information of the user and the item, and the first operation Information and second operation information are used to determine the first loss; the second operation information includes information obtained according to the user's operation log; the first loss is used to update the first recommendation model;

The second prediction module is configured to predict the third operation information and the fourth operation information of the user on the item through the second recommendation model and the updated first recommendation model respectively according to the second training sample; The second training sample is attribute information of users and items, the first recommendation model and the second recommendation model are ranking models at different stages in a multi-stage cascade recommendation system, and the third operation information and the third The four operation information is used to determine the second loss; the second loss is used to update the updated first recommendation model.
The device according to claim 10, wherein when performing model inference, the operation information output by the converged first recommendation model is used to screen items, and the converged second recommendation model is used for Predict user operation information for some or all of the filtered items.
The device according to claim 10 or 11, characterized in that the complexity of the second recommendation model is greater than the complexity of the first recommendation model; the complexity is related to at least one of the following:

The number of parameters included in the model, the depth of the network layers included in the model, the width of the network layers included in the model, and the number of feature dimensions of the input data.
The device according to any one of claims 10 to 12, characterized in that:

The first training sample includes attribute information of the user, exposed items, and unexposed items, and the second operation information includes the user's predicted operation information on the unexposed items, and the user's actual operation information on the exposed items, The actual operation information is obtained based on the user's operation log; or,

The second training sample is attribute information of users, exposed items, and unexposed items.
The device according to claim 13, wherein the predicted operation information indicates that the user has not performed any operation on the unexposed items.
The device according to any one of claims 10 to 14, wherein the first training sample is attribute information of a user and items, including: the first training sample is attribute information of a user and N items, so The first operation information is the user's operation information on the N items, and the first operation information is used to filter N1 items from the N items;

The device also includes:

A third prediction module, configured to predict the user's fifth preference for some or all of the N1 items through a third recommendation model based on the attribute information of the user and some or all of the N1 items. Operation information; the fifth operation information and the sixth operation information are used to determine the third loss, the sixth operation information includes information obtained according to the user's operation log; the third loss is used to update the third loss three recommendation models to obtain the second recommendation model.
The device according to any one of claims 10 to 15, characterized in that:

The first recommendation model is a rough ranking model, and the second recommendation model is a fine ranking model; or,

The first recommendation model is a recall model, and the second recommendation model is a fine ranking model; or,

The first recommendation model is a recall model, and the second recommendation model is a coarse ranking model; or,

The first recommendation model is a fine ranking model, and the second recommendation model is a rearrangement model; or,

The first recommendation model is a rough ranking model, and the second recommendation model is a rearrangement model; or,

The first recommendation model is a recall model, and the second recommendation model is a rearrangement model.
The device according to any one of claims 10 to 16, characterized in that the attribute information includes user attributes, and the user attributes include at least one of the following:

Gender, age, occupation, income, hobbies, education level.
The device according to any one of claims 10 to 17, wherein the attribute information includes item attributes, and the item attributes include at least one of the following:

Item name, developer, installation package size, category, and rating.
A computing device, characterized in that the computing device includes a memory and a processor; the memory stores code, and the processor is configured to obtain the code and execute as described in any one of claims 1 to 9 Methods.
A computer storage medium, characterized in that the computer storage medium stores one or more instructions, which when executed by one or more computers cause the one or more computers to implement any of claims 1 to 9. The method described in 1.
A computer program product comprising code, characterized in that when the code is executed, it is used to implement the method according to any one of claims 1 to 9.