CN117217284A

CN117217284A - Data processing method and device

Info

Publication number: CN117217284A
Application number: CN202310960411.4A
Authority: CN
Inventors: 颜番; 杜昭呈; 刘启东; 赵翔宇; 郭慧丰; 唐睿明
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-12-12

Abstract

A data processing method is applied to the field of artificial intelligence, and comprises the following steps: acquiring a first sequence and a second sequence of interaction behaviors of a user; according to the feature representation of the first sequence, the feature representation is subjected to noise adding through a noise adding module in the diffusion model, and the feature representation after noise adding is obtained; and according to the fusion result of the current step length and the second sequence and the characteristic representation after noise addition, predicting the noise of the current step length through a noise prediction module in the diffusion model to obtain a first noise prediction result, wherein the first noise prediction result is used for denoising the characteristic representation after noise addition. The application can apply the diffusion model obtained by training to the original sample, and starting from specific noise distribution (such as Gaussian noise), takes the original sample sequence as a guide to supplement the original sample sequence. The growth sequence generated by the method can be used for directly training a new sequence recommendation model and enhancing the effect of the recommendation model.

Description

Data processing method and device

Technical Field

The application relates to the field of artificial intelligence, in particular to a data processing method and a device thereof.

Background

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

stablistification (steady diffusion model, which may be simply referred to as diffusion model) is a generation model for generating multimedia data such as high fidelity images, voice, and video. The diffusion model generates an image through a diffusion process. The model completes training and reasoning of the model by carrying out multiple diffusion and inverse diffusion operations on noise. This makes the diffusion model generation process more stable, and is less prone to mode collapse.

A recommendation system is a technique that uses machine learning and data mining techniques to help users find content of interest. The recommendation system learns the behavior preferences of the user by collecting the user's historical behavior, thereby helping the user find goods, services or content that they may like, and can increase user satisfaction, promote sales, and increase the liveness of websites or applications. The recommendation system is a core module of various internet applications, and currently popular methods comprise collaborative filtering, depth models and the like.

A serial recommendation system refers to a recommendation system that makes recommendations based on a sequence of user history actions (e.g., purchase history, browsing history, etc.). Compared with the traditional recommendation system, the sequence recommendation system can reflect the change and evolution of the user interests better by utilizing the time information of the user behavior sequence, and can predict the future behavior of the user. The core idea of the sequence recommendation system is to use a sequence model to model a sequence of user behaviors, thereby learning the evolution trend of the user interests and predicting the future behaviors of the user.

The recommendation system is an application for judging the current needs or interested goods/services of the user according to the historical behaviors, the interest points, the context environment and other information of the user. The sequential recommendation system (sequential recommender system, SRS) is a recommendation system for making recommendations based mainly on historical behavior sequences (such as purchase history, browsing history, etc.) of the recommendation model. Compared with the traditional recommendation system, the SRS can reflect the evolution of the user interest by utilizing the time information of the user behavior sequence, and can predict the future behavior of the user. The core idea of the sequence recommendation is to model a user behavior sequence by using a sequence model, thereby learning the interest evolution trend of the user and predicting future behaviors.

The quality of the training data set determines the upper limit of the effect of the SRS. According to experimental verification, the more the historical behavior information of the user is, the more accurate the recommendation model prejudges the user. However, there is a serious long tail problem in the recommendation system, that is, behavior data of a large number of users is abnormally sparse, so that recommendation accuracy of the recommendation system is poor.

Thus, there is a need for a method of data enhancement of an original sequence recommendation dataset.

Disclosure of Invention

In a first aspect, the present application provides a data processing method, the method comprising: obtaining a target sequence, wherein the target sequence is an article sequence with interactive behaviors of a user, and the sequence comprises a first sequence and a second sequence; according to the feature representation of the first sequence, the feature representation is subjected to noise adding through a noise adding module in a diffusion model, and the feature representation after noise adding is obtained; and according to the current step length, the fusion result of the second sequence and the characteristic representation after noise addition, predicting the noise of the current step length through a noise prediction module in the diffusion model to obtain a first noise prediction result, wherein the first noise prediction result is used for denoising the characteristic representation after noise addition.

The present embodiment uses an original sequence (e.g., a target sequence in the present embodiment) to train a diffusion model, divides the original sequence into two parts, performs noise addition to a specific noise distribution (e.g., gaussian noise) in a front part (e.g., a first sequence in the present embodiment), and attempts to restore with the diffusion model under guidance of a rear part of the sequence (e.g., a second sequence in the present embodiment). The diffusion model trained in this way is applied to the whole original sample sequence, and the original sample sequence is supplemented by taking the original sample sequence as a guide from a specific noise distribution (such as Gaussian noise). The growth sequence generated by the method can be used for directly training a new sequence recommendation model and enhancing the effect of the recommendation model.

In one possible implementation, the method further comprises: pooling the second sequence; and fusing the current step length and the pooling operation result to obtain the fusion result.

In one possible implementation, the method further comprises: acquiring gradient information corresponding to a pre-training model when predicting a target object in the first sequence according to the characteristic representation of the first sequence; fusing the first noise prediction result obtained by the noise prediction module with the gradient information to obtain an adjusted first noise prediction result; and the adjusted first noise prediction result is used for denoising the denoised characteristic representation.

In one possible implementation, the target item is a first item in the first sequence.

In one possible implementation, the method further comprises: according to the current step length, the fusion result of the second sequence and the padding vector, carrying out noise prediction through a noise prediction module in the diffusion model to obtain a second noise prediction result; fusing the first noise prediction result and the second noise prediction result obtained by the noise prediction module to obtain an adjusted first noise prediction result; and the adjusted first noise prediction result is used for denoising the denoised characteristic representation.

In one possible implementation, the padding vector is a randomly initialized vector or a preset vector, and the padding vector may be updated.

Both of the above-described schemes enable the controlled generation of noise, and by both of the above-described guiding methods, enhanced samples conforming to the data distribution can be generated from the existing sequence.

In one possible implementation, the method further comprises: and updating the noise prediction module according to the first noise prediction result and the corresponding label.

In one possible implementation, the noise prediction module may be U-net. U-Net is a noise prediction network commonly used in the field of computer vision for diffusion models. The network receives three-dimensional image (channel with height) and time step information t as input, outputs tensor with same size as prediction noise, and can write epsilon _θ (x _t T). Because the image has great variability with the item representation in the recommendation system, the embodiment of the application modifies the input and output of the U-Net to adapt to the discretized input representation.

The embodiment of the application does not change the model structure of the U-Net, and only converts the data into a form suitable for U-Net processing. In theory, any image noise prediction model can be transformed into a model for processing sequence characteristics by using the method, so that the method has good universality.

In a second aspect, the present application provides a data processing apparatus, the apparatus comprising:

the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a target sequence, the target sequence is an article sequence with interactive behaviors of a user, and the sequence comprises a first sequence and a second sequence;

the processing module is used for carrying out noise adding on the characteristic representation through a noise adding module in the diffusion model according to the characteristic representation of the first sequence to obtain a noise-added characteristic representation;

And according to the current step length, the fusion result of the second sequence and the characteristic representation after noise addition, predicting the noise of the current step length through a noise prediction module in the diffusion model to obtain a first noise prediction result, wherein the first noise prediction result is used for denoising the characteristic representation after noise addition.

In one possible implementation, the processing module is further configured to:

pooling the second sequence;

and fusing the current step length and the pooling operation result to obtain the fusion result.

In one possible implementation, the acquiring module is further configured to

Acquiring gradient information corresponding to a pre-training model when predicting a target object in the first sequence according to the characteristic representation of the first sequence;

the processing module is further configured to:

fusing the first noise prediction result obtained by the noise prediction module with the gradient information to obtain an adjusted first noise prediction result; and the adjusted first noise prediction result is used for denoising the denoised characteristic representation.

In one possible implementation, the processing module is further configured to:

according to the current step length, the fusion result of the second sequence and the padding vector, carrying out noise prediction through a noise prediction module in the diffusion model to obtain a second noise prediction result;

fusing the first noise prediction result and the second noise prediction result obtained by the noise prediction module to obtain an adjusted first noise prediction result; and the adjusted first noise prediction result is used for denoising the denoised characteristic representation.

In one possible implementation, the processing module is further configured to:

and updating the noise prediction module according to the first noise prediction result and the corresponding label.

In a third aspect, an embodiment of the present application provides a data processing apparatus, which may include a memory, a processor, and a bus system, where the memory is configured to store a program, and the processor is configured to execute the program in the memory, so as to perform the method according to the first aspect and any optional method thereof.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, which when run on a computer causes the computer to perform the above-described first aspect and any of its alternatives.

In a fifth aspect, embodiments of the present application provide a computer program which, when run on a computer, causes the computer to perform the above first aspect and any of its alternative methods.

In a sixth aspect, the present application provides a chip system comprising a processor for supporting the execution of data processing means for performing the functions involved in the above aspects, e.g. for transmitting or processing data involved in the above methods; or, information. In one possible design, the chip system further includes a memory for holding program instructions and data necessary for the execution device or the training device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

Drawings

FIG. 1 is a schematic diagram of a structure of an artificial intelligence main body frame;

FIG. 2 is a schematic diagram of a system architecture according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a system architecture according to an embodiment of the present application;

fig. 4 is a schematic diagram of a recommendation scenario provided in an embodiment of the present application;

FIG. 5 is a flowchart of a data processing method according to an embodiment of the present application;

FIG. 6 is a processing schematic of a data processing method according to an embodiment of the present application;

FIG. 7 is a processing schematic of a data processing method according to an embodiment of the present application;

FIG. 8 is an effect illustration of an embodiment of the present application;

FIG. 9 is an illustration of an effect of an embodiment of the present application;

FIG. 10 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of an execution device according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a training apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the description of the embodiments of the application herein is for the purpose of describing particular embodiments of the application only and is not intended to be limiting of the application.

Embodiments of the present application are described below with reference to the accompanying drawings. As one of ordinary skill in the art can know, with the development of technology and the appearance of new scenes, the technical scheme provided by the embodiment of the application is also applicable to similar technical problems.

The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the application have been described in connection with the description of the objects having the same attributes. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, a schematic structural diagram of an artificial intelligence main body framework is shown in fig. 1, and the artificial intelligence main body framework is described below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where the "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process. The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.

(1) Infrastructure of

The infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform. Communicating with the outside through the sensor; the computing power is provided by a smart chip (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform comprises a distributed computing framework, a network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection and interworking networks and the like. For example, the sensor and external communication obtains data that is provided to a smart chip in a distributed computing system provided by the base platform for computation.

(2) Data

The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to the internet of things data of the traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.

Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capability

After the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

(5) Intelligent product and industry application

The intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, smart city etc.

The embodiment of the application can be applied to the field of information recommendation, wherein the scenes comprise but are not limited to scenes related to e-commerce product recommendation, search engine result recommendation, application market recommendation, music recommendation, video recommendation and the like, and the recommended objects in various application scenes can be also called as 'objects' so as to facilitate subsequent description, namely in different recommendation scenes, the recommended objects can be APP, video, or music, or a certain commodity (such as a presentation interface of an online shopping platform, different commodities can be displayed for presentation according to different users), and the essence can also be presented through the recommendation result of a recommendation model. These recommendation scenarios typically involve user behavior log collection, log data preprocessing (e.g. quantization, sampling, etc.), sample set training to obtain recommendation models, analysis of the objects (e.g. APP, music, etc.) involved in the scenario to which the training sample items correspond according to the recommendation models, e.g. the samples selected in the recommendation model training session come from the mobile phone APP application market user's operation behavior on the recommended APP, the recommendation models thus trained are then applicable to the mobile phone APP application market described above, or the APP application market for other types of terminals may be used to make recommendations of the terminal APP. The recommendation model finally calculates the recommendation probability or score of each object to be recommended, the recommendation system sorts the recommendation results selected according to a certain selection rule, for example, the recommendation results are sorted according to the recommendation probability or score, and the recommendation results are presented to the user through corresponding application or terminal equipment, and the user operates the objects in the recommendation results to generate links such as user behavior logs.

Referring to fig. 4, in the recommendation process, when a user interacts with the recommendation system, a recommendation request is triggered, the recommendation system inputs the request and related feature information into the deployed recommendation model, and then the click rate of the user on all candidate objects is predicted. And then, the candidate objects are arranged in a descending order according to the predicted click rate, and the candidate objects are displayed at different positions in order to serve as recommendation results for users. The user browses the presented items and user behavior such as browsing, clicking, downloading, etc. occurs. The user behaviors can be stored in a log to be used as training data, and the parameters of the recommendation model are updated irregularly through the offline training module, so that the recommendation effect of the model is improved.

For example, a user opens a mobile phone application market to trigger a recommendation module of the application market, and the recommendation module of the application market predicts the possibility of downloading given candidate applications by the user according to the historical downloading records of the user, the clicking records of the user, the self-characteristics of the applications, the time, the place and other environmental characteristic information. According to the predicted result, the application market is displayed according to the descending order of the possibility, and the effect of improving the application downloading probability is achieved. Specifically, applications that are more likely to be downloaded are ranked in a front position, and applications that are less likely to be downloaded are ranked in a rear position. The behavior of the user is also logged and the parameters of the prediction model are trained and updated through the offline training module.

For example, in the application related to life mate, the cognitive brain can be built by simulating the brain mechanism through various models and algorithms based on the historical data of the user in the fields of video, music, news and the like, and the life learning system framework of the user is built. The life mate can record events occurring in the past of the user according to system data, application data and the like, understand the current intention of the user, predict future actions or behaviors of the user and finally realize intelligent service. In the current first stage, behavior data (including information such as terminal side short messages, photos and mail events) of a user are obtained according to a music APP, a video APP, a browser APP and the like, a user portrait system is built, and learning and memory modules based on user information filtering, association analysis, cross-domain recommendation, causal reasoning and the like are realized to build a user personal knowledge map.

Next, an application architecture of an embodiment of the present application is described.

Referring to fig. 2, an embodiment of the present application provides a recommendation system architecture 200. The data collection device 260 is configured to collect samples, where a training sample may be composed of a plurality of feature information (or be described as attribute information, such as user attribute and article attribute), and the feature information may include user feature information and object feature information, and tag feature, where the user feature information is used to characterize a feature of a user, such as gender, age, occupation, hobbies, etc., the object feature information is used to characterize a feature of an object pushed to the user, different recommendation systems correspond to different objects, and types of features that need to be extracted by different objects are also not wanted, for example, the object feature extracted in the training sample of the APP market may be a name (identifier), a type, a size, etc. of APP; the object features mentioned in the training sample of the e-commerce APP can be the names of commodities, the category to which the commodities belong, price intervals and the like; the label feature is used to indicate whether the sample is a positive example or a negative example, and in general, the label feature of the sample may be obtained through operation information of the recommended object by the user, the sample in which the user has operated the recommended object is a positive example, the recommended object is not operated by the user, or only the sample browsed is a negative example, for example, when the user clicks or downloads or purchases the recommended object, the label feature is 1, which indicates that the sample is a positive example, and if the user has not operated any recommended object, the label feature is 0, which indicates that the sample is a negative example.

In the embodiment of the application, the training sample may be a sequence of user history behaviors (such as purchase history, browsing history, etc.), and specifically may be a sequence of enhanced user history behaviors.

The samples may be stored in the database 230 after collection, and some or all of the characteristic information in the samples in the database 230 may also be obtained directly from the client device 240, such as user characteristic information, user operation information on the object (for determining a type identifier), object characteristic information (such as an object identifier), and so on. The training device 220 trains the acquisition model parameter matrix based on the samples in the database 230 for generating the recommendation model 201. The recommendation model 201 can be used for evaluating a large number of objects to obtain scores of the objects to be recommended, further, a specified or preset number of objects can be recommended from the evaluation results of the large number of objects, and the calculation module 211 obtains recommendation results based on the evaluation results of the recommendation model 201 and recommends the recommendation results to the client device through the I/O interface 212.

In an embodiment of the present application, the training device 220 may also train a model (e.g., a diffusion model in an embodiment of the present application) of a sequence for enhancing the user's historical behavior based on training samples. Based on the model, the collected object sequences which are operated by the user in the history can be enriched, so that the recommendation precision of the recommendation model is improved.

The training device 220 is used for constructing the recommendation model 201 after obtaining the model parameter matrix based on sample training, and then sending the recommendation model 201 to the execution device 210, or directly sending the model parameter matrix to the execution device 210, and constructing a recommendation model in the execution device 210 for recommending a corresponding system, for example, the recommendation model obtained based on sample training related to video can be used for recommending video to a user in a video website or an APP, and the recommendation model obtained based on sample training related to APP can be used for recommending APP to the user in an application market.

The execution device 210 is configured with an I/O interface 212, and performs data interaction with an external device, and the execution device 210 may obtain user characteristic information, such as a user identifier, a user identity, a gender, a occupation, a preference, etc., from the client device 240 through the I/O interface 212, and this part of information may also be obtained from a system database. The recommendation model 201 recommends a target recommended object to the user based on the user characteristic information and the object characteristic information to be recommended. The execution device 210 may be disposed in a cloud server or in a user client.

The execution device 210 may invoke data, code, etc. in the data storage system 250 and may store the output data in the data storage system 250. The data storage system 250 may be disposed in the execution device 210, may be disposed independently, or may be disposed in other network entities, and the number may be one or multiple.

The calculation module 211 processes the user feature information by using the recommendation model 201, and the object feature information to be recommended, for example, the calculation module 211 uses the recommendation model 201 to analyze and process the user feature information and the feature information of the object to be recommended, so as to obtain the score of the object to be recommended, and the object to be recommended is ranked according to the score, wherein the object ranked in front is to be the object recommended to the client device 240.

Finally, the I/O interface 212 returns the recommendation to the client device 240 for presentation to the user.

Further, the training device 220 may generate respective recommendation models 201 for different targets based on different sample characteristic information to provide better results to the user.

It should be noted that fig. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 2, the data storage system 250 is an external memory with respect to the execution device 210, and in other cases, the data storage system 250 may be disposed in the execution device 210.

In the embodiment of the present application, the training device 220, the executing device 210, and the client device 240 may be three different physical devices, or the training device 220 and the executing device 210 may be on the same physical device or a cluster, or the executing device 210 and the client device 240 may be on the same physical device or a cluster.

Referring to fig. 3, a system architecture 300 is provided in accordance with an embodiment of the present invention. In this architecture the execution device 210 is implemented by one or more servers, optionally in cooperation with other computing devices, such as: data storage, routers, load balancers and other devices; the execution device 210 may be disposed on one physical site or distributed across multiple physical sites. The executing device 210 may use data in the data storage system 250 or call program codes in the data storage system 250 to implement an object recommendation function, specifically, input information of objects to be recommended into a recommendation model, generate a pre-estimated score for each object to be recommended by the recommendation model, sort the objects according to the pre-estimated score from high to low, and recommend the objects to be recommended to the user according to the sorting result. For example, the first 10 objects in the ranking result are recommended to the user.

The data storage system 250 is configured to receive and store parameters of the recommendation model sent by the training device, and data for storing recommendation results obtained by the recommendation model, and may also include program code (or instructions) required for normal operation of the storage system 250. The data storage system 250 may be a distributed storage cluster formed by one device or a plurality of devices disposed outside the execution device 210, and when the execution device 210 needs to use the data on the storage system 250, the storage system 250 may send the data required by the execution device to the execution device 210, and accordingly, the execution device 210 receives and stores (or caches) the data. Of course, the data storage system 250 may also be deployed within the execution device 210, and when deployed within the execution device 210, the distributed storage system may include one or more memories, and optionally, when there are multiple memories, different memories may be used to store different types of data, such as model parameters of a recommendation model generated by the training device and data of recommendation results obtained by the recommendation model, may be stored on two different memories, respectively.

The user may operate respective user devices (e.g., local device 301 and local device 302) to interact with the execution device 210. Each local device may represent any computing device, such as a personal computer, computer workstation, smart phone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set top box, game console, etc.

The local device of each user may interact with the performing device 210 through a communication network of any communication mechanism/communication standard, which may be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

In another implementation, the execution device 210 may be implemented by a local device, for example, the local device 301 may obtain user characteristic information and feed back recommendation results to the user based on a recommendation model implementing a recommendation function of the execution device 210, or provide services to the user of the local device 302.

Because the embodiments of the present application relate to a large number of applications of neural networks, for convenience of understanding, related terms and related concepts of the neural networks related to the embodiments of the present application will be described below.

1. Click-through probability (CTR)

The click probability, which may also be referred to as a click rate, refers to the ratio of the number of clicks to the number of exposures of recommended information (e.g., recommended items) on a web site or application, and is typically an important indicator in a recommendation system to measure the recommendation system.

2. Personalized recommendation system

The personalized recommendation system is a system for analyzing according to historical data (such as operation information in the embodiment of the application) of a user by utilizing a machine learning algorithm, predicting a new request and giving a personalized recommendation result.

3. Offline training (offlinenet)

The offline training refers to a module for iteratively updating recommendation model parameters according to an algorithm learned by a device in a personalized recommendation system according to historical data (such as operation information in the embodiment of the application) of a user until the recommendation model parameters reach set requirements.

4. Online prediction (onlineiference)

The online prediction refers to predicting the preference degree of the user for the recommended item in the current context according to the characteristics of the user, the item and the context based on an offline trained model, and predicting the probability of selecting the recommended item by the user.

For example, fig. 3 is a schematic diagram of a recommendation system provided in an embodiment of the present application. As shown in FIG. 3, when a user enters the system, a request for a recommendation is triggered, the recommendation system inputs the request and its associated information (e.g., operational information in the embodiment of the present application) into the recommendation model, and then predicts the user's selectivity for items within the system. Further, the items are arranged in a descending order according to the predicted selectivity or some function based on the selectivity, i.e. the recommendation system may display the items in different positions in order as a recommendation to the user. The user browses the different items in place and user actions occur such as browsing, selecting, downloading, etc. Meanwhile, the actual behaviors of the user can be stored in a log to be used as training data, and parameters of the recommendation model are continuously updated through the offline training module, so that the prediction effect of the model is improved.

For example, a user opening an application marketplace in a smart terminal (e.g., a cell phone) may trigger a recommendation system in the application marketplace. The recommendation system of the application market predicts the probability of downloading recommended candidate APP by the user according to the historical behavior log of the user, for example, the historical downloading record of the user, the user selection record and the self-characteristics of the application market, such as time, place and other environmental characteristic information. According to the calculated result, the recommendation system of the application market can display the candidate APP in descending order according to the predicted probability value, so that the downloading probability of the candidate APP is improved.

For example, APP with higher predicted user selectivity may be shown at a forward recommended position and APP with lower predicted user selectivity may be shown at a rearward recommended position.

The recommendation model may be a neural network model, and related terms and concepts of the neural network may be described below.

(1) Neural network

The neural network may be composed of neural units, which may refer to an arithmetic unit with xs (i.e., input data) and intercept 1 as inputs, and the output of the arithmetic unit may be:

where s=1, 2, … … n, n is a natural number greater than 1, ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to a next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by joining together a plurality of the above-described single neural units, i.e., the output of one neural unit may be the input of another neural unit. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

(2) Deep neural network

Deep neural networks (Deep Neural Network, DNN), also known as multi-layer neural networks, can be understood as neural networks having a large number of hidden layers, herein "Many "are not particularly metrics. From DNNs, which are divided by the location of the different layers, the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:wherein (1)>Is an input vector, +.>Is the output vector, +.>Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for the input vector +.>The output vector is obtained by such simple operation>Since DNN has a large number of layers, the coefficient W and the offset vector +.>And thus a large number. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in DNN of one three layers, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as +. >The superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4. The summary is:the coefficients from the kth neuron of the L-1 th layer to the jth neuron of the L-1 th layer are defined as +.>It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

(3) Loss function

In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually expected target value according to the difference between the predicted value of the current network and the actually expected target value (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be predicted to be lower, and the adjustment is continued until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.

(4) Back propagation algorithm

An error Back Propagation (BP) algorithm may be used to correct the magnitude of the parameters in the initial model during the training process, so that the error loss of the model is smaller and smaller. Specifically, the input signal is forward-transferred until output, and error loss occurs, and parameters in the initial model are updated by back-propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation motion that dominates the error loss, aiming at deriving optimal model parameters, such as a weight matrix.

(5) Machine learning system

Based on the input data and the labels, training parameters of a machine learning model through optimization methods such as gradient descent and the like, and finally completing prediction of unknown data by utilizing the model obtained through training.

(6) Personalized recommendation system

And analyzing and modeling by utilizing a machine learning algorithm according to the historical data of the user, predicting a new user request according to the analysis and modeling, and giving a personalized recommendation result.

(7) Diffusion model (diffusion model)

Is a generation model for generating data such as images, text, etc. The core idea of the diffusion model is to recover the original data by noise diffusing the data and then gradually removing the noise. The diffusion model includes two phases: forward (noise spreading) and reverse (de-noising recovery).

(8) Sequence U network (Sequential U-Net, SU-Net)

The sequence U network may be a noise prediction network (or referred to as a noise prediction module) used in embodiments of the present application. The U network (U-net) is a model for solving the semantic segmentation of the cell image, which is proposed in the field of medical image segmentation. Based on a convolutional neural network, the method firstly carries out convolutional+downsampling on an input image, then carries out upsampling+convolutional on the basis, and finally inputs a new image which is completely consistent with the original image in size and is used for image semantic segmentation. In the diffusion model, U-Net architecture is also often used as a noise prediction network. SU-Net is a modified version of U-Net for processing id-type data of a sequence. A historical behavior sequence contains N items, each item represented as a one-dimensional empdding vector of fixed length. We reshape the vector into a two-dimensional matrix, which is considered a certain channel of an image. The matrix superposition of multiple items is regarded as a complete multi-channel image, so that U-Net can process. This model is called SU-Net.

(9) Classifier-guided/no-Classifier-guided (Classifier Guidance/Classifier Free, CG/CF)

The classifier-directed and no-classifier-directed are two strategies for the generation of a guide in a diffusion model. The method for guiding the classifier (Classifier Guidance, CG) is that a gradient of the classifier is added to the original model when the noise is reversely predicted, and the gradient is used for guiding the noise predictor to learn towards the direction of the label. The Classifier Free (CF) removes the Classifier gradient, and instead uses the difference between the conditional and unconditional predictions as a Classifier gradient replacement to solve the problem of inaccurate gradient direction when the noise is too large.

In order to solve the above problems, an embodiment of the present application provides a data processing method. The following describes a data processing method according to an embodiment of the present application in detail with reference to the accompanying drawings.

Referring to fig. 5, fig. 5 is a flowchart of a data processing method according to an embodiment of the present application, and as shown in fig. 5, the data processing method according to an embodiment of the present application may include steps 501 to 503, which are respectively described in detail below.

501. Obtaining a target sequence, wherein the target sequence is an article sequence with interactive behaviors of a user, and the sequence comprises a first sequence and a second sequence;

the corresponding embodiment of fig. 5 may be an inference process of a diffusion model, or a feed-forward process of diffusion model training, such as a model pre-training or model fine-tuning process.

Sequence recommendations are a common paradigm of recommendation systems. The sequence recommendation recommends the next most likely clicked item for the user according to the historical behavior information of the user. However, current-stage sequence recommendation faces two challenges that prevent further development of the sequence recommendation model:

1. data sparseness problem: sparsity is an inherent property of real world data, i.e., only a small amount of interactions between a large number of users and items occur, and the user-item interaction matrix is very sparse. This can lead to problems of insufficient learning of user and article characterization;

2. long tail user problem: the length of the historical behavior sequences of a large number of users is very short, namely the problem of insufficient data volume exists. But the sequence model is very dependent on the historical data of the user, and the richer the data is, the better the recommendation effect is. These users with a small amount of data are called long tail users. Because of the lack of historical behavioral data, the existing sequence recommendation model is difficult to produce good recommendation results for long-tail users.

In the embodiment of the application, the original object sequence (i.e. the target sequence) with the interactive behavior of the user can be obtained, wherein the interactive behavior can be clicking, downloading, adding shopping carts, browsing, purchasing and the like.

For example, the ordering between items in the target sequence may be based on the time at which the user interacted with.

The former part of the target sequence may be selected as the first sequence and the latter part as the second sequence.

As shown in fig. 6. During the training phase of the diffusion model, the user's calendarThe history sequence can be split into two parts, the former part being S _aug The rear part is S _raw 。

Wherein the target sequence may include attribute information for each item.

502. And according to the characteristic representation of the first sequence, the characteristic representation is subjected to noise adding through a noise adding module in a diffusion model, and the characteristic representation after noise adding is obtained.

In one possible implementation, feature extraction (e.g., embedding) may be performed on the attribute information of each item in the first sequence to obtain a feature representation of the first sequence.

For example, S _aug Part (i.e. the first sequence) may go through the embellishing layer to get a feature representation x ₀ The features represent x ₀ Will be sent into the diffusion model to form Gaussian noise x through a series of noise adding operations _T (i.e., noisy feature representations).

Diffusion models have been widely and deeply applied in the fields of image generation, text generation and the like in recent years, and have good effects. The diffusion model mainly comprises two stages of noise adding and noise reducing. Let the original data be x ₀ The diffusion model gradually samples noise from the gaussian distribution, adds the noise to the original data, and gradually approximates the original data to the gaussian noise. The process can be formally expressed as:

q(x _1:T |x ₀ )：＝∏ _t＝1:T q(x _t |x _t-1 )；

the first equation in the above equation reveals how to derive from the original data x ₀ Generating x by adding noise step by step _T . The second equation reveals how to derive from x _t-1 Constructing Gaussian distribution and sampling x _t . The noise adding process of the diffusion model is a pure Markov process and does not contain parameters needing to be learned, namely beta _t Is a hyper-parameter, and I is an identity matrix. The first equation can be generalized according to the equation from step (t-1) to step tThe method is simple in that:

according to the above formula, after the super parameter beta and the noise adding step number t are selected, the final x can be directly calculated and sampled by the above formula _t 。

The noise reduction process of the diffusion model aims at giving a denoised sample x _t After that, attempt to restore the original sample x ₀ This process can be formally expressed as:

p _θ (x _0:T )＝p(x _T )∏ _t＝1:T p _θ (x _t-1 |x _t )；

/>

503. and according to the current step length, the fusion result of the second sequence and the characteristic representation after noise addition, predicting the noise of the current step length through a noise prediction module in the diffusion model to obtain a first noise prediction result, wherein the first noise prediction result is used for denoising the characteristic representation after noise addition.

In one possible implementation, the noise reduction process may be sampling from a gaussian distribution, leaving the variance of the gaussian distribution constant, the key to noise reduction being the mean of the gaussian distribution. I.e. starting from the data of step t, a noise prediction network e is used _θ A prediction of the mean is generated. The design of noise prediction networks is varied and U-Net networks are used in many cases in the field of computer vision. The training objectives of the noise prediction network may be formed as:

in the training phase of the diffusion model, the training target may be to make the predicted noise as close as possible to the real noise. In the scene of sequence recommendation data enhancement, the input x of a diffusion model ₀ For the original sample sequence S _aug Part of the ebedding is changed into gaussian noise by adding noise, and then noise is reduced to the original data sample.

During the use phase of the diffusion model, x can be sampled from the Gaussian distribution _T Consider the complete original sequence as S _raw Guiding generation S under guidance strategy _aug As a supplement.

Taking the prediction of the noise prediction module to a certain step size (for convenience of description, the current step size is referred to as a current step size in the embodiment of the present application) as an example, the fusion result of the current step size and the second sequence can be obtained.

The input of the noise prediction module may comprise time step information t. To better guide the generation, the embodiment of the application uses the original sequence S _raw (i.e., the second sequence) into time-step information, e.g., the emmbedding of the second sequence may be pooled, and the time-step information t fused together to obtain the condition vector z (i.e., the fusion result in the embodiment of the present application): z=t+pulling (S _raw )。

In addition, the input of the noise prediction module may further include noise adding information (such as a feature representation after noise adding in the embodiment of the present application) that needs to be input to the noise prediction module in the current step. Thus, the noise prediction module may be expressed as: e-shaped article _θ (x _t ,z)。

In one possible implementation, the noise prediction module in the diffusion model may be used to predict the current step size of the noise according to the fusion result and the noisy feature representation, so as to obtain a first noise prediction result.

In one possible implementation, the noise prediction module may be U-net. U-Net is a noise prediction network commonly used in the field of computer vision for diffusion models. The network receives three-dimensional image (channel with height) and time step information t as input, outputs tensor with same size as prediction noise, and can write epsilon _θ (x _t T). Because the image has great variability with the item representation in the recommendation system, the embodiment of the application modifies the input and output of the U-Net to adapt to the discretized input representation. The overall structure of the noise prediction module can be seen in fig. 7.

Assume that the diffusion model adds the sequence S with n item inputs in the noise adding and reducing process _aug The embodiment of the application converts n items (such as n items included in the first sequence) into n one-dimensional ebedding vectors by using an ebedding layer, wherein each vector dimension is d. The ebedding of multiple items is treated as multiple channels in the noise prediction model. A single ebedding vector will be reshaped intoMultiple emmbeddings can thus be spliced +.>Can be directly processed by U-Net.

Another input to the noise prediction network is time step information t. To better guide the generation, the application uses the original sequence S _raw Is represented by pooling, and forms a condition vector z together with time synchronization information t: z=t+pulling (S _raw ) Thus, the noise prediction network of an embodiment of the present application can be expressed as: e-shaped article _θ (x _t ,z)。

In one possible implementation, after the output of the noise prediction model is obtained, the information of the second sequence may be further used to diffuse the model to generate a new sequence that is more proximate to the user's interests.

Specifically, in one implementation, gradient information corresponding to a pre-training model when predicting a target item in the first sequence according to a feature representation of the first sequence may be obtained; fusing the first noise prediction result obtained by the noise prediction module with the gradient information to obtain an adjusted first noise prediction result; and the adjusted first noise prediction result is used for denoising the denoised characteristic representation.

The classifier-guided method (CG) introduces a gradient of the classifier to the target generation class when generating the prediction noise. The object is to move the direction of data generation in a controlled manner towards the direction of the target class, thereby achieving a controlled generation. The original noise prediction network input is (x _t Z), CG method predicts noise on this basis as:

wherein P is _φ (y|x _t ) Representing the classifier network, γ is a hyper-parameter that controls the gradient duty cycle. In CG approach, the most important is the design and training of the classifier. In order for the gradients of the classifier to represent the direction of data generation, a sequence recommendation model may be pre-trained on the raw data, which model is received in the training of the diffusion model, with the first item of the raw sequence as the target for gradient computation. This is because the generated sequence is spliced in front of the original sequence, so that the generated sequence is expected to be distributed as close as possible to the original data, and the first item of the original sequence can be predicted by the pre-trained sequence recommendation model.

Specifically, in one implementation, according to the current step length, the fusion result of the second sequence, and the padding vector, a noise prediction module in the diffusion model performs noise prediction to obtain a second noise prediction result; fusing the first noise prediction result and the second noise prediction result obtained by the noise prediction module to obtain an adjusted first noise prediction result; and the adjusted first noise prediction result is used for denoising the denoised characteristic representation.

In one possible implementation, the CG approach requires pre-training the recommendation model, which may introduce additional computational overhead, and thus one introduces a classifier-free approach, i.e., CF. The CF method removes the gradient of the classifier and instead uses the difference between the prediction noise with the conditional vector and the prediction noise without the conditional vector as an alternative to the gradient above. The method of CF may be formalized as:

in order to calculate the noise prediction under the unconditional condition, the embodiment of the application can calculate the noise by taking the global padding vector with the same specification as the replacement of the conditional vector.

The beneficial effects of the embodiment of the application are described below in connection with experiments:

the embodiment of the application verifies on three published recommended data sets, and selects three sequence recommendation model baselines: bert4Rec, SASRec, S3Rec, each verify their effects.

In fig. 8, the effect of the present application and other data enhancement methods on three sequential recommendation models is compared, and it can be seen that the method of the present application is superior to the control model on all data sets.

In order to verify the effect of the embodiment of the application under the long tail condition, the original Yelp data test set is divided into three parts of short, medium and long, and the effect of each part is verified to be improved. Fig. 9 shows the effect of each part, and it can be seen that the embodiment of the application has a remarkable effect improvement on long tail users.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 10, a data processing apparatus 1000 according to an embodiment of the present application includes:

an obtaining module 1001, configured to obtain a target sequence, where the target sequence is a sequence of items in which a user has interactive behavior, and the sequence includes a first sequence and a second sequence;

for a specific description of the obtaining module 1001, reference may be made to the description of step 502 in the above embodiment, which is not repeated here.

The processing module 1002 is configured to perform, according to the feature representation of the first sequence, denoising the feature representation by using a denoising module in a diffusion model, so as to obtain a denoised feature representation;

For a specific description of the processing module 1002, reference may be made to the descriptions of steps 502 and 503 in the foregoing embodiments, which are not repeated here.

In one possible implementation, the processing module 1002 is further configured to:

pooling the second sequence;

In one possible implementation, the obtaining module 1001 is further configured to

the processing module 1002 is further configured to:

Referring to fig. 11, fig. 11 is a schematic structural diagram of a terminal device provided by an embodiment of the present application, and a terminal device 1100 may be specifically represented by a virtual reality VR device, a mobile phone, a tablet, a notebook, an intelligent wearable device, etc., which is not limited herein. Specifically, the terminal apparatus 1100 includes: a receiver 1101, a transmitter 1102, a processor 1103 and a memory 1104 (where the number of processors 1103 in the terminal device 1100 may be one or more, one processor is exemplified in fig. 11), wherein the processor 1103 may include an application processor 11031 and a communication processor 11032. In some embodiments of the application, the receiver 1101, transmitter 1102, processor 1103 and memory 1104 may be connected by a bus or other means.

The memory 1104 may include read-only memory and random access memory and provides instructions and data to the processor 1103. A portion of the memory 1104 may also include non-volatile random access memory (non-volatile random access memory, NVRAM). The memory 1104 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.

The processor 1103 controls the operation of the execution device. In a specific application, the individual components of the execution device are coupled together by a bus system, which may include, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.

The method disclosed in the above embodiment of the present application may be applied to the processor 1103 or implemented by the processor 1103. The processor 1103 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the method described above may be performed by integrated logic circuitry in hardware or instructions in software in the processor 1103. The processor 1103 may be a general purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The processor 1103 can implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1104, and the processor 1103 reads the information in the memory 1104, and in combination with its hardware, performs the steps of the above method that involve model training or model reasoning.

The receiver 1101 is operable to receive input numeric or character information and to generate signal inputs related to performing relevant settings and function control of the device. The transmitter 1102 may be used to output numeric or character information through a first interface; the transmitter 1102 may also be configured to send instructions to the disk stack via the first interface to modify data in the disk stack; the transmitter 1102 may also include a display device such as a display screen.

Referring to fig. 12, fig. 12 is a schematic diagram of a structure of a server according to an embodiment of the present application, where the server 1200 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1212 (e.g., one or more processors) and a memory 1232, one or more storage media 1230 (e.g., one or more mass storage devices) storing application programs 1242 or data 1244. Wherein memory 1232 and storage medium 1230 can be transitory or persistent. The program stored on the storage medium 1230 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, a central processor 1212 may be provided in communication with the storage medium 1230, executing a series of instruction operations on the server 1200 in the storage medium 1230.

The server 1200 may also include one or more power sources 1226, one or more wired or wireless network interfaces 1250, one or more input/output interfaces 1258; or one or more operating systems 1241, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

In an embodiment of the present application, the central processor 1212 is configured to perform the actions related to model training or model reasoning in the above embodiment.

Embodiments of the present application also provide a computer program product which, when run on a computer, causes the computer to perform the steps as performed by the aforementioned performing device, or causes the computer to perform the steps as performed by the aforementioned training device.

The embodiment of the present application also provides a computer-readable storage medium having stored therein a program for performing signal processing, which when run on a computer, causes the computer to perform the steps performed by the aforementioned performing device or causes the computer to perform the steps performed by the aforementioned training device.

The execution device, training device or terminal device provided in the embodiment of the present application may be a chip, where the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip in the execution device to perform the data processing method described in the above embodiment, or to cause the chip in the training device to perform the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), etc.

Specifically, referring to fig. 13, fig. 13 is a schematic structural diagram of a chip provided in an embodiment of the present application, where the chip may be represented as a neural network processor NPU 1300, and the NPU 1300 is mounted as a coprocessor on a main CPU (Host CPU), and the Host CPU distributes tasks. The core part of the NPU is an arithmetic circuit 1303, and the controller 1304 controls the arithmetic circuit 1303 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuit 1303 includes a plurality of processing units (PEs) inside. In some implementations, the operation circuit 1303 is a two-dimensional systolic array. The arithmetic circuit 1303 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1303 is a general-purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1302 and buffers the data on each PE in the arithmetic circuit. The arithmetic circuit takes matrix a data from the input memory 1301 and performs matrix operation with matrix B, and the partial result or the final result of the matrix obtained is stored in an accumulator (accumulator) 1308.

Unified memory 1306 is used to store input data and output data. The weight data is directly transferred to the weight memory 1302 through the memory cell access controller (Direct Memory Access Controller, DMAC) 1305. The input data is also carried into the unified memory 1306 through the DMAC.

BIU is Bus Interface Unit, bus interface unit 1310 for interaction of the AXI bus with the DMAC and instruction fetch memory (Instruction Fetch Buffer, IFB) 1309.

The bus interface unit 1310 (Bus Interface Unit, abbreviated as BIU) is configured to obtain an instruction from the external memory by the instruction fetch memory 1309, and further configured to obtain raw data of the input matrix a or the weight matrix B from the external memory by the memory unit access controller 1305.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1306 or to transfer weight data to the weight memory 1302 or to transfer input data to the input memory 1301.

The vector calculation unit 1307 includes a plurality of operation processing units that perform further processing, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and the like, on the output of the operation circuit 1303, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization (batch normalization), pixel-level summation, up-sampling of a characteristic plane and the like.

In some implementations, the vector computation unit 1307 can store the vector of processed outputs to the unified memory 1306. For example, the vector calculation unit 1307 may perform a linear function; alternatively, a nonlinear function is applied to the output of the arithmetic circuit 1303, for example, to linearly interpolate the feature plane extracted by the convolution layer, and then, for example, to accumulate a vector of values to generate an activation value. In some implementations, vector computation unit 1307 generates a normalized value, a pixel-level summed value, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuit 1303, for example for use in subsequent layers in a neural network.

An instruction fetch memory (instruction fetch buffer) 1309 connected to the controller 1304 for storing instructions used by the controller 1304;

the unified memory 1306, the input memory 1301, the weight memory 1302, and the finger memory 1309 are all On-Chip memories. The external memory is proprietary to the NPU hardware architecture.

The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.

It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., comprising several instructions for causing a computer device (which may be a personal computer, a training device, a network device, etc.) to perform the method according to the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Claims

1. A method of data processing, the method comprising:

obtaining a target sequence, wherein the target sequence is an article sequence with interactive behaviors of a user, and the sequence comprises a first sequence and a second sequence;

according to the feature representation of the first sequence, the feature representation is subjected to noise adding through a noise adding module in a diffusion model, and the feature representation after noise adding is obtained;

2. The method according to claim 1, wherein the method further comprises:

pooling the second sequence;

3. The method according to claim 1 or 2, characterized in that the method further comprises:

4. A method according to claim 3, wherein the target item is the first item in the first sequence.

5. The method according to claim 1 or 2, characterized in that the method further comprises:

6. The method of claim 5, wherein the padding vector is a randomly initialized vector or a preset vector, and the padding vector can be updated.

7. The method according to any one of claims 1 to 6, further comprising:

8. A data processing apparatus, the apparatus comprising:

9. The apparatus of claim 8, wherein the processing module is further configured to:

pooling the second sequence;

10. The apparatus of claim 8 or 9, wherein the acquisition module is further configured to

the processing module is further configured to:

11. The apparatus of claim 10, wherein the target item is a first item in the first sequence.

12. The apparatus of claim 8 or 9, wherein the processing module is further configured to:

13. The apparatus of claim 12, wherein the padding vector is a randomly initialized vector or a preset vector, and the padding vector can be updated.

14. The apparatus of any one of claims 8 to 13, wherein the processing module is further configured to:

15. A computer storage medium storing one or more instructions which, when executed by one or more computers, cause the one or more computers to perform the operations of the method of any one of claims 1 to 7.

16. A computer program product comprising computer readable instructions which, when run on a computer device, cause the computer device to perform the method of any of claims 1 to 7.

17. A system comprising at least one processor, at least one memory; the processor and the memory are connected through a communication bus and complete communication with each other;

the at least one memory is used for storing codes;

The at least one processor is configured to execute the code to perform the method of any of claims 1 to 7.

18. A chip comprising a processor for supporting a data processing apparatus to implement a method as claimed in any one of claims 1 to 7.