CN117172330A

CN117172330A - Data processing method, system and electronic equipment

Info

Publication number: CN117172330A
Application number: CN202311112814.XA
Authority: CN
Inventors: 张峰; 郑祯; 潘再峰; 邱侠斐; 李永; 林伟; 杜小勇
Original assignee: Hangzhou Alibaba Feitian Information Technology Co ltd
Current assignee: Hangzhou Alibaba Feitian Information Technology Co ltd
Priority date: 2023-08-30
Filing date: 2023-08-30
Publication date: 2023-12-05

Abstract

The application discloses a data processing method, a data processing system and electronic equipment. Wherein the method comprises the following steps: acquiring a calculation flow for describing an original model, wherein calculation resources are needed to be used for completing the calculation flow when the original model is operated; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; and carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result, wherein different computing sub-flows of the target model are required to be processed by adopting matched computing resources. The application solves the technical problem of low data processing efficiency.

Description

Data processing method, system and electronic equipment

Technical Field

The present application relates to the field of computers, and in particular, to a data processing method, system, and electronic device.

Background

At present, the model can be automatically adjusted through a machine learning compiler, however, the problem caused by a large amount of calculation flows in the model is not solved, for example, redundant calculation is inevitably introduced in the process of running the large amount of calculation flows, so that the performance of the model is reduced, and the technical problem of low data processing efficiency exists.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing system and electronic equipment, which aim to at least solve the technical problem of low data processing efficiency.

According to an aspect of an embodiment of the present application, there is provided a data processing method. The method may include: acquiring a calculation flow for describing an original model, wherein calculation resources are needed to be used for completing the calculation flow when the original model is operated; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; and carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result, wherein different computing sub-flows of the target model are required to be processed by adopting matched computing resources.

According to another aspect of an embodiment of the present application, another data processing method is provided. The method may include: acquiring a calculation flow for describing an original model by calling a first interface, wherein the first interface comprises a first parameter, the parameter value of the first parameter is the calculation flow, and calculation resources are needed to be used for completing the calculation flow when the original model is operated; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result, wherein different computing sub-flows of the target model need to be processed by adopting matched computing resources; and outputting the target model by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter is the target model.

According to another aspect of the embodiment of the application, an information recommendation method is provided. The method may include: acquiring a calculation flow for describing an original recommendation model, wherein the original recommendation model is used for determining service information to be recommended to a target object, and calculation resources are needed to be used for completing the calculation flow when the original recommendation model is operated; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; and carrying out fusion processing on the computing units contained in the computing unit set, and generating a target recommendation model corresponding to the original recommendation model based on a fusion result, wherein different computing sub-flows of the target recommendation model are required to be processed by adopting matched computing resources so as to generate service information.

According to another aspect of an embodiment of the present application, a data processing system is provided. The system may include: the client is used for detecting a model processing request on the interactive interface, wherein the model processing request is used for requesting the cloud server to process the original model; the cloud server is used for responding to the model processing request and acquiring a calculation flow for describing the original model, wherein calculation resources are needed to be used for completing the calculation flow when the original model is operated; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result; and the computing end is used for calling computing resources matched with different computing sub-flows of the target model and processing the corresponding computing sub-flows to obtain a computing result.

According to another aspect of embodiments of the present application, there is also provided an electronic device that may include a memory and a processor: the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions, which when executed by the processor, implement the data processing method of any of the above.

According to another aspect of the embodiment of the present application, there is also provided a processor, configured to execute a program, where any one of the above data processing methods is executed when the program is executed.

According to another aspect of the embodiments of the present application, there is also provided a computer readable storage medium including a stored program, wherein the program when run controls a device in which the storage medium is located to execute the data processing method of any one of the above.

In the embodiment of the application, the original model can be identified, so that the calculation flow which can describe the condition that the original model needs to be operated by using the calculation resources can be determined, the calculation sub-flow can be identified from the calculation flow, the calculation flow can be simplified according to the calculation resource amount, and the calculation sub-flow with less calculation resource amount after simplification can be obtained. The computation sub-flow can be mapped into a corresponding computation unit to perform fusion processing, and a fusion result is generated. The target model corresponding to the original model can be generated according to the fusion result, so that the purpose of updating the original model based on the fusion result is achieved, a plurality of problems caused by a large number of calculation flows exist in the original model, the purpose of avoiding the problems of reduced performance and the like of the model caused by introducing redundant calculation due to a large number of calculation flows can be achieved by calculating calculation sub-flows and calculation resources required by the calculation sub-flows, the technical effect of improving the processing efficiency of data is achieved, and the technical problem of low processing efficiency of the data is solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application, as claimed.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

fig. 1 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present application;

FIG. 2 is a block diagram of a computing environment for a data processing method according to an embodiment of the application;

FIG. 3 is a flow chart of a data processing method according to an embodiment of the application;

FIG. 4 is a flow chart of another data processing method according to an embodiment of the present application;

FIG. 5 is a flowchart of an information recommendation method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a data processing system according to an embodiment of the present application;

FIG. 7 is a flow chart of compiling adjustment for a plurality of embedded columns in a depth recommendation model according to an embodiment of the present application;

FIG. 8 is a flow chart of identifying embedded columns according to an embodiment of the present application;

FIG. 9 is a flow chart of a CPU co-operating with a GPU according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 11 is a schematic diagram of another data processing apparatus according to an embodiment of the present application;

FIG. 12 is a schematic diagram of an information recommendation device according to an embodiment of the present application;

fig. 13 is a block diagram of a computer terminal according to an embodiment of the present application;

FIG. 14 is a block diagram of an electronic device of a data processing method according to an embodiment of the application;

fig. 15 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing a data processing method according to an embodiment of the present application;

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, partial terms or terminology appearing in the course of describing embodiments of the application are applicable to the following explanation:

an Embedding Column (Embedding Column) that can convert an input feature into a computational sub-graph of an embedded vector by processing and querying an Embedding table;

the deep recommendation model is a model for a recommendation system constructed by using a deep learning technology, and the recommendation system aims to recommend related products or services to a user according to the historical behaviors and preferences of the user;

The machine learning compiler can automatically convert the machine learning model into an efficient calculation graph, code or instruction according to the characteristics of the machine learning model, so that the training or deduction process of the model is accelerated;

a computational graph, which may be used to describe the computational flow of a machine learning model, each node representing a computational operation, the edges between nodes representing data flows and dependencies;

dynamic shape, which refers to the shape of each tensor in the model computational graph, can be obtained only at runtime and is uncertain during compilation.

Example 1

According to an embodiment of the present application, there is provided a data processing method, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

According to an aspect of an embodiment of the present application, there is provided a data processing method. As an alternative embodiment, the above-mentioned data processing method may be applied, but not limited to, to the application scenario shown in fig. 1. Fig. 1 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present application, as shown in fig. 1, in which a terminal device 102 may, but is not limited to, communicate with a server 106 through a network 104, and the server 106 may, but is not limited to, perform an operation on a database 108, for example, a data writing operation or a data reading operation. The terminal device 102 may include, but is not limited to, a man-machine interaction screen, a processor, and a memory. The man-machine interaction screen described above may be, but is not limited to, software or an application for displaying service information required to be recommended on the terminal device 102, historical behavior information, recommendation information, and the like. The processor may be, but not limited to, configured to respond to the man-machine interaction operation, perform a corresponding operation, or generate a corresponding instruction, and send the generated instruction to the server 106, for example, when the target object opens a software or an application on the terminal device 102 that needs to recommend service information, the operation of the target object on the software or the application may be detected, historical behavior information of the target object may be generated, and the historical behavior information may be sent to the server 106, where the server 106 may be a cloud server. The memory is used to store relevant data, such as software or applications that need recommended service information, historical behavior information of the target object, and service information that is finally obtained from the server 106. After the server 106 receives the historical behavior information, the historical behavior information may be stored in a database 108, wherein the database 108 may be a database in a cloud server.

As an alternative, the following steps in the data processing method may be performed on the server 106: step S102, a calculation flow for describing an original model is obtained, wherein calculation resources are needed to be used for completing the calculation flow when the original model is operated; step S104, identifying at least one calculation sub-process from the calculation processes; step S106, performing simplification processing on different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is smaller than the amount of computation resources required by the computation sub-flows before simplification; step S108, mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; in step S110, the computing units included in the computing unit set are subjected to fusion processing, and a target model corresponding to the original model is generated based on the fusion result, where different computing sub-flows of the target model need to be processed by adopting matched computing resources.

Optionally, after the target model corresponding to the updated original model is generated according to the above steps S102 to S110, when the historical behavior information needs to be analyzed to generate the service information, the historical behavior information of the target object that needs to be recommended by the service information may be read from the database 108, that is, the historical behavior information of the target object is also stored in the database on the cloud server. And (3) inputting the historical behavior information in the database into the updated model to be processed for reasoning, so that service information conforming to the target object can be generated. After inferring the service information, the service information may be recommended to the terminal device 102 of the target object via the network 104 and may be displayed on the terminal device.

By adopting the mode, the original model can be identified, the calculation flow which can describe the condition that the calculation resources are needed to run the original model can be determined, the calculation sub-flow can be identified from the calculation flow, the calculation flow can be simplified according to the magnitude of the calculation resource, and the simplified calculation sub-flow with less calculation resource can be obtained. The computation sub-flow can be mapped into a corresponding computation unit to perform fusion processing, and a fusion result is generated. The target model corresponding to the original model can be generated according to the fusion result, so that the purpose of updating the original model based on the fusion result is achieved, and due to the fact that a large number of problems caused by the calculation flow exist in the original model, the purpose of avoiding the problems caused by the calculation flow can be achieved by calculating the calculation sub-flow and the calculation resources required by the calculation sub-flow, the technical effect of improving the data processing efficiency is achieved, and the technical problem of low data processing efficiency is solved.

FIG. 2 is a block diagram of a computing environment for a data processing method according to an embodiment of the present application, as shown in FIG. 2, the computing environment 201 includes a plurality of computing nodes (e.g., servers) running on a distributed network (shown as 210-1, 210-2, …). The computing nodes each contain local processing and memory resources and end user 202 may run applications or store data remotely in computing environment 201. The application may be provided as a plurality of services 220-1, 220-2, 220-3, and 220-4 in computing environment 201, representing services "A", "D", "E", and "H", respectively.

End user 202 may provide and access services through a web browser or other software application on a client, in some embodiments, provisioning and/or requests of end user 202 may be provided to portal gateway 230. Ingress gateway 230 may include a corresponding agent to handle provisioning and/or request for services (one or more services provided in computing environment 201).

Services are provided or deployed in accordance with various virtualization techniques supported by the computing environment 201. In some embodiments, services may be provided according to Virtual Machine (VM) based virtualization, container based virtualization, and/or the like. Virtual machine-based virtualization may be the emulation of a real computer by initializing a virtual machine, executing programs and applications without directly touching any real hardware resources. While the virtual machine virtualizes the machine, according to container-based virtualization, a container may be started to virtualize the entire Operating System (OS) so that multiple workloads may run on a single operating system instance.

In one embodiment based on container virtualization, several containers of a service may be assembled into one Pod (e.g., kubernetesPod). For example, as shown in FIG. 2, the service 220-2 may be equipped with one or more Pods 240-1, 240-2, …,240-N (collectively referred to as Pods). The Pod may include an agent 245 and one or more containers 242-1, 242-2, …,242-M (collectively referred to as containers). One or more containers in the Pod handle requests related to one or more corresponding functions of the service, and the agent 245 generally controls network functions related to the service, such as routing, load balancing, etc. Other services may also be equipped with Pod similar to Pod.

In operation, executing a user request from end user 202 may require invoking one or more services in computing environment 201, and executing one or more functions of one service may require invoking one or more functions of another service. As shown in FIG. 2, service "A"220-1 receives a user request of end user 202 from ingress gateway 230, service "A"220-1 may invoke service "D"220-2, and service "D"220-2 may request service "E"220-3 to perform one or more functions.

The computing environment may be a cloud computing environment, and the allocation of resources is managed by a cloud service provider, allowing the development of functions without considering the implementation, adjustment or expansion of the server. The computing environment allows developers to execute code that responds to events without building or maintaining a complex infrastructure. Instead of expanding a single hardware device to handle the potential load, the service may be partitioned to a set of functions that can be automatically scaled independently.

In the above-described operating environment, the present application provides a data processing method as shown in fig. 3. It should be noted that the data processing method of this embodiment may be executed by the computer device of the embodiment shown in fig. 1. FIG. 3 is a flow chart of a data processing method according to an embodiment of the application, as shown in FIG. 3, the method may include the steps of:

In step S302, a computing flow for describing the original model is obtained, where computing resources are required to complete the computing flow when the original model is run.

In the technical solution provided in the above step S302 of the present application, if the original model needs to be processed to update the original model, the calculation flow of the original model may be obtained, where the calculation flow needs to be completed by using the calculation resource when the original model is operated. The raw model may be a model to be processed with a deep neural network, for example, a deep recommendation model with a large number of embedded columns to be processed, which may be a model for recommending services or products to a target object that meet the target object's preferences and needs. The present application is not limited to the model to be processed, and any model with deep neural network having a large number of problems caused by embedded columns is within the scope of the present application.

In this embodiment, the original model may include two parts: deep neural network stacks and embedded layers. The embedded layer may include at least one embedded column corresponding to a different feature field in the original model, e.g., the embedded layer is composed of the at least one embedded column. The embedding columns may also be referred to as embedding tables, which may be used to represent the computational flow between the deep neural network in the original model and the embedding layer of the original model. The calculation flow can be a splicing process and a splicing mode between the deep neural network and the embedded layer, and can be represented by a calculation graph of a corresponding original model. The embedded layer may contain at least one embedded column of the original model.

Alternatively, when there is a model with a deep neural network that needs to be subjected to an embedding layer process, the model may be determined as an original model to be processed, and an embedding vector (an embedding column) may be spliced to the deep neural network in the original model to generate a corresponding computation graph, where the embedding column may be an embedding vector that processes a feature input into the original model and queries an embedding table to convert the feature into. The embedded columns of the original model constitute the embedded layers of the original model.

Step S304, at least one computation sub-process is identified from the computation processes.

In the technical solution provided in the above step S304 of the present application, after the computing flow for describing the original model is obtained, a computing sub-flow may be identified from the computing flow, where the computing sub-flow may be an embedded column in the embedded layer. The embedded columns may be used to convert the features to be processed in the deep neural network into computational subgraphs in the computational graph. The feature to be processed may be an input feature, a feature field, a statistical feature, etc., which are only exemplified herein and are not particularly limited. The embedded column may be an embedded column subgraph identified from a computational graph. Various nodes, such as tensor computation nodes, shape computation nodes, etc., may be included in the embedded column subgraph, which is given by way of example only and not limitation. The computation subgraph may be used to represent subgraphs generated during adjustment of the embedded column, for example, subgraphs generated during tensor computation of the embedded column subgraphs, subgraphs generated during shape computation reconstruction, and subgraphs generated during redundancy security removal and simplification. It should be noted that, the contents included in the calculation subgraph are only for illustration, and the calculation subgraph is not specifically limited.

Optionally, the embodiment may identify a computation flow in the original model, identify at least one computation sub-flow from the computation flow, that is, may identify and disassemble an embedded column of the computation graph of the original model, disassemble at least one embedded column from an embedded layer in the computation graph, may perform computation for each embedded column, and convert the feature to be processed in each embedded column into a computation sub-graph of the computation graph.

In step S306, simplification processing is performed on the different computation sub-flows, wherein the amount of computation resources required for the simplified computation sub-flow is smaller than the amount of computation resources required for the computation sub-flow before simplification.

In the technical solution provided in the above step S306 of the present application, after at least one computation sub-flow is identified from the computation flows, each computation sub-flow may be simplified, that is, the amount of computation resources required in the computation sub-flow may be reduced, so as to obtain a simplified computation sub-flow with a lower amount of computation resources, where the amount of computation resources required in the simplified computation sub-flow may be less than the amount of computation resources required in the simplified computation sub-flow.

Alternatively, this embodiment derives, for each embedded column, a symbolic expression of the shape of all tensors in which the feature to be processed corresponds.

Optionally, the embedded column subgraph can be reconstructed through unified shape calculation, that is, the embedded column subgraph derived through the symbol expression is replaced by a unified operator for simplification, so that a new reconstructed embedded column subgraph is obtained. Redundant security removal and embedding search simplification can be performed on the reconstructed embedded column subgraph, and the simplified embedded column subgraph is obtained.

In this embodiment, there are many problems due to the large number of embedded columns in the original model, which may result in a technical problem of low processing efficiency of data. In the embodiment of the application, all embedded columns in the embedded layer can be identified from the computational graph of the original model, and the characteristics to be processed can be converted into a computational subgraph through the embedded columns. In the conversion process, the image level adjustment is equivalent to the adjustment of the embedded column subgraph, so that the purpose of simplifying the embedded column subgraph is achieved, therefore, the corresponding calculation subgraph is obtained by simplifying the embedded column subgraph, the purpose of simplifying the embedded column subgraph is achieved, and the technical effect of improving the data processing efficiency is achieved.

Step S308, mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set.

In the technical solution provided in the above step S308 of the present application, after the simplified computation sub-flow is obtained by performing the simplification process on the computation sub-flow, the simplified computation sub-flow may be mapped into corresponding computation units to form a computation unit set, where the computation unit may be a processor for running the computation sub-flow by using computation resources, and may be an image processor (Graphics Processing Unit, abbreviated as GPU) or a central processor (Central Processing Unit, abbreviated as CPU), and the image processor may also be referred to as a graphics processor. The processor matches the computational resources required for the embedded column. The embedded column may be a computing operation based on a computing resource by a processor. The set of computing units may be used to represent the set of computing sub-flows that are each mapped to a desired computing unit.

Alternatively, after the computational sub-flow is identified, that is, after the embedded columns in the embedded layer are obtained by identifying the computational graph, a processor that can be matched to the computational resources required for the embedded columns may be determined, so that the embedded columns may be mapped into the corresponding processor, and computational operations may be performed on the received embedded columns by the processor using the computational resources.

In the embodiment of the present application, the embedded column may be analyzed to determine in which processor the embedded column needs to perform a computing operation, for example, if the size of the embedded table included in the embedded column reaches a certain threshold, the entire embedded column may be mapped to the CPU for processing. Based on the mode of the common operation of the CPU and the GPU, different embedded columns or operators corresponding to the embedded columns can be pertinently placed into a proper processor according to actual requirements to execute calculation operation according to different conditions, so that the memory overhead can be reduced, the purpose of fully utilizing the calculation resources in the idle CPU can be achieved, the purpose of improving the utilization rate of the calculation resources can be achieved, the technical effect of improving the processing efficiency of data is achieved, and the technical problem of low processing efficiency of the data is solved.

After mapping the embedded column to the corresponding processor, the embodiment can execute a calculation operation on the embedded column by the processor according to the calculation resources required by the embedded column, and determine the calculation result corresponding to the embedded column, so that the model to be processed can be updated according to the calculation result.

Optionally, by performing operations such as simplifying processing on each embedded column and performing corresponding computing operation on the processed embedded column, a corresponding computing result is obtained, and the initial model to be processed can be updated through the computing result, for example, a kernel function or an operator corresponding to the embedded column can be added at a corresponding position in the initial model to be processed, so that the model to be processed can be updated.

In step S310, the computing units included in the computing unit set are subjected to fusion processing, and the original model is updated based on the fusion result, so as to generate a target model corresponding to the original model, wherein different computing sub-flows of the target model need to be processed by adopting matched computing resources.

In the technical solution provided in the above step S310 of the present application, after mapping the simplified computation sub-flows into the corresponding computation units, the computation units included in the computation unit set may be subjected to fusion processing to obtain a fusion result, and a corresponding target model may be generated according to the fusion result, where different computation sub-flows in the target model may be processed by adopting computation resources matched with the computation sub-flows.

Optionally, this embodiment may fuse all the simplified embedded column subgraphs in the embedded layer to obtain a fused operator. For example, the embedded column subgraphs may be fused by a parallelism-oriented fusion method.

It should be noted that the above process and method for fusing embedded column subgraphs are only illustrative, and any process and method for calculating and simplifying all embedded layer subgraphs in an embedded column are within the scope of the embodiments of the present application, and are not illustrated here.

Through the steps S302 to S310 of the present application, the original model may be identified, so as to determine that the calculation flow when the original model needs to be run by using the calculation resources can be described, and the calculation sub-flow may be identified from the calculation flow, and the calculation flow may be simplified according to the calculation resource amount, so that the simplified calculation sub-flow with less calculation resource amount may be obtained. The computation sub-flow can be mapped into a corresponding computation unit to perform fusion processing, and a fusion result is generated. The target model corresponding to the original model can be generated according to the fusion result, so that the purpose of updating the original model based on the fusion result is achieved, a plurality of problems caused by a large number of calculation flows exist in the original model, the purpose of avoiding the problems of reduced performance and the like of the model caused by introducing redundant calculation due to a large number of calculation flows can be achieved by calculating calculation sub-flows and calculation resources required by the calculation sub-flows, the technical effect of improving the processing efficiency of data is achieved, and the technical problem of the processing efficiency of the data is solved.

The above-described method of this embodiment is further described below.

As an optional implementation manner, step S304 identifies at least one computation sub-process from the computation processes, including: disassembling the calculation flow, and identifying at least one variable of the original model, wherein the variable is used for representing trainable data in the original model and is matched with the type of the calculation sub-flow; a computational sub-flow is determined based on the variables. The type of the computation sub-flow may be an operator type.

In this embodiment, the computing process may be disassembled and identified to obtain at least one variable in the original model, and a computing sub-process in the computing process may be determined based on the variable, where the variable may be used to represent trainable data in the original model that matches a type of the computing sub-process, for example, the variable may be a trainable variable, and may be an embedded table variable.

Optionally, the computational graph may be identified to obtain at least one embedded table of the original model, and an embedded column may be determined based on the embedded table, where operation information corresponding to the embedded table may be associated with the embedded column.

As an alternative embodiment, determining a computational sub-flow based on the variables includes: determining an initial calculation sub-flow corresponding to the variable; and updating the calculation sub-flow into a calculation sub-flow by utilizing a precursor calculation node of the calculation node in the initial calculation sub-flow.

In this embodiment, an initial computation sub-flow corresponding to the variable may be determined, so that the initial computation sub-flow is changed and updated to a computation sub-flow by using a precursor computation node of a computation node in the initial computation sub-flow, where the initial computation sub-flow may be an embedded column sub-graph corresponding to the initialized current embedded table. The compute nodes may be nodes embedded in a column subgraph, and may include a predecessor compute node and a successor compute node, where the predecessor compute node may also be referred to as a predecessor compute node. The precursor compute node may also be referred to as a precursor node or precursor node. The successor computing node may also be referred to as a successor node.

Optionally, the embedding column disassembly processing may be performed on the computation graph corresponding to the entire original model, for example, all the embedding tables in the computation graph may be disassembled by using the operator types applied after the variables are trainable, and for all the nodes in the computation graph, it may be counted that the direct or indirect precursor nodes include several embedding tables. All the embedded tables in the original model can be traversed, for each embedded table, the embedded column subgraph corresponding to each embedded table can be initialized, and the queue for breadth-limited traversal can be initialized. Whether the queue is not empty can be judged, if yes, the node can be dequeued from the queue, all the subsequent nodes of the node can be traversed, whether the number of the precursor embedded tables of the subsequent nodes is smaller than or equal to 1 is judged for each subsequent node, if smaller than or equal to 1, the subsequent node can be inserted into the current embedded column subgraph, and if larger than 1, other subsequent nodes can be traversed continuously until traversing is completed. If not, all nodes in the current embedded column subgraph can be further traversed, for each node, a corresponding precursor node can be traversed, and each precursor node can be inserted into the current embedded column subgraph. If all the precursor nodes are traversed, other nodes in the current embedded column subgraph can be traversed, and if all the nodes are traversed, whether the embedded column subgraph is updated in the current iteration can be judged.

As an alternative embodiment, the method may further comprise: determining a processing type corresponding to the calculation sub-flow of the target model, wherein the processing type is used for representing the type of operation to be executed on the calculation sub-flow of the target model; based on the processing type, the computational sub-process of determining the target model requires the use of matched computational resources.

In this embodiment, a processing type corresponding to the computation sub-flow of the target model may be determined, and according to the processing type, a computing resource to be matched with the computation sub-flow in the target model and a corresponding processor may be determined, where the processing type may be used to represent a type of an operation to be performed on the computation sub-flow of the target model.

Alternatively, the embedded columns may be mapped into thread blocks of the processor in this embodiment, where the thread blocks are used to form kernel functions that are run by the processor's kernel.

Alternatively, the embedded columns may be mapped into thread blocks of the processor, where the thread blocks may be used to form kernel functions to be run by the processor's kernel, also referred to as GPU thread blocks, GPU functions. The kernel may be the kernel of a GPU.

Alternatively, each embedded column obtained through the processes of simplifying processing and the like may be mapped onto a corresponding thread Block (Block) of the GPU, and the corresponding kernel function capable of running in the kernel of the processor may be obtained by processing the embedded column on the thread Block.

Optionally, the operators corresponding to at least one embedded column can be fused to obtain a kernel function.

Alternatively, individual embedded columns after a series of graphics plane adjustment processes and steps may be fused into one GPU function before mapping the embedded columns into corresponding thread blocks of the processor.

Alternatively, a symbolic expression in which the shapes of all tensors corresponding to the feature to be processed are derived may be derived for each embedded column. And reconstructing the embedded column subgraph through unified shape calculation to obtain a new embedded column subgraph after reconstruction. Redundant security removal and embedding search simplification can be performed on the reconstructed embedded column subgraph, and the simplified embedded column subgraph is obtained. All simplified embedded column subgraphs in the embedded layer can be fused to obtain a fused operator. And the fused operators can be converted into fused kernel functions through a unified computing device architecture (Compute Unified Device Architecture, simply called CUDA), for example, block 1, … and Block N can be obtained through fusion. In the embodiment of the application, the corresponding calculation subgraphs are obtained by simplifying the embedded column subgraphs and fusing the simplified corresponding operators corresponding to all the embedded columns, so that the fused operators can be processed to obtain the corresponding fused kernel functions, thereby achieving the purpose of simplifying the embedded column subgraphs and further achieving the technical effect of improving the data processing efficiency.

In the embodiment of the application, a large number of embedded columns can be fused into a single kernel function, and each embedded column is processed by a CUDA Block.

As an alternative embodiment, determining the computational sub-flow of the object model based on the processing type requires employing matched computational resources, including: responding to the processing type as the character string operation to be executed for the calculation sub-flow of the target model, and determining to call the calculation resource in the central processing unit; and determining to call computing resources in the graphics processor in response to the processing type being a non-string operation to be performed for the computing sub-flow of the target model.

In this embodiment, the processing type may be determined, and when the processing type is the calculation sub-flow of the target model and the character string operation is to be performed, it is determined that the calculation resource in the central processing unit may be invoked at this time. When the computing sub-flow of which the processing type is the target model is to perform a non-string operation, it is determined that computing resources in a graphics processor, which may also be referred to as an image processor, may be invoked at this time.

Alternatively, the embedded columns may be mapped into the corresponding processors based on operation information of the embedded columns, where the operation information may be used to characterize whether string operations are performed on the embedded columns, and may include information of collection (Gather) and the like.

For example, if a string operation is performed on the embedded column, the embedded column at this time may be mapped into the CPU. If no string operation is performed on the embedded column, the embedded column at this time may be mapped into the GPU. It should be noted that the present application is merely illustrative, and the operation information and the manner and procedure of mapping the operation information into the corresponding processor are not particularly limited.

In embodiments of the present application, after the kernel function is obtained, the operation information in the embedded column may be analyzed to determine in which processor the embedded column needs to perform the computing operation. Based on the mode of the common operation of the CPU and the GPU, different embedded columns or operators corresponding to the embedded columns can be pertinently placed into a proper processor according to actual requirements to execute calculation operation according to different conditions, so that the memory overhead can be reduced, the purpose of fully utilizing the calculation resources in the idle CPU can be achieved, and the technical effect of improving the data processing efficiency is achieved.

As an alternative embodiment, the method may further comprise: determining to invoke computing resources in the central processing unit in response to the data volume of the variable corresponding to the computing sub-flow of the target model being greater than a data volume threshold; and determining to invoke the computing resource in the graphics processor in response to the data amount of the variable corresponding to the computing sub-flow of the target model being not greater than the data amount threshold.

In this embodiment, the size of the data amount of the variable corresponding to the calculation sub-flow of the target model may be determined, and if the data amount is greater than the number threshold, the calculation resource in the central processing unit may be called to perform the processing. If the amount of data, which may be used to represent the byte size of the embedded table, may be megabytes (Mb) or simply Mb, is less than or equal to the number threshold, then the computing resources in the graphics processor may be invoked for processing. The data amount threshold may be a byte value of an embedded table preset in advance, or may be a byte value set by itself according to the size of an actual embedded column, for example, the size threshold may be set to 256MB in advance.

It should be noted that the size and the arrangement of the data amount threshold are merely illustrative, and are not limited herein. Any method and step of analyzing the operation information of the embedded column to determine which processor the embedded column should be mapped to is within the scope of the present application.

Alternatively, in this embodiment, not only the operation information but also the size of the data amount of the variable corresponding to the embedding table corresponding to the embedding column may be determined. If the operation information is judged to be that the character string operation is not executed on the embedded columns, and the data volume is smaller than or equal to the data volume threshold value in the process of executing the calculation operation on the embedded tables corresponding to the embedded columns, the embedded columns can be mapped into the corresponding graphics processors. If the operation information is judged to be that the character string operation is executed on the embedded columns, or the data volume is larger than the data volume threshold value, the embedded columns can be mapped into the corresponding central processing units.

Optionally, a portion of the deep neural network (Deep Neural Networks, simply DNN) in the original model may be placed onto the GPU. All of the embedded layers of the original model may be traversed. For each embedded column, the amount of data of a variable in the embedded table contained in each embedded column may be determined. If the data amount of the variable in the embedded table of the embedded layer is greater than the data amount threshold, the embedded column may be mapped entirely into the CPU.

Optionally, if the data amount of the variable in the embedded layer is less than or equal to the data amount threshold, all operators included in the embedded column may be traversed, and for each operator, it may be determined whether each operator belongs to a string operation. If the operator belongs to the character string operation, the operator belonging to the character string operation can be mapped into the CPU. After mapping the operator, it may continue to traverse other operators in the embedded column until the traversal is completed. If the current operator does not belong to the string operation, the operator that does not belong to the string operation may be mapped into the GPU.

Alternatively, all operators mapped to the embedded columns on the GPU may be fused into the same kernel.

In the embodiment of the application, based on the mode of the common operation of the CPU and the GPU, different embedded columns or operators corresponding to the embedded columns can be pertinently placed into a proper processor to execute calculation operation according to actual requirements according to different conditions, so that the display memory expenditure can be reduced, the purpose of fully utilizing the calculation resources in the idle CPU can be achieved, and the technical effect of improving the data processing efficiency is further realized.

As an alternative embodiment, the simplification process is performed on different calculation sub-flows, including: reconstructing the computation sub-flow based on description information of tensors in the computation sub-flow to obtain a target computation sub-flow, wherein the description information is used for describing attributes of the tensors, and the amount of computation resources required by the target computation sub-flow is smaller than that required by the computation sub-flow.

Optionally, if the computation sub-process needs to be simplified, the computation sub-process may be reconstructed based on description information of tensors in the computation sub-process, so as to obtain a target computation sub-process, where the description information may be a symbol expression of a shape of the tensors. The amount of computing resources required by the target computing sub-process is less than the amount of computing resources required by the computing sub-process. Reconstruction may simplify the computational graph by replacing operators with unified operators.

Alternatively, for each embedded column in the embedded layer, a symbolic expression of the shape of all tensors in the embedded column subgraph of the embedded column may be derived in the graph layer. For example, in the embedded column subgraph, the following symbol expressions corresponding to all tensor calculation nodes in the embedded column subgraph can be determined:<n ₀ >、<n ₁ >、<n ₀ ,8>etc. Note that the symbol expressions of the tensor calculation nodes are merely examples, and are not limited herein.

Alternatively, the shape computation subgraph where all the shape computation nodes in the embedded column subgraph are located may be determined, and the shape computation subgraph may be replaced by a unified operator, that is, a unified shape construction node, so as to simplify the embedded layer subgraph.

For example, after deriving the symbolic expressions of the shapes of all tensors in the embedded column subgraph, all the shape computation nodes with redundant shape computation information in the target embedded column subgraph may be replaced with uniform shape construction nodes, i.e., redundant computation information in the shape computation nodes is preliminarily deleted. After that, the tensor calculation nodes having redundant tensor calculation information in the embedded column subgraph can be deleted by the secondary simplification, that is, the redundant calculation information in the tensor calculation nodes is deleted twice, so that the embedded column which does not include the redundant calculation information can be obtained.

Optionally, the embedded columns may be adjusted, redundant calculation information in the embedded columns may be deleted, so as to obtain embedded columns that do not include redundant calculation information, and operators corresponding to the embedded columns at this time may be fused, so as to obtain kernel functions corresponding to the embedded columns, where adjusting the embedded columns may be adjusting the graph level of the embedded column subgraphs, where the redundant calculation information may be redundant shape calculation information or redundant tensor calculation information, and so on.

It should be noted that the redundant calculation information is only illustrative, and any process and method for deleting the redundant calculation information, thereby simplifying the embedded column, are within the scope of the embodiments of the present application, and are not illustrated herein.

Alternatively, the embedded columns in the embedded layer may be identified and analyzed to obtain all the embedded layers included in the embedded layer. For each embedded column subgraph therein, a symbolic expression of the shape of all tensors therein can be deduced. And then the embedded column subgraph deduced by the symbol expression can be replaced by a unified operator to simplify the embedded column subgraph, namely, redundant calculation information in the embedded column subgraph can be preliminarily deleted, and the preliminarily simplified embedded column subgraph is obtained. The redundancy security removal and the embedding search simplification can be performed on the primarily simplified embedded column subgraph, namely, the redundancy calculation information in the primarily simplified embedded column subgraph can be deleted for the second time, so that the secondarily simplified embedded column subgraph is obtained, namely, the adjusted embedded column which does not comprise the redundancy calculation information can be obtained. All simplified embedded column subgraphs in the embedded layer can be fused to obtain a fused operator. Further, a fused kernel function can be obtained.

For example, after deriving the symbolic expressions of the shapes of all tensors in the embedded column subgraph, all the shape computation nodes with redundant shape computation information in the target embedded column subgraph may be replaced with uniform shape construction nodes, i.e., redundant computation information in the shape computation nodes is preliminarily deleted. After that, the tensor calculation nodes having redundant tensor calculation information in the embedded column subgraph can be deleted by the secondary simplification, that is, the redundant calculation information in the tensor calculation nodes is deleted twice, so that the embedded column which does not include the redundant calculation information can be obtained. It should be noted that, the above-mentioned process and method for deleting redundant calculation information embedded in a column are all within the protection scope of the embodiment of the present application.

As an optional implementation manner, mapping the simplified computation sub-flow to a corresponding computation unit to obtain a computation unit set, including: and mapping the target calculation sub-flow after removing the data redundancy information and/or simplifying the data searching information into the corresponding calculation unit to obtain a calculation unit set.

In this embodiment, the data redundancy information of the target computing sub-process may be removed, and/or the data search information in the target computing sub-process may be simplified, and the processed target computing sub-process may be mapped to a corresponding computing unit, so as to obtain a computing unit set, where the processor may include a thread block for mapping the target computing sub-process. The data security information can be used for representing the data security performance of the embedded column, and can be used for removing redundant security assurance in the embedded column. The data lookup information may be used to lookup the data embedded in the embedded column. The data embedded in the embedded column may be the embedded find portion

Optionally, the embedded columns adjusted by removing redundant information of data and/or simplifying look-up information of data, etc. may be mapped into thread blocks of the processor.

Alternatively, each embedded column obtained through a series of adjustment processes of the image layers such as simplification, unification operator and the like can be fused into a GPU kernel, and each embedded column can be mapped onto a GPU Block.

In the embodiment of the application, a series of problems of a large number of embedded columns can be avoided by performing a series of image-level adjustment processes on the embedded columns, such as methods including shape calculation simplification, redundant safety guarantee removal, embedding search simplification and the like, so that the model to be processed can be updated by using the simplified and adjusted embedded columns if the simplified and adjusted embedded columns are mapped into thread blocks of a processor, and the technical effect of improving the processing efficiency of data is realized.

Alternatively, the embedded columns may be adjusted on the graph level, or the computational subgraphs corresponding to the embedded columns may be adjusted based on the context of the embedded columns.

Alternatively, the computational subgraph may be adjusted to eliminate redundant computational information by adjusting the computational subgraph by analysis of the context in the embedded column.

Alternatively, adjustments may be made at the graph level of the embedded column, i.e., adjustments may be made to the embedded column subgraph, thereby simplifying and adjusting the embedded column subgraph.

Optionally, the embedded columns may be adjusted at the image level by at least one of: the computational subgraphs corresponding to the description information of the tensors embedded in the columns can be converted into target operators. The data security information embedded in the columns may be removed. The data look-up information embedded in the columns can be simplified.

Optionally, redundant security (data security information) embedded in the columns may be removed.

Alternatively, the search portion embedded in the embedded column subgraph may be simplified.

Since a large number of embedded columns exist in the model to be processed, a plurality of problems exist, and therefore, the technical problem of low data processing efficiency can be caused. However, in the embodiment of the present application, the adjustment of the image level may be performed on all the embedded columns in the embedded layer, by a series of adjustment of the image level including calculation simplification, redundancy security removal, embedding search simplification, and the like. The aim of simplifying the calculation of the embedded columns is achieved, so that corresponding calculation subgraphs are obtained by simplifying the embedded column subgraphs and the like, and the simplified calculation subgraphs corresponding to all the embedded columns are fused, so that the aim of simplifying the embedded column subgraphs is achieved, and the technical effect of improving the data processing efficiency is achieved.

The software as a service (Software as a Service, simply referred to as SaaS) side of the embodiment of the application provides another data processing method. FIG. 4 is a flow chart of another data processing method according to an embodiment of the application, as shown in FIG. 4, which may include the steps of:

Step S402, a computing flow for describing the original model is obtained by calling a first interface, wherein the first interface comprises a first parameter, the parameter value of the first parameter is the computing flow, and computing resources are needed to be used for completing the computing flow when the original model is operated.

In the technical solution provided in the above step S402 of the present application, the computing flow of the original model may be obtained by calling a first interface, where the first interface may include a first parameter, and a parameter value of the first parameter may be the computing flow. The raw model may include a deep neural network and an embedded layer. The computational flow may be represented by a computational graph that may be used to describe at least the computational flow between the deep neural network and the embedded layer. Computational resources are required to complete the computational flow when running the original model.

Optionally, if a certain deep neural network model needs to be processed, the deep neural network model at this time is an original model, and the first interface can be called through a corresponding calling instruction to analyze the received original model, so that a corresponding calculation flow can be determined.

In step S404, at least one computation sub-process is identified from the computation processes.

In the technical scheme provided in the step S404 of the present application, the computing flow may be identified, and at least one computing sub-flow therein may be determined.

Alternatively, the analyzed computational graph may be identified, identifying all embedded columns in the embedded layer in the computational graph, wherein the embedded columns may be used to convert the feature to be processed of the deep neural network into a computational subgraph of the computational graph.

In step S406, simplification processing is performed on the different computation sub-flows, wherein the amount of computation resources required for the simplified computation sub-flow is smaller than the amount of computation resources required for the computation sub-flow before simplification.

In the technical solution provided in step S406 of the present application, simplification processing may be performed on different computation sub-flows, where the amount of computation resources required for the simplified computation sub-flow is smaller than the amount of computation resources required for the computation sub-flow before the simplification.

Alternatively, all embedded columns in the embedded layer can be identified from the computational graph of the original model, by which the feature to be processed can be converted into a computational subgraph. In the conversion process, the image level of the embedded column subgraph is adjusted, so that the purpose of simplifying the embedded column subgraph is achieved, therefore, the corresponding calculation subgraphs are obtained by simplifying the embedded column subgraphs, and the simplified calculation subgraphs corresponding to all the embedded columns are fused, thereby achieving the purpose of simplifying the embedded column subgraphs, and further achieving the technical effect of improving the data processing efficiency.

Step S408, the simplified computation sub-flow is mapped to the corresponding computation unit to obtain a computation unit set.

In the technical solution provided in the above step S408 of the present application, the simplified computation sub-flow may be mapped to a corresponding computation unit, so as to obtain a computation unit set.

Alternatively, the embedded columns may be mapped into corresponding processors, where the processors may be matched to the computing resources required for the embedded columns. The embedded column may be a computing operation performed by the processor based on the computing resource.

In step S410, the computing units included in the computing unit set are fused, and a target model corresponding to the original model is generated based on the fusion result, where different computing sub-flows of the target model need to be processed by adopting matched computing resources.

In the technical solution provided in the above step S410 of the present application, the computing units in the computing unit set may be subjected to fusion processing, and a target model corresponding to the original model may be generated based on the fusion result, where different computing sub-flows of the target model need to be processed by adopting matched computing resources.

Optionally, a kernel function corresponding to an operator after the computation subgraphs of all embedded columns are fused can be determined through CUDA, and the kernel function can be mapped into a processor.

Alternatively, after the kernel function is obtained, the embedded column may be analyzed to determine in which processor the embedded column needs to perform the computing operation, e.g., if the size of the embedded table contained in the embedded column reaches a certain threshold, the entire embedded column may be mapped into the CPU for processing. Based on the mode of the common operation of the CPU and the GPU, different embedded columns or operators corresponding to the embedded columns can be pertinently placed into a proper processor according to actual requirements to execute calculation operation according to different conditions, so that the memory overhead can be reduced, the purpose of fully utilizing the calculation resources in the idle CPU can be achieved, and the technical effect of improving the data processing efficiency is achieved.

Step S412, outputting the target model by calling a second interface, wherein the second interface includes a second parameter, and a parameter value of the second parameter is the target model.

In the technical solution provided in the above step S412 of the present application, the target model may be output by calling the second interface, where the second interface may include a second parameter, and a parameter value of the second parameter may be the target model.

Through the steps S402 to S412 of the present application, a computing process for describing an original model is obtained by calling a first interface, where the first interface includes a first parameter, a parameter value of the first parameter is the computing process, and computing resources are required to be used to complete the computing process when the original model is operated; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result, wherein different computing sub-flows of the target model need to be processed by adopting matched computing resources; the target model is output by calling the second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter is the target model, so that the technical effect of improving the processing efficiency of the data is realized, and the technical problem of low processing efficiency of the data is solved.

The embodiment of the application also provides an information recommending method from the application side, and fig. 5 is a flowchart of an information recommending method according to an embodiment of the application, and as shown in fig. 5, the method may include the following steps:

step S502, a calculation flow describing an original recommendation model is obtained, where the original recommendation model is used to determine service information to be recommended to a target object, and a calculation resource is required to be used to complete the calculation flow when the original recommendation model is operated.

In the technical solution provided in the above step S502 of the present application, a calculation process of obtaining an original recommendation model to be processed may be performed, where the recommendation model may be used to recommend service information to a target object based on historical behavior information of the target object, and the recommendation model may include a deep neural network and an embedding layer, and may be a deep neural network model, or may be referred to as a deep recommendation model. The computational flow may be represented by a computational graph. The computational graph may be used to describe at least a computational flow between the deep neural network and the embedded layer. The target object may be a customer or user who needs recommended service information. The historical behavior information may be information such as the name, category, and price of a file previously read by the target object, a browsed purchase connection, a viewed video, or an article purchased in praise. It should be noted that the above historical behavior information is merely illustrative, and is not limited herein. The information of recommending corresponding service information to the target object is within the protection scope of the embodiment of the application as long as the information of the target object such as preference can be analyzed. The service information may be used to represent information meeting the requirements of the target object, such as a video recommended to the target object, an item to be purchased, etc., and is merely illustrative and not particularly limited.

As an alternative example, historical behavior information of the target object may be input into the deep recommendation model, and the historical behavior information may be analyzed through the deep recommendation model, and service information meeting preferences may be recommended to the target object. However, the process of reasoning service information by the deep recommendation model is very time-consuming, and the performance of the recommended service information is reduced due to a large number of embedded columns in the deep recommendation model, so that the problem of low processing efficiency of the deep recommendation model still exists. In order to improve the efficiency of processing the depth recommendation model and solve the problem of a large number of embedded columns, in the embodiment of the application, corresponding simplified processing and the like can be performed on the embedded layer at the image layer, so that a series of problems caused by embedded columns are avoided.

In step S504, simplification processing is performed on the different computation sub-flows, wherein the amount of computation resources required for the simplified computation sub-flow is smaller than the amount of computation resources required for the computation sub-flow before simplification.

In the technical solution provided in the above step S504 of the present application, after the calculation flow of the original recommendation model to be processed is obtained, the calculation flow may be identified, a calculation sub-flow may be determined, and simplification processing may be performed on different calculation sub-flows, where the amount of calculation resources required for the simplified calculation sub-flow may be less than the amount of calculation resources required for the calculation sub-flow before simplification.

Optionally, the computational graph is identified, and all embedded columns in the embedded layer may be determined, wherein the embedded columns may be used to convert the to-be-processed features of the deep neural network into computational subgraphs of the computational graph.

Alternatively, an embedded column may be identified for the computation graph, at least one embedded column may be identified from the embedded layers in the computation graph, and a computation may be performed for each embedded column, where the feature to be processed in each embedded column is converted into a computation subgraph of the computation graph.

Step S506, mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set.

In the technical solution provided in the above step S506 of the present application, the simplified computation sub-flow may be mapped to a corresponding computation unit to obtain a computation unit set.

Optionally, after identifying the computation graph to obtain at least one embedded column of the embedded layer, the embedded column may be mapped into a corresponding processor, where the processor may be matched with computing resources required by the embedded column, and the embedded column may be a computing operation performed by the processor based on the computing resources.

In step S508, the computing units included in the computing unit set are subjected to fusion processing, and a target recommendation model corresponding to the original recommendation model is generated based on the fusion result, where different computing sub-flows of the target recommendation model need to be processed by adopting matched computing resources, so as to generate service information.

In the technical scheme provided in the step S508, the computing units included in the computing unit set may be subjected to fusion processing to obtain a fusion result. And processing the fusion result to generate a target recommendation model corresponding to the original recommendation model, wherein different calculation sub-flows of the target recommendation model need to be processed by adopting matched calculation resources for generating service information.

Alternatively, a symbolic expression in which the shapes of all tensors corresponding to the feature to be processed are derived may be derived for each embedded column. The embedded column subgraph can be reconstructed through unified shape calculation, namely, the embedded column subgraph deduced through the symbol expression is replaced by a unified operator for simplification, and a new embedded column subgraph after reconstruction is obtained. Redundant security removal and embedding search simplification can be performed on the reconstructed embedded column subgraph, and the simplified embedded column subgraph is obtained. All simplified embedded column subgraphs in the embedded layer can be fused to obtain a fused operator. For example, the embedded column subgraphs may be fused by a parallelism-oriented fusion method.

Optionally, after mapping the embedded column to the corresponding processor, a computing operation may be performed on the embedded column by the processor according to the computing resources required by the embedded column, and a computing result corresponding to the embedded column is determined, so that the recommendation model may be updated according to the computing result.

Optionally, by performing operations such as simplification processing on each embedded column and performing corresponding calculation operations on the processed embedded columns, a corresponding calculation result is obtained, and the initial depth recommendation model can be updated through the calculation result, for example, a kernel function or an operator corresponding to the embedded column can be added at a corresponding position in the initial depth recommendation model to update the depth recommendation model.

Through the steps S502 to S508, a calculation flow for describing an original recommendation model is obtained, wherein the original recommendation model is used for determining service information to be recommended to a target object, and calculation resources are needed to be used to complete the calculation flow when the original recommendation model is operated; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; and carrying out fusion processing on the computing units contained in the computing unit set, and generating a target recommendation model corresponding to the original recommendation model based on a fusion result, wherein different computing sub-flows of the target recommendation model are required to be processed by adopting matched computing resources so as to generate service information, thereby realizing the technical effect of improving the processing efficiency of data and solving the technical problem of low processing efficiency of the data.

Example 2

There is further provided, in accordance with an embodiment of the present application, an embodiment of a data processing system, FIG. 6 is a schematic diagram of a data processing system in accordance with an embodiment of the present application, as shown in FIG. 6, a data processing system 600 may include: client 601, cloud server 602, and computing end 603.

The client 601 is configured to detect a model processing request on the interactive interface, where the model processing request is used to request the cloud server to process the original model.

In this embodiment, the client 601 may detect a model processing request generated by a user performing a corresponding request operation on an interactive interface of the terminal device, where the model processing request may be used to request processing of an original model from a cloud server to obtain a target model. The terminal device may be a target device such as a mobile phone, a personal computer (Personal Computer, abbreviated as PC) or a tablet computer, which is only illustrated herein, and is not particularly limited.

Optionally, if the user needs to process the original model, corresponding input operations such as clicking can be performed on the interactive interface, so as to generate a corresponding model processing instruction. Based on the model processing instructions, a corresponding model processing request may be issued to cloud server 602 to process the original model.

The cloud server 602 is configured to obtain a calculation flow for describing the original model in response to the model processing request, where a calculation resource is required to be used to complete the calculation flow when the original model is run; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; and carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result.

In this embodiment, after detecting the model processing request, the cloud server 602 may acquire a calculation flow describing the original model, and identify a calculation sub-flow from the calculation flows, and may perform simplified processing on the calculation sub-flow. After the simplification process, the simplified computation sub-flow may be mapped into a corresponding computation unit to perform a fusion process, thereby generating a target model corresponding to the original model.

Optionally, by identifying the original model, it is determined that a calculation flow when the original model needs to be run by using calculation resources can be described, a calculation sub-flow can be identified from the calculation flows, the calculation flow can be simplified according to the amount of the calculation resources, and a simplified calculation sub-flow with a smaller amount of the calculation resources can be obtained. The computation sub-flow can be mapped into a corresponding computation unit to perform fusion processing, and a fusion result is generated. And generating a target model corresponding to the original model according to the fusion result, thereby achieving the purpose of updating the original model based on the fusion result. The target model may be sent to the computing side.

And the computing end 603 is used for calling computing resources matched with different computing sub-flows of the target model, and processing the corresponding computing sub-flows to obtain a computing result.

In this embodiment, after receiving the target model sent by the cloud server 602, the computing end 603 may call computing resources matched with different computing sub-flows of the target model to process the corresponding computing sub-flows, so as to determine a computing result.

The above-described method of this embodiment is further described below.

As an optional implementation manner, the cloud server identifies the calculation flow, obtains at least one variable of the original model, and determines a calculation sub-flow based on the variable; and/or the calculating end is used for calling the calculating sub-flow of the target model to adopt matched calculating resources based on the processing type to be processed in the calculating sub-flow of the target model.

In this embodiment, the computing flow may be identified by the cloud server, at least one variable in the original model may be identified, and the computing sub-flow may be determined based on the variable. The computing end can be used for calling the computing sub-flow of the target model based on the processing type to be processed of the computing sub-flow of the target model, and matched computing resources are needed.

Optionally, the computing process may be disassembled and identified to obtain at least one variable in the original model, and a computing sub-process in the computing process may be determined based on the variable, where the variable may be used to represent trainable data in the original model that matches a type of the computing sub-process, and the variable may be an embedded table variable, which may also be referred to as a trainable variable.

Optionally, the computational graph may be identified to obtain at least one embedded table of the original model, and an embedded column may be determined based on the embedded table, where operation information corresponding to the embedded table may be associated with the embedded column. The processing type corresponding to the computation sub-flow of the target model can be determined, and the computing resources to be matched with the computation sub-flow in the target model and the corresponding processor can be determined according to the processing type, wherein the processing type can be used for representing the type of the operation to be executed on the computation sub-flow of the target model.

The processing type can be judged, and when the processing type is the character string operation to be executed in the calculation sub-flow of the target model, the calculation resource in the central processing unit can be called at the moment. When the computing sub-flow of which the processing type is the target model is to perform a non-string operation, it is determined that computing resources in a graphics processor, which may also be referred to as an image processor, may be invoked at this time.

For example, if a string operation is performed on the embedded column, the embedded column at this time may be mapped into the CPU. If no string operation is performed on the embedded column, the embedded column at this time may be mapped into the GPU. It should be noted that the present invention is merely illustrative, and the operation information and the manner and procedure of mapping the operation information into the corresponding processor are not particularly limited.

As an optional implementation manner, the computing end is configured to send a computing result to the client; and the client is used for responding to the first modification operation acted on the interactive interface and modifying the calculation result.

In this embodiment, the calculation result may be transmitted to the client through the calculation terminal. When the client receives the calculation result, the calculation result can be modified correspondingly according to a first modification operation of whether the user modifies the calculation result or not.

Alternatively, the computing end may send the calculation result to the terminal device of the corresponding user in the client. The user can determine whether the calculation result needs to be modified according to the self requirement and the accuracy of the calculation result, if so, a first modification operation can be executed on the interactive interface of the terminal equipment, and the calculation result is modified correspondingly according to a modification instruction corresponding to the first modification operation.

As an optional implementation manner, the cloud server is configured to send a computing sub-flow to the client; and the client is used for responding to the second modification operation acted on the interactive interface, modifying the calculation sub-flow and sending the modified calculation sub-flow to the cloud server.

In this embodiment, the computing sub-flow may be sent to the client by the cloud server. When the client receives the computation sub-flow, the computation sub-flow can be modified correspondingly according to a second modification operation of a user of the client on whether the computation sub-flow needs to be modified.

Alternatively, the cloud server may send the computation sub-flow to the terminal device of the corresponding user in the client. The user can determine whether to modify the computation sub-flow according to the self requirement and the accuracy of the computation sub-flow, if so, a second modification operation can be executed on the interactive interface of the terminal device, and the computation sub-flow is modified correspondingly according to a modification instruction corresponding to the second modification operation.

As an optional implementation manner, the computing end is configured to send a computing result to the client; the cloud server is used for sending the calculation sub-flow to the client; and the client is used for responding to the query operation acted on the interactive interface and displaying the calculation result and the calculation sub-flow on the interactive interface.

In this embodiment, the final calculation result may be sent to the client through the calculation end, the final calculation sub-flow may be sent to the client through the cloud server, whether the query operation exists on the terminal device may be detected through the client, and if the query operation on the interactive interface is detected, the calculation result and the calculation sub-flow are displayed on the interactive interface.

In this embodiment, a data processing system is provided. The client 601 is configured to detect a model processing request on the interactive interface, where the model processing request is used to request the cloud server to process the original model; the cloud server 602 is configured to obtain a calculation flow for describing the original model in response to the model processing request, where a calculation resource is required to be used to complete the calculation flow when the original model is run; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result; the computing end 603 is used for calling computing resources matched with different computing sub-flows of the target model, processing the corresponding computing sub-flows to obtain a computing result, thereby realizing the technical effect of improving the processing efficiency of the data and solving the technical problem of low processing efficiency of the data.

It should be noted that, in the present application, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.), for example, the data for verification are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and are provided with corresponding operation entries for the user to select authorization or rejection.

Example 3

At present, embedded columns are very important for depth recommendation models to achieve high accuracy, but they are very time consuming in the reasoning process. In one aspect, manual adjustments may be made for different recommendation models. However, as models become more complex and more numerous, manual adjustments for all models become increasingly impossible. On the other hand, the machine learning compiler can automatically adjust the machine learning model, but the existing work cannot solve the three performance problems caused by embedding the model into the recommendation model. First, it is difficult to generate efficient GPU code for a large number of operators embedded in columns. Existing compilers may result in fragmented kernel execution and underutilization of sub-graph parallelism. Second, complex shape computation in dynamic shape scenarios prevents the compiler from making further computational graph level adjustments. Third, the machine learning framework inevitably introduces redundant computation in order to take robustness into account, resulting in a significant amount of performance overhead. Therefore, the application provides a machine learning compiler for automatically adjusting a large number of embedded columns in a recommendation model, which can effectively solve the performance problem. However, there is still a technical problem that the processing efficiency of data is low.

Optionally, the application provides a method for automatically adjusting a large number of embedded columns in a depth recommendation model, which solves the technical problem of low data processing efficiency, and is different from the conventional solution that the large number of embedded columns are not processed, so that the technical problem of low data processing efficiency existing through manual adjustment is avoided, and the technical problem of low data processing efficiency is solved.

In the embodiment of the application, the original model can be identified, so that the calculation flow which can describe the condition that the original model needs to be operated by using the calculation resources can be determined, the calculation sub-flow can be identified from the calculation flow, the calculation flow can be simplified according to the calculation resource amount, and the calculation sub-flow with less calculation resource amount after simplification can be obtained. The computation sub-flow can be mapped into a corresponding computation unit to perform fusion processing, and a fusion result is generated. The target model corresponding to the original model can be generated according to the fusion result, so that the purpose of updating the original model based on the fusion result is achieved, and due to the fact that a large number of problems caused by the calculation flow exist in the original model, the purpose of avoiding the problems caused by the calculation flow can be achieved by calculating the calculation sub-flow and the calculation resources required by the calculation sub-flow, the technical effect of improving the data processing efficiency is achieved, and the technical problem of low data processing efficiency is solved.

The above-described method of this embodiment is further described below.

In this embodiment, fig. 7 is a flowchart of compiling adjustment for a plurality of embedded columns in a depth recommendation model according to an embodiment of the present application, and as shown in fig. 7, the method may include the steps of:

in step S701, an embedded column is identified from the calculation map.

In the technical solution provided in the above step S701 of the present application, after the calculation map of the model to be processed is obtained, the calculation map may be identified, and the embedded columns in the embedded layer may be determined.

Alternatively, when a model with a deep neural network exists and needs to be subjected to embedded layer processing, the model can be determined as a model to be processed, and embedded vectors can be spliced to the deep neural network in the model to be processed.

FIG. 8 is a flow chart of identifying embedded columns according to an embodiment of the present application, as shown in FIG. 8, the method may include the steps of:

in step S801, all the embedded tables are identified from the calculation map.

In the technical scheme provided in the step S801, all variables of the embedded table in the calculation map can be identified according to the operator types applied after the trainable variables.

Step S802, for all nodes, the embedded tables contained in the predecessor nodes are counted.

In the technical solution provided in the above step S802 of the present application, several embedded tables may be included in the direct or indirect predecessor nodes corresponding to all the nodes.

Step S803, judging whether all the embedded tables are traversed.

In the technical solution provided in the above step S803 of the present application, all the embedded tables in the model to be processed may be traversed, and it may be determined whether all the embedded tables are traversed. For each embedded table, step S804 may be performed. If the traversal has completed, the flow may end.

Step S804, initializing an embedded column subgraph.

In the technical solution provided in the above step S804 of the present application, the embedded column subgraph corresponding to the current embedded table may be initialized.

Step S805, initialize the queue.

In the technical solution provided in the above step S805 of the present application, a queue for breadth-first traversal may be initialized.

Step S806, determining whether the queue is not empty.

In the technical solution provided in the above step S806 of the present application, it may be determined whether the queue is not empty. If the queue is not empty, step S807 may be performed. Otherwise, the process may jump to step S812.

Step S807 dequeues the node from the queue.

In the technical solution provided in the above step S807 of the present application, the node may be dequeued from the current queue.

Step S808, judging whether all the subsequent nodes are traversed.

In the technical solution provided in the above step S808 of the present application, all the successor nodes of the node may be traversed, and it may be determined whether all the successor nodes have been traversed. For each subsequent node, step S809 may be performed. If all the successor nodes have been traversed, then step S806 may be skipped.

And step S809, judging that the number of the precursor embedded tables of the nodes is less than or equal to 1.

In the technical scheme provided in the step S809, whether the number of the precursor embedded tables of the node is greater than or equal to 1 may be determined, and if yes, the step S810 may be executed. If the number of the precursor embedded tables is < 1, the process may jump to step S808.

Step S810, inserting the node into the current embedded column subgraph.

In the technical solution provided in the above step S810 of the present application, a node may be inserted into the current embedded column subgraph.

Step S811, enqueuing the node in a queue.

In the technical solution provided in the above step S811 of the present application, the node may be enqueued in the queue, and step S808 may be skipped.

Step S812, it is determined whether all nodes are traversed.

In the technical solution provided in the above step S812 of the present application, all nodes may be traversed, and it may be determined whether all nodes in the current embedded column subgraph have been traversed. For each node, step S813 may be performed. If all nodes have been traversed, the process may jump to step S815.

Step S813, determine whether to traverse all the precursor nodes.

In the technical scheme provided in the step S813, the method may traverse all the precursor nodes of the current node and determine whether all the precursor nodes of the current node are traversed. For each precursor node, step S814 may be performed. If all the precursor nodes have been traversed, the process may jump to step S815.

Step S814, inserts the node into the current embedded column subgraph.

In the technical solution provided in the above step S814 of the present application, a node may be inserted into the current embedded column subgraph.

Step S815, whether to update the embedded column subgraph.

In the technical solution provided in the above step S815 of the present application, it may be determined whether the embedded column subgraph is updated in the present round of iteration, if yes, step S812 may be skipped. Otherwise, the process may jump to step S803.

Step S702, performing image level adjustment on the embedded columns.

In the technical solution provided in the above step S702 of the present application, the embedded columns may be adjusted on the image level.

Since a large number of embedded columns exist in the model to be processed, a plurality of problems exist, and therefore, the technical problem of low data processing efficiency can be caused. In the embodiment of the application, all embedded columns in the embedded layer can be identified from the computational graph of the model to be processed, and the characteristics to be processed can be converted into the computational subgraph through the embedded columns. In the conversion process, the image level of the embedded column subgraph is adjusted, so that the purpose of simplifying the embedded column subgraph is achieved, therefore, the corresponding calculation subgraphs are obtained by simplifying the embedded column subgraphs, and the simplified calculation subgraphs corresponding to all the embedded columns are fused, thereby achieving the purpose of simplifying the embedded column subgraphs, and further achieving the technical effect of improving the data processing efficiency.

Alternatively, for each embedded column in the embedded layer, a symbolic expression of the shape of all tensors in the embedded column subgraph of the embedded column may be derived in the graph layer. For example, in the embedded column subgraph in fig. 7, the following symbol expressions corresponding to all tensor calculation nodes in the embedded column subgraph can be determined:<n ₀ >、<n ₁ >、<n ₀ ,8>etc.

In step S703, operators between subgraphs are fused.

In the technical solution provided in the above step S703 of the present application, operator fusion may be performed between subgraphs.

Alternatively, all simplified embedded column subgraphs in the embedded layer may be fused to obtain a fused operator. For example, the embedded column subgraphs may be fused by a parallelism-oriented fusion method.

In step S704, the CPU operates together with the GPU.

In the technical solution provided in step S704, after obtaining the kernel function, the embedded column may be analyzed to determine in which processor the embedded column needs to perform the computing operation, for example, if the size of the embedded table included in the embedded column reaches a certain threshold, the entire embedded column may be mapped into the CPU for processing. Based on the mode of the common operation of the CPU and the GPU, different embedded columns or operators corresponding to the embedded columns can be pertinently placed into a proper processor according to actual requirements to execute calculation operation according to different conditions, so that the memory overhead can be reduced, the purpose of fully utilizing the calculation resources in the idle CPU can be achieved, and the technical effect of data processing efficiency can be achieved.

FIG. 9 is a flowchart of a CPU co-operating with a GPU according to an embodiment of the present application, as shown in FIG. 9, the method may include the steps of:

in step S901, the DNN part is placed on the GPU.

In the technical solution provided in the above step S901 of the present application, the DNN portion in the given model to be processed may be placed on the GPU.

Step S902, it is determined whether all embedded columns are traversed.

In the technical solution provided in the above step S902 of the present application, the embedded columns in the model to be processed may be traversed, and whether all the embedded columns are traversed is determined. For each embedded column, step S903 may be performed. If all the embedded columns have been traversed, the process may jump to step S909.

In step S903, it is determined whether the size of the embedded table reaches a threshold.

In the technical solution provided in the above step S903 of the present application, it may be determined whether the size of the embedded table reaches the threshold, if so, step S908 may be skipped, otherwise, step S904 may be performed.

Step S904, determining whether the operator embedded in the column is traversed.

In the technical solution provided in the above step S904 of the present application, all operators included in the embedded column may be traversed, and it may be determined whether all operators are traversed. For each operator, step S905 may be performed. If all operators have been traversed, then step S902 may be skipped.

In step S905, it is determined whether the operator belongs to a string operation.

In the technical solution provided in the above step S905 of the present application, it may be determined whether the current operator belongs to a string operation, if so, step S906 may be executed. If it does not belong to the string operation, step S907 may be performed.

In step S906, an operator is placed on the CPU.

In the technical solution provided in the above step S906 of the present application, operators belonging to the string operation may be placed on the CPU. And jumps to step S904.

In step S907, operators are placed on the GPU.

In the technical solution provided in the above step S907 of the present application, operators not belonging to the string operation may be placed on the GPU. And jumps to step S904.

In step S908, the embedded column is placed on the CPU.

In the solution provided in step S908 of the present application, the entire embedded column may be placed on the GPU. And jumps to step S902.

In step S909, the embedded column on the GPU is fused.

In the technical solution provided in the above step S909 of the present application, all operators embedded in columns placed on the GPU may be fused into one kernel function.

In step S910, the CPU and GPU transmissions are combined.

In the technical solution provided in the above step S910 of the present application, the copy transmission between the CPU and the GPU may be combined.

In the embodiment of the application, the computing process between the deep neural network and the embedded layer and the embedded column in the embedded layer which can describe the model to be processed can be identified by identifying the computing graph of the model to be processed, the processor matched with the computing resources required by the embedded column can be determined, the identified embedded column can be mapped into the corresponding processor, the computing operation can be carried out on the embedded column according to the matched computing resources by the processor, the computing result corresponding to the embedded column is obtained, and therefore, the initial model to be processed can be updated based on the computing result.

Example 4

According to an embodiment of the present application, there is also provided a data processing apparatus for implementing the data processing method shown in fig. 3.

Fig. 10 is a schematic diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 10, the data processing apparatus 1000 may include: a first acquisition unit 1002, a first identification unit 1004, a first simplification unit 1006, a first mapping unit 1008, and a first processing unit 1010.

The first obtaining unit 1002 is configured to obtain a calculation flow for describing the original model, where a calculation resource is required to be used to complete the calculation flow when the original model is run.

The first identifying unit 1004 is configured to identify at least one computation sub-process from the computation processes.

A first simplifying unit 1006, configured to perform simplification processing on different computation sub-flows, where the amount of computation resources required for the simplified computation sub-flows is smaller than the amount of computation resources required for the computation sub-flows before the simplification.

And the first mapping unit 1008 is configured to map the simplified computation sub-flow to a corresponding computation unit, so as to obtain a computation unit set.

The first processing unit 1010 is configured to perform fusion processing on the computing units included in the computing unit set, and generate a target model corresponding to the original model based on the fusion result, where different computing sub-flows of the target model need to be processed by adopting matched computing resources.

Here, the above-described first obtaining unit 1002, first identifying unit 1004, first simplifying unit 1006, first mapping unit 1008, and first processing unit 1010 correspond to steps S302 to S310 in embodiment 1, and five units are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1 above. It should be noted that the above units may be hardware components or software components stored in a memory or a database in the cloud server and processed by one or more processors, or the above units may also be run in the cloud server as part of the apparatus.

According to an embodiment of the present application, there is also provided a data processing apparatus for implementing the data processing method shown in fig. 4.

Fig. 11 is a schematic diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 11, the data processing apparatus 1100 may include: a first invoking unit 1102, a second identifying unit 1104, a second simplifying unit 1106, a second mapping unit 1108, a second processing unit 1110 and a second invoking unit 1112.

The first calling unit 1102 is configured to obtain a calculation flow for describing the original model by calling a first interface, where the first interface includes a first parameter, a parameter value of the first parameter is the calculation flow, and when the original model is run, a calculation resource is required to be used to complete the calculation flow.

The second identifying unit 1104 is configured to identify at least one computation sub-process from the computation processes.

A second simplifying unit 1106, configured to perform simplification processing on different computation sub-flows, where the amount of computation resources required for the simplified computation sub-flow is smaller than the amount of computation resources required for the computation sub-flow before the simplification.

The second mapping unit 1108 is configured to map the simplified computation sub-flow to a corresponding computation unit, so as to obtain a computation unit set.

And a second processing unit 1110, configured to perform fusion processing on the computing units included in the computing unit set, and generate a target model corresponding to the original model based on the fusion result, where different computing sub-flows of the target model need to be processed by adopting matched computing resources.

A second invoking unit 1112, configured to output the target model by invoking a second interface, where the second interface includes a second parameter, and a parameter value of the second parameter is the target model.

Here, the first calling unit 1102, the second identifying unit 1104, the second simplifying unit 1106, the second mapping unit 1108, the second processing unit 1110 and the second calling unit 1112 correspond to steps S402 to S412 in embodiment 1, and the six units are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1. It should be noted that the above units may be hardware components or software components stored in a memory or a database in the cloud server and processed by one or more processors, or the above units may also be run in the cloud server as part of the apparatus.

According to an embodiment of the present application, there is also provided an information recommendation apparatus for implementing the information recommendation method shown in fig. 5.

Fig. 12 is a schematic diagram of an information recommendation apparatus according to an embodiment of the present application, and as shown in fig. 12, the information recommendation apparatus 1200 may include: a second acquisition unit 1202, a third simplification unit 1204, a third mapping unit 1206, and a third processing unit 1208.

The second obtaining unit 1202 is configured to obtain a calculation flow for describing an original recommendation model, where the original recommendation model is used for determining service information to be recommended to the target object, and a calculation resource is required to be used to complete the calculation flow when the original recommendation model is run.

A third simplifying unit 1204, configured to perform simplification processing on different computation sub-flows, where the amount of computation resources required for the simplified computation sub-flows is smaller than the amount of computation resources required for the computation sub-flows before the simplification.

The third mapping unit 1206 is configured to map the simplified computation sub-flow to a corresponding computation unit, so as to obtain a computation unit set.

The third processing unit 1208 is configured to perform fusion processing on the computing units included in the computing unit set, and generate a target recommendation model corresponding to the original recommendation model based on the fusion result, where different computing sub-flows of the target recommendation model need to be processed by using matched computing resources to generate service information.

Here, the second acquiring unit 1202, the third simplifying unit 1204, the third mapping unit 1206, and the third processing unit 1208 correspond to steps S502 to S508 in embodiment 1, and the four units are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1. It should be noted that the above units may be hardware components or software components stored in a memory or a database in the cloud server and processed by one or more processors, or the above units may also be run in the cloud server as part of the apparatus.

In the data processing device, the original model can be identified, so that the calculation flow which can describe the condition that the original model needs to be operated by using the calculation resources can be determined, the calculation sub-flow can be identified from the calculation flow, the calculation flow can be simplified according to the magnitude of the calculation resource, and the calculation sub-flow with less calculation resource after simplification can be obtained. The computation sub-flow can be mapped into a corresponding computation unit to perform fusion processing, and a fusion result is generated. The target model corresponding to the original model can be generated according to the fusion result, so that the purpose of updating the original model based on the fusion result is achieved, a plurality of problems caused by a large number of calculation flows exist in the original model, the purpose of avoiding the problems of reduced performance and the like of the model caused by introducing redundant calculation due to a large number of calculation flows can be achieved by calculating calculation sub-flows and calculation resources required by the calculation sub-flows, the technical effect of improving the processing efficiency of data is achieved, and the technical problem of low processing efficiency of the data is solved.

Example 5

Embodiments of the present application may provide a computer terminal, which may be any one of a group of computer terminals. Alternatively, in the present embodiment, the above-described computer terminal may be replaced with a terminal device such as a mobile terminal.

Alternatively, in this embodiment, the above-mentioned computer terminal may be located in at least one network device among a plurality of network devices of the computer network.

In this embodiment, the above-mentioned computer terminal may acquire the history behavior information generated by the relevant operation of the target object, for example, the clicking operation or the like in the application software in the computer terminal, and the cloud server may receive the history behavior information from the computer terminal and may store the history behavior information in the database in the cloud server. Program code for the following steps in the data processing method may be executed by the cloud server: acquiring a calculation flow for describing an original model, wherein calculation resources are needed to be used for completing the calculation flow when the original model is operated; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; and carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result, wherein different computing sub-flows of the target model are required to be processed by adopting matched computing resources.

Optionally, after updating the original model, the historical behavior information of the target object in the database of the cloud server can be called according to the updated target model to perform reasoning, so that final service information conforming to the target object is obtained, and the service information is returned to the computer terminal of the target object, so that the service information is recommended to the target object.

Alternatively, fig. 13 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in fig. 13, the computer terminal a may include: one or more (only one is shown) processors 1302, memory 1304, and transmission means 1306. The processor 1302 may be configured to process historical behavior information of the target object in the application software and transmit the processed historical behavior information to the transmitting device 1306. The memory 1304 may be used to store service information that is ultimately received to conform to the target object. The transmitting device 1306 may be configured to transmit the historical behavior information to the cloud server for processing, or may receive service information recommended by the cloud server to the target object.

The cloud server may call information and applications stored in its own memory to perform the following steps: acquiring a calculation flow for describing an original model, wherein calculation resources are needed to be used for completing the calculation flow when the original model is operated; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; and carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result, wherein different computing sub-flows of the target model are required to be processed by adopting matched computing resources.

Optionally, the cloud server may further execute program code that includes: identifying at least one computational sub-process from the computational processes, comprising: disassembling the calculation flow, and identifying at least one variable of the original model, wherein the variable is used for representing trainable data in the original model and is matched with the type of the calculation sub-flow; a computational sub-flow is determined based on the variables. The type of the computation sub-flow may be an operator type.

Optionally, the cloud server may further execute program code that includes: determining a computational sub-process based on the variables, comprising: determining an initial calculation sub-flow corresponding to the variable; and updating the calculation sub-flow into a calculation sub-flow by utilizing a precursor calculation node of the calculation node in the initial calculation sub-flow.

Optionally, the cloud server may further execute program code that includes: determining a processing type corresponding to the calculation sub-flow of the target model, wherein the processing type is used for representing the type of operation to be executed on the calculation sub-flow of the target model; based on the processing type, the computational sub-process of determining the target model requires the use of matched computational resources.

Optionally, the cloud server may further execute program code that includes: based on the processing type, determining the computational sub-flow of the target model requires the use of matched computational resources, including: responding to the processing type as the character string operation to be executed for the calculation sub-flow of the target model, and determining to call the calculation resource in the central processing unit; and determining to call computing resources in the graphics processor in response to the processing type being a non-string operation to be performed for the computing sub-flow of the target model.

Optionally, the cloud server may further execute program code that includes: determining to invoke computing resources in the central processing unit in response to the data volume of the variable corresponding to the computing sub-flow of the target model being greater than a data volume threshold; and determining to invoke the computing resource in the graphics processor in response to the data amount of the variable corresponding to the computing sub-flow of the target model being not greater than the data amount threshold.

Optionally, the cloud server may further execute program code that includes: simplifying the different calculation sub-flows, including: reconstructing the computation sub-flow based on description information of tensors in the computation sub-flow to obtain a target computation sub-flow, wherein the description information is used for describing attributes of the tensors, and the amount of computation resources required by the target computation sub-flow is smaller than that required by the computation sub-flow.

Optionally, the cloud server may further execute program code that includes: mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set, wherein the calculation unit set comprises: and mapping the target calculation sub-flow after removing the data redundancy information and/or simplifying the data searching information into the corresponding calculation unit to obtain a calculation unit set.

The cloud server may execute the following steps by calling information and an application program stored in its own memory: acquiring a calculation flow for describing an original model by calling a first interface, wherein the first interface comprises a first parameter, the parameter value of the first parameter is the calculation flow, and calculation resources are needed to be used for completing the calculation flow when the original model is operated; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result, wherein different computing sub-flows of the target model need to be processed by adopting matched computing resources; and outputting the target model by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter is the target model.

The cloud server may execute the following steps by calling information and an application program stored in its own memory: acquiring a calculation flow for describing an original recommendation model, wherein the original recommendation model is used for determining service information to be recommended to a target object, and calculation resources are needed to be used for completing the calculation flow when the original recommendation model is operated; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; and carrying out fusion processing on the computing units contained in the computing unit set, and generating a target recommendation model corresponding to the original recommendation model based on a fusion result, wherein different computing sub-flows of the target recommendation model are required to be processed by adopting matched computing resources so as to generate service information.

By adopting the embodiment of the application, a data processing method is provided. In the embodiment of the application, the original model can be identified, so that the calculation flow which can describe the condition that the original model needs to be operated by using the calculation resources can be determined, the calculation sub-flow can be identified from the calculation flow, the calculation flow can be simplified according to the calculation resource amount, and the calculation sub-flow with less calculation resource amount after simplification can be obtained. The computation sub-flow can be mapped into a corresponding computation unit to perform fusion processing, and a fusion result is generated. The target model corresponding to the original model can be generated according to the fusion result, so that the purpose of updating the original model based on the fusion result is achieved, a plurality of problems caused by a large number of calculation flows exist in the original model, the purpose of avoiding the problems of reduced performance and the like of the model caused by introducing redundant calculation due to a large number of calculation flows can be achieved by calculating calculation sub-flows and calculation resources required by the calculation sub-flows, the technical effect of improving the processing efficiency of data is achieved, and the technical problem of low processing efficiency of the data is solved.

It will be understood by those skilled in the art that the structure shown in fig. 13 is only schematic, and the computer terminal a may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobile Internet Devices, abbreviated as MID), a PAD, etc. Fig. 13 does not limit the structure of the computer terminal a. For example, the computer terminal a may also include more or fewer components (such as a network interface, a display device, etc.) than shown in fig. 13, or have a different configuration than shown in fig. 13.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: display means for displaying information to a user, e.g. a Cathode Ray Tube (CRT) or a liquid crystal display, a monitor (LCD); and a keyboard and pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be noted that, the foregoing reference numerals of the embodiments of the present application are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and are merely a logical functional division, and there may be other manners of dividing the apparatus in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. A method of data processing, comprising:

acquiring a calculation flow for describing an original model, wherein calculation resources are needed to be used for completing the calculation flow when the original model is operated;

identifying at least one computational sub-process from the computational processes;

simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification;

mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set;

and carrying out fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result, wherein different computing sub-flows of the target model are required to be processed by adopting matched computing resources.

2. The method of claim 1, wherein identifying at least one computational sub-process from the computational processes comprises:

Disassembling the calculation flow, and identifying at least one variable of the original model, wherein the variable is used for representing trainable data in the original model and is matched with the type of the calculation sub-flow;

the computational sub-process is determined based on the variables.

3. The method of claim 2, wherein determining the computational sub-process based on the variable comprises:

determining an initial calculation sub-flow corresponding to the variable;

and updating the initial computing sub-process into the computing sub-process by utilizing a precursor computing node of the computing node in the initial computing sub-process.

4. The method according to claim 1, wherein the method further comprises:

determining a processing type corresponding to a calculation sub-flow of the target model, wherein the processing type is used for representing the type of an operation to be executed on the calculation sub-flow of the target model;

based on the processing type, determining that the computational sub-flow of the target model requires the adoption of matched computational resources.

5. The method of claim 4, wherein determining that the computational sub-flow of the object model requires adoption of matched computational resources based on the processing type comprises:

Determining to call computing resources in a central processor in response to the processing type being a string operation to be executed for a computing sub-flow of the target model;

and determining to call computing resources in a graphics processor in response to the processing type being a non-string operation to be performed for a computing sub-flow of the target model.

6. The method according to claim 1, wherein the method further comprises:

determining to invoke computing resources in a central processing unit in response to the data volume of the variable corresponding to the computing sub-flow of the target model being greater than a data volume threshold;

and determining to call the computing resource in the graphic processor in response to the data volume of the variable corresponding to the computing sub-flow of the target model not being greater than the data volume threshold.

7. The method according to any one of claims 1 to 6, wherein the simplification of the different computational sub-flows comprises:

reconstructing the computation sub-flow based on description information of tensors in the computation sub-flow to obtain a target computation sub-flow, wherein the description information is used for describing attributes of the tensors, and the amount of computation resources required by the target computation sub-flow is smaller than that required by the computation sub-flow.

8. The method of claim 7, wherein the method further comprises:

removing data redundancy information in the target computing sub-process, wherein the data redundancy information is used for at least representing the data security performance of the target computing sub-process; and/or the number of the groups of groups,

simplifying data searching information in the target computing sub-process, wherein the data searching information is used for searching the data embedded in the target computing sub-process.

9. The method of claim 8, wherein mapping the simplified computation sub-flow into a corresponding computation unit to obtain a computation unit set comprises:

and mapping the target calculation sub-flow after removing the data redundancy information and/or simplifying the data searching information into the corresponding calculation unit to obtain the calculation unit set.

10. A method of data processing, comprising:

acquiring a calculation flow for describing an original model by calling a first interface, wherein the first interface comprises a first parameter, the parameter value of the first parameter is the calculation flow, and calculation resources are needed to be used for completing the calculation flow when the original model is operated;

fusing the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result, wherein different computing sub-flows of the target model need to be processed by adopting matched computing resources;

and outputting the target model by calling a second interface, wherein the second interface comprises a second parameter, and the parameter value of the second parameter is the target model.

11. An information recommendation method, applied to a compiler, comprising:

acquiring a calculation flow for describing an original recommendation model, wherein the original recommendation model is used for determining service information to be recommended to a target object, and calculation resources are needed to be used for completing the calculation flow when the original recommendation model is operated;

and carrying out fusion processing on the computing units contained in the computing unit set, and generating a target recommendation model corresponding to the original recommendation model based on a fusion result, wherein different computing sub-flows of the target recommendation model need to be processed by adopting matched computing resources so as to generate the service information.

12. A data processing system, comprising:

the cloud server is used for receiving a model processing request from the client, and sending the model processing request to the client;

the cloud server is used for responding to the model processing request and acquiring a calculation flow for describing an original model, wherein calculation resources are needed to be used for completing the calculation flow when the original model is operated; identifying at least one computational sub-process from the computational processes; simplifying different computation sub-flows, wherein the amount of computation resources required by the simplified computation sub-flows is less than that required by the computation sub-flows before simplification; mapping the simplified calculation sub-flow to a corresponding calculation unit to obtain a calculation unit set; performing fusion processing on the computing units contained in the computing unit set, and generating a target model corresponding to the original model based on a fusion result;

And the computing end is used for calling computing resources matched with different computing sub-flows of the target model and processing the corresponding computing sub-flows to obtain a computing result.

13. The system of claim 12, wherein the system further comprises a controller configured to control the controller,

the cloud server identifies the calculation flow to obtain at least one variable of the original model, and determines the calculation sub-flow based on the variable; and/or the number of the groups of groups,

the computing end is used for calling the computing sub-flow of the target model to adopt matched computing resources based on the processing type to be processed of the computing sub-flow of the target model.

14. The system of claim 12, wherein the system further comprises a controller configured to control the controller,

the computing end is used for sending the computing result to the client;

the client is used for responding to a first modification operation acted on the interactive interface and modifying the calculation result.

15. The system of claim 12, wherein the system further comprises a controller configured to control the controller,

the cloud server is used for sending the computing sub-flow to the client;

the client is used for responding to a second modification operation acting on the interactive interface, modifying the calculation sub-flow and sending the modified calculation sub-flow to the cloud server.

16. The system of claim 12, wherein the system further comprises a controller configured to control the controller,

the computing end is used for sending the computing result to the client;

the cloud server is used for sending the computing sub-flow to the client;

and the client is used for responding to the query operation acted on the interactive interface and displaying the calculation result and the calculation sub-flow on the interactive interface.

17. An electronic device, comprising: a memory and a processor; the memory is configured to store computer executable instructions, the processor being configured to execute the computer executable instructions, which when executed by the processor, implement the steps of the method of any one of claims 1 to 11.