WO2021197144A1 - 一种通信方法及装置 - Google Patents

一种通信方法及装置 Download PDF

Info

Publication number
WO2021197144A1
WO2021197144A1 PCT/CN2021/082483 CN2021082483W WO2021197144A1 WO 2021197144 A1 WO2021197144 A1 WO 2021197144A1 CN 2021082483 W CN2021082483 W CN 2021082483W WO 2021197144 A1 WO2021197144 A1 WO 2021197144A1
Authority
WO
WIPO (PCT)
Prior art keywords
side device
cloud
resource
artificial intelligence
computing
Prior art date
Application number
PCT/CN2021/082483
Other languages
English (en)
French (fr)
Inventor
林霖
唐朋成
梁琪
刁文波
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021197144A1 publication Critical patent/WO2021197144A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Definitions

  • the embodiments of the present application relate to the field of communication technologies, and in particular, to a communication method and device.
  • the dedicated accelerator may be, for example, a neural-network processing unit (NPU), a graphics processing unit (GPU), a field programmable gate array (FPGA), and the like.
  • NPU neural-network processing unit
  • GPU graphics processing unit
  • FPGA field programmable gate array
  • dedicated accelerators are set up inside end-side devices to implement software and application calculations.
  • the computing capabilities of the dedicated accelerators installed in some end-side devices may still not meet the requirements of all software and applications.
  • the embodiments of the present application provide a communication method and device, which are used to solve the possible problem that the computing capability of the dedicated accelerator provided in the end-side device cannot meet the requirements of software and applications.
  • an embodiment of the present application provides a communication method, which is applied to an end-side device, for example, it may be executed by a chip or a chip system of the end-side device.
  • the communication method includes: the end-side device sends a resource application request to the cloud-side device, and provides the cloud-side device with a first artificial intelligence model required to realize artificial intelligence processing; the resource application request is used to request the realization of artificial intelligence functions.
  • the end-side device receives a loading completion message sent by the cloud-side device, and the loading completion message is used to indicate that the computing resources allocated by the cloud-side device to the end-side device based on the resource application request have successfully loaded the first artificial intelligence model; Further, the end-side device provides the first to-be-analyzed data to the cloud-side device, so that the cloud-side device can infer the first to-be-analyzed data based on the first artificial intelligence model to obtain the first inference result, and send it to the end-side device, and then the end-side device The side device receives the first inference result of the first to-be-analyzed data sent by the cloud-side device; where the first inference result is obtained by running the first artificial intelligence model based on the first to-be-analyzed data.
  • the end-side device applies for computing resources from the cloud-side device according to its own needs, and then the cloud-side device combines the computing resources to assist the end-side device to perform model loading and use the loaded model to perform the calculation of the data to be analyzed, even if the end-side device
  • the computing power of the internal dedicated calculator cannot meet the requirements of software and applications, and it can use cloud-side equipment to realize data reasoning.
  • application development does not need to be combined with the deployment of cloud-side devices.
  • the application side only needs to send the models and data that need to be uninstalled to the cloud-side device to the cloud-side device, which simplifies the development difficulty of application developers and reduces the workload of developers .
  • the above method may also include:
  • the end-side device sends a computing power service registration request to the cloud-side device, and the computing power service registration request is used to request the cloud-side device to provide computing power services for the end-side device;
  • the end-side device receives the computing power service registration response sent by the cloud-side device, and the computing power service registration response is used to indicate that the end-side device has successfully requested the computing power service of the cloud-side device.
  • the end-side device can register with the cloud side to apply for the computing power service in advance, can apply for the computing power service as required, and subsequently for the user, can use the computing power service provided by the cloud side without perception.
  • the end-side device provides the cloud-side device with the first artificial intelligence model required to implement artificial intelligence processing, including:
  • the end-side device sends the first artificial intelligence model to the cloud-side device; or,
  • the end-side device sends the download address of the first artificial intelligence model to the cloud-side device.
  • the end-side device providing the first to-be-analyzed data to the cloud-side device includes:
  • the end-side device sends the first data to be analyzed to the cloud-side device; or,
  • the end-side device sends the download address of the first artificial intelligence model to the cloud-side device.
  • the method further includes:
  • the end-side device provides the second to-be-analyzed data to the cloud-side device, and receives the second inference result of the second to-be-analyzed data sent by the cloud-side device; wherein the second inference result is based on the second to-be-analyzed data to run the first manual Obtained from the smart model.
  • the method further includes:
  • the end-side device provides the second artificial intelligence model to the cloud-side device and provides the third data to be analyzed to the cloud-side device; the end-side device receives the third inference result of the third data to be analyzed sent by the cloud-side device; where the third The inference result is obtained by running the first artificial intelligence model on the computing resources based on the second data to be analyzed.
  • the end-side device obtains the reasoning result of the first data to be analyzed, if it is necessary to use other artificial intelligence models for reasoning for the second data to be analyzed, only the second data to be analyzed and the required artificial intelligence are needed.
  • the model can be sent to the cloud-side device, and there is no need to apply for resources again, and the implementation is relatively simple.
  • it further includes: after the end-side device completes the use of computing resources, it sends a resource release request to the cloud-side device, the resource release request is used to request the release of computing resources; the end-side device receives the resource sent by the cloud-side device Release response.
  • the resource release response is used to indicate the successful release of computing resources and the artificial intelligence model of computing resource operation.
  • the end-side device before the end-side device sends the resource application request to the cloud-side device, it further includes: the end-side device determines that part or all of the tasks processed by the artificial intelligence are processed by the cloud-side device.
  • the end-side device can determine whether it needs the cloud-side device to assist in performing artificial intelligence processing tasks based on its own situation.
  • it also includes:
  • the end-side device determines that part of the tasks processed by artificial intelligence is processed by the cloud-side device, the end-side device disassembles the artificial intelligence model to be used into the first artificial intelligence model and the third artificial intelligence model;
  • the end-side device Before the end-side device provides the first data to be analyzed to the cloud-side device, the end-side device loads the third artificial intelligence model, and when the end-side device receives the loading completion message sent by the cloud-side device, it splits the data to be analyzed into the first Data to be analyzed and the fourth data to be analyzed;
  • the end-side device splits the to-be-analyzed data into the first to-be-analyzed data and the fourth to-be-analyzed data, it runs the loaded third artificial intelligence model to infer the fourth to-be-analyzed data to obtain a fourth inference result;
  • the end-side device After receiving the first inference result, the end-side device performs fusion processing on the first inference result and the fourth inference result.
  • the end-side device can offload part of the business to the cloud-side device based on its own situation, which can reduce the burden on the accelerator of the end-side device.
  • the embodiments of the present application provide another communication method, which is applied to cloud-side devices.
  • it can be implemented by a chip or a chip system on a cloud-side device.
  • the method may include; the cloud-side device receives a resource application request from the end-side device, and obtains the first artificial intelligence model provided by the end-side device for realizing artificial intelligence processing, and the resource application request is used to request the realization of artificial intelligence functions.
  • the cloud-side device allocates computing resources to the end-side device according to the resource application request; after the cloud-side device successfully loads the first artificial intelligence model through the computing resources, it sends a loading complete message to the end-side device, and the loading complete message is used to indicate
  • the computing resources on the cloud-side device have successfully loaded the first artificial intelligence model; the cloud-side device obtains the first to-be-analyzed data provided by the end-side device, and the first inference is obtained by running the first artificial intelligence model on the first to-be-analyzed data Result; and send the first inference result to the end-side device.
  • the end-side device applies for computing resources from the cloud-side device according to its own needs, and then the cloud-side device combines the computing resources to assist the end-side device to perform model loading and use the loaded model to perform the calculation of the data to be analyzed, even if the end-side device
  • the computing power of the internal dedicated calculator cannot meet the requirements of software and applications, and it can use cloud-side equipment to realize data reasoning.
  • the application development does not need to be combined with the deployment of the cloud-side device.
  • the application side only needs to send the model and data that need to be uninstalled to the cloud-side device to the cloud-side device, which simplifies the development difficulty of application developers and reduces the workload of developers .
  • the cloud-side device before the cloud-side device receives the resource application request from the end-side device, it further includes: the cloud-side device receives the computing power service registration request sent by the end-side device, and the computing power service registration request is used to request the cloud side
  • the device provides computing power services for the end-side device users; the cloud-side device sends a computing power service registration response to the end-side device, and the computing power service registration response is used to indicate that the end-side device user has successfully requested the cloud-side device's computing power service .
  • the end-side device can register with the cloud side to apply for the computing power service in advance, can apply for the computing power service as required, and subsequently for the user, can use the computing power service provided by the cloud side without perception.
  • the computing power service registration request carries computing resource information, and computing resource information is used to characterize the computing power specifications applied for by the end-side device;
  • the computing power service registration response carries the cloud-side device allocated for the end-side device Resource ID, the resource ID is used to identify computing resource information;
  • the resource application request carries the resource ID, and the cloud-side device allocates computing resources to the end-side device according to the resource application request, including: the cloud-side device is the end-side device according to the computing resource information corresponding to the resource ID
  • the equipment allocates computing resources.
  • the resource application request carries computing resource information, and the computing resource information is used to characterize the computing power specifications applied for by the end-side device; the cloud-side device allocates computing resources to the end-side device according to the resource application request, including: The cloud device allocates computing resources to the end-side device according to the computing resource information.
  • the cloud-side device acquiring the first artificial intelligence model provided by the end-side device for realizing artificial intelligence processing includes: the cloud-side device receiving the end-side device sends the first artificial intelligence model; or, The cloud-side device receives the download address of the first artificial intelligence model sent by the end-side device, and downloads the first artificial intelligence model according to the download address of the first artificial intelligence model.
  • the cloud-side device acquiring the first data to be analyzed provided by the end-side device includes: the cloud-side device receives the first data to be analyzed sent by the end-side device; or, the cloud-side device receives the data sent by the end-side device The download address of the first data to be analyzed, and download the data to be analyzed according to the download address of the first data to be analyzed.
  • the cloud-side device after the cloud-side device sends the first inference result to the end-side device, it further includes: the cloud-side device obtains the second data to be analyzed provided by the end-side device, and runs the first artificial intelligence model to the second Perform inference on the data to be analyzed to obtain a second inference result; and send the second inference result to the end-side device.
  • the cloud-side device after the cloud-side device sends the first inference result to the end-side device, it further includes: the cloud-side device obtains the second artificial intelligence model provided by the end-side device, and obtains the third to-be-analyzed model provided by the end-side device Data; the cloud-side device infers the third to-be-analyzed data by running the second artificial intelligence model to obtain the third inference result; and sends the third inference result to the end-side device.
  • the cloud-side device receives a resource release request sent by the end-side device, and the resource release request is used to request the release of computing resources; the cloud-side device releases computing resources and releases the artificial intelligence model for computing resource operation ; The cloud-side device sends a resource release response to the end-side device, and the resource release response is used to indicate the successful release of computing resources and the artificial intelligence model for computing resource operation.
  • an embodiment of the present application provides a communication method, which is applied to an end-side device, for example, executed by a chip or a chip system of the end-side device.
  • the communication method includes: the end-side device sends a resource application request to the cloud-side device, the resource application request is used to request computing resources required to realize the artificial intelligence function; the end-side device receives the resource application response sent by the cloud-side device, and the resource application response uses When instructing the cloud-side device to successfully allocate computing resources to the end-side device; the end-side device generates the first calculation instruction and the first calculation data when the first artificial intelligence model used to realize the artificial intelligence function performs inference on the first data to be analyzed , And send the first calculation instruction and the first calculation data to the cloud-side device; the end-side device receives the first calculation result sent by the cloud-side device; wherein the first calculation result is that the computing resource executes the first calculation instruction on the first calculation The calculation result obtained by calculating the data.
  • the end-side device applies for computing resources from the cloud-side device according to its own needs, and the end-side device loads the artificial intelligence model by itself, and then runs the artificial intelligence model for the data to be analyzed to generate calculation instructions and calculation data, and then the cloud-side device Combining computing resources to assist the end-side device to execute calculation instructions to perform calculations of calculated data to obtain the calculation results, even if the computing power of the dedicated calculator installed in the end-side device cannot meet the requirements of software and applications, the cloud-side device can be used to implement data reasoning.
  • the application development does not need to be combined with the deployment of the cloud-side device. The application side only needs to send the model and data that need to be uninstalled to the cloud-side device to the cloud-side device, which simplifies the development difficulty of application developers and reduces the workload of developers .
  • it further includes: the end-side device sends a computing power service registration request to the cloud-side device, the computing power service registration request is used to request the cloud-side device to provide computing power services for the end-side device; the end-side device receives the cloud The computing power service registration response sent by the side device, and the computing power service registration response is used to indicate that the terminal side device has successfully requested the computing power service of the cloud side device.
  • the end-side device after the end-side device receives the first calculation result sent by the cloud-side device, it further includes: the end-side device runs the first artificial intelligence model to perform inference on the second data to be analyzed to obtain the second calculation instruction, and The second calculation data; the end-side device receives the second calculation result sent by the cloud-side device; where the second calculation result is the calculation result obtained by calculating the second calculation data by the calculation resource executing the second calculation instruction.
  • the method further includes: after the end-side device completes the use of the computing resources, sending a resource release request to the cloud-side device, the resource release request is used to request the release of the computing resources; the end-side device receives the cloud-side device sending The resource release response is used to indicate the successful release of computing resources.
  • the method before the end-side device sends a resource application request to the cloud-side device, the method further includes: the end-side device determines that part or all of the tasks processed by the artificial intelligence are processed by the cloud-side device.
  • the method further includes: when the end-side device determines that part of the tasks processed by artificial intelligence is processed by the cloud-side device, when the end-side device runs the first artificial intelligence model to perform inference on the first data to be analyzed, A third calculation instruction and third calculation data are also generated; the end-side device executes the third calculation command to calculate the third calculation data to obtain the third calculation result; after the end-side device receives the first calculation result sent by the cloud-side device , The end-side device performs fusion processing on the first calculation result and the third calculation result to obtain the inference result of the first artificial intelligence model performing inference on the first data to be analyzed.
  • an embodiment of the present application provides a communication method, including: a cloud-side device receives a resource application request from an end-side device, and obtains a first artificial intelligence model provided by the end-side device for realizing artificial intelligence processing ,
  • the resource application request is used to request the computing resources required to realize the artificial intelligence function;
  • the cloud-side device allocates computing resources to the end-side device according to the resource application request;
  • the cloud-side device sends a resource application response to the end-side device, and the resource application response is used to indicate
  • the cloud-side device successfully allocates computing resources to the end-side device;
  • the cloud-side device receives the first calculation instruction and the first calculation data sent by the end-side device;
  • the cloud-side device executes the first calculation instruction through the computing resource to calculate the first calculation data Calculation result;
  • the cloud-side device sends the calculation result to the end-side device.
  • the end-side device applies for computing resources from the cloud-side device according to its own needs, and the end-side device loads the artificial intelligence model by itself, and then runs the artificial intelligence model for the data to be analyzed to generate calculation instructions and calculation data, and then the cloud-side device Combining computing resources to assist the end-side device to execute calculation instructions to perform calculations of calculated data to obtain the calculation results, even if the computing power of the dedicated calculator installed in the end-side device cannot meet the requirements of software and applications, the cloud-side device can be used to implement data reasoning.
  • the application development does not need to be combined with the deployment of the cloud-side device. The application side only needs to send the model and data that need to be uninstalled to the cloud-side device to the cloud-side device, which simplifies the development difficulty of application developers and reduces the workload of developers .
  • the cloud-side device before the cloud-side device receives the resource application request from the end-side device, it further includes: the cloud-side device receives the computing power service registration request sent by the end-side device, and the computing power service registration request is used to request the cloud side
  • the device provides computing power services for the end-side device users; the cloud-side device sends a computing power service registration response to the end-side device, and the computing power service registration response is used to indicate that the end-side device user has successfully requested the cloud-side device's computing power service .
  • the computing power service registration request carries computing resource information, and computing resource information is used to characterize the computing power specifications applied for by the end-side device;
  • the computing power service registration response carries the cloud-side device allocated for the end-side device Resource ID, which is used to identify computing resource information;
  • the resource application request carries the resource ID, and the cloud-side device allocates computing resources to the end-side device according to the resource application request, including:
  • the cloud device allocates computing resources to the end-side device according to the computing resource information corresponding to the resource ID.
  • the resource application request carries computing resource information, and the computing resource information is used to characterize the computing power specifications applied by the end-side device; the cloud-side device allocates computing resources to the end-side device according to the resource application request, including:
  • the cloud device allocates computing resources to the end-side device according to the computing resource information.
  • the method further includes:
  • the cloud-side device receives the second calculation instruction and the second calculation data sent by the end-side device, runs the second calculation instruction through the computing resource, and infers the second calculation data to obtain the second calculation result; and sends the second calculation result to the end-side device .
  • it also includes:
  • the cloud-side device receives the resource release request sent by the end-side device, and the resource release request is used to request the release of computing resources;
  • the cloud-side device releases the computing resources and sends a resource release response to the end-side device.
  • the resource release response is used to indicate the successful release of the computing resources.
  • the present application provides a communication device used in an end-side device or a chip of an end-side device, including a unit or means for executing the method in the foregoing first aspect or any possible implementation of the first aspect ( means), or includes a unit or means for executing the aforementioned third aspect or any possible implementation of the third aspect.
  • the present application provides a communication device used in a cloud-side device or a chip of a cloud-side device, including a unit or means for executing the foregoing second aspect or any possible implementation of the second aspect, Or it includes a unit or means for executing the foregoing fourth aspect or any possible implementation of the fourth aspect.
  • the present application provides a communication device used in an end-side device or a chip of an end-side device, including at least one processing element and at least one storage element, wherein the at least one storage element is used to store programs and data, so
  • the at least one processing element is used to execute a method in the foregoing first aspect or any possible implementation of the first aspect, or is used to execute a method in any possible implementation of the foregoing third aspect or the third aspect.
  • the present application provides a communication device used in a cloud-side device or a chip of a cloud-side device, including at least one processing element and at least one storage element, wherein the at least one storage element is used to store programs and data, so
  • the at least one processing element is used to execute the foregoing second aspect or any possible implementation method of the second aspect, or is used to implement the foregoing fourth aspect or any possible implementation method of the fourth aspect.
  • the present application provides a communication device, including a processor and an interface circuit, the interface circuit is used to receive signals from other communication devices other than the communication device and transmit them to the processor or from all communication devices.
  • the signal of the processor is sent to other communication devices other than the communication device, and the processor is used to implement the foregoing first aspect or the method in any possible implementation manner of the first aspect through logic circuits or execution code instructions, Or used to implement the foregoing third aspect or any possible implementation of the third aspect.
  • the present application provides a communication device, including a processor and an interface circuit, where the interface circuit is used to receive signals from other communication devices other than the communication device and transmit them to the processor or from all communication devices.
  • the signal of the processor is sent to other communication devices other than the communication device, and the processor is used to implement the foregoing second aspect or the method in any possible implementation manner of the second aspect through logic circuits or execution code instructions, Or it is used to implement the foregoing fourth aspect or any possible implementation manner of the fourth aspect.
  • the present application provides a computer program product
  • the computer program product includes a computer instruction, when the computer instruction is executed, causes the foregoing first aspect or any possible implementation of the first aspect to be executed , Or cause the foregoing second aspect or any possible implementation of the second aspect to be executed, or cause the foregoing third aspect or any possible implementation of the third aspect to be executed, or cause the foregoing fourth
  • the method in any possible implementation of the aspect or the fourth aspect is executed.
  • the present application provides a computer-readable storage medium that stores computer instructions.
  • the computer instructions When the computer instructions are executed, the foregoing first aspect or any possible implementation of the first aspect
  • the method of is executed, or the method in the foregoing second aspect or any possible implementation of the second aspect is executed, or the method in the foregoing third aspect or any possible implementation of the third aspect is executed, or
  • the foregoing fourth aspect or the method in any possible implementation manner of the fourth aspect is executed.
  • Figure 1 is a schematic diagram of a possible implementation of artificial intelligence processing deployment in an embodiment of the application
  • FIG. 2 is a schematic diagram of another possible implementation of artificial intelligence processing deployment in an embodiment of the application.
  • FIG. 3 is a schematic diagram of a communication system architecture in an embodiment of the application.
  • FIG. 4 is a schematic flowchart of a first possible communication method in an embodiment of this application.
  • FIG. 5 is a schematic diagram of a first possible communication system deployment architecture in an embodiment of this application.
  • FIG. 6 is a schematic flowchart of a second possible communication method in an embodiment of this application.
  • FIG. 7 is a schematic diagram of a second possible communication system deployment architecture in an embodiment of this application.
  • FIG. 8 is a schematic flowchart of a third possible communication method in an embodiment of this application.
  • FIG. 9 is a schematic flowchart of a fourth possible communication method in an embodiment of this application.
  • FIG. 10 is a schematic diagram of a third possible communication system deployment architecture in an embodiment of this application.
  • FIG. 11 is a schematic flowchart of a fifth possible communication method in an embodiment of this application.
  • FIG. 12 is a schematic diagram of a fourth possible communication system deployment architecture in an embodiment of this application.
  • FIG. 13 is a schematic structural diagram of a communication device 1300 in an embodiment of this application.
  • the runtime (Runtime) module generally includes an application programming interface (API), a runtime (Runtime) environment, and a hardware abstraction layer (HAL). It consists of three modules.
  • API application programming interface
  • Runtime runtime
  • HAL hardware abstraction layer
  • API is mainly responsible for providing a unified model management and execution interface to implement the steps of model network definition, compilation, and execution.
  • the runtime environment as the execution engine of the API, is used to complete the construction of the artificial intelligence model, load the data of the artificial intelligence model, load the input data, and inference calculations. In addition, you can also optimize the code and generate machine code for accelerators.
  • HAL provides a unified interface to shield the implementation differences of different equipment manufacturers. Developers only need to develop a set of codes to run on devices with various accelerator chips.
  • End-side equipment the end-side equipment has the ability of artificial intelligence processing.
  • Artificial intelligence for example, can be computer vision, including face detection, beautification, slimming, and deepfakes.
  • smart security including face recognition, vehicle detection, yellow detection and storm detection.
  • AR/VR including AR/VR games, VR modeling, etc.
  • End-side devices can be mobile phones (mobile phones), tablets, laptops, handheld computers, mobile Internet devices (MIDs), wearable devices, cameras, in-vehicle devices, virtual reality (VR) devices, Augmented reality (AR) equipment, wireless terminals in industrial control, wireless terminals in self-driving (self-driving), wireless terminals in remote medical surgery, and smart grid (smart grid) Wireless terminals in ), wireless terminals in transportation safety, wireless terminals in smart cities, or wireless terminals in smart homes, etc.
  • MIDs mobile Internet devices
  • MIDs mobile Internet devices
  • wearable devices cameras
  • in-vehicle devices virtual reality (VR) devices
  • Augmented reality (AR) equipment wireless terminals in industrial control
  • wireless terminals in self-driving self-driving
  • wireless terminals in remote medical surgery and smart grid (smart grid)
  • smart grid smart grid
  • the cloud-side device can be a physical server or a server cluster with an accelerator deployed. Cloud-side devices may also be referred to as computing nodes or cloud-side computing clusters.
  • Artificial intelligence models used to implement artificial intelligence inference models, such as neural network models, deep learning models, computer vision models, etc.
  • the accelerator is different from the main processor.
  • the accelerator can also be called a coprocessor.
  • the main processor is generally implemented by a central processing unit (CPU).
  • the accelerator can be, for example, a graphics processing unit (GPU), a field programmable gate array (FPGA), a neural network processor (neural-network processing unit, NPU), or an application specific integrated circuit (application specific integrated circuit). circuit, ASIC) and so on.
  • GPU graphics processing unit
  • FPGA field programmable gate array
  • NPU neural network processor
  • ASIC application specific integrated circuit
  • the end-side device-side application After the end-side device-side application preprocesses the collected data, it can call the runtime module interface to load the corresponding inference model and preprocessed data.
  • the model executes the inference process based on the pre-processed data and realizes artificial intelligence processing.
  • Accelerators can also be deployed in end-side devices.
  • the runtime of the end-side device can speed up the inference process by calling the accelerator to perform the calculations required by the model runtime.
  • the CPU can be used to perform the calculations required by the model when it is running.
  • the end-side device executes artificial intelligence processing, it can provide the end-side device with artificial intelligence processing capabilities based on cloud services.
  • the cloud service mode application developers package the inference part that is expensive in computing power into a specific artificial intelligence service running on the cloud-side device according to their business specificity, such as facial recognition services and voice Identification services, etc.
  • the application of the end-side device After the application of the end-side device preprocesses the collected data, it calls the API interface of the specific artificial intelligence service of the cloud-side device through the network, and sends the data to be analyzed to a specific cloud service (such as a face recognition service).
  • the face recognition service performs inference based on the data to be analyzed to obtain the inference result, and sends it to the end-side device.
  • the application development needs to be combined with the deployment of the cloud-side device to determine which processing is offloaded to the cloud-side device for execution. Therefore, in addition to the ability of application development, developers also need to have the ability to develop services on cloud-side devices.
  • the technical difficulty is relatively high, and the workload is relatively high.
  • the requirements for service deployment, daily maintenance upgrades, and capacity expansion of cloud-side equipment are relatively high, and maintenance and management are difficult.
  • the embodiments of the present application provide a communication method, device, and system.
  • FIG. 3 for a schematic diagram of a system architecture provided by an embodiment of this application.
  • the system includes end-side equipment and cloud-side equipment.
  • the flow of the communication method provided by the embodiment of the present application will be described below with reference to FIG. 3.
  • the loading of the artificial intelligence model can be executed on the end side or on the cloud side.
  • the method may include:
  • the end-side device sends a resource application request to the cloud-side device, so that the cloud-side device receives the resource application request from the end-side device.
  • the resource application request is used to request the computing resources required to realize the artificial intelligence function.
  • S402 The cloud-side device allocates computing resources to the end-side device according to the resource application request.
  • the cloud-side device may be virtualized for the accelerator deployed by itself, for example, a virtualized core method may be adopted.
  • the cloud-side device allocates computing resources to the end-side device according to the computing resource information, it can allocate one or more cores to the end-side device according to the computing power specification. Different cores correspond to different computing power instances.
  • the cloud-side device may determine the computing power specification according to the computing resource information.
  • the computing resource information is used to characterize the computing power demand of the end-side device, that is, the computing power requested by the user of the end-side device.
  • the computing resource information can be, for example, a computing power specification, which can be a universal unit of computing power, such as one trillion floating point operations per second (TFLOPS), or one trillion integer operations per second.
  • the computing resource information may also include specified hardware specifications, such as a certain hardware model. Different hardware models correspond to different computing power specifications.
  • the cloud-side device determines the computing power specification based on the computing resource information, in one way, the computing resource information includes the computing power specification, and the cloud-side device obtains the computing resource information, that is, determines the computing power specification.
  • the computing resource information includes hardware specifications, and the computing power specifications are determined according to the hardware specifications.
  • the cloud-side device can obtain computing resource information through any of the following examples:
  • the end-side device may carry the computing resource information in the resource application request, so that the cloud-side device allocates computing resources to the end-side device according to the computing resource information in the resource application request.
  • the computing resource information can be sent to the cloud-side device, so that the cloud-side device sends the resource ID corresponding to the computing resource information to the end-side device. Therefore, the end-side device can carry the resource ID in the resource application request and send it to the cloud-side device, so that the cloud-side device can allocate computing resources to the end-side device according to the computing resource information corresponding to the resource ID.
  • the specific process of registering the computing power of the end-side device with the cloud-side device will be described in detail later, and will not be repeated here.
  • the end-side device provides the cloud-side device with a first artificial intelligence model required for realizing artificial intelligence processing.
  • the end-side device provides the cloud-side device with the first artificial intelligence model required to implement artificial intelligence processing, it may be implemented in the following manner:
  • the end-side device may directly send the first artificial intelligence model to the cloud-side device.
  • the end-side device may carry the first artificial intelligence model in the resource application request and send it to the cloud-side device, or separately send it to the cloud-side device.
  • the end-side device may send the download address of the first artificial intelligence model to the cloud-side device, so that the cloud-side device obtains the first artificial intelligence model according to the download address of the first artificial intelligence model.
  • the download address of the first artificial intelligence model may be a uniform resource locator (URL).
  • URL uniform resource locator
  • the end-side device may upload the first artificial intelligence model to a server on the network, and send the URL of the server to the cloud-side device. Therefore, the cloud-side device downloads the first artificial intelligence model according to the URL.
  • the end-side device uploads the first artificial intelligence model to a server on the network or the cloud-side device downloads the first artificial intelligence model from a network service, both of which can use a second transmission technology, such as message-digest algorithm 5 (MD5).
  • MD5 message-digest algorithm 5
  • S404 The cloud-side device successfully loads the first artificial intelligence model through computing resources.
  • S405 After the cloud-side device successfully loads the first artificial intelligence model through the computing resource, it sends a loading completion message to the end-side device, so that the end-side device receives the loading completion message sent by the cloud-side device.
  • the loading complete message is used to indicate that the computing resources on the cloud-side device have successfully loaded the first artificial intelligence model.
  • S406 The end-side device provides the first to-be-analyzed data to the cloud-side device, so that the cloud-side device obtains the first to-be-analyzed data provided by the end-side device.
  • the end-side device provides the first to-be-analyzed data to the cloud-side device, it can be implemented in the following manner:
  • the end-side device may directly send the first to-be-analyzed data to the cloud-side device.
  • the end-side device may send the download address of the first to-be-analyzed data to the cloud-side device, so that the cloud-side device obtains the first to-be-analyzed data according to the download address of the first to-be-analyzed data.
  • the download address of the first data to be analyzed may be a URL.
  • the end-side device may upload the first data to be analyzed to a server on the network, and send the URL of the server to the cloud-side device. Therefore, the cloud-side device downloads the first data to be analyzed according to the URL of the first data to be analyzed.
  • the end-side device uploads the first to-be-analyzed data to a server on the network or the cloud-side device downloads the first to-be-analyzed data from a network service, both of which can use a second transmission technology, such as message-digest algorithm 5 (MD5).
  • MD5 message-digest algorithm 5
  • the cloud-side device After the cloud-side device obtains the first data to be analyzed provided by the end-side device, it performs inference on the first data to be analyzed through the first artificial intelligence model loaded by the operation of the computing resource to obtain a first inference result.
  • the cloud-side device runs the loaded first artificial intelligence model through computing resources, and inputs the first to-be-analyzed data into the running first artificial intelligence model to obtain a first inference result.
  • S408 The cloud-side device sends the first inference result to the end-side device.
  • the end-side device may register or open an account with the cloud-side device, and obtain cloud-side computing power services from the cloud-side device as needed.
  • the cloud-side device can provide users with operation interfaces for registering, opening an account, recharging, or purchasing computing power for using cloud computing services. Users can perform corresponding operations according to the operating interface guidelines, and call the cloud computing service management API on the cloud-side device , Get cloud computing service.
  • the end-side device can deploy a plug-in or application ((Application, APP) that connects to a cloud-side device (such as a cloud computing service management and control plane), so that the end-side device can obtain an operation interface according to the plug-in or APP.
  • the cloud-side device can provide a web-based operation page to support user registration, account opening, recharge or resource purchase, etc.
  • the end-side device When the user of the end-side device registers for the cloud computing service for the first time, he can register the user first, and then apply for the computing service after the user registration is completed.
  • the end-side device sends a user registration request to the cloud-side device in response to an operation for the user, and the user registration request can carry user information of the user.
  • User information is used to indicate the identity of the user.
  • the user information may include the user's user name or user identification (Identity, ID), registration password, and so on.
  • the user information may also include other information used to indicate the user's identity, such as fingerprint information.
  • the user information can be used by the user to perform operations such as recharging, resource application, or purchase to the cloud-side device later.
  • the cloud-side device may record the user ID in the user list used to indicate that the computing power service has been applied for.
  • the end-side device can carry the user ID in the resource application request sent to the cloud-side device, so that when the end-side device verifies that the user identified by the user ID has applied for the computing power service, it then executes the allocation of resources to the end-side device based on the computing resource information. ⁇ computing resources.
  • the end-side device can perform the computing power service registration. For example, the end-side device responds to the operation for the user to obtain the cloud computing power service, and sends a computing power service registration request to the cloud-side device. After the cloud-side device receives the computing power service registration request, the cloud-side device can send a computing power service registration response to the end-side device.
  • the computing power service registration response is used to indicate that the end-side device has successfully requested the computing power service of the cloud-side device .
  • the computing power service registration request may carry computing resource information.
  • the cloud-side device may allocate a resource ID to the user of the end-side device, and the resource ID is used to identify computing resource information.
  • the cloud-side device may associate and save the computing resource information and the resource ID, and send the resource ID to the end-side device, and the end-side device may carry the resource ID in the resource application request.
  • the end-side device may continue to use the first artificial intelligence model loaded by the cloud-side device to perform inference operations on new data (such as the second data to be analyzed) as required. For example, the end-side device obtains the inference result of the first to-be-analyzed data. If the second to-be-analyzed data exists on the end-side device, the end-side device provides the second to-be-analyzed data to the cloud-side device, so that the cloud-side device obtains the second to-be-analyzed data After analyzing the data, perform inference on the second to-be-analyzed data through the loaded first artificial intelligence model to obtain a second inference result. That is, the cloud-side device inputs the second to-be-analyzed data into the loaded first artificial intelligence model to obtain the second inference result. Then, the cloud-side device sends the second inference result to the end-side device.
  • new data such as the second data to be analyzed
  • the end-side device can continue to use the applied computing resources to load the second artificial intelligence model as required, and perform inference operations on the new data (such as the third data to be analyzed).
  • the end-side device provides the second artificial intelligence model to the cloud-side device, so that after the cloud-side device obtains the second artificial intelligence model, it uses computing resources to load the second artificial intelligence model.
  • it can first apply for the destruction of the artificial intelligence model to the cloud-side device. For example, send a model destruction request to the cloud-side device.
  • the model destruction request can carry the ID of the first artificial intelligence model.
  • the cloud-side device After the cloud-side device receives the model destruction request, it releases the first artificial intelligence model loaded by the computing resource.
  • the cloud-side device may send a model destruction response to the end-side device.
  • the end-side device receives the model destruction response, it determines that the cloud-side device has completed the destruction of the first artificial intelligence model, and may provide the second artificial intelligence model to the cloud-side device.
  • the manner in which the end-side device provides the second artificial intelligence model to the cloud-side device is similar to the manner in which the first artificial intelligence model is provided to the cloud-side device, and will not be repeated here.
  • the manner in which the end-side device provides the second data to be analyzed or the third data to be analyzed to the cloud-side device is similar to the manner in which the first data to be analyzed is provided to the cloud-side device, and will not be repeated here.
  • the end-side device may apply to the cloud-side device to release the computing resources after completing the inference after using the computing resources. For example, after the end-side device completes the use of computing resources, it sends a resource release request to the cloud-side device. After the cloud-side device receives the resource release request sent by the end-side device, it releases the artificial intelligence model loaded by the computing resource and releases the calculation. resource. The cloud-side device may send a resource release response to the end-side device, and the resource release response is used to indicate that the computing resource is successfully released.
  • the cloud-side device can record the bill for using the computing resource this time according to the application time and release time of the computing resource, which can be used to charge the user for using the cloud computing power service.
  • the end-side device determines that tasks processed by artificial intelligence are transferred to the cloud-side device for execution according to configuration information.
  • S401-S408 are executed.
  • the end-side device determines that tasks processed by artificial intelligence are executed by the end-side device according to the configuration information. There is no need to perform S401-S408 in this scenario.
  • the end-side device can load the artificial intelligence model through the internal accelerator, and then use the loaded artificial intelligence model to infer the data to be analyzed to obtain the inference result.
  • the end-side device determines that part of the tasks processed by the artificial intelligence is performed by the cloud-side device according to the configuration information, and the other part of the tasks is performed by the end-side device.
  • the two-part tasks split by the end-side device can be executed in serial mode, or executed in parallel, or executed in mixed serial and parallel mode.
  • the inference result executed by the local accelerator is sent to the cloud-side device, and the cloud-side device continues to execute the inference operation to obtain the final inference result.
  • the data to be analyzed may be sent to the cloud-side device, and the inference result obtained by the inference operation performed by the cloud-side device is returned to the end-side device, so that the end-side device continues to perform the inference operation according to the received inference result to obtain the final inference result.
  • the parallel mode can be the inference result obtained by the local accelerator executing the inference operation, and the inference result obtained by the cloud-side device executing the inference operation may be combined.
  • serial-parallel hybrid mode can be executed serially first and then executed in parallel, or it can be executed in parallel first and then serially executed.
  • the end-side device can divide the tasks processed by artificial intelligence into two parts based on the computing power of the local accelerator and the computing power of the registered second computing resource. End-side equipment execution.
  • the end-side equipment includes an application module and a computing power service driver module.
  • the application module can be an artificial intelligence application (APP).
  • APP artificial intelligence application
  • the application module is used to perform the collection and preprocessing of the data to be analyzed, and can also be used to provide an artificial intelligence model.
  • the application module belongs to the application layer. Below the application layer is the runtime layer, and below the runtime layer can include the driver layer. Below the driver layer is the hardware resource layer, including acceleration hardware, network cards, and so on. Runtime API, runtime environment, and HAL are located in the runtime layer.
  • the computing power service driver module can be used to provide the driving function of virtual computing power acceleration hardware.
  • the computing power service driver module can be called a remote direct computing access (RDCA) agent or an RDCA driver (driver) function.
  • the computing power service driver module is also used to call the computing power service data plane in the cloud-side device to perform inference calculation processing; it also has reliable upload functions, such as providing the artificial intelligence model of the application module and the data to be analyzed to the cloud-side device.
  • the computing power service driver module can be deployed in the runtime layer.
  • the computing power service driver module has a virtual runtime environment, which can be understood as a proxy for the cloud-side runtime module.
  • the computing power service drive module has the function of virtual drive, which is used to connect the network card and the cloud device for communication.
  • the cloud-side equipment includes a computing power service agent module and a computing power service control module.
  • the computing power service proxy module can also be called RDCA Daemon or RDCA proxy function.
  • the computing power service agent module is responsible for receiving & authenticating the resource application request of the computing power service driving module, and applying for the computing power required by the end-side device from the computing power service control module as needed after the authentication is passed. And according to the computing power service control module to allocate computing resources to the end side.
  • the computing power service agent module is also responsible for obtaining the artificial intelligence model and the data to be analyzed provided by the computing power service driving module, and then loading the artificial intelligence model through computing resources, and performing inference operations on the analyzed data to obtain the inference result, which is returned to the end-side device.
  • the computing power service control module can be called RDCA Control function or RDCA Manager function.
  • the computing power service control module is responsible for the management and allocation of cloud-side computing resources.
  • the computing power service control module also supports the application and recovery of computing resources based on computing power/device type. It can also support the use of computing resources to record bills.
  • the cloud-side device may also include a cloud runtime module, which is used to call computing resources to execute the loading of the artificial intelligence model and the data to be analyzed to obtain the inference result.
  • the cloud runtime module can also be referred to as a cloud runtime (Cloud Runtime) function.
  • the end-side device may also include an end-side runtime API for connecting the application module and the computing power service driver module.
  • the end-side runtime API can also be used to determine whether the artificial intelligence processing is executed by the cloud side or the end side.
  • the cloud-side device supports the registration of the computing power service of the end-side device.
  • the end-side device can also deploy a registration module.
  • Cloud-side devices can deploy computing power service modules.
  • the computing power service module can also be called RDCA service function.
  • the computing power service module has the functions of registering, opening an account, recharging, and purchasing resources for cloud computing power services. It can also be responsible for generating tenant bills based on the bill records of the use of computing resources.
  • the registration module can be a plug-in or APP installed on the end-side device.
  • the registration module may be, for example, an RDCA client (client) APP.
  • the registration module and the computing power service module can be connected through the cloud computing power service management and control plane API.
  • the registration module is responsible for providing operation interfaces for users to register, open an account, recharge, and purchase resources for using cloud computing services, and call the cloud computing service management and control API and implement corresponding functions according to the user's operations.
  • the registration module can also set the working status and working mode of the computing power service driver module according to the user's settings, such as prompting the user when arrears, the applied virtual computing power specifications, such as the virtual Nvidia TX2 accelerator card, etc.
  • the cloud-side device may also include a console module, which supports the provision of web-based operation pages, and supports user registration, account opening, recharge or resource purchase, etc.
  • the console module can also be called the RDCA console (Console) function.
  • FIG. 6 for a schematic flowchart of a communication method provided by an embodiment of this application.
  • the end-side device determines that the task of the inference operation of the data to be analyzed according to the configuration information is executed by the cloud-side device.
  • a variety of accelerators with different hardware specifications or computing power rules are deployed in cloud devices to form a computing resource pool to provide computing power services for different registered users. After the accelerator is powered on, it can be registered to the computing power service control module, and the computing power service control module is responsible for the maintenance, management, application, and distribution of each accelerator in the computing resource pool.
  • the multiple accelerators may be physical accelerators or virtual accelerators obtained through virtualization processing. For example, a similar approach to the CPU virtualization core is used to virtualize the deployment of hardware computing resources on computing devices.
  • the application module When the application module needs to implement artificial intelligence processing, it sends a model loading request to the end-side runtime API (that is, the loading interface of the artificial intelligence model), and the model loading request carries the name of the artificial intelligence model and the file path of the artificial intelligence model.
  • the end-side runtime API that is, the loading interface of the artificial intelligence model
  • the artificial intelligence model for example, can be HiAI, or Tensorflow or Android Neural Networks API (NNAPI) model.
  • TensorFlow TM is a symbolic mathematics system based on dataflow programming, which is widely used in the programming implementation of various machine learning algorithms.
  • HiAI is an AI capability open platform for smart terminals.
  • NNAPI is a C language API based on the Android system that can run computationally intensive operations related to machine learning on mobile devices.
  • NNAPI will provide underlying support for higher-level machine learning frameworks that can build and train neural networks.
  • S602 After the end-side runtime API receives the model loading request, it is determined that the inference operation of the artificial intelligence is executed by the cloud-side device or locally.
  • execution by the cloud-side device is taken as an example. Specifically, it can be determined to be executed on the cloud side or locally based on the configuration information. It is determined to be executed by the cloud-side device, and the model loading request is sent to the computing power service driver module.
  • the user of the end-side device can set the artificial intelligence processing to be executed by the cloud-side device or locally according to the operation interface provided by the registration module or the end-side runtime environment (Runtime).
  • the registration module not only provides the user with an operation interface for registering the computing power service with the cloud-side device, but can also provide the user with an operation interface for configuring whether the cloud-side device performs artificial intelligence processing.
  • the end-side runtime environment provides users with an operation interface for configuring whether the cloud-side device performs artificial intelligence processing.
  • the computing power service driving module sends a resource application request 1 to the cloud-side device, so that the computing power service proxy module (RDCA Daemon or RDCA proxy function) of the cloud-side device receives the resource application request 1 from the end-side device.
  • Resource application request 1 is used to request the computing resource information required to realize the artificial intelligence function, that is, the computing power requirement.
  • the resource application request 1 carries hardware specifications or computing power specifications.
  • the resource application request 1 may also carry an agent ID.
  • the agent ID may include a resource ID, and the agent ID may also include at least one of a user ID and an ID of the end-side device.
  • the user ID and the ID of the end-side device can be used for subsequent use and billing of computing resources.
  • the computing power service agent module After receiving the resource application request 1, the computing power service agent module sends a resource application response 2 to the computing power service control module.
  • the resource application response 2 carries the Agent ID.
  • the computing power service control module After receiving the resource application response 2, the computing power service control module allocates computing resources to the end-side device according to computing resource information (such as computing power specifications or hardware specifications) corresponding to the resource ID.
  • the computing power service control module may send the ID of the computing resource to the computing power service agent module.
  • the ID of the computing resource may be carried in the resource application response 2 and sent to the computing power service agent module. Therefore, the computing power service agent module can maintain the corresponding relationship between the agent ID and the ID of the computing resource.
  • the ID of the computing resource includes the instance ID of the computing resource, and may also include at least one of the ID of the hardware resource or the communication IP address of the hardware.
  • the hardware resource can be a board card.
  • the computing power service agent module sends a resource application response 1 to the computing power service driving module, and the computing power service response 1 is used to indicate that the computing power resource application is successful.
  • the agent ID can be carried in the response of the computing power service.
  • the computing power service driving module provides the computing power service agent module with the first artificial intelligence model required for realizing artificial intelligence processing.
  • the computing power service driving module may directly send the first artificial intelligence model to the computing power service agent module.
  • the computing power service agent module may carry the first artificial intelligence model in the resource application request and send it to the computing power service agent module, or separately send it to the computing power service agent module. This method is taken as an example in Figure 6 for illustration.
  • the computing power service agent module receives the computing power service response 2 sent by the computing power service control module, it can directly execute S608, that is, call the cloud-side runtime module to load the first artificial intelligence model.
  • the computing power service driving module may send the download address of the first artificial intelligence model to the computing power service agent module, so that the computing power service agent module obtains the first artificial intelligence model according to the download address of the first artificial intelligence model. Intelligent model.
  • the computing power service driving module can upload the first artificial intelligence model to a server on the network, and send the URL of the server to the computing power service agent module.
  • the computing power service agent module downloads the first artificial intelligence model according to the URL.
  • the computing power service driving module uploads the first artificial intelligence model to the network server or the cloud-side device downloads the first artificial intelligence model from the network service, both of which can use the second transmission technology, such as MD5.
  • the computing power service agent module invokes the cloud Runtime module to load the first artificial intelligence model according to the pre-stored correspondence between the agent ID and the ID of the computing resource.
  • the computing power service driving module sends an indication that the model is successfully loaded to the application module through the end-side runtime API.
  • the application module sends the data to be analyzed to the computing power service driving module through the end-side runtime API.
  • the application module when the application module sends the data to be analyzed to the computing power service driving module, it can directly send the data to be analyzed, or the storage path of the data to be analyzed is sent to the computing power service driving module, so that the computing power service driving module can The data storage path obtains the data to be analyzed.
  • the computing power service driving module may provide the data to be analyzed to the computing power service agent module.
  • the computing power service driving module provides the data to be analyzed to the computing power service agent module, it can be implemented in the following ways:
  • the computing power service driving module can directly send the data to be analyzed to the computing power service agent module. This method is taken as an example in Figure 6 for illustration.
  • the computing power service driving module may send the download address of the data to be analyzed to the computing power service agent module, so that the computing power service agent module obtains the data to be analyzed according to the download address of the data to be analyzed.
  • the download address of the data to be analyzed may be a URL.
  • the computing power service driving module can upload the data to be analyzed to a server on the network, and send the URL of the server to the computing power service agent module. Therefore, the computing power service agent module downloads the data to be analyzed according to the URL of the data to be analyzed.
  • the computing power service driving module uploads the data to be analyzed to the network server or the computing power service agent module downloads the data to be analyzed from the network service, both of which can use the second transmission technology, such as MD5.
  • the computing power service agent module After obtaining the data to be analyzed, the computing power service agent module calls the cloud-side runtime module to execute the operation of the first artificial intelligence model. Specifically, the computing power service agent module sends a model running request to the cloud-side runtime module, and the model running request carries the ID of the agent, the data to be analyzed, and the ID of the computing resource.
  • the cloud-side runtime module invokes the computing resource (ie, hardware resource) corresponding to the ID of the computing resource to run the first artificial intelligence model to perform inference on the data to be analyzed to obtain the inference result.
  • the computing resource ie, hardware resource
  • S615 The cloud-side runtime module sends the inference result to the computing power service agent module.
  • S616 The computing power service agent model sends the inference result to the computing power service driving module.
  • the computing power service driving module sends the inference result to the application module through the end-side runtime API.
  • the application module can apply for artificial intelligence model destruction. For example, the application module can call the model destruction interface to send a resource release request to the computing power service driver module.
  • the computing power service driving module sends the resource release request to the computing power service agent module.
  • the resource release request can carry the name of the artificial intelligence model and the Agent ID.
  • the computing power service agent module applies to the cloud-side runtime module to release the first artificial intelligence model. Specifically, the computing power service agent module sends a model release request to the cloud-side runtime module, and the model release request carries the name of the artificial intelligence model.
  • the computing power service agent module notifies the computing power service control module to release computing resources according to the Agent ID.
  • the computing power service agent module After determining that the resource release is completed, the computing power service agent module sends a resource release response to the computing power service driving module, where the resource release response is used to indicate that the resource is successfully released.
  • the computing power service driving module forwards the resource release response to the application model.
  • the computing power service control module can record the bill for the use of computing resources this time according to the application time and release time of computing resources, which can be used for subsequent billing for the user's use of cloud computing power services.
  • FIG. 7 is a schematic diagram of another possible deployment mode.
  • the accelerator is deployed in the end-side device in FIG. 7, that is, the end-side device includes hardware resources for implementing acceleration.
  • the end-side equipment also includes an end-side runtime environment loaded with artificial intelligence models.
  • the tasks processed by the artificial intelligence model can be combined with actual conditions to determine whether to be executed by the cloud-side device, or executed by the end-side device, or executed by the cloud-side device and the end-side device in cooperation.
  • the end-side device can determine according to the configuration information that part of the tasks of the artificial intelligence processing of the data to be analyzed is performed by the cloud-side device, and the other part of the task is performed by the end-side device.
  • the end-side device can divide the tasks processed by artificial intelligence into two parts according to the computing power of the local accelerator and the computing power of the registered second computing resource, and transfer one part of the task to the cloud-side device for execution, and the other part of the task to be executed by the end-side device.
  • a splitting module is also deployed in the end-side device to split the tasks processed by artificial intelligence, such as splitting the used artificial intelligence model and splitting the analyzed data.
  • splitting module splits the artificial intelligence model, it can be explained by any of the following examples:
  • the model loading request triggered by the application module carries a splitting instruction, and the splitting instruction is used to indicate a splitting rule for splitting the artificial intelligence model. Therefore, after the splitting module receives the splitting instruction, it performs splitting processing on the artificial intelligence model.
  • the splitting module can be configured with splitting rules corresponding to different artificial intelligence models. For example, there is a one-to-one correspondence between the names of different artificial intelligence models and the split rules.
  • the model loading request carries the name of the artificial intelligence model, and the splitting module can split the artificial intelligence model according to the splitting rule corresponding to the name of the artificial intelligence model.
  • the splitting module can also be configured with general splitting rules. For artificial intelligence models that cannot match the corresponding relationship, general splitting rules can be used to perform splitting processing.
  • the splitting module splits the artificial intelligence model according to the computing power of the local accelerator and the computing power of the requested computing resource.
  • the splitting module may perform splitting before the resource application, and then perform splitting processing according to the computing power of the local accelerator and the computing power of the registered second computing resource.
  • the splitting module may also perform splitting after the resource application, and then perform splitting processing according to the computing power of the local accelerator and the computing power of the applied second computing resource.
  • the two parts of the task split by the split module can be executed in serial mode, or executed in parallel, or executed in mixed serial and parallel mode.
  • the inference result executed by the local accelerator is sent to the cloud-side device, and the cloud-side device continues to execute the inference operation to obtain the final inference result.
  • the data to be analyzed may be sent to the cloud-side device, and the inference result obtained by the inference operation performed by the cloud-side device is returned to the splitting module, so that the splitting module continues to perform the inference operation according to the received inference result to obtain the final inference result.
  • Parallel mode can be the inference result obtained by the local accelerator executing the inference operation, and the inference result obtained by the cloud-side device executing the inference operation can be combined.
  • the splitting module can split the data to be analyzed into two parts, such as data to be analyzed 1 and data to be analyzed 2.
  • the splitting module can split the artificial intelligence model into two parts, the cloud-side model content and the end-side model content.
  • the serial-parallel hybrid mode can be executed serially first and then executed in parallel, or it can be executed in parallel first and then serially executed. In this way, the splitting process is performed according to the splitting rule corresponding to the serial-parallel hybrid mode.
  • the splitting module coordinates the hardware resources of the cloud-side device and the end-side device to perform inference, and coordinates the inference results of the hardware resources of the cloud-side device and the end-side device.
  • FIG. 8 for a schematic flowchart of a communication method provided by an embodiment of this application.
  • Figure 8 only schematically describes the process of the parallel model.
  • the end-side device can subscribe to or purchase cloud computing power services from the cloud-side device as required through the web page provided by the registration module or the console module.
  • the second computing resource is taken as an example of the hardware resource obtained by the end-side device from the cloud-side device by purchasing or activating the cloud computing power service.
  • S801, refer to S601, which will not be repeated here.
  • the model loading request carries the artificial intelligence model used, or the storage path of the artificial intelligence model.
  • the splitting module analyzes the artificial intelligence model, and splits the used artificial intelligence model, such as splitting into artificial intelligence model 1 and artificial intelligence model 2.
  • Artificial intelligence model 1 includes the content of the cloud-side model after the split, and artificial intelligence model 2 includes the content of the split back-end model.
  • the splitting module sends the artificial intelligence model 1 to the computing power service driving module.
  • the computing power service driving module sends a resource application request 1 to the computing power service agent module.
  • the resource application request 1 carries the artificial intelligence model 1.
  • resource application request 1 and artificial intelligence model 1 can be sent to the computing power service agent module separately.
  • the computing power service agent module calls the cloud-side runtime module to load the artificial intelligence model 1.
  • the computing power service agent module After the computing power service agent module is successfully loaded, it sends a resource application response 1 to the computing power driving module.
  • the resource application response 1 carries an indication that the cloud-side model is successfully loaded.
  • S810 refer to S610, which will not be repeated here.
  • the execution sequence of S804-S810 and S812 is not limited.
  • the application module sends the data to be analyzed to the splitting module.
  • the splitting module splits the data to be analyzed into data to be analyzed 1 and data to be analyzed 2.
  • the splitting module sends the to-be-analyzed data 1 to the computing power service driving module.
  • the computing power service driving module sends the inference result 1 to the splitting module.
  • the splitting module sends the to-be-analyzed data 2 to the end-side runtime.
  • the end-side runtime uses the artificial intelligence model 2 loaded by the accelerator to perform inference on the data 2 to be analyzed to obtain the inference result 2.
  • the splitting module performs fusion processing on the inference result 1 and the inference result 2 to obtain the inference result 3.
  • the splitting module sends the inference result 3 to the application module.
  • the method may include:
  • the end-side device sends a resource application request to the cloud-side device, so that the cloud-side device receives the resource application request from the end-side device.
  • the resource application request is used to request the computing resources required to realize the artificial intelligence function.
  • S902 The cloud-side device allocates computing resources to the end-side device according to the resource application request.
  • the cloud-side device may be virtualized for the accelerator deployed by itself, for example, a virtualized core method may be adopted.
  • the cloud-side device allocates computing resources to the end-side device according to the computing resource information, it can allocate one or more cores to the end-side device according to the computing power specification. Different cores correspond to different computing power instances.
  • S903 The cloud-side device sends a resource request response to the end-side device, where the resource request response is used to indicate that the cloud-side device successfully allocates computing resources to the end-side device.
  • the end-side device generates the first calculation instruction and the first calculation data when running the first artificial intelligence model for realizing the artificial intelligence function to perform inference on the first to-be-analyzed data.
  • the end-side device may first load the first artificial intelligence model and obtain the first data to be analyzed. Then run the first artificial intelligence model to perform inference on the first data to be analyzed, and generate first calculation instructions and first calculation data.
  • the end-side device running the first artificial intelligence model to perform inference on the first to-be-analyzed data can generate one or more calculation instructions and one or more calculation data.
  • one calculation instruction and multiple calculation data are generated, and the one calculation instruction is used to execute multiple calculation data.
  • K calculation instructions and K calculation data are generated, and the calculation instructions correspond to one calculation data one-to-one.
  • K calculation instructions and M calculation data are generated, and the calculation instructions do not correspond to one calculation data one-to-one.
  • multiple calculation instructions and multiple calculation data in the embodiments of the present application may be generated at one time, or generated multiple times, and sent to the cloud-side device at different times.
  • sending the first calculation instruction and the first calculation data to the cloud-side device is taken as an example.
  • the first calculation instruction is any calculation instruction generated by the end-side device running the first artificial intelligence model to perform inference on the first to-be-analyzed data
  • the first calculation data is the calculation data corresponding to the first calculation instruction.
  • S905 The end-side device sends the first calculation instruction and the first calculation data to the cloud-side device.
  • S906 The cloud-side device executes the first calculation instruction through the computing resource to calculate the first calculation data to obtain the first calculation result.
  • S907 The cloud-side device sends the first calculation result to the end-side device. Therefore, the end-side device receives the first calculation result from the cloud-side device.
  • the end-side device may also provide the first calculation instruction and the first calculation data to the cloud-side device in the following manner:
  • the end-side device may send the download address of the web server for storing the first calculation instruction and the first calculation data to the cloud-side device, so that the cloud-side device obtains the first calculation instruction and the first calculation data according to the download address.
  • the end-side device may upload the first calculation instruction and the first calculation data to a server on the network, and send the URL of the server to the cloud-side device. Therefore, the cloud-side device downloads the first calculation instruction and the first calculation data according to the URL.
  • the end-side device uploads the first calculation instruction and the first calculation data to a network server, or the cloud-side device downloads the first calculation instruction and the first calculation data from a network service, both of which can adopt a second transmission technology, such as MD5.
  • the end-side device may register or open an account with the cloud-side device, and obtain cloud-side computing power services from the cloud-side device as needed.
  • registration or account opening methods please refer to the description of the embodiment corresponding to FIG. 4, which is not repeated here.
  • the end-side device can continue to use the computing resources of the cloud-side device to implement artificial intelligence processing as required. For example, if the end-side device obtains the first calculation result, if there is second data to be analyzed on the end-side device, the end-side device can be based on the loaded first artificial intelligence model (or other artificial intelligence models) and the second data to be analyzed Obtain the second calculation instruction and the second calculation data; thus, the cloud-side device continues to execute the second calculation instruction through the calculation resource to calculate the second calculation data to obtain the second calculation result, and sends the second calculation result to the end-side device. Therefore, the end-side device receives the second calculation result sent by the cloud-side device.
  • the end-side device running the first artificial intelligence model to perform inference on the second data to be analyzed may also generate one or more calculation instructions and one or more calculation data.
  • sending the second calculation instruction and the second calculation data to the cloud-side device is taken as an example.
  • the second calculation instruction is any calculation instruction generated by the end-side device running the first artificial intelligence model to perform inference on the second to-be-analyzed data
  • the second calculation data is the calculation data corresponding to the second calculation instruction.
  • the end-side device may apply to the cloud-side device to release the computing resources after completing the calculation after using the computing resources. For example, after the end-side device completes the use of computing resources, it sends a resource release request to the cloud-side device, and the cloud-side device releases the computing resources after receiving the resource release request sent by the end-side device.
  • the cloud-side device may send a resource release response to the end-side device, and the resource release response is used to indicate that the computing resource is successfully released.
  • the cloud-side device can record the bill for using the computing resource this time according to the application time and release time of the computing resource, which can be used to charge the user for using the cloud computing power service.
  • the end-side device determines whether part or all of the tasks processed by artificial intelligence need to be executed by the cloud-side device.
  • the end-side device determines the task of the inference operation of the data to be analyzed according to the configuration information and transfers it to the cloud-side device for execution.
  • S901-S907 are executed.
  • the end-side device determines that tasks processed by artificial intelligence are executed by the end-side device according to the configuration information. There is no need to perform S901-S907 in this scenario.
  • the end-side device can load the artificial intelligence model through the internal accelerator, and then use the loaded artificial intelligence model to infer the data to be analyzed to obtain the inference result.
  • the end-side device determines according to the configuration information that part of the tasks of the inference operation of the data to be analyzed is performed by the cloud-side device, and another part of the tasks is performed by the end-side device.
  • the end-side device determines that part of the tasks processed by the artificial intelligence is processed by the cloud-side device
  • the end-side device runs the first artificial intelligence model to perform inference on the first data to be analyzed, and generates the first cloud-side execution
  • a third calculation instruction and third calculation data to be executed by the end-side are also generated
  • the end-side device executes the third calculation instruction to calculate the third calculation data to obtain the third calculation result
  • the end-side device merges the first calculation result and the third calculation result to obtain the inference result of the first artificial intelligence model performing inference on the first data to be analyzed .
  • the end-side equipment includes application modules, runtime modules (including runtime API, runtime environment, and HAL), and computing power service driver modules.
  • the application module can be an artificial intelligence application (APP).
  • APP artificial intelligence application
  • the application module is used to perform the collection and preprocessing of the data to be analyzed, and can also be used to provide an artificial intelligence model.
  • the application module belongs to the application layer. Below the application layer is the runtime layer, and below the runtime can include the driver layer. Below the driver layer is the hardware resource layer. The Runtime module is located in the runtime layer.
  • the computing power service driver module can be used to provide the driving function of virtual computing power acceleration hardware.
  • the computing power service driver module can be called a remote direct computing access (RDCA) agent or an RDCA driver (driver) function.
  • the computing power service driver module is also used to call the computing power service data plane in the cloud-side device to perform inference calculation processing; it also has reliable upload functions, such as computing instructions and computing data generated by the runtime layer (such as the first computing instruction and The first calculation data, or the second calculation instruction and the second calculation data) are provided to the cloud-side device.
  • the computing power service driver module can be deployed under the runtime layer of the end-side device, such as in the driver layer.
  • the cloud-side equipment includes a computing power service agent module and a computing power service control module.
  • the computing power service proxy module can also be called RDCA Daemon or RDCA proxy function.
  • the computing power service agent module is responsible for receiving & authenticating the resource application request of the computing power service driving module, and applying for the computing power required by the end-side device from the computing power service control module as needed after the authentication is passed. And according to the computing power service control module to allocate computing resources to the end side.
  • the computing power service agent module is also responsible for obtaining the artificial intelligence model and the data to be analyzed provided by the computing power service driving module, and then loading the artificial intelligence model through computing resources, and performing inference operations on the analyzed data to obtain the inference result, which is returned to the end-side device.
  • the computing power service control module can be called RDCA Control function or RDCA Manager function.
  • the computing power service control module is responsible for the management and allocation of cloud-side computing resources.
  • the computing power service control module also supports the application and recovery of computing resources based on computing power/device type. It can also support the use of computing resources to record bills.
  • the cloud-side device may also include a cloud runtime module, which is used to call computing resources to execute the loading of the artificial intelligence model and the data to be analyzed to obtain the inference result.
  • the cloud runtime module can also be referred to as a cloud runtime (Cloud Runtime) function.
  • the end-side device may also include an end-side runtime API for connecting the application module and the computing power service driver module.
  • the end-side runtime API can also be used to determine whether the artificial intelligence processing is executed by the cloud side or the end side.
  • the cloud-side device supports the registration of the computing power service of the end-side device.
  • the end-side device can also deploy a registration module.
  • Cloud-side devices can deploy computing power service modules.
  • the computing power service module can also be called RDCA service function.
  • the computing power service module has the functions of registering, opening an account, recharging, and purchasing resources for cloud computing power services. It can also be responsible for generating tenant bills based on the bill records of the use of computing resources.
  • the registration module can be a plug-in or APP installed on the end-side device.
  • the registration module may be, for example, an RDCA client (client) APP.
  • the registration module and the computing power service module can be connected through the cloud computing power service management and control plane API.
  • the registration module is responsible for providing operation interfaces for users to register, open an account, recharge, and purchase resources for using cloud computing services, and call the cloud computing service management and control API and implement corresponding functions according to the user's operations.
  • the registration module can also set the working status and working mode of the computing power service driver module according to the user's settings, such as prompting the user when arrears, the applied virtual computing power specifications, such as the virtual Nvidia TX2 accelerator card, etc.
  • the cloud-side device may also include a console module, which supports the provision of web-based operation pages, and supports user registration, account opening, recharge or resource purchase, etc.
  • the console module can also be called the RDCA console (Console) function.
  • FIG. 11 for a schematic flowchart of a communication method provided by an embodiment of this application.
  • the end-side device determines that the task of the inference operation of the data to be analyzed according to the configuration information is executed by the cloud-side device.
  • the registration module sends a resource application request 1 to the computing power service driving module.
  • the resource application request 1 is used to request computing resources required to implement the artificial intelligence function, that is, computing power requirements.
  • the resource application request 1 carries an agent ID.
  • the agent ID may include a resource ID, and also includes at least one of a user ID and an ID of the end-side device.
  • the user ID and the ID of the end-side device can be used for subsequent use and billing of computing resources.
  • the computing power service driver module After receiving resource application request 1, the computing power service driver module sends resource application request 1 to the cloud-side device, so that the computing power service proxy module (RDCA Daemon or RDCA proxy function) of the cloud-side device receives resources from the end-side device Application request 1.
  • the computing power service proxy module RDCA Daemon or RDCA proxy function
  • the computing power service agent module After receiving the resource application request 1, the computing power service agent module sends a resource application request 2 to the computing power service control module.
  • the resource application request 2 carries the Agent ID.
  • the computing power service control module After receiving the resource application request 2, the computing power service control module allocates computing resources to the end-side device according to the computing resource information corresponding to the resource ID.
  • the computing power service control module may send the ID of the computing resource to the computing power service agent module.
  • the ID of the computing resource may be carried in the resource application response 2 and sent to the computing power service agent module. Therefore, the computing power service agent module can maintain the corresponding relationship between the agent ID and the ID of the computing resource.
  • the ID of the computing resource includes the instance ID of the computing resource, and may also include at least one of the ID of the hardware resource or the communication IP address of the hardware.
  • the hardware resource can be a board card.
  • the computing power service agent module sends a resource request response 1 to the computing power service driving module, and the computing power service response 1 is used to indicate that the computing power resource application is successful.
  • the agent ID can be carried in the response of the computing power service.
  • the computing power service driving module forwards the resource application response 1 to the registration module.
  • the runtime module After receiving the model loading request, the runtime module loads the artificial intelligence model, and sends a model loading response to the application module.
  • the application module sends a running model instruction to the runtime module, and the running model instruction carries the data to be analyzed or the storage path of the data to be analyzed in the model.
  • the runtime module runs the artificial intelligence model to perform inference on the data to be analyzed, obtains first calculation data and first calculation instructions, and sends the first calculation data and first calculation instructions to the computing power service driving module.
  • the computing power service driving module may provide the first data and the first calculation instruction to the computing power service agent module.
  • the computing power service agent module After obtaining the first calculation data and the first calculation instruction, the computing power service agent module sends the first calculation data and the first calculation instruction to the cloud-side runtime module. Specifically, the computing power service agent module sends a calculation instruction to the cloud-side runtime module.
  • the cloud-side runtime module After the cloud-side runtime module receives the model running request, it calls the computing resource (ie, the hardware resource) corresponding to the ID of the computing resource to execute the first calculation instruction to calculate the first calculation data to obtain the first calculation result.
  • the computing resource ie, the hardware resource
  • S1114 The cloud-side runtime module sends the first calculation result to the computing power service agent module.
  • the computing power service agent model sends the first calculation result to the computing power service driving module.
  • the computing power service driving module sends the first calculation result to the application module through the end-side runtime module.
  • the registration module may apply for resource release. For example, the registration module can send a resource release request to the computing power service driver module.
  • the computing power service driving module sends a resource release request to the computing power service agent module.
  • the resource release request can carry the ID of the computing power instance and the Agent ID.
  • the computing power service agent module notifies the computing power service control module to release computing resources according to the Agent ID.
  • the computing power service agent module After the computing power service agent module determines that the resource is released, it sends a resource release response to the computing power service driving module, and the resource release response is used to indicate that the resource is released successfully.
  • the computing power service driving module forwards the resource release response to the registration module.
  • the computing power service control module can record the bill for the use of computing resources this time according to the application time and release time of computing resources, which can be used for subsequent billing for the user's use of cloud computing power services.
  • FIG. 12 is a schematic diagram of another possible deployment mode.
  • the deployment method shown in FIG. 12 differs from the deployment method shown in FIG. 10 in that the accelerator is deployed in the end-side device in FIG. 12, that is, the end-side device includes hardware resources for implementing acceleration.
  • the tasks processed by the artificial intelligence model can be determined to be executed by the cloud-side device, or executed by the end-side device, or executed by the cloud-side device and the end-side device in cooperation with the actual situation.
  • the end-side device may determine, according to the configuration information, that part of the tasks processed by the artificial intelligence are executed by the cloud-side device, and the other part of the tasks are executed by the end-side device.
  • the end-side device can divide the tasks processed by artificial intelligence into two parts according to the computing power of the local accelerator and the computing power of the registered second computing resource, and transfer one part of the task to the cloud-side device for execution, and the other part of the task to be executed by the end-side device.
  • the function of splitting tasks can be realized by the end-side runtime module, which is used to split the tasks processed by artificial intelligence. For example, the calculation instructions and calculation data generated by running the artificial intelligence model to reason about the data to be analyzed are separately processed. .
  • the running model instruction triggered by the end-side runtime module carries a splitting instruction, and the splitting instruction is used to indicate a splitting rule for splitting the artificial intelligence model. Therefore, after the end-side runtime module receives the splitting instruction, according to the splitting instruction, it will run the artificial intelligence model to perform calculation instructions and calculation data generated by reasoning on the data to be analyzed for split processing.
  • split rules corresponding to different artificial intelligence models can be configured in the end-side runtime module.
  • the running model instruction carries the name of the artificial intelligence model
  • the splitting module can split the calculation instruction and the calculation data according to the split rule corresponding to the name of the artificial intelligence model.
  • the end-side runtime module can also configure general splitting rules. For artificial intelligence models that cannot match the corresponding relationship, general splitting rules can be used to perform split processing.
  • the end-side runtime module splits the artificial intelligence model according to the computing power of the local accelerator and the computing power of the requested computing resource.
  • the end-side runtime module can perform split before resource application, and then perform split processing according to the computing power of the local accelerator and the computing power of the registered second computing resource.
  • the splitting module can also perform splitting after the resource application, and then perform splitting processing based on the computing power of the local accelerator and the computing power of the applied second computing resource.
  • the two parts of the task split by the split module can be executed in serial mode, or executed in parallel, or executed in mixed serial and parallel mode.
  • Figure 13 shows a schematic diagram of another device.
  • the apparatus 1300 may be an end-side device, and may be a chip, a chip system, or a processor that supports the end-side device to implement the foregoing method.
  • the device can be used to implement the method executed by the end-side device in the foregoing method embodiment.
  • the device has the function of implementing the end-side device described in the embodiment of this application.
  • the device includes the end-side device that executes the modules corresponding to the steps described in the embodiment of the present application (such as Figure 5, Figure 7, Figure 10).
  • the function or unit or means can be realized by software, or by hardware, or by hardware executing corresponding software, or by software Realized by combining with hardware.
  • the apparatus 1300 may include one or more processors 1301, and the processor 1301 may also be referred to as a processing unit, which may implement certain control functions.
  • the processor 1301 may be a general-purpose processor or a special-purpose processor. For example, it can be a central processing unit.
  • the central processing unit can be used to control communication devices (such as base stations, baseband chips, terminals, terminal chips, DU or CU, etc.), execute software programs, and process data in the software programs.
  • the processor 1301 may also store instructions and/or data 1303, and the instructions and/or data 1303 may be executed by the processor, so that the apparatus 1300 executes the above method embodiments. Described method.
  • the processor 1301 may include a transceiver unit for implementing receiving and sending functions.
  • the transceiver unit may be a transceiver circuit, or an interface, or an interface circuit.
  • the transceiver circuits, interfaces, or interface circuits used to implement the receiving and transmitting functions can be separate or integrated.
  • the foregoing transceiver circuit, interface, or interface circuit can be used for code/data reading and writing, or the foregoing transceiver circuit, interface, or interface circuit can be used for signal transmission or transmission.
  • the apparatus 1300 may include a circuit, and the circuit may implement the sending or receiving or communication functions in the foregoing method embodiments.
  • the apparatus 1300 may include one or more memories 1302, on which instructions 1304 may be stored, and the instructions may be executed on the processor, so that the apparatus 1300 executes the foregoing method embodiments. Described method.
  • data may also be stored in the memory.
  • instructions and/or data may also be stored in the processor.
  • the processor and the memory can be provided separately or integrated together. For example, the corresponding relationship described in the foregoing method embodiment may be stored in a memory or in a processor.
  • the apparatus 1300 may further include a transceiver 1305 and/or an antenna 1306.
  • the processor 1301 may be referred to as a processing unit, and controls the device 1300.
  • the transceiver 1305 may be called a transceiver unit, a transceiver, a transceiver circuit, a transceiver device or a transceiver module, etc., for implementing the transceiver function.
  • the apparatus 1300 in the embodiment of the present application may be used to execute the method described in the foregoing embodiment of the present application.
  • the processor and transceiver described in this application can be implemented in integrated circuit (IC), analog IC, radio frequency integrated circuit RFIC, mixed signal IC, application specific integrated circuit (ASIC), printed circuit board ( printed circuit board, PCB), electronic equipment, etc.
  • the processor and transceiver can also be manufactured by various IC process technologies, such as complementary metal oxide semiconductor (CMOS), nMetal-oxide-semiconductor (NMOS), P-type Metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
  • CMOS complementary metal oxide semiconductor
  • NMOS nMetal-oxide-semiconductor
  • PMOS P-type Metal oxide semiconductor
  • BJT bipolar junction transistor
  • BiCMOS bipolar CMOS
  • SiGe silicon germanium
  • GaAs gallium arsenide
  • the device described in the above embodiment may be an end-side device, but the scope of the device described in this application is not limited to this, and the structure of the device may not be limited by FIG. 13.
  • the device can be a stand-alone device or can be part of a larger device.
  • the device may be:
  • the IC collection may also include storage components for storing data and/or instructions;
  • ASIC such as modem (MSM)
  • the apparatus 1300 may be applied to cloud-side equipment.
  • the apparatus 1300 may be used to implement the method executed by the cloud-side device in the foregoing method embodiment.
  • the device has the function of implementing the cloud-side device described in the embodiment of the present application.
  • the device includes a module or unit or means corresponding to the cloud-side device executing the steps related to the cloud-side device described in the embodiment of the present application,
  • the functions or units or means can be realized by software, or by hardware, or by hardware executing corresponding software, or by a combination of software and hardware.
  • the apparatus 1300 may include one or more processors 1301, and the processor 1301 may also be referred to as a processing unit, which may implement certain control functions.
  • the processor 1301 may be a general-purpose processor or a special-purpose processor.
  • the processor 1301 may also store instructions and/or data 1303, and the instructions and/or data 1303 may be executed by the processor, so that the apparatus 1300 executes the above method embodiments. Described method.
  • the processor 1301 may include a transceiver unit for implementing receiving and sending functions.
  • the transceiver unit may be a transceiver circuit, or an interface, or an interface circuit.
  • the transceiver circuits, interfaces, or interface circuits used to implement the receiving and transmitting functions can be separate or integrated.
  • the foregoing transceiver circuit, interface, or interface circuit can be used for code/data reading and writing, or the foregoing transceiver circuit, interface, or interface circuit can be used for signal transmission or transmission.
  • the apparatus 1300 may include a circuit, and the circuit may implement the sending or receiving or communication functions in the foregoing method embodiments.
  • the apparatus 1300 may include one or more memories 1302, on which instructions 1304 may be stored, and the instructions may be executed on the processor, so that the apparatus 1300 executes the foregoing method embodiments. Described method.
  • data may also be stored in the memory.
  • instructions and/or data may also be stored in the processor.
  • the processor and the memory can be provided separately or integrated together. For example, the corresponding relationship described in the foregoing method embodiment may be stored in a memory or in a processor.
  • the apparatus 1300 may further include a transceiver 1305.
  • the processor 1301 may be referred to as a processing unit, and controls the device 1300.
  • the transceiver 1305 may be called a transceiver unit, a transceiver, a transceiver circuit, a transceiver device or a transceiver module, etc., for implementing the transceiver function.
  • the apparatus 1300 in the embodiment of the present application may be used to execute the method executed by the cloud-side device described in the foregoing embodiment of the present application.
  • the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capability.
  • the steps of the foregoing method embodiments may be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the above-mentioned processor may be a general-purpose processor, a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (ASIC), a field programmable gate array (field programmable gate array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • the processing unit used to execute these technologies at a communication device can be implemented in one or more general-purpose processors, DSPs, digital signal processing devices, ASICs, Programmable logic device, FPGA, or other programmable logic device, discrete gate or transistor logic, discrete hardware component, or any combination of the foregoing.
  • the general-purpose processor may be a microprocessor.
  • the general-purpose processor may also be any traditional processor, controller, microcontroller, or state machine.
  • the processor can also be implemented by a combination of computing devices, such as a digital signal processor and a microprocessor, multiple microprocessors, one or more microprocessors combined with a digital signal processor core, or any other similar configuration. accomplish.
  • the memory in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic RAM
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM
  • the present application also provides a computer-readable medium on which a computer program is stored, and when the computer program is executed by a computer, the function of any of the foregoing method embodiments is realized.
  • This application also provides a computer program product, which, when executed by a computer, realizes the functions of any of the foregoing method embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state disk, SSD)) etc.
  • system and “network” in this article are often used interchangeably in this article.
  • the term “and/or” in this article is only an association relationship describing the associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, exist alone In the three cases of B, A can be singular or plural, and B can be singular or plural.
  • the character “/” generally indicates that the associated objects before and after are in an "or” relationship.
  • At least one of or “at least one of” herein means all or any combination of the listed items, for example, "at least one of A, B and C", It can mean: A alone exists, B alone exists, C exists alone, A and B exist at the same time, B and C exist at the same time, and there are six cases of A, B and C at the same time, where A can be singular or plural, and B can be Singular or plural, C can be singular or plural.
  • B corresponding to A means that B is associated with A, and B can be determined according to A.
  • determining B based on A does not mean that B is determined only based on A, and B can also be determined based on A and/or other information.
  • the pre-definition in this application can be understood as definition, pre-definition, storage, pre-storage, pre-negotiation, pre-configuration, curing, or pre-fired.
  • the systems, devices, and methods described in this application can also be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer And Data Communications (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本申请公开了一种通信方法及装置,用以解决可能存在的端侧设备内部设置的专用加速器的计算能力不能满足软件和应用的要求的问题。端侧设备按需从云侧设备申请计算资源,一方面,端侧设备可以将人工智能模型和待分析数据发送云侧设备,云侧设备结合计算资源加载并运行人工智能模型,并针对待分析数据进行推理得到推理结果反馈给端侧设备。另一方面,端侧设备可以自身加载人工智能模型,然后运行人工模型以及待分析数据产生计算指令以及计算数据,将计算指令和计算数据发送给云侧设备,进而云侧设备执行计算指令对计算数据进行计算得到计算结果,并反馈给端侧设备。

Description

一种通信方法及装置
相关申请的交叉引用
本申请要求在2020年03月31日提交中国专利局、申请号为202010242173.X、申请名称为“一种通信方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及通信技术领域,尤其涉及一种通信方法及装置。
背景技术
随着人工智能、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)的兴起,产生了大量与之相关的软件和应用。这些软件和应用对计算能力要求较高,通用中央处理单元(central processing unit,CPU)已经无法满足要求。为了满足这些软件和应用对计算能力的要求,通常使用专用加速器来执行这些软件和应用的计算。专用加速器比如可以是神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、现场可编程门阵列(field programmable gate array,FPGA)等。
目前,端侧设备内部设置专用加速器,来实现软件和应用的计算。但由于成本、功耗、空间等的限制,某些端侧设备内部设置的专用加速器的计算能力可能还是不能满足所有软件和应用的要求。
发明内容
本申请实施例提供一种通信方法及装置,用以解决可能存在的端侧设备内部设置的专用加速器的计算能力不能满足软件和应用的要求的问题。
第一方面,本申请实施例提供一种通信方法,该方法应用于端侧设备,比如可以由端侧设备的芯片或者芯片系统执行。该通信方法包括:端侧设备向云侧设备发送资源申请请求,并向云侧设备提供用于实现人工智能处理所需的第一人工智能模型;该资源申请请求用于请求实现人工智能功能所需的计算资源;然后,端侧设备接收云侧设备发送的加载完成消息,加载完成消息用于指示云侧设备基于资源申请请求为端侧设备分配的计算资源已成功加载第一人工智能模型;进一步地,端侧设备向云侧设备提供第一待分析数据,从而云侧设备可以基于第一人工智能模型对第一待分析数据进行推理得到第一推理结果,发送给端侧设备,进而端侧设备接收云侧设备发送的第一待分析数据的第一推理结果;其中,第一推理结果是基于第一待分析数据运行第一人工智能模型得到的。
通过上述方案,端侧设备根据自身的需求向云侧设备申请计算资源,然后由云侧设备结合计算资源协助端侧设备执行模型加载以及使用加载的模型执行待分析数据的计算,即使端侧设备内部设置的专用计算器的计算能力无法满足软件和应用的要求,能够借助云侧设备来实现数据推理。另外,应用的开发无需结合云侧设备的部署,应用侧仅需将需要卸载到云侧的模型以及数据发送给云侧设备即可,简化应用的开发人员的开发难度, 降低开发人员的工作量。
在一种可能的设计中,上述方法还可以包括:
端侧设备向云侧设备发送算力服务注册请求,算力服务注册请求用于请求云侧设备为端侧设备提供算力服务;
端侧设备接收云侧设备发送的算力服务注册响应,算力服务注册响应用于指示端侧设备已成功请求到云侧设备的算力服务。
通过上述设备,端侧设备可以提前向云侧注册申请算力服务,可以按照需求申请算力服务,后续针对用户来说,可以无感知的使用云侧提供的算力服务。
在一种可能的设计中,端侧设备向云侧设备提供用于实现人工智能处理所需的第一人工智能模型,包括:
端侧设备向云侧设备发送第一人工智能模型;或者,
端侧设备将第一人工智能模型的下载地址发送给云侧设备。
在一种可能的设计中,端侧设备向云侧设备提供第一待分析数据,包括:
端侧设备向云侧设备发送第一待分析数据;或者,
端侧设备将第一人工智能模型的下载地址发送给云侧设备。
在一种可能的设计中,端侧设备接收云侧设备发送的第一待分析数据的第一推理结果后,还包括:
端侧设备将第二待分析数据提供给云侧设备,并接收云侧设备发送的第二待分析数据的第二推理结果;其中,第二推理结果是基于第二待分析数据运行第一人工智能模型得到的。
上述设计中,端侧设备获取第一待分析数据的推理结果后,如果还需要针对第二待分析数据进行推理,从而仅需要将第二待分析数据发送给云侧设备即可,无需再次申请资源,实现较简单。
在一种可能的设计中,端侧设备接收云侧设备发送的第一待分析数据的第一推理结果后,还包括:
端侧设备向云侧设备提供第二人工智能模型,并向云侧设备提供第三待分析数据;端侧设备接收云侧设备发送的第三待分析数据的第三推理结果;其中,第三推理结果是计算资源基于第二待分析数据运行第一人工智能模型得到的。
上述设计中,端侧设备获取第一待分析数据的推理结果后,如果还需要针对第二待分析数据采用其它的人工智能模型进行推理,从而仅需要将第二待分析数据以及需要的人工智能模型发送给云侧设备即可,无需再次申请资源,实现较简单。
在一种可能的设计中,还包括:端侧设备完成计算资源的使用后,向云侧设备发送资源释放请求,资源释放请求用于请求释放计算资源;端侧设备接收云侧设备发送的资源释放响应,资源释放响应用于指示成功释放计算资源以及计算资源运行的人工智能模型。上述设计中,端侧设备在确定完成计算资源的使用后,及时地通知云侧释放掉资源,避免资源浪费。
在一种可能的设计中,端侧设备向云侧设备发送资源申请请求之前,还包括:端侧设备确定人工智能处理的部分任务或者全部任务由云侧设备处理。
上述设计中,端侧设备可以结合自身情况,确定是否需要云侧设备协助执行人工智能处理任务。
在一种可能的设计中,还包括:
端侧设备确定人工智能处理的部分任务由云侧设备处理时,端侧设备将待使用的人工智能模型拆解为第一人工智能模型以及第三人工智能模型;
端侧设备向云侧设备提供第一待分析数据之前,端侧设备加载第三人工智能模型,并在端侧设备接收云侧设备发送的加载完成消息时,将待分析数据拆分为第一待分析数据以及第四待分析数据;
端侧设备将待分析数据拆分为第一待分析数据以及第四待分析数据后,运行加载的第三人工智能模型对第四待分析数据进行推理得到第四推理结果;
端侧设备接收到第一推理结果后,对第一推理结果和第四推理结果进行融合处理。
上述设计中,端侧设备可以结合自身情况,将一部分业务卸载到云侧设备,可以减轻端侧设备的加速器的负担。
第二方面,本申请实施例提供另一种通信方法,该通信方法应用于云侧设备。比如可以由云侧设备上的芯片或者芯片系统来实现。方法可以包括;云侧设备接收来自端侧设备的资源申请请求,并获取端侧设备提供的用于实现人工智能处理所需的第一人工智能模型,资源申请请求用于请求实现人工智能功能所需的计算资源;云侧设备根据资源申请请求为端侧设备分配计算资源;云侧设备通过计算资源成功加载第一人工智能模型后,向端侧设备发送加载完成消息,加载完成消息用于指示云侧设备上的计算资源已成功加载第一人工智能模型;云侧设备获取端侧设备提供的第一待分析数据,通过运行第一人工智能模型对第一待分析数据进行推理得到第一推理结果;并向端侧设备发送第一推理结果。
通过上述方案,端侧设备根据自身的需求向云侧设备申请计算资源,然后由云侧设备结合计算资源协助端侧设备执行模型加载以及使用加载的模型执行待分析数据的计算,即使端侧设备内部设置的专用计算器的计算能力无法满足软件和应用的要求,能够借助云侧设备来实现数据推理。另外,应用的开发无需结合云侧设备的部署,应用侧仅需将需要卸载到云侧的模型以及数据发送给云侧设备即可,简化应用的开发人员的开发难度,降低开发人员的工作量。
在一种可能的设计中,云侧设备接收来自端侧设备的资源申请请求之前,还包括:云侧设备接收端侧设备发送的算力服务注册请求,算力服务注册请求用于请求云侧设备为端侧设备的用户提供算力服务;云侧设备向端侧设备发送算力服务注册响应,算力服务注册响应用于指示端侧设备的用户已成功请求到云侧设备的算力服务。
通过上述设备,端侧设备可以提前向云侧注册申请算力服务,可以按照需求申请算力服务,后续针对用户来说,可以无感知的使用云侧提供的算力服务。
在一种可能的设计中,算力服务注册请求中携带计算资源信息,计算资源信息用于表征端侧设备所申请的算力规格;算力服务注册响应携带云侧设备为端侧设备分配的资源ID,资源ID用于标识计算资源信息;资源申请请求中携带资源ID,云侧设备根据资源申请请求为端侧设备分配计算资源,包括:云端设备根据资源ID对应的计算资源信息为端侧设备分配计算资源。
在一种可能的设计中,资源申请请求中携带计算资源信息,计算资源信息用于表征端侧设备所申请的算力规格;云侧设备根据资源申请请求为端侧设备分配计算资源,包 括:云端设备根据计算资源信息为端侧设备分配计算资源。
在一种可能的设计中,云侧设备获取端侧设备提供的用于实现人工智能处理所需的第一人工智能模型,包括:云侧设备接收端侧设备发送第一人工智能模型;或者,云侧设备接收端侧设备发送的第一人工智能模型的下载地址,并根据第一人工智能模型的下载地址下载第一人工智能模型。
在一种可能的设计中,云侧设备获取端侧设备提供的第一待分析数据,包括:云侧设备接收端侧设备发送第一待分析数据;或者,云侧设备接收端侧设备发送的第一待分析数据的下载地址,并根据第一待分析数据的下载地址下载待分析数据。
在一种可能的设计中,云侧设备向端侧设备发送第一推理结果后,还包括:云侧设备获取端侧设备提供的第二待分析数据,通过运行第一人工智能模型对第二待分析数据进行推理得到第二推理结果;并向端侧设备发送第二推理结果。
在一种可能的设计中,云侧设备向端侧设备发送第一推理结果后,还包括:云侧设备获取端侧设备提供的第二人工智能模型,获取端侧设备提供的第三待分析数据;云侧设备通过运行第二人工智能模型对第三待分析数据进行推理得到第三推理结果;并向端侧设备发送第三推理结果。
在一种可能的设计中,还包括:云侧设备接收端侧设备发送的资源释放请求,资源释放请求用于请求释放计算资源;云侧设备释放计算资源,并释放计算资源运行的人工智能模型;云侧设备向端侧设备发送资源释放响应,资源释放响应用于指示成功释放计算资源以及计算资源运行的人工智能模型。
第二方面的具体有益效果,可以参见第一方面的描述,此处不再赘述。
第三方面,本申请实施例提供一种通信方法,该通信方法应用于端侧设备,比如由端侧设备的芯片或者芯片系统执行。该通信方法包括:端侧设备向云侧设备发送资源申请请求,资源申请请求用于请求实现人工智能功能所需的计算资源;端侧设备接收云侧设备发送的资源申请响应,资源申请响应用于指示云侧设备成功为端侧设备分配计算资源;端侧设备在运行用于实现人工智能功能的第一人工智能模型对第一待分析数据执行推理时产生第一计算指令以及第一计算数据,并将第一计算指令以及第一计算数据发送给云侧设备;端侧设备接收云侧设备发送的第一计算结果;其中,第一计算结果是计算资源执行第一计算指令对第一计算数据进行计算得到的计算结果。
通过上述方案,端侧设备根据自身的需求向云侧设备申请计算资源,端侧设备自身加载人工智能模型,然后在针对待分析数据运行人工智能模型产生计算指令以及计算数据,进而由云侧设备结合计算资源协助端侧设备执行计算指令执行计算数据的计算得到计算结果,即使端侧设备内部设置的专用计算器的计算能力无法满足软件和应用的要求,能够借助云侧设备来实现数据推理。另外,应用的开发无需结合云侧设备的部署,应用侧仅需将需要卸载到云侧的模型以及数据发送给云侧设备即可,简化应用的开发人员的开发难度,降低开发人员的工作量。
在一种可能的设计中,还包括:端侧设备向云侧设备发送算力服务注册请求,算力服务注册请求用于请求云侧设备为端侧设备提供算力服务;端侧设备接收云侧设备发送的算力服务注册响应,算力服务注册响应用于指示端侧设备已成功请求到云侧设备的算力服务。
在一种可能的设计中,端侧设备接收云侧设备发送的第一计算结果后,还包括:端 侧设备运行第一人工智能模型对第二待分析数据执行推理,获得第二计算指令以及第二计算数据;端侧设备接收云侧设备发送的第二计算结果;其中,第二计算结果是计算资源执行第二计算指令对第二计算数据进行计算得到的计算结果。
在一种可能的设计中,该方法还包括:端侧设备完成计算资源的使用后,向云侧设备发送资源释放请求,资源释放请求用于请求释放计算资源;端侧设备接收云侧设备发送的资源释放响应,资源释放响应用于指示成功释放计算资源。
在一种可能的设计中,端侧设备向云侧设备发送资源申请请求之前,该方法还包括:端侧设备确定人工智能处理的部分任务或者全部任务由云侧设备处理。
在一种可能的设计中,该方法还包括:端侧设备确定人工智能处理的部分任务由云侧设备处理时,端侧设备在运行第一人工智能模型对第一待分析数据执行推理时,还产生第三计算指令以及第三计算数据;端侧设备执行第三计算执令对第三计算数据进行计算得到第三计算结果;端侧设备接收到接收云侧设备发送的第一计算结果后,端侧设备将第一计算结果和第三计算结果进行融合处理得到第一人工智能模型对第一待分析数据执行推理的推理结果。
第四方面,本申请实施例提供一种通信方法,包括:云侧设备接收来自端侧设备的资源申请请求,并获取端侧设备提供的用于实现人工智能处理所需的第一人工智能模型,资源申请请求用于请求实现人工智能功能所需的计算资源;云侧设备根据资源申请请求为端侧设备分配计算资源;云侧设备向端侧设备发送资源申请响应,资源申请响应用于指示云侧设备成功为端侧设备分配计算资源;云侧设备接收端侧设备发送的第一计算指令以及第一计算数据;云侧设备通过计算资源执行第一计算指令对第一计算数据进行计算得到计算结果;云侧设备将计算结果发送给端侧设备。
通过上述方案,端侧设备根据自身的需求向云侧设备申请计算资源,端侧设备自身加载人工智能模型,然后在针对待分析数据运行人工智能模型产生计算指令以及计算数据,进而由云侧设备结合计算资源协助端侧设备执行计算指令执行计算数据的计算得到计算结果,即使端侧设备内部设置的专用计算器的计算能力无法满足软件和应用的要求,能够借助云侧设备来实现数据推理。另外,应用的开发无需结合云侧设备的部署,应用侧仅需将需要卸载到云侧的模型以及数据发送给云侧设备即可,简化应用的开发人员的开发难度,降低开发人员的工作量。
在一种可能的设计中,云侧设备接收来自端侧设备的资源申请请求之前,还包括:云侧设备接收端侧设备发送的算力服务注册请求,算力服务注册请求用于请求云侧设备为端侧设备的用户提供算力服务;云侧设备向端侧设备发送算力服务注册响应,算力服务注册响应用于指示端侧设备的用户已成功请求到云侧设备的算力服务。
在一种可能的设计中,算力服务注册请求中携带计算资源信息,计算资源信息用于表征端侧设备所申请的算力规格;算力服务注册响应携带云侧设备为端侧设备分配的资源ID,资源ID用于标识计算资源信息;资源申请请求中携带资源ID,云侧设备根据资源申请请求为端侧设备分配计算资源,包括:
云端设备根据资源ID对应的计算资源信息为端侧设备分配计算资源。
在一种可能的设计中,资源申请请求中携带计算资源信息,计算资源信息用于表征端侧设备所申请的算力规格;云侧设备根据资源申请请求为端侧设备分配计算资源,包 括:
云端设备根据计算资源信息为端侧设备分配计算资源。
在一种可能的设计中,云侧设备向端侧设备发送第一计算结果后,还包括:
云侧设备接收端侧设备发送的第二计算指令以及第二计算数据,通过计算资源运行第二计算指令对第二计算数据进行推理得到第二计算结果;并向端侧设备发送第二计算结果。
在一种可能的设计中,还包括:
云侧设备接收端侧设备发送的资源释放请求,资源释放请求用于请求释放计算资源;
云侧设备释放计算资源,并向端侧设备发送资源释放响应,资源释放响应用于指示成功释放计算资源。
第四方面的具体有益效果,可以参见第一方面的描述,此处不再赘述。
第五方面,本申请提供一种通信装置,用于端侧设备或端侧设备的芯片,包括用于执行前述第一方面或第一方面的任意可能的实现方式中的方法的单元或手段(means),或者包括用于执行前述第三方面或第三方面的任意可能的实现方式中的方法的单元或手段。
第六方面,本申请提供一种通信装置,用于云侧设备或云侧设备的芯片,包括用于执行前述第二方面或第二方面的任意可能的实现方式中的方法的单元或手段,或者包括用于执行前述第四方面或第四方面的任意可能的实现方式中的方法的单元或手段。
第七方面,本申请提供一种通信装置,用于端侧设备或端侧设备的芯片,包括至少一个处理元件和至少一个存储元件,其中所述至少一个存储元件用于存储程序和数据,所述至少一个处理元件用于执行前述第一方面或第一方面的任意可能的实现方式中的方法,或者用于执行前述第三方面或第三方面的任意可能的实现方式中的方法。
第八方面,本申请提供一种通信装置,用于云侧设备或云侧设备的芯片,包括至少一个处理元件和至少一个存储元件,其中所述至少一个存储元件用于存储程序和数据,所述至少一个处理元件用于执行前述第二方面或第二方面的任意可能的实现方式中的方法,或者用于执行前述第四方面或第四方面的任意可能的实现方式中的方法。
第九方面,本申请提供一种通信装置,包括处理器和接口电路,所述接口电路用于接收来自所述通信装置之外的其它通信装置的信号并传输至所述处理器或将来自所述处理器的信号发送给所述通信装置之外的其它通信装置,所述处理器通过逻辑电路或执行代码指令用于实现前述第一方面或第一方面的任意可能的实现方式中的方法,或者用于实现前述第三方面或第三方面的任意可能的实现方式中的方法。
第十方面,本申请提供一种通信装置,包括处理器和接口电路,所述接口电路用于接收来自所述通信装置之外的其它通信装置的信号并传输至所述处理器或将来自所述处理器的信号发送给所述通信装置之外的其它通信装置,所述处理器通过逻辑电路或执行代码指令用于实现前述第二方面或第二方面的任意可能的实现方式中的方法,或者用于实现前述第四方面或第四方面的任意可能的实现方式中的方法。
第十一方面,本申请提供一种计算机程序产品,该计算机程序产品包括计算机指令,当该计算机指令被执行时,使得前述第一方面或第一方面的任意可能的实现方式中的方法被执行,或使得前述第二方面或第二方面的任意可能的实现方式中的方法被执行,或使得前述第三方面或第三方面的任意可能的实现方式中的方法被执行,或使得前述第四 方面或第四方面的任意可能的实现方式中的方法被执行。
第十二方面,本申请提供了一种计算机可读存储介质,该存储介质存储有计算机指令,当所述计算机指令被执行时,使得前述第一方面或第一方面的任意可能的实现方式中的方法被执行,或使得前述第二方面或第二方面的任意可能的实现方式中的方法被执行,或使得前述第三方面或第三方面的任意可能的实现方式中的方法被执行,或使得前述第四方面或第四方面的任意可能的实现方式中的方法被执行。
附图说明
图1为本申请实施例中一种可能的人工智能处理部署时实现方式示意图;
图2为本申请实施例中另一种可能的人工智能处理部署时实现方式示意图;
图3为本申请实施例中通信系统架构示意图;
图4为本申请实施例中第一种可能的通信方法流程示意图;
图5为本申请实施例中第一种可能的通信系统部署架构示意图;
图6为本申请实施例中第二种可能的通信方法流程示意图;
图7为本申请实施例中第二种可能的通信系统部署架构示意图;
图8为本申请实施例中第三种可能的通信方法流程示意图;
图9为本申请实施例中第四种可能的通信方法流程示意图;
图10为本申请实施例中第三种可能的通信系统部署架构示意图;
图11为本申请实施例中第五种可能的通信方法流程示意图;
图12为本申请实施例中第四种可能的通信系统部署架构示意图;
图13为本申请实施例中通信装置1300结构示意图。
具体实施方式
下面先对本申请涉及的技术术语进行解释说明。
1)运行时(Runtime)模块一般可以包括应用程序接口(application programming interface,API)、运行时(Runtime)环境,硬件抽象层(hardware abstraction layer,HAL)。三个模块组成。
API,主要负责提供统一的模型管理、执行接口,以用来实现模型网络定义、编译、执行等步骤。
运行时环境,作为API的执行引擎,用于完成构建人工智能模型、装填人工智能模型的数据、加载输入的数据、推理运算等。此外,还可以对代码进行优化、生成为加速器专用的机器代码等。
HAL,提供统一的接口,屏蔽不同设备厂商的实现差异。开发者只需开发一套代码,便能运行于带有各种加速器芯片的设备上。
2)端侧设备,端侧设备具备人工智能处理的能力。人工智能,比如可以是计算机视觉,包括人脸检测、美颜瘦身变脸、Deepfakes等。再比如智能安防,包括人脸识别、车辆检测、鉴黄鉴暴等。再比如AR/VR,包括AR/VR游戏、VR建模等。端侧设备可以是手机(mobile phone)、平板电脑、笔记本电脑、掌上电脑、移动互联网设备(mobile internet device,MID)、可穿戴设备、相机、车载设备、虚拟现实(virtual reality,VR) 设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端,或智慧家庭(smart home)中的无线终端等。
3)云侧设备,可以是部署有加速器的物理服务器或者服务器集群。云侧设备也可以称为计算节点或者云侧计算集群等。
4)人工智能模型,用于实现人工智能的推理模型,比如神经网络模型、深度学习模型、计算机视觉模型等。
目前在实现人工智能处理时,一种可能的方式是,在端侧设备上设置加速器。加速器区别于主处理器。加速器也可以称为协处理器。主处理器一般由中央处理单元(central processing unit,CPU)来实现。加速器比如可以是图形处理器(graphics processing unit,GPU)、现场可编程门阵列(field programmable gate array,FPGA)、神经网络处理器(neural-network processing unit,NPU)或者专用集成电路(application specific integrated circuit,ASIC)等。参见图1所示,端侧设备执行人工智能处理时,端侧设备侧运行推理模型得到推理结果,实现人工智能处理。端侧设备侧应用将采集的数据进行预处理后,可以调用运行时(runtime)模块接口,加载对应的推理模型和预处理好的数据。模型根据预处理好的数据执行推理过程,实现人工智能处理。端侧设备中也可以部署加速器。端侧设备的runtime可以通过调用加速器来执行模型运行时所需的计算,从而加速推理过程。当然如果端侧设备未部署加速器,可以通过CPU来执行模型运行时所需的计算。
由于成本、功耗、空间等的限制,应用运行所需要的硬件加速能力不是所有的端侧设备都能满足。比如一些端侧设备未部署加速器,可能会导致应用在端侧设备无法运行,或者会出现应用运行慢、耗电量大的问题。或者为了提供更好的用户体验,有些应用升级,但是应用升级可能需要更高的硬件加速能力,可能有些端侧设备不具备应用升级所需的硬件加速能力。
另一种可能的方式是,端侧设备执行人工智能处理时,可以基于云服务的方式为端侧设备提供人工智能处理的能力。参见图2所示,在云服务模式下,应用开发者根据自己业务特定,将算力开销大的推理部分包装成一个运行在云侧设备上特定的人工智能服务,如人脸识别服务、语音识别服务等。
端侧设备的应用将采集到的数据进行预处理后,通过网络调用云侧设备特定人工智能服务的API接口,将待分析数据的发送到特定的云服务(比如人脸识别服务)。人脸识别服务根据待分析的数据执行推理得到推理结果,并发送给端侧设备。
上述方式中,应用的开发需要结合云侧设备的部署,来确定将那些处理卸载到云侧设备执行。因此需要开发人员具备应用开发的能力以外,还需要具备云侧设备的服务开发能力。技术难度较高,工作量较高。并且云侧设备的服务部署、日常维护升级、扩容的要求较高,维护管理的难度大。
基于此,本申请实施例提供一种通信方法、装置及系统。参见图3所示为本申请实施例提供的系统架构示意图。系统中包括端侧设备以及云侧设备。下面结合图3对本申请实施例提供的通信方法流程进行说明。
本申请实施例中,人工智能模型的加载可以在端侧执行也可以在云侧执行。下面先 对人工智能模型的加载由端侧执行的方案进行详细说明。参见图4所示,方法可以包括:
S401,端侧设备向云侧设备发送资源申请请求,从而云侧设备接收来自端侧设备的资源申请请求。资源申请请求用于请求实现人工智能功能所需的计算资源。
S402,云侧设备根据资源申请请求为端侧设备分配计算资源。
示例性地,云侧设备可以针对自身所部署的加速器进行虚拟化,比如可以采用虚拟化核的方式。云侧设备根据计算资源信息为端侧设备分配计算资源时,可以根据算力规格为端侧设备分配一个或者多个核。不同的核对应不同的算力实例。
示例性地,云侧设备可以根据计算资源信息确定算力规格。计算资源信息用于表征端侧设备的算力需求,即端侧设备的用户所请求的算力。计算资源信息比如可以是算力规格,算力规格可以是算力通用单位,比如每秒一万亿次的浮点运算(TFLOPS),或者采用每秒一万亿次的整型运算。计算资源信息还可以包括指定的硬件规格,比如某种硬件的型号。不同的硬件的型号对应不同的算力规格。
云侧设备根据计算资源信息确定算力规格时,一种方式中,计算资源信息包括算力规格,则云侧设备获得计算资源信息,即确定算力规格。另一种方式中,计算资源信息包括硬件规格,根据硬件规格确定算力规格。
云侧设备具体可以通过如下任一种示例来获得计算资源信息:
一种示例中,端侧设备可以在资源申请请求中携带该计算资源信息,从而云侧设备根据资源申请请求中的计算资源信息来为端侧设备分配计算资源。
另一种示例中,端侧设备向云侧设备注册算力服务的时候,可以将计算资源信息发送给云侧设备,从而云侧设备将计算资源信息对应的资源ID发送给端侧设备。从而端侧设备可以将资源ID携带在资源申请请求中发送给云侧设备,从而云侧设备可以根据资源ID对应的计算资源信息为端侧设备分配计算资源。具体的端侧设备向云侧设备注册算力的流程后续会详细说明,此处不进行赘述。
S403,端侧设备向云侧设备提供用于实现人工智能处理所需的第一人工智能模型。
示例性地,端侧设备向云侧设备提供用于实现人工智能处理所需的第一人工智能模型时,可以通过如下方式来实现:
一种可能的方式中,端侧设备可以直接将第一人工智能模型发送给云侧设备。作为一种示例,端侧设备可以将第一人工智能模型携带在资源申请请求中发送给云侧设备,也可以单独发送给云侧设备。
另一种可能的方式中,端侧设备可以将第一人工智能模型的下载地址发送给云侧设备,从而云侧设备根据第一人工智能模型的下载地址获取第一人工智能模型。
比如,第一人工智能模型的下载地址可以是统一资源定位符(uniform resource locator,URL)。
端侧设备可以将第一人工智能模型上传到网络的服务器上,将服务器的URL发送给云侧设备。从而云侧设备根据URL下载第一人工智能模型。端侧设备上传第一人工智能模型到网络的服务器或者云侧设备从网络的服务下载第一人工智能模型,均可以采用秒传技术,比如信息摘要算法5(message-digest algorithm,MD5)。
S404,云侧设备通过计算资源成功加载第一人工智能模型。
S405,云侧设备通过计算资源成功加载第一人工智能模型后,向端侧设备发送加载完成消息,从而端侧设备接收云侧设备发送的加载完成消息。加载完成消息用于指示云 侧设备上的计算资源已成功加载第一人工智能模型。
S406,端侧设备向云侧设备提供第一待分析数据,从而云侧设备获取端侧设备提供的第一待分析数据。
示例性地,端侧设备向云侧设备提供第一待分析数据时,可以通过如下方式来实现:
一种可能的方式中,端侧设备可以直接将第一待分析数据发送给云侧设备。
另一种可能的方式中,端侧设备可以将第一待分析数据的下载地址发送给云侧设备,从而云侧设备根据第一待分析数据的下载地址获取第一待分析数据。
比如,第一待分析数据的下载地址可以是URL。端侧设备可以将第一待分析数据上传到网络的服务器上,并将服务器的URL发送给云侧设备。从而云侧设备根据第一待分析数据的URL下载第一待分析数据。端侧设备上传第一待分析数据到网络的服务器或者云侧设备从网络的服务下载第一待分析数据,均可以采用秒传技术,比如信息摘要算法5(message-digest algorithm,MD5)。
S407,云侧设备获取端侧设备提供的第一待分析数据后,通过所述计算资源运行加载的第一人工智能模型对第一待分析数据进行推理得到第一推理结果。云侧设备通过计算资源运行所述加载的第一人工智能模型,并将第一待分析数据输入到运行的第一人工智能模型得到第一推理结果。
S408,云侧设备将第一推理结果发送给端侧设备。
在一种可能的实施方式中,端侧设备向云侧设备发送资源申请请求之前,端侧设备可以向云侧设备注册或开户,按需从云侧设备而获得云侧算力服务。
云侧设备可以为用户提供使用云算力服务的注册、开户、充值或者算力购买等操作界面,用户可以根据操作界面指引执行相应的操作,调用云侧设备上云算力服务管控面的API,获得云算力服务。一方面,端侧设备可以部署与云侧设备(比如与云算力服务管控面)进行连接的插件或者应用((Application,APP),从而端侧设备可以根据插件或者APP获得操作界面。另一方面,云侧设备可以提供Web化的操作页面,支持用户的注册、开户、充值或资源购买等。
端侧设备的用户首次注册云算力服务时,可以先进行用户注册,用户注册完成后,然后申请算力服务。比如端侧设备响应用于用户的操作,向云侧设备发送用户注册请求,用户注册请求中可以携带用户的用户信息。用户信息用于表明用户身份。比如,用户信息可以包括用户的用户名或者用户标识(Identity,ID)、注册密码等。用户信息中还可以包括其它用于表明用户身份的信息,比如指纹信息等。用户信息可以用于用户后续向云侧设备再次执行充值、资源申请或购买等操作。
云侧设备接收到用户注册请求后,云侧设备可以在用于表明已申请算力服务的用户列表中记录该用户ID。端侧设备可以在向云侧设备发送的资源申请请求中携带用户ID,从而端侧设备验证用户ID所标识的用户已申请过算力服务时,进而执行根据计算资源信息为端侧设备分配所述计算资源。
用户注册完成后,端侧设备可以执行算力服务注册。比如,端侧设备响应用于用户获得云算力服务的操作,向云侧设备发送算力服务注册请求。云侧设备接收到算力服务注册请求后,云侧设备可以向端侧设备发送算力服务注册响应,算力服务注册响应用于指示端侧设备已成功请求所述云侧设备的算力服务。
作为一种示例,算力服务注册请求中可以携带计算资源信息,云侧设备在接收到算力服务注册请求后,可以为端侧设备的用户分配资源ID,资源ID用于标识计算资源信息。云侧设备可以关联保存计算资源信息和资源ID,并将资源ID发送给端侧设备,进而端侧设备可以在资源申请请求中携带该资源ID。
在一种可能的场景中,端侧设备可以按需求继续使用云侧设备加载的第一人工智能模型对新数据(比如第二待分析数据)进行推理运算。比如,端侧设备获得第一待分析数据的推理结果,若端侧设备上存在第二待分析数据,端侧设备将第二待分析数据提供给云侧设备,从而云侧设备获得第二待分析数据后,通过加载的第一人工智能模型对第二待分析数据进行推理得到第二推理结果。即云侧设备将第二待分析数据输入到加载的第一人工智能模型得到第二推理结果。然后,云侧设备将第二推理结果发送给端侧设备。
另一种可能的场景中,端侧设备可以按照需求继续使用已申请的计算资源加载第二人工智能模型,对新数据(比如第三待分析数据)进行推理运算。端侧设备向云侧设备提供第二人工智能模型,从而云侧设备获得第二人工智能模型后,采用计算资源加载所述第二人工智能模型。端侧设备向云侧设备提供第二人工智能模型之前,可以先向云侧设备申请人工智能模型销毁,比如向云侧设备发送模型销毁请求,模型销毁请求中可以携带第一人工智能模型的标识(比如模型名称),云侧设备接收到模型销毁请求后,释放计算资源所加载的第一人工智能模型。云侧设备可以向端侧设备发送模型销毁响应,端侧设备接收到模型销毁响应时,确定云侧设备执行第一人工智能模型销毁完成,可以向云侧设备提供第二人工智能模型。
上述端侧设备向云侧设备提供第二人工智能模型的方式,与向云侧设备提供第一人工智能模型的方式类似,此处不再赘述。上述端侧设备向云侧设备提供第二待分析数据或第三待分析数据的方式,与向云侧设备提供第一待分析数据的方式类似,此处不再赘述。
在一种可能的实施方式中,端侧设备在使用完计算资源完成推理后,可以向云侧设备申请释放计算资源。比如,端侧设备完成计算资源的使用后,向云侧设备发送资源释放请求,云侧设备接收到端侧设备发送的资源释放请求后,释放计算资源加载的人工智能模型,并释放所述计算资源。云侧设备可以向端侧设备资源释放响应,资源释放响应用于指示成功释放所述计算资源。
作为一种示例,云侧设备可以根据计算资源的申请时间、以及释放时间,记录本次使用计算资源的话单,可以用于对用户使用云算力服务的计费。
在一种可能的实施方式中,端侧设备在执行S401向云侧设备发送资源申请请求之前,先待分析数据的推理运算是否需要转移到云侧设备执行。
一种场景中,端侧设备根据配置信息确定人工智能处理的任务转移到云侧设备执行。从而执行S401-S408。
另一种场景中,端侧设备根据配置信息确定人工智能处理的任务由端侧设备执行。在该场景下无需再执行S401-S408。端侧设备可以通过内部的加速器加载人工智能模型,然后通过加载的人工智能模型对待分析数据进行推理得到推理结果。
又一种场景中,端侧设备根据配置信息确定人工智能处理的任务中一部分任务由云 侧设备执行,另一部分任务由端侧设备执行。
端侧设备拆分的两部分任务,可以是串行模式执行、或者是并行模式执行、又或者是串并行混合模式执行。
串行模式,可以是将本地加速器执行的推理结果发送给到云侧设备,由云侧设备继续执行推理运算得到最终的推理结果。或者可以是将待分析数据发送给云侧设备,由云侧设备执行推理运算得到的推理结果返回端侧设备,从而端侧设备根据接收到的推理结果继续执行推理运算得到最终的推理结果。
并行模式,可以是本地加速器执行推理运算得到的推理结果,与云侧设备执行推理运算得到的推理结果进行合并。
串并混合模式,可以是先串行执行后并行执行,又或者可以是先并行执行后串行执行。
又一种场景中,端侧设备可以根据本地加速器的算力以及注册的第二计算资源的算力将人工智能处理的任务分成两部分,将一部分任务转到云侧设备执行,另一部分任务由端侧设备执行。
下面结合附图对本申请实施例中端侧设备以及云侧设备的部署进行说明。
参见图5所示,为一种可能的部署方式的示意图。端侧设备中包括应用模块、算力服务驱动模块。应用模块,可以是人工智能应用(APP)。应用模块,用于执行待分析数据的采集、预处理,还可以用于提供人工智能模型。
应用模块属于应用层。应用层之下为runtime层,runtime层之下可以包括驱动层。驱动层之下为硬件资源层,比如包括加速硬件,网卡等。运行时API,运行时环境、HAL位于runtime层。
算力服务驱动模块,可以用于提供虚拟算力加速硬件的驱动功能。算力服务驱动模块可以称为远程直接计算访问(remote direct computation access,RDCA)agent或者RDCA驱动(driver)功能。算力服务驱动模块,还用于调用云侧设备中的算力服务数据面,执行推理运算处理;还具备可靠上传功能,比如将应用模块的人工智能模型和待分析数据提供给云侧设备。算力服务驱动模块可以部署于runtime层。算力服务驱动模块具有虚拟运行时环境,可以理解为云侧运行时模块的代理。算力服务驱动模块具备虚拟驱动的功能,用于连接网卡与云端设备之间进行通信。
云侧设备包括算力服务代理模块,以及算力服务控制模块。
算力服务代理模块,还可以称为RDCA Daemon或者RDCA proxy功能。算力服务代理模块负责接收&鉴权算力服务驱动模块的资源申请请求,鉴权通过后按需从算力服务控制模块申请端侧设备所需的算力。并根据算力服务控制模块为端侧分配计算资源。算力服务代理模块还负责获取算力服务驱动模块提供的人工智能模型以及待分析数据,进而通过计算资源加载人工智能模型,并对待分析数据进行推理运算得到推理结果,返回给端侧设备。
算力服务控制模块可以称为RDCA Control功能或者RDCA Manager功能。算力服务控制模块负责云侧计算资源的管理与分配。算力服务控制模块还支持按算力/设备类型的计算资源的申请与回收。还可以支持计算资源的使用话单记录。
云侧设备还可以包括云端运行时模块,用于调用计算资源执行加载人工智能模型与 待分析数据得到推理结果。云端运行时模块也可以称为云运行时(Cloud Runtime)功能。
端侧设备也可以包括端侧运行时API,用于连接应用模块与算力服务驱动模块,端侧运行时API还可以用于判断人工智能处理由云侧执行或者由端侧执行。
在一种可能的实施方式中,云侧设备支持端侧设备的算力服务注册。端侧设备还可以部署注册模块。云侧设备可以部署算力服务模块。算力服务模块也可以称为RDCA service功能。算力服务模块,具备实现云算力服务的注册、开户、充值、资源购买等功能。还可以负责根据计算资源的使用话单记录,生成租户账单。注册模块可以是安装在端侧设备上插件或者APP。注册模块比如可以是RDCA客户端(client)APP。注册模块与算力服务模块之间可以通过云算力服务管控面API连接。注册模块负责提供用户使用云算力服务的注册、开户、充值、资源购买等操作界面,并根据用户的操作,调用云算力服务管控面API与实现对应功能。注册模块还可以根据用户的设置,设定算力服务驱动模块的工作状态与工作模式,比如欠费时提示用户、所申请的虚拟算力规格,如虚拟Nvidia TX2加速卡等。
云侧设备上还可以包括控制台模块,支持提供Web化的操作页面,支持用户的注册、开户、充值或资源购买等。控制台模块也可以称为RDCA控制台(Console)功能。
下面结合图5所提供的云侧设备以及端侧设备的部署架构对本申请实施例提供的方案进行详细说明。参见图6为本申请实施例提供的通信方法流程示意图。图6中,以端侧设备根据配置信息确定待分析数据的推理运算的任务由云侧设备执行为例。以端侧设备在算力服务注册过程中获取到云侧设备分配的资源ID为例。具体关于算力服务注册流程的相关说明,参见图4对应的实施例,此处不再赘述。
云端设备中部署多种具备不同硬件规格或算力规则的加速器,构成计算资源池,用于为不同的注册用户提供算力服务。加速器上电启动后,可以注册到算力服务控制模块,由算力服务控制模块负责计算资源池中各个加速器的维护、管理、申请与分配。多个加速器可以物理加速器,还可以是经过虚拟化处理得到的虚拟加速器。比如,采用CPU虚拟化核的类似方式对运算设备上部署硬件计算资源进行虚拟化处理。
S601,应用模块需要实现人工智能处理时,向端侧运行时API(即人工智能模型的加载接口)发送模型加载请求,模型加载请求中携带人工智能模型的名称和人工智能模型的文件路径。
人工智能模型,比如可以是HiAI,或者Tensorflow或者安卓神经网络(Android Neural Networks API,NNAPI)模型。TensorFlow TM是一个基于数据流编程(dataflow programming)的符号数学系统,被广泛应用于各类机器学习(machine learning)算法的编程实现。
HiAI是面向智能终端的AI能力开放平台。NNAPI是一个基于Android系统的用于可在移动设备上运行与机器学习相关的计算密集型操作的C语言API,NNAPI将为更高层次的可构建和训练神经网络的机器学习框架提供底层支持。
S602,端侧运行时API接收到模型加载请求后,确定人工智能的推理运算由云侧设备执行或者在本地执行。图6中以由云侧设备执行为例。具体的可以根据配置信息确定由云侧设别执行或者在本地执行。确定由云侧设备执行,将模型加载请求发送给算力服务驱动模块。
示例性地,端侧设备的用户可以根据注册模块或者端侧运行时环境(Runtime)提供的操作界面,设置人工智能处理由云侧设备执行或者在本地执行。
作为一种示例,注册模块不仅为用户提供向云侧设备注册算力服务的操作界面,还可以为用户提供用于配置是否由云侧设备执行人工智能处理的操作界面。
作为另一种示例,端侧运行时环境为用户提供用于配置是否由云侧设备执行人工智能处理的操作界面。
S603,算力服务驱动模块向云侧设备发送资源申请请求1,从而云侧设备的算力服务代理模块(RDCA Daemon或者RDCA proxy功能)接收来自端侧设备的资源申请请求1。资源申请请求1用于请求实现人工智能功能所需的计算资源信息,即算力需求。
示例性地,资源申请请求1携带硬件规格或算力规格。资源申请请求1中还可以携带代理(agent)ID。代理ID可以包括资源ID,代理ID还可以包括用户ID、端侧设备的ID中至少一项。用户ID以及端侧设备的ID可以用于后续计算资源的使用计费。
S604,算力服务代理模块接收到资源申请请求1后,向算力服务控制模块发送资源申请响应2。资源申请响应2中携带Agent ID。
S605,算力服务控制模块接收到资源申请响应2后,根据资源ID对应的计算资源信息(比如算力规格或硬件规格)为端侧设备分配计算资源。算力服务控制模块可以将计算资源的ID发送给算力服务代理模块,比如可以将计算资源的ID携带资源申请响应2中发送给算力服务代理模块。从而算力服务代理模块可以维护代理ID与计算资源的ID之间的对应关系。计算资源的ID包括计算资源的实例ID,还可以包括硬件资源的ID或硬件的通信IP地址等中的至少一项。比如,硬件资源可以是板卡。
S606,算力服务代理模块向算力服务驱动模块发送资源申请响应1,算力服务响应1用于指示算力资源申请成功。算力服务响应中可以携带agent ID。
S607,算力服务驱动模块向算力服务代理模块提供用于实现人工智能处理所需的第一人工智能模型。
一种可能的方式中,算力服务驱动模块可以直接将第一人工智能模型发送给算力服务代理模块。作为一种示例,算力服务代理模块可以将第一人工智能模型携带在资源申请请求中发送给算力服务代理模块,也可以单独发送给算力服务代理模块。图6中以该种方式为例进行说明。在该方式下,算力服务代理模块接收到算力服务控制模块发送的算力服务响应2后,可以直接执行S608,即调用云侧运行时模块加载所述第一人工智能模型。
另一种可能的方式中,算力服务驱动模块可以将第一人工智能模型的下载地址发送给算力服务代理模块,从而算力服务代理模块根据第一人工智能模型的下载地址获取第一人工智能模型。
算力服务驱动模块可以将第一人工智能模型上传到网络的服务器上,将服务器的URL发送给算力服务代理模块。从而算力服务代理模块根据URL下载第一人工智能模型。算力服务驱动模块上传第一人工智能模型到网络的服务器或者云侧设备从网络的服务下载第一人工智能模型,均可以采用秒传技术,比如MD5。
S608,算力服务代理模块根据预先存储的代理ID与计算资源的ID之间的对应关系,调用云(cloud)Runtime模块加载所述第一人工智能模型。
S609,算力服务代理模块加载成功后,向算力服务驱动模块发送模型加载成功的指 示。
S610,算力服务驱动模块通过端侧运行时API将模型加载成功的指示发送给应用模块。
S611,应用模块将待分析数据通过端侧运行时API发送给算力服务驱动模块。
示例性地,应用模块将待分析数据发送给算力服务驱动模块时,可以直接发送待分析数据,或者待分析数据的存储路径发送给算力服务驱动模块,从而算力服务驱动模块根据待分析数据的存储路径获得待分析数据。
S612,算力服务驱动模块获得待分析数据后,可以将待分析数据提供给算力服务代理模块。
示例性地,算力服务驱动模块向算力服务代理模块提供待分析数据时,可以通过如下方式来实现:
一种可能的方式中,算力服务驱动模块可以直接将待分析数据发送给算力服务代理模块。图6中以该种方式为例进行说明。
另一种可能的方式中,算力服务驱动模块可以将待分析数据的下载地址发送给算力服务代理模块,从而算力服务代理模块根据待分析数据的下载地址获取待分析数据。
比如,待分析数据的下载地址可以是URL。算力服务驱动模块可以将待分析数据上传到网络的服务器上,并将服务器的URL发送给算力服务代理模块。从而算力服务代理模块根据待分析数据的URL下载待分析数据。算力服务驱动模块上传待分析数据到网络的服务器或者算力服务代理模块从网络的服务下载待分析数据,均可以采用秒传技术,比如MD5。
S613,算力服务代理模块获得待分析数据后,调用云侧runtime模块,执行第一人工智能模型的运行。具体的,算力服务代理模块向云侧运行时模块发送模型运行请求,模型运行请求携带Agent ID、待分析数据、计算资源的ID。
S614,云侧运行时模块接收到模型运行请求后,调用计算资源的ID对应的计算资源(即硬件资源)运行第一人工智能模型对待分析数据进行推理得到推理结果。
S615,云侧运行时模块将推理结果发送给算力服务代理模块。
S616,算力服务代理模型将推理结果发送给算力服务驱动模块。
S617,算力服务驱动模块通过端侧运行时API将推理结果发送给应用模块。
S618,应用模块确定不需要执行人工智能处理时,可以申请进行人工智能模型销毁。比如,应用模块可以调用模型销毁接口向算力服务驱动模块发送资源释放请求。
S619,算力服务驱动模块将资源释放请求发送给算力服务代理模块。资源释放请求可以携带人工智能模型的名称以及Agent ID。
S620,算力服务代理模块向云侧运行时模块申请释放第一人工智能模型。具体的,算力服务代理模块向云侧运行时模块发送模型释放请求,模型释放请求中携带人工智能模型的名称。
S621,云侧运行时模块释放人工智能模型成功后,可以向算力服务代理模块发送模型成功释放指示。
S622,算力服务代理模块根据Agent ID通知算力服务控制模块释放计算资源。
S623,算力服务控制模块完成释放计算资源后,向算力服务代理模块发送资源成功释放指示。
S624,算力服务代理模块确定完成资源释放后,向算力服务驱动模块发送资源释放响应,资源释放响应用于指示资源释放成功。
S625,算力服务驱动模块向应用模型转发所述资源释放响应。
作为一种示例,算力服务控制模块可以根据计算资源的申请时间、以及释放时间,记录本次使用计算资源的话单,可以用于后续对用户使用云算力服务的计费。
参见图7所示,为另一种可能的部署方式的示意图。图7所示的部署方式与图5所示的部署方式的区别在于:图7中端侧设备中部署加速器,即端侧设备中包括用于实现加速的硬件资源。端侧设备中还包括加载人工智能模型的端侧运行时环境。在图7所示的部署方式下,人工智能模型处理的任务可以结合实际情况来确定由云侧设备执行,或者由端侧设备执行,或者由云侧设备与端侧设备协同执行。
下面对由云侧设备与端侧设备协同执行的情况进行详细说明。
端侧设备可以根据配置信息确定待分析数据的人工智能处理的任务中一部分任务由云侧设备执行,另一部分任务由端侧设备执行。端侧设备可以根据本地加速器的算力以及注册的第二计算资源的算力将人工智能处理的任务分成两部分,将一部分任务转到云侧设备执行,另一部分任务由端侧设备执行。在情况下,端侧设备中还部署拆分模块,用于将人工智能处理的任务进行拆分处理,比如将使用的人工智能模型进行拆分处理,以及将分析的数据进行拆分处理。
拆分模块对人工智能模型进行拆分处理时,可以通过如下任一种示例提供的方式进行说明:
一种可能的示例中,应用模块触发的模型加载请求中携带拆分指示,拆分指示用于指示拆分人工智能模型的拆分规则。从而拆分模块接收到拆分指示后,对人工智能模型进行拆分处理。
另一种可能的示例中,可以在拆分模块配置不同的人工智能模型分别对应的拆分规则。比如不同的人工智能模型名称与拆分规则存在一一对应关系。模型加载请求中携带人工智能模型的名称,拆分模块可以根据该人工智能模型的名称对应的拆分规则对人工智能模型进行拆分处理。另外,拆分模块还可以配置通用的拆分规则,针对无法匹配对应关系的人工智能模型,可以采用通用的拆分规则执行拆分处理。
又一种示例中,拆分模块根据本地加速器的算力以及申请的计算资源的算力拆分人工智能模型。
在示例下,拆分模块可以在资源申请之前,执行拆分,则根据本地加速器的算力以及注册的第二计算资源的算力执行拆分处理。拆分模块也可以在进行资源申请之后,执行拆分,则根据本地加速器的算力以及申请的第二计算资源的算力执行拆分处理。
拆分模块拆分的两部分任务,可以是串行模式执行、或者是并行模式执行、又或者是串并行混合模式执行。
串行模式,可以是将本地加速器执行的推理结果发送给到云侧设备,由云侧设备继续执行推理运算得到最终的推理结果。或者可以是将待分析数据发送给云侧设备,由云侧设备执行推理运算得到的推理结果返回拆分模块,从而拆分模块根据接收到的推理结果继续执行推理运算得到最终的推理结果。
并行模式,可以是本地加速器执行推理运算得到的推理结果,与云侧设备执行推理 运算得到的推理结果进行合并。在该并行模式下,拆分模块可以将待分析数据拆分为两部分,比如待分析数据1和待分析数据2。拆分模块可以将人工智能模型拆分为两部分,云侧模型内容以及端侧模型内容。
串并混合模式,可以是先串行执行后并行执行,又或者可以是先并行执行后串行执行。在该方式根据串并行混合模式对应的拆分规则进行拆分处理。拆分模块协调云侧设备和端侧设备的硬件资源执行推理,并协调云侧设备和端侧设备的硬件资源的推理结果。
参见图8为本申请实施例提供的通信方法流程示意图。图8仅示意性的描述并行模型的流程。
端侧设备可以通过注册模块或者控制台模块提供的Web页面,按照需求向云侧设备开通或者购买云算力服务。开通或者购买云算力服务的方式可以参见图4或图6对应的实施例中的相关说明,此处不再赘述。端侧设备从云侧设备购买或开通云算力服务获得的硬件资源以第二计算资源为例。
S801,参见S601,此处不再赘述。
S802,端侧运行时API接收到模型加载请求后,将模型加载请求发送给拆分模块。
模型加载请求中携带使用的人工智能模型,或者人工智能模型的存储路径。
S803,拆分模块解析人工智能模型,并对使用的人工智能模型进行拆分处理,比如拆分为人工智能模型1和人工智能模型2。人工智能模型1包括拆分后云侧模型内容,人工智能模型2包括拆分后端侧模型内容。
S804,拆分模块将人工智能模型1发送给算力服务驱动模块。
S805,算力服务驱动模块向算力服务代理模块发送给资源申请请求1。资源申请请求1中携带人工智能模型1。另外,资源申请请求1和人工智能模型1可以分开发送给算力服务代理模块。
S806-S807,参见S604-S605,此处不再赘述。
S808,算力服务代理模块调用云侧运行时模块加载人工智能模型1。
S809,算力服务代理模块加载成功后,向算力驱动模块发送给资源申请响应1。资源申请响应1中携带云侧模型加载成功的指示。
S810,参见S610,此处不再赘述。
S811,拆分模块拆解人工智能模型后,调用端侧runtime加载人工智能模型2。
S812,在加载人工智能模型2完成后,通过端侧运行时API将端侧模型加载成功指示以及云侧模型加载成功指示合并为模型加载成功指示后发送给应用模块。
本申请实施例中不限定S804-S810,与S812之间的先后执行顺序。
S813,应用模块将待分析数据发送给拆分模块。
S814,拆分模块将待分析数据拆分为待分析数据1和待分析数据2。
S815,拆分模块将待分析数据1发送给算力服务驱动模块。
S816-S820,参见S612-S616,此处不再赘述。
S821,算力服务驱动模块将推理结果1发送给拆分模块。
S822,拆分模块将待分析数据2发送给端侧runtime。
S823,端侧runtime通过加速器运行加载的人工智能模型2对待分析数据2进行推理得到推理结果2。
S824,端侧runtime将推理结果2发送给拆分模块。
S825,拆分模块对推理结果1和推理结果2进行融合处理得到推理结果3。
S826,拆分模块将推理结果3发送给应用模块。
下面对人工智能模型的加载由云侧执行的方案进行详细说明。参见图9所示,方法可以包括:
S901,端侧设备向云侧设备发送资源申请请求,从而云侧设备接收来自端侧设备的资源申请请求。资源申请请求用于请求实现人工智能功能所需的计算资源。
关于计算资源信息的相关说明参见图4对应的实施例,此处不再赘述。
S902,云侧设备根据资源申请请求为端侧设备分配计算资源。
示例性地,云侧设备可以针对自身所部署的加速器进行虚拟化,比如可以采用虚拟化核的方式。云侧设备根据计算资源信息为端侧设备分配计算资源时,可以根据算力规格为端侧设备分配一个或者多个核。不同的核对应不同的算力实例。
云侧设备根据资源申请请求为端侧设备分配计算资源的方式可以参见图4对应的实施例,此处不再赘述。
S903,云侧设备向端侧设备发送资源申请响应,所述资源申请响应用于指示所述云侧设备成功为所述端侧设备分配计算资源。
S904,端侧设备在运行用于实现人工智能功能的第一人工智能模型对第一待分析数据执行推理时产生第一计算指令以及第一计算数据。
具体的,端侧设备可以先加载第一人工智能模型,并获得第一待分析数据。然后运行第一人工智能模型对第一待分析数据执行推理,产生第一计算指令以及第一计算数据。
需要说明的是,端侧设备在运行第一人工智能模型对第一待分析数据执行推理可以产生一个或者多个计算指令和一个或多个计算数据。比如产生一个计算指令,多个计算数据,该一个计算指令用于执行多个计算数据。再比如,产生K个计算指令和K个计算数据,计算指令与一个计算数据一一对应。再比如,产生K个计算指令和M个计算数据,计算指令与一个计算数据不一一对应。另外,本申请实施例中多个计算指令和多个计算数据可以一次产生,还可以分多次产生,分不同次发送给云侧设备。本申请实施例中以向云侧设备发送第一计算指令和第一计算数据为例。第一计算指令是端侧设备运行第一人工智能模型对第一待分析数据执行推理产生的任一计算指令,第一计算数据是第一计算指令对应的计算数据。
S905,端侧设备将第一计算指令以及第一计算数据发送给云侧设备。
S906,云侧设备通过计算资源执行所述第一计算指令对第一计算数据进行计算得到第一计算结果。
S907,云侧设备将所述第一计算结果发送给端侧设备。从而端侧设备从云侧设备接收第一计算结果。
示例性地,除了S905所示的方式外,端侧设备也可以通过如下方式将第一计算指令以及第一计算数据提供给云侧设备:
端侧设备可以将用于存储第一计算指令和第一计算数据的网络服务器的下载地址发送给云侧设备,从而云侧设备根据下载地址获取第一计算指令以及第一计算数据。
比如,端侧设备可以将第一计算指令和第一计算数据上传到网络的服务器上,将服务器的URL发送给云侧设备。从而云侧设备根据URL下载第一计算指令和第一计算数据。 端侧设备上传第一计算指令和第一计算数据到网络的服务器或者云侧设备从网络的服务下载第一计算指令和第一计算数据,均可以采用秒传技术,比如MD5。
在一种可能的实施方式中,端侧设备向云侧设备发送资源申请请求之前,端侧设备可以向云侧设备注册或开户,按需从云侧设备而获得云侧算力服务。具体的注册或开户的方式可以参见图4对应的实施例的描述,此处不再赘述。
在一种可能的场景中,端侧设备可以按需求继续使用云侧设备的计算资源实现人工智能处理。比如,端侧设备获得第一计算结果,若端侧设备上存在第二待分析数据,端侧设备可以根据加载的第一人工智能模型(或者采用其它的人工智能模型)以及第二待分析数据获得第二计算指令以及第二计算数据;从而云侧设备通过计算资源继续执行第二计算指令对第二计算数据进行计算得到第二计算结果,并将第二计算结果发送给端侧设备。从而端侧设备接收云侧设备发送的第二计算结果。需要说明的是,端侧设备在运行第一人工智能模型对第二待分析数据执行推理也可以产生一个或者多个计算指令和一个或多个计算数据。本申请实施例中以向云侧设备发送第二计算指令和第二计算数据为例。第二计算指令是端侧设备运行第一人工智能模型对第二待分析数据执行推理产生的任一计算指令,第二计算数据是第二计算指令对应的计算数据。
在一种可能的实施方式中,端侧设备在使用完计算资源完成计算后,可以向云侧设备申请释放计算资源。比如,端侧设备完成计算资源的使用后,向云侧设备发送资源释放请求,云侧设备接收到端侧设备发送的资源释放请求后,并释放所述计算资源。云侧设备可以向端侧设备资源释放响应,资源释放响应用于指示成功释放所述计算资源。
作为一种示例,云侧设备可以根据计算资源的申请时间、以及释放时间,记录本次使用计算资源的话单,可以用于对用户使用云算力服务的计费。
在一种可能的实施方式中,端侧设备在执行S901向云侧设备发送资源申请请求之前,端侧设备确定人工智能处理的部分任务或者全部任务是否需要由云侧设备执行。
一种场景中,端侧设备根据配置信息确定待分析数据的推理运算的任务转移到云侧设备执行。从而执行S901-S907。
另一种场景中,端侧设备根据配置信息确定人工智能处理的任务由端侧设备执行。在该场景下无需再执行S901-S907。端侧设备可以通过内部的加速器加载人工智能模型,然后通过加载的人工智能模型对待分析数据进行推理得到推理结果。
又一种场景中,端侧设备根据配置信息确定待分析数据的推理运算的任务中一部分任务由云侧设备执行,另一部分任务由端侧设备执行。
示例性地,端侧设备确定人工智能处理的部分任务由云侧设备处理时,端侧设备在运行第一人工智能模型对第一待分析数据执行推理时,在产生由云侧执行的第一计算指令和第一计算数据的基础上,还产生由端侧执行的第三计算指令以及第三计算数据;端侧设备执行第三计算执令对第三计算数据进行计算得到第三计算结果;端侧设备接收到接收云侧设备发送的第一计算结果后,端侧设备将第一计算结果和第三计算结果进行融合处理得到第一人工智能模型对第一待分析数据执行推理的推理结果。
具体可以参见图4对应的实施例的相关说明,此处不再赘述。
下面结合附图对本申请实施例中端侧设备以及云侧设备的部署进行说明。
参见图10所示,为一种可能的部署方式的示意图。端侧设备中包括应用模块、运行时(runtime)模块(包括运行时API,运行时环境、HAL),算力服务驱动模块。应用模块,可以是人工智能应用(APP)。应用模块,用于执行待分析数据的采集、预处理,还可以用于提供人工智能模型。
应用模块属于应用层。应用层之下为runtime层,runtime之下可以包括驱动层。驱动层之下为硬件资源层。Runtime模块位于runtime层。
算力服务驱动模块,可以用于提供虚拟算力加速硬件的驱动功能。算力服务驱动模块可以称为远程直接计算访问(remote direct computation access,RDCA)agent或者RDCA驱动(driver)功能。算力服务驱动模块,还用于调用云侧设备中的算力服务数据面,执行推理运算处理;还具备可靠上传功能,比如将runtime层产生的计算指令和计算数据(比如第一计算指令和第一计算数据,或者第二计算指令和第二计算数据)提供给云侧设备。算力服务驱动模块可以部署于端侧设备的runtime层之下,比如部署于驱动层。
云侧设备包括算力服务代理模块,以及算力服务控制模块。
算力服务代理模块,还可以称为RDCA Daemon或者RDCA proxy功能。算力服务代理模块负责接收&鉴权算力服务驱动模块的资源申请请求,鉴权通过后按需从算力服务控制模块申请端侧设备所需的算力。并根据算力服务控制模块为端侧分配计算资源。算力服务代理模块还负责获取算力服务驱动模块提供的人工智能模型以及待分析数据,进而通过计算资源加载人工智能模型,并对待分析数据进行推理运算得到推理结果,返回给端侧设备。
算力服务控制模块可以称为RDCA Control功能或者RDCA Manager功能。算力服务控制模块负责云侧计算资源的管理与分配。算力服务控制模块还支持按算力/设备类型的计算资源的申请与回收。还可以支持计算资源的使用话单记录。
云侧设备还可以包括云端运行时模块,用于调用计算资源执行加载人工智能模型与待分析数据得到推理结果。云端运行时模块也可以称为云运行时(Cloud Runtime)功能。
端侧设备也可以包括端侧运行时API,用于连接应用模块与算力服务驱动模块,端侧运行时API还可以用于判断人工智能处理由云侧执行或者由端侧执行。
在一种可能的实施方式中,云侧设备支持端侧设备的算力服务注册。端侧设备还可以部署注册模块。云侧设备可以部署算力服务模块。算力服务模块也可以称为RDCA service功能。算力服务模块,具备实现云算力服务的注册、开户、充值、资源购买等功能。还可以负责根据计算资源的使用话单记录,生成租户账单。注册模块可以是安装在端侧设备上插件或者APP。注册模块比如可以是RDCA客户端(client)APP。注册模块与算力服务模块之间可以通过云算力服务管控面API连接。注册模块负责提供用户使用云算力服务的注册、开户、充值、资源购买等操作界面,并根据用户的操作,调用云算力服务管控面API与实现对应功能。注册模块还可以根据用户的设置,设定算力服务驱动模块的工作状态与工作模式,比如欠费时提示用户、所申请的虚拟算力规格,如虚拟Nvidia TX2加速卡等。
云侧设备上还可以包括控制台模块,支持提供Web化的操作页面,支持用户的注册、开户、充值或资源购买等。控制台模块也可以称为RDCA控制台(Console)功能。
下面结合图10所提供的云侧设备以及端侧设备的部署架构对本申请实施例提供的方案进行详细说明。参见图11为本申请实施例提供的通信方法流程示意图。图11中,以端侧设备根据配置信息确定待分析数据的推理运算的任务由云侧设备执行为例。
1101,注册模块向算力服务驱动模块发送资源申请请求1,资源申请请求1用于请求实现人工智能功能所需的计算资源,即算力需求。
示例性地,资源申请请求1携带代理(agent)ID。代理ID可以包括资源ID,还包括用户ID、端侧设备的ID中至少一项。用户ID以及端侧设备的ID可以用于后续计算资源的使用计费。
S1102,算力服务驱动模块接收到资源申请请求1后,向云侧设备发送资源申请请求1,从而云侧设备的算力服务代理模块(RDCA Daemon或者RDCA proxy功能)接收来自端侧设备的资源申请请求1。
S1103,算力服务代理模块接收到资源申请请求1后,向算力服务控制模块发送资源申请请求2。资源申请请求2中携带Agent ID。
S1104,算力服务控制模块接收到资源申请请求2后,根据资源ID对应的计算资源信息为端侧设备分配计算资源。算力服务控制模块可以将计算资源的ID发送给算力服务代理模块,比如可以将计算资源的ID携带资源申请响应2中发送给算力服务代理模块。从而算力服务代理模块可以维护代理ID与计算资源的ID之间的对应关系。计算资源的ID包括计算资源的实例ID,还可以包括硬件资源的ID或硬件的通信IP地址等中的至少一项。比如,硬件资源可以是板卡。
S1105,算力服务代理模块向算力服务驱动模块发送资源申请响应1,算力服务响应1用于指示算力资源申请成功。算力服务响应中可以携带agent ID。
S1106,算力服务驱动模块将资源申请响应1转发给注册模块。
S1107,应用模块需要实现人工智能处理时,向端侧runtime模块(即人工智能模型的加载接口)发送模型加载请求,模型加载请求中携带人工智能模型的名称和人工智能模型的文件路径。
S1108,runtime模块接收到模型加载请求后,加载所述人工智能模型,并向应用模块发送模型加载响应。
S1109,应用模块向runtime模块发送运行模型指令,运行模型指令携带模型待分析数据或者待分析数据的存储路径。
S1110,runtime模块运行所述人工智能模型执行所述待分析数据的推理,获得第一计算数据以及第一计算指令,并将第一计算数据和第一计算指令发送给算力服务驱动模块。
S1111,算力服务驱动模块获得第一计算数据以及第一计算指令后,可以将第一数据以及第一计算指令提供给算力服务代理模块。
S1112,算力服务代理模块获得第一计算数据以及第一计算指令后,将第一计算数据以及第一计算指令发送给云侧运行时模块。具体的,算力服务代理模块向云侧运行时模块发送计算指令。
S1113,云侧运行时模块接收到模型运行请求后,调用计算资源的ID对应的计算资源(即硬件资源)执行第一计算指令对第一计算数据进行计算得到第一计算结果。
S1114,云侧运行时模块将第一计算结果发送给算力服务代理模块。
S1115,算力服务代理模型将第一计算结果发送给算力服务驱动模块。
S1116,算力服务驱动模块通过端侧runtime模块将第一计算结果发送给应用模块。
S1117,注册模块确定不需要执行人工智能处理时,可以申请进行资源释放。比如,注册模块可以向算力服务驱动模块发送资源释放请求。
S1118,算力服务驱动模块将资源释放请求发送给算力服务代理模块。资源释放请求可以携带算力实例的ID以及Agent ID。
S1119,算力服务代理模块根据Agent ID通知算力服务控制模块释放计算资源。
S1120,算力服务控制模块完成释放计算资源后,向算力服务代理模块发送资源成功释放指示。
S1121,算力服务代理模块确定完成资源释放后,向算力服务驱动模块发送资源释放响应,资源释放响应用于指示资源释放成功。算力服务驱动模块向注册模块转发所述资源释放响应。
作为一种示例,算力服务控制模块可以根据计算资源的申请时间、以及释放时间,记录本次使用计算资源的话单,可以用于后续对用户使用云算力服务的计费。
参见图12所示,为另一种可能的部署方式的示意图。图12所示的部署方式与图10所示的部署方式的区别在于:图12中端侧设备中部署加速器,即端侧设备中包括用于实现加速的硬件资源。在图12所示的部署方式下,人工智能模型处理的任务可以结合实际情况来确定由云侧设备执行,或者由端侧设备执行,或者由云侧设备与端侧设备协同执行。
下面对由云侧设备与端侧设备协同执行的情况进行详细说明。
端侧设备可以根据配置信息确定人工智能处理的任务中一部分任务由云侧设备执行,另一部分任务由端侧设备执行。端侧设备可以根据本地加速器的算力以及注册的第二计算资源的算力将人工智能处理的任务分成两部分,将一部分任务转到云侧设备执行,另一部分任务由端侧设备执行。拆分任务的功能可以由端侧runtime模块来实现,用于将人工智能处理的任务进行拆分处理,比如将运行人工智能模型对待分析数据进行推理产生的计算指令和计算数据分别进行拆分处理。
端侧runtime模块对人工智能模型进行拆分处理时,可以通过如下任一种示例提供的方式进行说明:
一种可能的示例中,端侧runtime模块触发的运行模型指令中携带拆分指示,拆分指示用于指示拆分人工智能模型的拆分规则。从而端侧runtime模块接收到拆分指示后,根据拆分指示,将运行人工智能模型对待分析数据进行推理产生的计算指令和计算数据,分别进行拆分处理。
另一种可能的示例中,可以在端侧runtime模块配置不同的人工智能模型分别对应的拆分规则。比如不同的人工智能模型名称与拆分规则存在一一对应关系。运行模型指令中携带人工智能模型的名称,拆分模块可以根据该人工智能模型的名称对应的拆分规则对计算指令和计算数据进行拆分处理。另外,端侧runtime模块还可以配置通用的拆分规则,针对无法匹配对应关系的人工智能模型,可以采用通用的拆分规则执行拆分处理。
又一种示例中,端侧runtime模块根据本地加速器的算力以及申请的计算资源的算力拆分人工智能模型。
在示例下,端侧runtime模块可以在资源申请之前,执行拆分,则根据本地加速器的算力以及注册的第二计算资源的算力执行拆分处理。拆分模块也可以在进行资源申请之 后,执行拆分,则根据本地加速器的算力以及申请的第二计算资源的算力执行拆分处理。
拆分模块拆分的两部分任务,可以是串行模式执行、或者是并行模式执行、又或者是串并行混合模式执行。
图13给出了另一种装置的结构示意图。
一种可能场景中,所述装置1300可以是端侧设备,可以是支持端侧设备实现上述方法的芯片、芯片系统、或处理器等。该装置可用于实现上述方法实施例中端侧设备执行的方法,具体可以参见上述方法实施例中的说明。所述装置具备实现本申请实施例描述的端侧设备的功能,比如,所述装置包括端侧设备执行本申请实施例描述的终端涉及步骤所对应的模块(比如图5、图7、图10和图11中端侧设备中的模块)或单元或手段(means),所述功能或单元或手段可以通过软件实现,或者通过硬件实现,也可以通过硬件执行相应的软件实现,还可以通过软件和硬件结合的方式实现。详细可进一步参考前述对应方法实施例中的相应描述。
所述装置1300可以包括一个或多个处理器1301,所述处理器1301也可以称为处理单元,可以实现一定的控制功能。所述处理器1301可以是通用处理器或者专用处理器等。例如可以是中央处理器。中央处理器可以用于对通信装置(如,基站、基带芯片,终端、终端芯片,DU或CU等)进行控制,执行软件程序,处理软件程序的数据。
在一种可选的设计中,处理器1301也可以存有指令和/或数据1303,所述指令和/或数据1303可以被所述处理器运行,使得所述装置1300执行上述方法实施例中描述的方法。
在另一种可选的设计中,处理器1301中可以包括用于实现接收和发送功能的收发单元。例如该收发单元可以是收发电路,或者是接口,或者是接口电路。用于实现接收和发送功能的收发电路、接口或接口电路可以是分开的,也可以集成在一起。上述收发电路、接口或接口电路可以用于代码/数据的读写,或者,上述收发电路、接口或接口电路可以用于信号的传输或传递。
在又一种可能的设计中,装置1300可以包括电路,所述电路可以实现前述方法实施例中发送或接收或者通信的功能。
可选的,所述装置1300中可以包括一个或多个存储器1302,其上可以存有指令1304,所述指令可在所述处理器上被运行,使得所述装置1300执行上述方法实施例中描述的方法。可选的,所述存储器中还可以存储有数据。可选的,处理器中也可以存储指令和/或数据。所述处理器和存储器可以单独设置,也可以集成在一起。例如,上述方法实施例中所描述的对应关系可以存储在存储器中,或者存储在处理器中。
可选的,所述装置1300还可以包括收发器1305和/或天线1306。所述处理器1301可以称为处理单元,对所述装置1300进行控制。所述收发器1305可以称为收发单元、收发机、收发电路、收发装置或收发模块等,用于实现收发功能。
可选的,本申请实施例中的装置1300可以用于执行本申请上述实施例描述的方法。
本申请中描述的处理器和收发器可实现在集成电路(integrated circuit,IC)、模拟IC、射频集成电路RFIC、混合信号IC、专用集成电路(application specific integrated circuit,ASIC)、印刷电路板(printed circuit board,PCB)、电子设备等上。该处理器和收发器也可以用各种IC工艺技术来制造,例如互补金属氧化物半导体(complementary metal oxide semiconductor,CMOS)、N型金属氧化物半导体(nMetal-oxide-semiconductor,NMOS)、P型金属氧化物半导体(positive channel metal oxide semiconductor,PMOS)、双极结型晶体 管(Bipolar Junction Transistor,BJT)、双极CMOS(BiCMOS)、硅锗(SiGe)、砷化镓(GaAs)等。
以上实施例描述中的装置可以是端侧设备,但本申请中描述的装置的范围并不限于此,而且装置的结构可以不受图13的限制。装置可以是独立的设备或者可以是较大设备的一部分。例如所述装置可以是:
(1)独立的集成电路IC,或芯片,或,芯片系统或子系统;
(2)具有一个或多个IC的集合,可选的,该IC集合也可以包括用于存储数据和/或指令的存储部件;
(3)ASIC,例如调制解调器(MSM);
(4)可嵌入在其他设备内的模块;
(5)接收机、终端、智能终端、蜂窝电话、无线设备、手持机、移动单元、车载设备、网络设备、云设备、人工智能设备、机器设备、家居设备、医疗设备、工业设备等等;
(6)其他等等。
另一种可能场景中,所述装置1300可以是应用于云侧设备。该装置1300可用于实现上述方法实施例中云侧设备执行的方法,具体可以参见上述方法实施例中的说明。所述装置具备实现本申请实施例描述的云侧设备的功能,比如,所述装置包括云侧设备执行本申请实施例描述的云侧设备涉及步骤所对应的模块或单元或手段(means),所述功能或单元或手段可以通过软件实现,或者通过硬件实现,也可以通过硬件执行相应的软件实现,还可以通过软件和硬件结合的方式实现。详细可进一步参考前述对应方法实施例中的相应描述。
所述装置1300可以包括一个或多个处理器1301,所述处理器1301也可以称为处理单元,可以实现一定的控制功能。所述处理器1301可以是通用处理器或者专用处理器等。
在一种可选的设计中,处理器1301也可以存有指令和/或数据1303,所述指令和/或数据1303可以被所述处理器运行,使得所述装置1300执行上述方法实施例中描述的方法。
在另一种可选的设计中,处理器1301中可以包括用于实现接收和发送功能的收发单元。例如该收发单元可以是收发电路,或者是接口,或者是接口电路。用于实现接收和发送功能的收发电路、接口或接口电路可以是分开的,也可以集成在一起。上述收发电路、接口或接口电路可以用于代码/数据的读写,或者,上述收发电路、接口或接口电路可以用于信号的传输或传递。
在又一种可能的设计中,装置1300可以包括电路,所述电路可以实现前述方法实施例中发送或接收或者通信的功能。
可选的,所述装置1300中可以包括一个或多个存储器1302,其上可以存有指令1304,所述指令可在所述处理器上被运行,使得所述装置1300执行上述方法实施例中描述的方法。可选的,所述存储器中还可以存储有数据。可选的,处理器中也可以存储指令和/或数据。所述处理器和存储器可以单独设置,也可以集成在一起。例如,上述方法实施例中所描述的对应关系可以存储在存储器中,或者存储在处理器中。
可选的,所述装置1300还可以包括收发器1305。所述处理器1301可以称为处理单元,对所述装置1300进行控制。所述收发器1305可以称为收发单元、收发机、收发电路、收发装置或收发模块等,用于实现收发功能。
可选的,本申请实施例中的装置1300可以用于执行本申请上述实施例描述的云侧设备执行的方法。
可以理解的是,本申请实施例中的一些可选的特征,在某些场景下,可以不依赖于其他特征,比如其当前所基于的方案,而独立实施,解决相应的技术问题,达到相应的效果,也可以在某些场景下,依据需求与其他特征进行结合。相应的,本申请实施例中给出的装置也可以相应的实现这些特征或功能,在此不予赘述。
本领域技术人员还可以理解到本申请实施例列出的各种说明性逻辑块(illustrative logical block)和步骤(step)可以通过电子硬件、电脑软件,或两者的结合进行实现。这样的功能是通过硬件还是软件来实现取决于特定的应用和整个系统的设计要求。本领域技术人员对于相应的应用,可以使用各种方法实现所述的功能,但这种实现不应被理解为超出本申请实施例保护的范围。
可以理解,本申请实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
本申请所描述的方案可通过各种方式来实现。例如,这些技术可以用硬件、软件或者硬件结合的方式来实现。对于硬件实现,用于在通信装置(例如,基站,终端、网络实体、或芯片)处执行这些技术的处理单元,可以实现在一个或多个通用处理器、DSP、数字信号处理器件、ASIC、可编程逻辑器件、FPGA、或其它可编程逻辑装置,离散门或晶体管逻辑,离散硬件部件,或上述任何组合中。通用处理器可以为微处理器,可选地,该通用处理器也可以为任何传统的处理器、控制器、微控制器或状态机。处理器也可以通过计算装置的组合来实现,例如数字信号处理器和微处理器,多个微处理器,一个或多个微处理器联合一个数字信号处理器核,或任何其它类似的配置来实现。
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本申请还提供了一种计算机可读介质,其上存储有计算机程序,该计算机程序被计算机执行时实现上述任一方法实施例的功能。
本申请还提供了一种计算机程序产品,该计算机程序产品被计算机执行时实现上述任一方法实施例的功能。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。
可以理解,说明书通篇中提到的“实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各个实施例未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。可以理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
可以理解,在本申请中,“当…时”、“若”以及“如果”均指在某种客观情况下装置会做出相应的处理,并非是限定时间,且也不要求装置实现时一定要有判断的动作,也不意味着存在其它限定。
本申请中的“同时”可以理解为在相同的时间点,也可以理解为在一段时间段内,还可以理解为在同一个周期内。
本领域技术人员可以理解:本申请中涉及的第一、第二等各种数字编号仅为描述方便进行的区分,并不用来限制本申请实施例的范围。本申请中的编号(也可被称为索引)的具体取值、数量的具体取值、以及位置仅作为示意的目的,并不是唯一的表示形式,也并不用来限制本申请实施例的范围。本申请中涉及的第一个、第二个等各种数字编号也仅为描述方便进行的区分,并不用来限制本申请实施例的范围。
本申请中对于使用单数表示的元素旨在用于表示“一个或多个”,而并非表示“一个且仅一个”,除非有特别说明。本申请中,在没有特别说明的情况下,“至少一个”旨在用于表示“一个或者多个”,“多个”旨在用于表示“两个或两个以上”。
另外,本文中术语“系统”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A可以是单数或者复数,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。
本文中术语“……中的至少一个”或“……中的至少一种”,表示所列出的各项的全部或任意组合,例如,“A、B和C中的至少一种”,可以表示:单独存在A,单独存在B,单 独存在C,同时存在A和B,同时存在B和C,同时存在A、B和C这六种情况,其中A可以是单数或者复数,B可以是单数或者复数,C可以是单数或者复数。
可以理解,在本申请各实施例中,“与A相应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
本申请中的预定义可以理解为定义、预先定义、存储、预存储、预协商、预配置、固化、或预烧制。
本领域普通技术人员可以理解,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本领域普通技术人员可以理解,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
可以理解,本申请中描述的系统、装置和方法也可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请中各个实施例之间相同或相似的部分可以互相参考。在本申请中各个实施例、以及各实施例中的各个实施方式/实施方法/实现方法中,如果没有特殊说明以及逻辑冲突,不同的实施例之间、以及各实施例中的各个实施方式/实施方法/实现方法之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例、以及各实施例中的各个实施方式/实施方法/实现方法中的技术特征根据其内在的逻辑关系可以组合形成新的实施例、实施方式、实施方法、或实现方法。以上所述的本申请实施方式并不构成对本申请保护范围的 限定。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。

Claims (33)

  1. 一种通信方法,其特征在于,包括:
    端侧设备向云侧设备发送资源申请请求,并向所述云侧设备提供用于实现人工智能处理所需的第一人工智能模型;所述资源申请请求用于请求实现人工智能功能所需的计算资源;
    所述端侧设备接收所述云侧设备发送的加载完成消息,所述加载完成消息用于指示所述云侧设备基于所述资源申请请求为所述端侧设备分配的计算资源已成功加载所述第一人工智能模型;
    所述端侧设备向所述云侧设备提供第一待分析数据,并接收所述云侧设备发送的第一待分析数据的第一推理结果;
    其中,所述第一推理结果是基于所述第一待分析数据运行所述第一人工智能模型得到的。
  2. 如权利要求1所述的方法,其特征在于,还包括:
    所述端侧设备向云侧设备发送算力服务注册请求,所述算力服务注册请求用于请求所述云侧设备为所述端侧设备提供算力服务;
    所述端侧设备接收云侧设备发送的算力服务注册响应,所述算力服务注册响应用于指示所述端侧设备已成功请求到所述云侧设备的算力服务。
  3. 如权利要求1或2所述的方法,其特征在于,端侧设备向所述云侧设备提供用于实现人工智能处理所需的第一人工智能模型,包括:
    所述端侧设备向所述云侧设备发送所述第一人工智能模型;或者,
    所述端侧设备将所述第一人工智能模型的下载地址发送给所述云侧设备。
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述端侧设备向所述云侧设备提供第一待分析数据,包括:
    所述端侧设备向所述云侧设备发送所述第一待分析数据;或者,
    所述端侧设备将所述第一人工智能模型的下载地址发送给所述云侧设备。
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述端侧设备接收所述云侧设备发送的第一待分析数据的第一推理结果后,还包括:
    端侧设备将第二待分析数据提供给云侧设备,并接收所述云侧设备发送的所述第二待分析数据的第二推理结果;
    其中,所述第二推理结果是基于所述第二待分析数据运行所述第一人工智能模型得到的。
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述端侧设备接收所述云侧设备发送的第一待分析数据的第一推理结果后,还包括:
    所述端侧设备向云侧设备提供第二人工智能模型,并向云侧设备提供第三待分析数据;
    所述端侧设备接收所述云侧设备发送的第三待分析数据的第三推理结果;
    其中,所述第三推理结果是所述计算资源基于所述第二待分析数据运行所述第一人工智能模型得到的。
  7. 如权利要求1-6任一项所述的方法,其特征在于,还包括:
    所述端侧设备完成计算资源的使用后,向云侧设备发送资源释放请求,所述资源释放请求用于请求释放所述计算资源;
    所述端侧设备接收所述云侧设备发送的资源释放响应,所述资源释放响应用于指示成功释放所述计算资源以及所述计算资源运行的人工智能模型。
  8. 如权利要求1-7任一项所述的方法,其特征在于,所述端侧设备向云侧设备发送资源申请请求之前,还包括:
    所述端侧设备确定所述人工智能处理的部分任务或者全部任务由所述云侧设备处理。
  9. 如权利要求8所述的方法,其特征在于,还包括:
    所述端侧设备确定所述人工智能处理的部分任务由所述云侧设备处理时,所述端侧设备将待使用的人工智能模型拆解为所述第一人工智能模型以及第三人工智能模型;
    所述端侧设备向所述云侧设备提供第一待分析数据之前,所述端侧设备加载所述第三人工智能模型,并在所述端侧设备接收所述云侧设备发送的加载完成消息时,将待分析数据拆分为所述第一待分析数据以及第四待分析数据;
    所述端侧设备将待分析数据拆分为所述第一待分析数据以及第四待分析数据后,运行加载的所述第三人工智能模型对所述第四待分析数据进行推理得到第四推理结果;
    所述端侧设备接收到第一推理结果后,对所述第一推理结果和所述第四推理结果进行融合处理。
  10. 一种通信方法,其特征在于,包括:
    云侧设备接收来自端侧设备的资源申请请求,并获取端侧设备提供的用于实现人工智能处理所需的第一人工智能模型,所述资源申请请求用于请求实现人工智能功能所需的计算资源;
    所述云侧设备根据所述资源申请请求为所述端侧设备分配计算资源;
    所述云侧设备通过所述计算资源成功加载所述第一人工智能模型后,向所述端侧设备发送加载完成消息,所述加载完成消息用于指示所述云侧设备上的计算资源已成功加载所述第一人工智能模型;
    所述云侧设备获取所述端侧设备提供的第一待分析数据,通过运行所述第一人工智能模型对所述第一待分析数据进行推理得到第一推理结果;并向所述端侧设备发送所述第一推理结果。
  11. 如权利要求10所述的方法,其特征在于,云侧设备接收来自端侧设备的资源申请请求之前,还包括:
    所述云侧设备接收所述端侧设备发送的算力服务注册请求,所述算力服务注册请求用于请求所述云侧设备为所述端侧设备的用户提供算力服务;
    所述云侧设备向所述端侧设备发送算力服务注册响应,所述算力服务注册响应用于指示所述端侧设备的用户已成功请求到所述云侧设备的算力服务。
  12. 如权利要求11所述的方法,其特征在于,所述算力服务注册请求中携带计算资源信息,所述计算资源信息用于表征所述端侧设备所申请的算力规格;所述算力服务注册响应携带所述云侧设备为所述端侧设备分配的资源ID,所述资源ID用于标识所述计算资源信息;
    所述资源申请请求中携带所述资源ID,所述云侧设备根据所述资源申请请求为所述端侧设备分配计算资源,包括:
    所述云端设备根据所述资源ID对应的计算资源信息为所述端侧设备分配所述计算资源。
  13. 如权利要求10或11所述的方法,其特征在于,所述资源申请请求中携带计算资源信息,所述计算资源信息用于表征所述端侧设备所申请的算力规格;
    所述云侧设备根据所述资源申请请求为所述端侧设备分配计算资源,包括:
    所述云端设备根据所述计算资源信息为所述端侧设备分配所述计算资源。
  14. 如权利要求10-13任一项所述的方法,其特征在于,所述云侧设备获取端侧设备提供的用于实现人工智能处理所需的第一人工智能模型,包括:
    所述云侧设备接收所述端侧设备发送所述第一人工智能模型;或者,
    所述云侧设备接收所述端侧设备发送的所述第一人工智能模型的下载地址,并根据所述第一人工智能模型的下载地址下载所述第一人工智能模型。
  15. 如权利要求10-14任一项所述的方法,其特征在于,所述云侧设备获取端侧设备提供的第一待分析数据,包括:
    所述云侧设备接收所述端侧设备发送所述第一待分析数据;或者,
    所述云侧设备接收所述端侧设备发送的所述第一待分析数据的下载地址,并根据所述第一待分析数据的下载地址下载所述待分析数据。
  16. 如权利要求10-15任一项所述的方法,其特征在于,所述云侧设备向所述端侧设备发送所述第一推理结果后,还包括:
    所述云侧设备获取所述端侧设备提供的第二待分析数据,通过运行所述第一人工智能模型对所述第二待分析数据进行推理得到第二推理结果;并向所述端侧设备发送所述第二推理结果。
  17. 如权利要求10-16任一项所述的方法,其特征在于,所述云侧设备向所述端侧设备发送所述第一推理结果后,还包括:
    所述云侧设备获取所述端侧设备提供的第二人工智能模型,获取所述端侧设备提供的第三待分析数据;
    所述云侧设备通过运行所述第二人工智能模型对所述第三待分析数据进行推理得到第三推理结果;并向所述端侧设备发送所述第三推理结果。
  18. 如权利要求10-17任一项所述的方法,其特征在于,还包括:
    所述云侧设备接收端侧设备发送的资源释放请求,所述资源释放请求用于请求释放所述计算资源;
    所述云侧设备释放所述计算资源,并释放所述计算资源运行的人工智能模型;
    所述云侧设备向所述端侧设备发送资源释放响应,所述资源释放响应用于指示成功释放所述计算资源以及所述计算资源运行的人工智能模型。
  19. 一种通信方法,其特征在于,包括:
    端侧设备向云侧设备发送资源申请请求,所述资源申请请求用于请求实现人工智能功能所需的计算资源;
    所述端侧设备接收所述云侧设备发送的资源申请响应,所述资源申请响应用于指示所述云侧设备成功为所述端侧设备分配计算资源;
    所述端侧设备在运行用于实现人工智能功能的第一人工智能模型对第一待分析数据执行推理时产生第一计算指令以及第一计算数据,并将所述第一计算指令以及第一计算 数据发送给所述云侧设备;
    所述端侧设备接收所述云侧设备发送的第一计算结果;
    其中,所述第一计算结果是所述计算资源执行第一计算指令对所述第一计算数据进行计算得到的计算结果。
  20. 如权利要求19所述的方法,其特征在于,还包括:
    所述端侧设备向云侧设备发送算力服务注册请求,所述算力服务注册请求用于请求所述云侧设备为所述端侧设备提供算力服务;
    所述端侧设备接收云侧设备发送的算力服务注册响应,所述算力服务注册响应用于指示所述端侧设备已成功请求到所述云侧设备的算力服务。
  21. 如权利要求19或20所述的方法,其特征在于,所述端侧设备接收所述云侧设备发送的第一计算结果后,还包括:
    所述端侧设备运行所述第一人工智能模型对第二待分析数据执行推理,获得第二计算指令以及第二计算数据;
    所述端侧设备接收所述云侧设备发送的第二计算结果;
    其中,所述第二计算结果是所述计算资源执行第二计算指令对所述第二计算数据进行计算得到的计算结果。
  22. 如权利要求19-21任一项所述的方法,其特征在于,还包括:
    所述端侧设备完成计算资源的使用后,向云侧设备发送资源释放请求,所述资源释放请求用于请求释放所述计算资源;
    所述端侧设备接收所述云侧设备发送的资源释放响应,所述资源释放响应用于指示成功释放所述计算资源。
  23. 如权利要求19-22任一项所述的方法,其特征在于,所述端侧设备向云侧设备发送资源申请请求之前,还包括:
    所述端侧设备确定所述人工智能处理的部分任务或者全部任务由所述云侧设备处理。
  24. 如权利要求23所述的方法,其特征在于,还包括:
    所述端侧设备确定所述人工智能处理的部分任务由所述云侧设备处理时,所述端侧设备在运行所述第一人工智能模型对第一待分析数据执行推理时,还产生第三计算指令以及第三计算数据;
    所述端侧设备执行所述第三计算执令对所述第三计算数据进行计算得到第三计算结果;
    所述端侧设备接收到所述接收所述云侧设备发送的第一计算结果后,所述端侧设备将所述第一计算结果和所述第三计算结果进行融合处理得到所述第一人工智能模型对所述第一待分析数据执行推理的推理结果。
  25. 一种通信方法,其特征在于,包括:
    云侧设备接收来自端侧设备的资源申请请求,并获取端侧设备提供的用于实现人工智能处理所需的第一人工智能模型,所述资源申请请求用于请求实现人工智能功能所需的计算资源;
    所述云侧设备根据所述资源申请请求为所述端侧设备分配计算资源;
    所述云侧设备向所述端侧设备发送资源申请响应,所述资源申请响应用于指示所述云侧设备成功为所述端侧设备分配计算资源;
    所述云侧设备接收所述端侧设备发送的第一计算指令以及第一计算数据;
    所述云侧设备通过所述计算资源执行第一计算指令对所述第一计算数据进行计算得到计算结果;
    所述云侧设备将所述计算结果发送给所述端侧设备。
  26. 如权利要求25所述的方法,其特征在于,云侧设备接收来自端侧设备的资源申请请求之前,还包括:
    所述云侧设备接收所述端侧设备发送的算力服务注册请求,所述算力服务注册请求用于请求所述云侧设备为所述端侧设备的用户提供算力服务;
    所述云侧设备向所述端侧设备发送算力服务注册响应,所述算力服务注册响应用于指示所述端侧设备的用户已成功请求到所述云侧设备的算力服务。
  27. 如权利要求26所述的方法,其特征在于,所述算力服务注册请求中携带计算资源信息,所述计算资源信息用于表征所述端侧设备所申请的算力规格;所述算力服务注册响应携带所述云侧设备为所述端侧设备分配的资源ID,所述资源ID用于标识所述计算资源信息;
    所述资源申请请求中携带所述资源ID,所述云侧设备根据所述资源申请请求为所述端侧设备分配计算资源,包括:
    所述云端设备根据所述资源ID对应的计算资源信息为所述端侧设备分配所述计算资源。
  28. 如权利要求26或27所述的方法,其特征在于,所述资源申请请求中携带计算资源信息,所述计算资源信息用于表征所述端侧设备所申请的算力规格;
    所述云侧设备根据所述资源申请请求为所述端侧设备分配计算资源,包括:
    所述云端设备根据所述计算资源信息为所述端侧设备分配所述计算资源。
  29. 如权利要求26-28任一项所述的方法,其特征在于,所述云侧设备向所述端侧设备发送所述第一计算结果后,还包括:
    所述云侧设备接收所述端侧设备发送的第二计算指令以及第二计算数据,通过所述计算资源运行所述第二计算指令对所述第二计算数据进行推理得到第二计算结果;并向所述端侧设备发送所述第二计算结果。
  30. 如权利要求26-29任一项所述的方法,其特征在于,还包括:
    所述云侧设备接收端侧设备发送的资源释放请求,所述资源释放请求用于请求释放所述计算资源;
    所述云侧设备释放所述计算资源,并向所述端侧设备发送资源释放响应,所述资源释放响应用于指示成功释放所述计算资源。
  31. 一种通信装置,其特征在于,包括用于执行如权利要求1至9或10-18或19至24或25-30中的任一项所述方法的模块。
  32. 一种通信装置,其特征在于,包括处理器和接口电路,所述接口电路用于接收来自所述通信装置之外的其它通信装置的信号并传输至所述处理器或将来自所述处理器的信号发送给所述通信装置之外的其它通信装置,所述处理器通过逻辑电路或执行代码指令用于实现如权利要1至9或10-18或19至24或25-30中任一项所述的方法。
  33. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,当所述计算机指令被执行时,使得权利要求1至9或10-18或19至24或25-30中 任一项所述的方法被执行。
PCT/CN2021/082483 2020-03-31 2021-03-23 一种通信方法及装置 WO2021197144A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010242173.XA CN113472830B (zh) 2020-03-31 2020-03-31 一种通信方法及装置
CN202010242173.X 2020-03-31

Publications (1)

Publication Number Publication Date
WO2021197144A1 true WO2021197144A1 (zh) 2021-10-07

Family

ID=77865267

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/082483 WO2021197144A1 (zh) 2020-03-31 2021-03-23 一种通信方法及装置

Country Status (2)

Country Link
CN (1) CN113472830B (zh)
WO (1) WO2021197144A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401301A (zh) * 2022-01-17 2022-04-26 东云睿连(武汉)计算技术有限公司 一种带有遥控装置的边缘计算设备
WO2023098662A1 (zh) * 2021-11-30 2023-06-08 维沃移动通信有限公司 定位方法及通信设备
CN116414559A (zh) * 2023-01-28 2023-07-11 北京神州泰岳软件股份有限公司 算力统一标识建模、分配的方法、存储介质及电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117061345A (zh) * 2022-05-06 2023-11-14 华为技术有限公司 一种通信方法、装置及设备
CN117439958A (zh) * 2022-07-12 2024-01-23 维沃移动通信有限公司 一种ai网络模型交互方法、装置和通信设备
CN115934323B (zh) * 2022-12-02 2024-01-19 北京首都在线科技股份有限公司 云端计算资源的调用方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180332102A1 (en) * 2018-03-22 2018-11-15 Michael Sheidaei Cloud-based system for collaborating engineering, operations, maintenance, project management, procurement and vendor data and activities
CN109067840A (zh) * 2018-06-29 2018-12-21 优刻得科技股份有限公司 人工智能在线服务的方法、系统和存储介质
CN110750312A (zh) * 2019-10-17 2020-02-04 中科寒武纪科技股份有限公司 硬件资源配置方法、装置、云侧设备和存储介质
CN110750359A (zh) * 2019-10-17 2020-02-04 中科寒武纪科技股份有限公司 硬件资源配置方法、装置、云侧设备和存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103002044B (zh) * 2012-12-18 2016-05-11 武汉大学 一种提高多平台智能终端处理能力的方法
CN108243216B (zh) * 2016-12-26 2020-02-14 华为技术有限公司 数据处理的方法、端侧设备、云侧设备与端云协同系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180332102A1 (en) * 2018-03-22 2018-11-15 Michael Sheidaei Cloud-based system for collaborating engineering, operations, maintenance, project management, procurement and vendor data and activities
CN109067840A (zh) * 2018-06-29 2018-12-21 优刻得科技股份有限公司 人工智能在线服务的方法、系统和存储介质
CN110750312A (zh) * 2019-10-17 2020-02-04 中科寒武纪科技股份有限公司 硬件资源配置方法、装置、云侧设备和存储介质
CN110750359A (zh) * 2019-10-17 2020-02-04 中科寒武纪科技股份有限公司 硬件资源配置方法、装置、云侧设备和存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023098662A1 (zh) * 2021-11-30 2023-06-08 维沃移动通信有限公司 定位方法及通信设备
CN114401301A (zh) * 2022-01-17 2022-04-26 东云睿连(武汉)计算技术有限公司 一种带有遥控装置的边缘计算设备
CN114401301B (zh) * 2022-01-17 2023-07-14 东云睿连(武汉)计算技术有限公司 一种带有遥控装置的边缘计算设备
CN116414559A (zh) * 2023-01-28 2023-07-11 北京神州泰岳软件股份有限公司 算力统一标识建模、分配的方法、存储介质及电子设备

Also Published As

Publication number Publication date
CN113472830A (zh) 2021-10-01
CN113472830B (zh) 2023-03-10

Similar Documents

Publication Publication Date Title
WO2021197144A1 (zh) 一种通信方法及装置
CN107291456B (zh) 一种多屏显示的控制方法及系统
CN102939579B (zh) 绑定用户接口元素和粒度反映处理的方法和装置
KR102326521B1 (ko) Mec 플랫폼, 그것을 갖는 디지털 트윈 서비스 시스템 및 그것의 동작 방법
US10624022B2 (en) Method for establishing wireless LAN communication connection and electronic device therefor
US8843631B2 (en) Dynamic local function binding apparatus and method
WO2011116556A1 (zh) 基于物联网的无线通信终端及其应用方法
US20130091502A1 (en) System and method of providing virtual machine using device cloud
CN107409436B (zh) 一种云平台、运行应用的方法及接入网单元
CN114584613B (zh) 一种推送消息的方法、消息推送系统及电子设备
WO2019095154A1 (zh) 一种调度加速资源的方法、装置及加速系统
WO2022165771A1 (zh) 虚拟电子卡管理方法、系统及安全芯片、终端和存储介质
CN111414249A (zh) 一种车载主机系统
AU2019256257A1 (en) Processor core scheduling method and apparatus, terminal, and storage medium
CN109829546B (zh) 平台即服务云端服务器及其机器学习数据处理方法
CN107278294A (zh) 输入设备实现方法及其实现装置
CN108874554B (zh) 信息通信方法及装置
CN111654539B (zh) 基于云原生的物联网操作系统构建方法、系统及电子设备
CN114090483A (zh) 一种基于协程的rdma通信方法、装置及存储介质
CN110474891A (zh) 基于多系统智能设备的业务访问控制方法及装置
CN116932234A (zh) 应用程序间通信方法、装置、存储介质及程序产品
CN104471541A (zh) 混合应用环境
US11829791B2 (en) Providing device abstractions to applications inside a virtual machine
CN112615928B (zh) 数据的处理方法、设备以及存储介质
CN114398082B (zh) 一种框架式区块链应用的兼容运行方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21780962

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21780962

Country of ref document: EP

Kind code of ref document: A1