WO2022012129A1 - 云服务系统的模型处理方法及云服务系统 - Google Patents

云服务系统的模型处理方法及云服务系统 Download PDF

Info

Publication number
WO2022012129A1
WO2022012129A1 PCT/CN2021/092942 CN2021092942W WO2022012129A1 WO 2022012129 A1 WO2022012129 A1 WO 2022012129A1 CN 2021092942 W CN2021092942 W CN 2021092942W WO 2022012129 A1 WO2022012129 A1 WO 2022012129A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
model
local
local server
cloud server
Prior art date
Application number
PCT/CN2021/092942
Other languages
English (en)
French (fr)
Inventor
宁伟康
杨学文
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21843138.5A priority Critical patent/EP4174736A4/en
Publication of WO2022012129A1 publication Critical patent/WO2022012129A1/zh
Priority to US18/152,970 priority patent/US20230164030A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0813Configuration setting characterised by the conditions triggering a change of settings
    • H04L41/082Configuration setting characterised by the conditions triggering a change of settings the condition being updates or upgrades of network functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Definitions

  • the present application relates to the technical field of cloud computing, and in particular, to a model processing method of a cloud service system and a cloud service system.
  • Edge computing is a specific implementation of cloud computing technology.
  • cloud servers can provide computing tools such as machine learning models to terminal devices, and edge devices use the machine learning models provided by cloud servers to perform edge computing.
  • the calculation method can effectively reduce the calculation amount of the cloud server, thereby improving the operation efficiency of the entire cloud service system.
  • the supplier needs to continuously update the machine learning model provided by the cloud server.
  • the latest computing data used by all terminal devices in computing will be sent to the cloud server, depending on the cloud server.
  • These computing data update the machine learning model, but increase the computing volume of the cloud server, which will reduce the operation efficiency of the entire cloud service system; in another update technology, the terminal device and the cloud server
  • the model is updated, in which the federated learning client can be set on the terminal device, and the machine learning model can be updated according to the respective calculation data, and the updated gradient value can be sent to the cloud server, and the federated learning service can be set on the cloud server.
  • the machine learning model can be updated according to the received gradient value of the terminal device, but it will increase the calculation amount of the terminal device. In the current situation where the computing power of most terminal devices cannot be satisfied, it will also affect the cloud service. the overall operation of the system.
  • the present application provides a model processing method of a cloud service system and a cloud service system, which are used to solve the technical problem of how to update the machine learning model without affecting the overall operation efficiency of the cloud server in the cloud service system of the prior art.
  • a first aspect of the present application provides a cloud service system, including: a cloud server and a plurality of local servers, a first local server among the plurality of local servers is connected to the cloud server through a network, and the first local server also Connecting at least one edge device; wherein the first local server is used to obtain a data set of the at least one edge device, the data set including the at least one edge device using the first model provided by the cloud server. data used in calculation; determining a first gradient value for updating the first model according to the data set of the at least one edge device; sending the first gradient value to the cloud server; The cloud server is configured to update the first model according to the first gradient value, and send the updated first model to the first local server.
  • the cloud service system provided in this embodiment, in the whole process of model updating, neither completely relies on the cloud server for data calculation nor the edge device itself for model updating, but uses the computing power provided by the local server. Update the model, so as to ensure that the model provided by the cloud server is updated, the amount of data interaction between the edge device and the server can also be reduced, and the computing power requirements of the cloud server and edge devices can also be reduced. , thereby improving the operating efficiency of the entire cloud service system.
  • the cloud server is further configured to send multiple models to the cloud server; the first local server is further configured to receive and store the multiple models sent by the cloud server model; determining at least one model corresponding to a first edge device in the at least one edge device; and sending the at least one model to the first edge device.
  • the first local server also has the functions of model storage and determining different models corresponding to different edge devices, thereby further reducing the calculation required by the cloud server, and the cloud server only needs to Send the trained model to the local server, and the local server will distribute the model to the corresponding edge devices in a more targeted manner, which can also make the model used by the first edge device more accurate, and improve the use of the first edge device.
  • the accuracy of the model calculation further improves the operating efficiency of the entire cloud service system.
  • the cloud server is further configured to send a construction tool and an annotation tool to the first local server; wherein the construction tool is used for construction of the first local server,
  • the labeling tool is used to label the data in the data set.
  • the cloud server can send the construction tool and the labeling tool to the first local server, so that the first local server can construct the local server according to the tools sent by the cloud server, and implement related This complements the completeness of the entire cloud service system when it is implemented, so that the operator of the cloud service system can complete the construction and deployment of the first local server through the cloud server.
  • the first local server is further configured to, by using the labeling tool, label the first data in the data set of the at least one edge device to obtain multiple labeling results; and When the multiple annotation results are the same, the first data is added to a local data set, and the local data set is used to determine the first gradient value for updating the first model; When the two labeling results are not identical, the first data is sent to the first device, and after receiving the confirmation information sent by the first device, the first data is added to the local data set.
  • the first local server can use the labeling tool to label the data in the data set of the edge device, and only add the data with the same labeling result to the local data set, thereby improving the performance of the local data set.
  • the accuracy of the calculation when the data of the added local data set is used to update the model later.
  • manual labeling is used for data with different labeling results, which further ensures the correct labeling of the data added to the local dataset.
  • the first local server is further configured to determine the performance of the connected at least one edge device when using multiple models stored in the first local server to perform computation parameters, and sort the multiple models according to the performance parameters; send the sorting information of the multiple models to the cloud server; the cloud server is used for, according to the sorting information of the multiple models, to The multiple models are sorted.
  • the first server also has a sorting function, which can continuously optimize the composition of the models provided by the cloud server through the sorting of multiple models by the first server, so as to realize the "survival of the fittest" of the models.
  • the performance of subsequent edge devices using model computing is improved, thereby further improving the operating efficiency of the entire cloud service system.
  • the cloud server is specifically configured to, according to the first gradient value and the gradient value sent by at least one second local server among the multiple local servers A model is updated.
  • the cloud service system provided in this embodiment can update the model used by the edge device through the collaborative update between the cloud server and the local server, and this collaborative update structure can realize federated learning, and the local server can Deploying the federated learning client enables the local server to replace the terminal device to update the model and interact with the cloud server, further reducing the calculation performed by the terminal device, reducing the amount of data interaction between the edge device and the server, and improving the The operation efficiency of the entire cloud service system is improved.
  • a second aspect of the present application provides a model processing method for a cloud service system. After obtaining the data when the edge device uses the model for computing through a local server set between the cloud server and the edge device, the model can be processed by the local server. update, and send the updated gradient value of the model to the cloud server, and finally the cloud server updates the model according to the gradient value of the local server.
  • the model processing method of the cloud service system provided by this embodiment, during the entire model update process, neither the cloud server is completely relied on for data calculation nor the edge device itself is used to update the model, but the local server is used.
  • the provided computing power can update the model, so that on the basis of ensuring the update of the model provided by the cloud server, it can also reduce the amount of data interaction between the edge device and the server, and can also reduce the cloud server and edge devices. The requirements of computing power, thereby improving the operating efficiency of the entire cloud service system.
  • the cloud server may also deliver the model to the edge device through the first local server. Specifically, for the first local server, after receiving and storing multiple models sent by the cloud server, the model corresponding to each edge device in the at least one edge device is determined respectively. For example, after determining at least one model corresponding to the first edge device , and send the determined model to the first edge device.
  • the first local server also has the functions of model storage and determining different models corresponding to different edge devices, thereby further reducing the calculation required by the cloud server.
  • the cloud server only needs to send the trained model to the local server, and the local server will distribute the model to the corresponding edge devices in a more targeted manner, which can also make the model used by the first edge device more accurate and improve the first
  • An edge device uses the accuracy of model calculation, which further improves the operation efficiency of the entire cloud service system.
  • the first local server in order to implement the cloud service system, before the first local server starts to acquire the data set of at least one edge device, the first local server may receive the construction tool and the annotation tool sent by the cloud server, Thereby, the construction tool is used to build the first local server, and the annotation tool is used to annotate the data in the dataset.
  • the cloud server can send the construction tool and the labeling tool to the first local server, so that the first local server can perform the local server's processing according to the tools sent by the cloud server.
  • Build and implement related functions thereby complementing the completeness of the entire cloud service system when it is implemented, so that the operator of the cloud service system can complete the construction and deployment of the first local server through the cloud server.
  • the marking of the data set by the first local server specifically includes: the first local server uses a marking tool to mark the first data in the data set of at least one edge device.
  • the first local server adds the first data to the local data set, and the local data set is used by the first local server when updating the first model;
  • the labeling results are not exactly the same, it is necessary to enter the manual rechecking step.
  • the first server can send the first data to the first device used by the staff, so that the staff can manually mark the first data, and when the staff uses The first data is added to the local data set only after the confirmation information sent by the first device.
  • the first local server can use the labeling tool to label the data in the data set of the edge device, and only add the data with the same labeling result to the local data set, thereby improving the performance of the local data set.
  • the accuracy of the calculation when the data of the added local data set is used to update the model later.
  • manual labeling is used for data with different labeling results, which further ensures the correct labeling of the data added to the local dataset.
  • the first local server also has a function of sorting the models.
  • the first local server may perform calculations using multiple models according to the performance of at least one connected edge device. After sorting the multiple models according to the performance parameters, the sorting information of the multiple models is sent to the cloud server.
  • the model processing method of the cloud service system enables the cloud server to continuously optimize the composition of the models provided by the cloud server after sorting multiple models, so as to realize the "survival of the fittest" of the models and improve the use of subsequent edge devices.
  • the performance of the model calculation thereby further improving the operating efficiency of the entire cloud service system.
  • a third aspect of the present application provides a model processing method for a cloud service system, including: after the cloud server receives the first gradient value sent by the first local server, updates the first model according to the first gradient value, and sends the data to the local server. Updated first model.
  • the model processing method of the cloud service system provided in this embodiment, from the perspective of the cloud server, only needs to cooperate with the local server to update the first model of the edge device.
  • the server performs data calculation and does not rely on the edge device itself to update the model, but updates the model through the computing power provided by the local server, so as to ensure that the model provided by the cloud server is updated, it can also reduce the number of edge
  • the amount of data interaction between the device and the server can also reduce the requirements for the computing power of the cloud server and edge devices, thereby improving the operating efficiency of the entire cloud service system.
  • the cloud server specifically updates the first model jointly according to the first gradient value sent by the first local server and the gradient value sent by at least one second local server. renew.
  • the model processing method of the cloud service system can update the model used by the edge device through the collaborative updating of the cloud server and the local server, and this collaborative updating structure can realize federated learning.
  • the federated learning client can be deployed on the local server, so that the local server can replace the terminal device to update the model and interact with the cloud server, further reducing the calculation performed by the terminal device and reducing the data interaction between the edge device and the server This improves the operating efficiency of the entire cloud service system.
  • the cloud server may send the construction tool and the annotation tool to the first local server, Therefore, the first local server can build the first local server according to the construction tool, and use the labeling tool to label the data in the data set.
  • the cloud server can send the construction tool and the labeling tool to the first local server, so that the first local server can perform the local server's processing according to the tools sent by the cloud server.
  • Build and implement related functions thereby complementing the completeness of the entire cloud service system when it is implemented, so that the operator of the cloud service system can complete the construction and deployment of the first local server through the cloud server.
  • the first local server also has a function of sorting models.
  • the cloud server can receive sorting information of multiple models sent by the first local server. After the cloud server sorts multiple models, it can continuously optimize the composition of the models provided by the cloud server, realize the "survival of the fittest" of the models, and improve the performance of subsequent edge devices when using the model for computing, thereby further improving the performance of the entire cloud service system. operating efficiency.
  • a fourth aspect of the present application provides an apparatus for processing a cloud service system model, which can be used as the first local server in each of the embodiments in the first and second aspects of the present application, and execute the method executed by the first local server.
  • the apparatus includes: an acquisition module configured to acquire a data set of at least one edge device, the data set including data used when the at least one edge device performs computation using a first model provided by a cloud server; a processing module for Determine a first gradient value for updating the first model according to the data set of the at least one edge device; and a transmission module, configured to send the first gradient value to the cloud server.
  • the transmission module is further configured to receive multiple models sent by the cloud server, and store the multiple models in the storage module; the processing module is further configured to determine at least one model corresponding to a first edge device in the at least one edge device; the transmission module is further configured to send the at least one model to the first edge device.
  • the transmission module is further configured to receive a construction tool and an annotation tool sent by the cloud server; wherein the construction tool is used for construction of the first local server, and the The labeling tool is used to label the data in the dataset.
  • the processing module is further configured to, by using the labeling tool, label the first data in the data set of the at least one edge device to obtain multiple labeling results;
  • the first local server adds the first data to a local data set, and the local data set is used to determine a first gradient value for updating the first model;
  • the transmission module is further configured to, when the multiple labeling results are not identical, send the first data to the first device, and after receiving the confirmation information sent by the first device, send the first data to the first device. A data is added to the local data set.
  • the processing module is further configured to determine a performance parameter of the connected at least one edge device when using multiple models stored in the first local server to perform computation, and sort the multiple models according to the performance parameters; the transmission module is further configured to send the sorting information of the multiple models to the cloud server.
  • a fifth aspect of the present application provides a cloud service system model processing apparatus, which can be used as the cloud server in each of the embodiments in the first aspect and the third aspect of the present application, and execute a method for executing the cloud server.
  • the device includes: a transmission module configured to receive a first gradient value sent by a first local server, wherein the first gradient value is used to update the first model provided by the cloud server; a processing module configured to update the first model provided by the cloud server; The first gradient value is used to update the first model; the transmission module is further configured to send the updated first model to the first local server.
  • the processing module is specifically configured to, according to the first gradient value and a gradient value sent by at least one second local server among the plurality of local servers A model is updated.
  • the transmission module is further configured to send a construction tool and an annotation tool to the first local server; wherein the construction tool is used for construction of the first local server,
  • the labeling tool is used to label the data in the data set.
  • the transmission module is further configured to receive sorting information of multiple models sent by the first local server; the processing module is further configured to: Sorting information to sort the plurality of models.
  • an embodiment of the present application provides a computing device, including: a processor and a communication interface.
  • the processor sends data through the communication interface; the processor is configured to implement the method performed by the first local server in the first aspect or the second aspect.
  • the above computing device further includes: a memory; the memory is used to store program codes, and the processor executes the program codes stored in the memory, so that the computing device executes the above-mentioned first aspect or The method of the second aspect performed by the first local server.
  • an embodiment of the present application provides a computing device, including: a processor and a communication interface.
  • the processor sends data through the communication interface; the processor is configured to implement the method executed by the cloud server in the first aspect or the third aspect.
  • the above computing device further includes: a memory; the memory is used to store program codes, and the processor executes the program codes stored in the memory, so that the computing device executes the above-mentioned first aspect or The method performed by the cloud server in the third aspect.
  • 1 is a schematic diagram of an application scenario of the application
  • FIG. 2 is a schematic structural diagram of a cloud service system
  • FIG. 3 is a schematic structural diagram of an embodiment of a cloud service system provided by the present application.
  • FIG. 4 is a schematic flowchart of an embodiment of a model processing method of a cloud service system provided by the present application
  • Fig. 5 is the schematic flow chart of the model synchronization update provided by this application.
  • FIG. 6 is a schematic flowchart of the asynchronous update of the model provided by the present application.
  • FIG. 7 is a schematic flowchart of an embodiment of a method for processing a cloud service system model provided by the present application.
  • FIG. 8 is a schematic flowchart of an embodiment of a method for processing a cloud service system model provided by the present application
  • FIG. 10 is a schematic flowchart of an embodiment of a model processing method of a cloud service system provided by the present application.
  • FIG. 11 is a schematic structural diagram of another cloud service system provided by the application.
  • FIG. 12 is a schematic structural diagram of an embodiment of a cloud service system model processing apparatus provided by the present application.
  • FIG. 13 is a schematic structural diagram of an embodiment of a cloud service system model processing apparatus provided by the present application.
  • FIG. 14 is a schematic structural diagram of a computing device provided by the present application.
  • FIG. 1 is a schematic diagram of an application scenario of the application, wherein the application can be applied in the field of cloud computing technology.
  • the provider of cloud computing services can set up one or more cloud servers 3 in the Internet 2, and the cloud server 3 provides cloud computing services. .
  • the terminal device 1 used by the user needs certain software and hardware computing resources, it can directly use, or apply to the supplier, or pay a certain fee to the supplier to obtain the software and hardware resources provided by the cloud server 3, etc.
  • the terminal device 1 uses the cloud computing service provided by the provider. Since the computing resources used by the terminal device 1 are provided by the cloud server 3 set by the supplier on the network side, this scenario of using network resources for computing can also be called “cloud computing".
  • the cloud server 3 and the terminal device 1 It can also be called “cloud service system” together.
  • the terminal device 1 may be an edge device for implementing edge computing.
  • edge computing means that the device on the side of the cloud service system close to the object or data source can provide computing services, that is, in Figure 1, the terminal device 1 can cooperate with the cloud server 3 to perform edge computing, and perform edge computing.
  • Device 1 may also be referred to as an "edge device".
  • the terminal device 1 can process the local data with a low delay, and then send the processed data to the cloud server 3, so that the terminal device 1 does not need to send the data to the cloud server 3 for calculation, reducing the number of cloud servers 3 It increases the computing pressure and improves the operation efficiency of the cloud service system.
  • the training and calculation of the machine learning model is a common edge computing method in cloud service systems.
  • the supplier of the cloud server 3 collects a large amount of training data by , with the help of a high-performance server, a machine learning model 31 that can be used to identify animal types in images is obtained by training, and the machine learning model 31 is delivered to the terminal device 1 that needs to use the machine learning model.
  • the cloud server 3 can deliver the machine learning model 31 to the three terminal devices 1 labeled 11-13. In this way, the model provided by the cloud server 3 is calculated in the terminal device 1 to realize the edge computing scenario.
  • the computing data may also change at any time, which may cause the machine learning model 31 to perform The computing accuracy during edge computing decreases. Therefore, in the above-mentioned edge computing scenario, after sending the machine learning model 31 to the terminal device 1, the cloud server 3 can continue to update the machine learning model 31 and update the updated machine learning model 31.
  • the learning model 31 is sent to the terminal device 1 to improve the calculation accuracy of the edge computing performed by the terminal device 1 using the machine learning model 31 .
  • each terminal device 1 sends the data calculated by using the machine learning model 31 to the cloud server 3 , and the cloud server 3 sends the data according to the data sent by each terminal device 1 to the cloud server 3 .
  • the machine learning model 31 is updated, and the updated machine learning model 31 is sent to each terminal device 1 again.
  • this update method completely depends on the computing capability of the cloud server 3, and adds a large amount of interactive data to the cloud server 3 and the terminal device 1, increasing the bandwidth requirements, thereby reducing the operating efficiency of the entire cloud service system.
  • some sensitive data processed by the terminal device 1 will also be directly sent to the cloud server 3, and the security of the data cannot be guaranteed during this process.
  • each terminal device 1 since each terminal device 1 directly uploads data to the cloud server, data sharing cannot be achieved between different terminal devices, resulting in a "data island" problem.
  • FIG. 2 is a schematic structural diagram of a cloud service system.
  • the system shown in FIG. 2 is based on the scenario shown in FIG. learning) service to update the machine learning model 31, wherein the federated learning service (FLS) is deployed in the cloud server 3, and the federated learning client (FLC) is deployed in each terminal device 1 , all FLCs can connect to FLS through the front-end agent server, this structure can also be called "edge-cloud collaboration" update structure.
  • the FLC deployed by each terminal device 1 can update the machine learning model 31 by itself according to the data used when the terminal device 1 uses the machine learning model 31 for calculation, and will update the machine learning model 31 by itself.
  • the gradient value obtained by updating the learning model 31 is sent to the FLS through the pre-proxy server 4 .
  • the FLS can update the machine learning model 31 according to the gradient values sent by the multiple FLCs, and then send the updated machine learning model 31 to each terminal device 1 .
  • this update method puts forward higher requirements on the computing capability of the terminal device 1.
  • the terminal device 1 also needs to calculate the gradient value for updating the machine learning model 31. More terminal devices 1 have limited computing power, and it is difficult to directly participate in the update of the machine learning model 31 through the limited computing power.
  • the present application provides a model processing method of a cloud service system and a cloud service system.
  • the local server can be connected according to the data of at least one terminal device.
  • update the machine learning model together with the cloud server so as to ensure the update of the machine learning model, reduce the requirements for the computing power of the cloud server and terminal equipment and the data interaction between the two, thereby improving the overall cloud Operational efficiency of the service system.
  • FIG. 3 is a schematic structural diagram of an embodiment of a cloud service system provided by this application.
  • the cloud service system shown in FIG. 3 includes: a cloud server 3 and a plurality of local servers 5.
  • the plurality of local servers 5 in FIG. 3 Connect to cloud server 3 respectively.
  • the local server 5 may also be connected to at least one edge device 1, and the edge device 1 may be a terminal device capable of performing edge computing.
  • one local server 5 may be connected with multiple edge devices 1 as an example in FIG. 3 .
  • the local server 2 may be a server located at the location of multiple edge devices.
  • company B located in city A has set up a cloud server in its company and provides a machine learning model, then the company located in city C
  • company D uses multiple edge devices
  • a local server can be set up in company D, so that multiple edge devices of company D can connect to the local server set in company D, and at the same time, the local server set in company D can be set on the Company B's cloud server.
  • the cloud server 3 may provide machine learning models to edge devices 1 that require machine learning models, and the number of machine learning models provided by the cloud server 3 to each edge device 1 may be one or more.
  • the cloud server 3 may, after training to obtain multiple machine learning models, send the multiple machine learning models to the connected local server 5 , and the local server 5 sends the machine learning models to the connected local server 5 .
  • Corresponding edge device 1. For example, after the machine learning model for realizing different functions is trained in the cloud server 3, assuming that at least one edge device connected to the local server 5 needs to use the machine learning model for recognizing animal types, the cloud server 3 will be used for recognizing animal types.
  • the local server 5 After the multiple machine learning models are sent to the local server 5, the local server 5 sends the multiple machine learning models to the connected edge devices.
  • the edge devices that receive the machine learning models can perform edge computing for animal category recognition according to the machine learning models.
  • the local server 5 can play the role of a gateway.
  • the cloud service system as shown in FIG. 3 provided by this application can also be updated by the local server 5 and the cloud server 3 in coordination with the machine learning model provided by the cloud server on the basis of realizing the above-mentioned edge computing.
  • a local server (referred to as the first local server) connected to the server and an edge device (referred to as the first edge device) connected to the first local server are taken as examples, in the model processing method of the cloud service system provided by this embodiment, Description of the process of updating the machine learning model used by the first edge device by cooperating with the first local server and the cloud server.
  • FIG. 4 is a schematic flowchart of an embodiment of a model processing method for a cloud service system provided by the present application.
  • the execution body of the method shown in FIG. 4 can be applied to the cloud service system shown in FIG. 3 .
  • the cloud server 3 and the Any one of the local servers 5 connected to the cloud server 3 is executed, and the local server is also connected to at least one edge device 1 .
  • the first local server obtains a data set of at least one connected edge device.
  • the model processing method provided in this embodiment is based on that the cloud server has sent the machine learning model to the local server, and the local server sends the machine learning model to the edge device for use. Then, in S101, in order to update the machine learning model, when all edge devices connected to the first local server use the machine learning model to perform computation, all the data used for the computation are sent to the first local server.
  • each edge device there may be one or more machine learning models used by each edge device, and any one of them is used as the first model for description. Then, in S101, the data when each edge device uses the first model for calculation is recorded as a data set, and the first local server receives the data when the first model is used for calculation sent by the connected edge device.
  • the cloud server sends the first model that recognizes the animal category in the image as cat or dog to the first local server, and after the first local server sends the first model to the two edge devices connected to the first local server,
  • a data set sent by one edge device may be received, including two images of cats when calculated using the first model
  • a data set sent by another edge device may be received, including two images calculated using the first model.
  • the first local server calculates, according to the data set of at least one edge device obtained in S101, a first gradient value for updating the first model.
  • the first local server provided in this embodiment not only can provide the function of the gateway, but also has the ability to obtain parameters for model updating, and after obtaining a certain number of data sets, the first model can be calculated and updated.
  • parameters such as the gradient value used to update the first model, this update method does not directly update the first model because the cloud server does not participate, and the calculation does not directly update the first model. parameter, so it can be called "local update”.
  • the first local server can After the images of dogs and three cats, use these five images to locally update the first model to obtain the first gradient value, assuming that the parameter in the first model is 2, after the first local server updates the first model locally
  • the parameter of is 2.1, then the first gradient value is the change value of the two is 0.1.
  • the first local server After obtaining the first gradient value of the first model in S102, the first local server sends the obtained first gradient value to the cloud server in S103, and correspondingly, the cloud server receives the first gradient value sent by the first local server. gradient value.
  • the calculation performed by the first local server only obtains parameters for updating the first model, but does not update the first model.
  • the cloud server updates the first model according to the first gradient value.
  • the first local server does not actually complete the update of the first model, the first local server also participates in the calculation of the update of the first model by the cloud server (calculating the first model for updating the first model). gradient value), so this process can also be called "co-update" of the first model by the cloud server and the local server.
  • S104 The cloud server updates the first model according to the first gradient value sent by the first local server.
  • the cloud server may update the first model in cooperation with the local server in a synchronous update or asynchronous update manner.
  • FIG. 5 is a schematic flowchart of model synchronization update provided by the present application, which can be applied to the cloud service system as shown in FIG. 3 , and the cloud server is connected to other local servers except the first local server in the above example. Both are recorded as the second local server.
  • the cloud server trains and obtains the first model, it first sends the first model to all local servers, and then in the actual use process, each local server calculates the gradient for updating the first model through the steps of S101-S103. value.
  • the first local server calculates and obtains the first gradient value and sends it to the cloud server
  • the second local server 1 calculates and obtains the second gradient value 1 and sends it to the cloud server . . .
  • All local servers can calculate the first model according to their respective data and obtain the gradient value, and then send it to the cloud server at the same time. Then, for the cloud server, after receiving the gradient values for updating the first model sent by multiple local servers at the same time, gradient aggregation can be performed on all the gradient values, and finally the first model can be updated.
  • the cloud server can add these gradients to obtain the updated first model. The parameter is 2.2. After updating the first model, the cloud server can send the first model to all local servers again, and can continue to cyclically execute the process shown in FIG. 5 .
  • Fig. 6 is a schematic flowchart of the asynchronous update of the model provided by the present application, wherein, the execution body is the same as that in Fig. 5.
  • the cloud server obtains the first model after training, it first sends the first model to all local servers, and then in the actual use process , after each local server calculates the gradient value for updating the first model through the steps of S101-S103, it can send the updated gradient value to the cloud server respectively.
  • the first local server calculates the first gradient value and sends it to the cloud server.
  • the cloud server can update the first model and return the updated first model to the first local server.
  • the cloud server updates the first model according to the first gradient value, and then according to the second gradient value 1 Update the first model and return the updated first model to the second local server 1... Then when the cloud server receives the gradient values sent by all the local servers and updates them respectively, the entire asynchronous update process is completed, and The process shown in FIG. 6 can be continued to be executed cyclically.
  • the cloud server sends the updated first model to the first local server.
  • the updated first model sent by the cloud server is received, and the updated first model is sent to the corresponding edge devices, so that subsequent edge devices can use the updated first model for calculation.
  • the corresponding edge device may be an edge device that needs to use the first model, or an edge device that already includes the first model but needs to update the first model.
  • the model processing method of the cloud service system after obtaining the data when the edge device uses the model for calculation through the local server set between the cloud server and the edge device, the model can be processed through the local server. update, and send the updated gradient value of the model to the cloud server, and finally the cloud server updates the model according to the gradient value of the local server.
  • the model is updated through the computing power provided by the local server, thus ensuring that the cloud server provides Based on the updated model, it can also reduce the amount of data interaction between edge devices and servers, and can also reduce the computing power requirements of cloud servers and edge devices, thereby improving the operation efficiency of the entire cloud service system.
  • the FLC can be deployed in the first local server and the FLC can be deployed in the cloud server.
  • the first local server can be used instead of the edge device to implement the method shown in FIG. 2 .
  • the update technology of federated learning because the computing capability of the first local server provided in this embodiment may be greater than that of the edge device, and the edge device may not need to update the model, which is similar to the technology of deploying FLC in the edge device as shown in FIG. 2 . In contrast, it can also reduce the computing power requirements of edge devices, thereby improving the operating efficiency of the cloud service system.
  • FIG. 7 is a schematic flowchart of an embodiment of a method for processing a cloud service system model provided by the present application. The method shown in FIG. 7 can be applied to the cloud service system shown in FIG. In the illustrated embodiment, S101 is executed before.
  • the cloud server pre-trains multiple models.
  • the cloud server can obtain multiple machine learning models according to the training data set provided by the supplier. For example, after the supplier collects images of different animals and labels the images of cats and dogs, the model trained by the cloud server can be used to identify whether the animals in the images are cats or dogs.
  • the cloud server sends the multiple models trained in S201 to the first local server. Then, for the first local server, multiple models sent by the cloud server are received.
  • the first local server determines at least one model corresponding to the first edge device.
  • the multiple models pre-trained by the cloud server in this embodiment may be all sent to the first local server, or part of them may be sent to the first local server. Then, after receiving the multiple models, the first local server determines at least one model corresponding to each connected edge device. Denote any edge device connected to the first local server as the first edge device, and the first local server can determine the corresponding edge device of the first edge device according to the computing power of the first edge device, the requirements for computing or the supported model type, etc. Model. For example, if there are multiple models in which the animal category in the recognition image is cat or dog, and the sizes of the models are different, when the computing performance of the first edge device is good, it can be determined that the first edge device corresponds to a larger model. When the computing performance of an edge device is poor, it may be determined that the first edge device corresponds to a smaller model.
  • the first local server sends the at least one model determined in S204 to the first edge device.
  • the first local server may determine a model corresponding to each edge device to which it is connected, and send the corresponding model to each edge device respectively. Meanwhile, for the first edge device, after receiving the model, the model can be used for calculation. It can be understood that, the at least one model sent by the first local server to the first edge device includes the first model in the foregoing embodiment.
  • the first local server also has the function of storing the model and determining the model corresponding to the edge device, thereby further reducing the calculation required by the cloud server. It is only necessary to send the trained model to the local server, and the local server will deliver the model to the corresponding edge devices in a more targeted manner. It can also make the model used by the first edge device more accurate and improve the first edge device. The device uses the accuracy of model calculation, which further improves the operation efficiency of the entire cloud service system.
  • FIG. 8 is a schematic flowchart of an embodiment of a cloud service system model processing method provided by the present application.
  • the embodiment shown in FIG. 8 shows the construction process of the cloud service system shown in FIG. 3 .
  • the cloud server firstly constructs the functions on the cloud server side.
  • the cloud server may deploy a federated learning server.
  • the first local server sends request information to the cloud server, requesting to build the first local server.
  • S303 The cloud server performs authentication and registration on the first local server according to the request information.
  • the cloud server sends the construction tool and the annotation tool to the first local server.
  • the construction tool is used for building the first local server, and the labeling tool is used to label the data in the dataset.
  • the first local server constructs the functions on the side of the first local server according to the received construction tool.
  • the first local server may deploy a federated learning client.
  • the first local server After receiving the construction tool, the first local server can mark the data, and update the local data set through S307.
  • FIG. 9 is a schematic flowchart of data labeling provided by this embodiment of the application, wherein the first local server receives the connected at least one edge After the data set sent by the device, the data in the data set can be marked, and the data being marked by the first local server is recorded as the first data.
  • the subordinates of the first local server can use the labeling tool to label the first data to obtain multiple labeling results.
  • the labeling tool may be multiple pre-training models, for example, multiple models trained on the cloud server for classifying the animal category in the image as cat or dog, the first data is an image of cat or dog, each pre-trained model The training model can label the first data to obtain the result of cat or dog.
  • the first local server can interpret the results of the multiple pre-trained models, and when the multiple annotation results of the multiple models are the same, the first local server adds the first data to the local data set, and the local data set is used for Subsequently, the first local server is used when updating the first model; when the multiple annotation results of multiple models are not identical, it is necessary to enter the manual rechecking step, and the first server can send the first device to the first device used by the staff. data, let the staff manually mark the first data, and then the first server can add the first data to the local data set after receiving the confirmation information sent by the staff through the first device. In addition, if the staff member believes that the sample is abnormal, the first server may delete the first data without subsequent processing after receiving the abnormality information sent by the staff member through the first device.
  • the local data set is stored in the first local server, and cannot be accessed by other local servers, but can be accessed by at least one edge device connected to the first local server. Therefore, at least one edge device can realize data sharing through the first local server, and at the same time, the security of the data uploaded to the local server can also be ensured.
  • all edge devices of a company can be connected to a local server, then the data processed by all edge devices in the company can be added to the local dataset through the above process, and the local server can use the data in the local dataset when updating each model. data, and other companies have no access to this company's data.
  • the local server updates the model it only sends the updated gradient value to the cloud server, and the used data will not be sent to the network, thus further ensuring the security of the data.
  • FIG. 10 is a schematic flowchart of an embodiment of a model processing method for a cloud service system provided by the present application, which can be applied to the cloud service system shown in FIG. 3 .
  • the cloud server sends multiple pre-trained models to the first local server.
  • the cloud server may send all the multiple pre-trained models to the first local server, or the cloud server may send multiple models to be used by the edge device connected to the first local server to the first local server.
  • the first local server determines the performance parameters of the connected at least one edge device when the multiple models are used for calculation.
  • the performance parameter may be calculation accuracy or calculation speed.
  • the first local server will count the performance parameters of all edge devices using different models. For example, the first local server is connected to the edge devices 1-5, and the average time for the edge device 1-3 to calculate the result using model a is 0.1 second, and the average time for the edge device 2-5 to calculate the result using the model b is: 0.2 seconds...etc.
  • the first local server sorts the multiple models according to the performance parameters of the multiple models determined in S401. For example, if the computing time of the connected edge devices using model a is 0.1 seconds, and the computing time using model b is 0.2 seconds, etc., the first local server can be multi-to-many in order of computing speed from fast to slow Models are sorted, for example: a, b, ... .
  • the first local server sends the ranking information of the multiple models determined in S402 to the cloud server.
  • the cloud server sorts the multiple models according to the sorting information. Finally, the cloud server can sort all the models provided by the cloud server according to the sorting information sent by all connected local servers. And after sorting, some lower-ranked models can be deleted and replaced with some other models. After that, the cloud server may repeat the step of S401 to send the updated multiple models to the local server. At this time, since multiple models are arranged in order, assuming that the edge device needs to recognize two models of animal categories in the image, the cloud server can send the updated top two models for recognizing animal categories in the image to the local server , which is sent by the local server to the edge device, which ensures that the model used by the edge device is the most advanced, that is, the computing performance is better.
  • the local server can sort the performance parameters of the connected edge device usage models, and send the sorting information to the cloud server, and the cloud server can perform multiple models on multiple models.
  • the composition of the models provided by the cloud server can be continuously optimized to realize the "survival of the fittest" of the models, and improve the performance of subsequent edge devices when using the model to calculate, thereby further improving the operation efficiency of the entire cloud service system.
  • the cloud service system shown in FIG. 3 takes the cloud server connecting multiple local services as an example.
  • the cloud server may also be directly connected to edge devices to realize hybrid deployment.
  • FIG. 11 is a schematic structural diagram of another cloud service system provided by the present application, wherein, on the basis of the embodiment shown in FIG. 3 , the cloud server 3 can also be directly connected to the edge device 1 , for example, the reference numerals in the figure Taking the edge device 6 as an example, the local server 5 can cooperate with the cloud server 3 to perform the processing of updating the model in the foregoing embodiments of this application, and the edge device 6 directly connected to the cloud server 3 may not participate in the model.
  • the cloud service system provided in this embodiment has strong deployment flexibility, and can reduce the number of local servers in the cloud service system to a certain extent.
  • the cloud server and the first local server serving as executive bodies may include hardware structures and/or software modules, and implement the above functions in the form of hardware structures, software modules, or hardware structures plus software modules. Whether one of the above functions is performed in the form of a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.
  • FIG. 12 is a schematic structural diagram of an embodiment of a cloud service system model processing device provided by the present application.
  • the device shown in FIG. 12 can be used as the first local server in the foregoing embodiments of the present application, and execute the first local server. method of execution.
  • the apparatus 120 shown in FIG. 12 includes: an acquisition module 1201 , a processing module 1202 and a transmission module 1203 .
  • the acquisition module 1201 is used to acquire a data set of at least one edge device, and the data set includes data used when the at least one edge device uses the first model provided by the cloud server to perform calculation;
  • the processing module 1202 is used to obtain data according to the data of the at least one edge device. set to determine the first gradient value for updating the first model;
  • the transmission module 1203 is configured to send the first gradient value to the cloud server.
  • the transmission module 1203 is further configured to receive multiple models sent by the cloud server, and store the multiple models in the storage module; the processing module 1202 is further configured to determine the corresponding model of the first edge device in the at least one edge device. at least one model; the transmission module 1203 is further configured to send the at least one model to the first edge device.
  • the transmission module 1203 is further configured to receive the construction tool and the labeling tool sent by the cloud server; wherein the building tool is used for building the first local server, and the labeling tool is used to label the data in the dataset.
  • the processing module 1202 is further configured to, through an annotation tool, annotate the first data in the data set of at least one edge device to obtain multiple annotation results; and when the multiple annotation results are the same, the first local server will The first data is added to the local data set, and the local data set is used to determine the first gradient value used to update the first model; the transmission module 1203 is also used to send to the first device when the multiple annotation results are not identical the first data, and after receiving the confirmation information sent by the first device, the first data is added to the local data set.
  • the processing module 1202 is further configured to determine performance parameters of the connected at least one edge device when using multiple models stored in the first local server for calculation, and sort the multiple models according to the performance parameters; transmitting The module is also used to send the ranking information of multiple models to the cloud server.
  • FIG. 13 is a schematic structural diagram of an embodiment of a cloud service system model processing apparatus provided by the present application.
  • the apparatus shown in FIG. 13 can be used as the cloud server in the foregoing embodiments of the present application, and execute the method executed by the cloud server.
  • the apparatus 130 shown in FIG. 13 includes: a transmission module 1301 and a processing module 1302 .
  • the transmission module 1301 is used to receive the first gradient value sent by the first local server, where the first gradient value is used to update the first model provided by the cloud server; the processing module 1302 is used to update the first gradient value according to the first gradient value.
  • a model is updated; the transmission module is further configured to send the updated first model to the first local server.
  • the processing module 1302 is specifically configured to update the first model according to the first gradient value and the gradient value sent by at least one second local server among the multiple local servers.
  • the transmission module 1301 is further configured to send the construction tool and the labeling tool to the first local server; wherein, the building tool is used for the construction of the first local server, and the labeling tool is used to label the data in the dataset.
  • the transmission module 1301 is further configured to receive the sorting information of the multiple models sent by the first local server; the processing module is further configured to sort the multiple models according to the sorting information of the multiple models.
  • each module of the above apparatus is only a division of logical functions, and may be fully or partially integrated into a physical entity in actual implementation, or may be physically separated.
  • these modules can all be implemented in the form of software calling through processing elements; they can also all be implemented in hardware; some modules can also be implemented in the form of calling software through processing elements, and some modules can be implemented in hardware.
  • the processing module may be a separately established processing element, or it may be integrated into a certain chip of the above-mentioned apparatus to realize, in addition, it may also be stored in the memory of the above-mentioned apparatus in the form of program code, and a certain processing element of the above-mentioned apparatus may be used.
  • each step of the above-mentioned method or each of the above-mentioned modules can be completed by an integrated logic circuit of hardware in the processor element or an instruction in the form of software.
  • the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more application specific integrated circuits (ASIC), or one or more microprocessors (digital) signal processor, DSP), or, one or more field programmable gate arrays (field programmable gate array, FPGA), etc.
  • ASIC application specific integrated circuits
  • DSP digital signal processor
  • FPGA field programmable gate array
  • the processing element may be a general-purpose processor, such as a central processing unit (central processing unit, CPU) or other processors that can call program codes.
  • these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC).
  • SOC system-on-a-chip
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.
  • FIG. 14 is a schematic structural diagram of the computing device provided by the present application, as shown in FIG. 14 ,
  • the computing device 1400 may include a communication interface 1410 and a processor 1420 .
  • the computing device 1400 may further include a memory 1430 .
  • the memory 1430 may be provided inside the computing device, and may also be provided outside the computing device.
  • each of the first local servers in the foregoing FIG. 4 to FIG. 10 may be implemented by the processor 1420 .
  • the processor 1420 sends data through the communication interface 1410, and is used to implement any method performed by the first local server described in FIG. 4-FIG. 10 .
  • each step of the processing flow can be implemented by the hardware integrated logic circuit in the processor 1420 or the instructions in the form of software to complete the method executed by the first local server in FIGS. 4-10 .
  • the program codes executed by the processor 1420 for implementing the above method may be stored in the memory 1430 .
  • the memory 1430 is connected to the processor 1420, such as a coupling connection and the like.
  • each cloud server in FIG. 4 to FIG. 10 above may be implemented by the processor 1420 .
  • the processor 1420 sends control signals and communication data through the communication interface 1410, and is used to implement any method performed by the cloud server described in FIG. 4-FIG. 10 .
  • the steps of the processing flow can be implemented by the hardware integrated logic circuit in the processor 1420 or the instructions in the form of software to complete the method executed by the cloud server in FIGS. 4-10 .
  • the program codes executed by the processor 1420 for implementing the above method may be stored in the memory 1430 .
  • the memory 1430 is connected to the processor 1420, such as a coupling connection and the like.
  • Some features of the embodiments of the present application may be implemented/supported by the processor 1420 executing program instructions or software codes in the memory 1430 .
  • the software components loaded on the memory 1430 can be summarized in terms of function or logic, for example, the acquisition module 1201, the processing module 1202, and the transmission module 1302 shown in FIG. 12; for example, the transmission module 1301 and the processing module shown in FIG. 13 1302.
  • any communication interface involved in the embodiments of this application may be a circuit, a bus, a transceiver, or any other device that can be used for information interaction.
  • the communication interface 1410 in the computing device 1400 for example, the other device may be a device connected to the computing device, for example, when the computing device is the first local server, the other device may be a cloud server; When the cloud server is a cloud server, the other device may be the first local server.
  • the processors involved in the embodiments of the present application may be general-purpose processors, digital signal processors, application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and may implement or The methods, steps and logic block diagrams disclosed in the embodiments of this application are executed.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, modules or modules, which may be in electrical, mechanical or other forms, and is used for information exchange between devices, modules or modules.
  • the processor may cooperate with the memory.
  • the memory can be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or a volatile memory (volatile memory), such as random access memory (random-state drive, SSD), etc. access memory, RAM).
  • Memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • connection medium among the above-mentioned communication interface, processor, and memory is not limited in the embodiments of the present application.
  • the memory, the processor and the communication interface can be connected by a bus.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like.
  • the connection bus between the processor and the memory is not the connection bus between the aforementioned cloud server and the first local server.
  • “at least one” refers to one or more, and "a plurality” refers to two or more.
  • “And/or”, which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the related objects before and after are an “or” relationship; in the formula, the character “/” indicates that the related objects are a “division” relationship.
  • “At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one item (number) of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple Piece.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

一种云服务系统的模型处理方法及云服务系统,通过在云服务器(3)和边缘设备(1)之间设置的本地服务器(5),获取边缘设备(1)使用模型进行计算时的数据后,即可通过本地服务器(5)进行更新,并将模型更新后的梯度值发送给云服务器(3),最终由云服务器(3)根据本地服务器(5)的梯度值对模型进行更新。在整个模型更新的过程中,在保证对云服务器(3)提供的模型进行更新的基础上,还能够减少边缘设备(1)与服务器之间的数据交互量,也能够降低对云服务器(3)以及边缘设备(1)的计算能力的要求,进而提高了整个云服务系统的运行效率。

Description

云服务系统的模型处理方法及云服务系统
本申请要求于2020年07月17日提交中国专利局、申请号为202010699825.2、发明名称为“云服务系统的模型处理方法及云服务系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及云计算技术领域,尤其涉及一种云服务系统的模型处理方法及云服务系统。
背景技术
边缘计算是云计算技术的一种具体实现,在云服务器的架构下,云服务器可以向终端设备提供机器学习模型等计算工具,由边缘设备使用云服务器提供的机器学习模型进行边缘计算,这种计算方式能够有效减少云服务器的计算量,从而提高整个云服务系统的运行效率。
而为了保证计算精度,供应商需要对云服务器提供的机器学习模型进行不断的更新,在一种更新技术中,所有终端设备在计算时使用的最新计算数据都会发送给云服务器,依赖云服务器根据这些计算数据对机器学习模型进行更新,但是增大了云服务器的计算量,会降低整个云服务系统的运行效率;在另一种更新技术中,终端设备和云服务器通过联邦学习的方式对机器模型进行更新,其中,终端设备上可以设置联邦学习客户端,并可以根据各自的计算数据对机器学习模型进行更新,并将更新后的梯度值发送给云服务器,云服务器上可以设置联邦学习服务端,可以根据接收到的终端设备的梯度值对机器学习模型进行更新,但是又会增加终端设备的计算量,在目前大多数终端设备的算力都无法满足的情况下,同样会影响云服务系统整体的运行。
因此,如何对云服务系统中,在能够对云服务器提供的机器学习模型进行更新的同时,又不会影响云服务系统的整体运行,是本领域亟需解决的技术问题。
发明内容
本申请提供一种云服务系统的模型处理方法及云服务系统,用于解决现有技术的云服务系统中,如何对机器学习模型进行更新时,不会影响云服务器整体运行效率的技术问题。
本申请第一方面提供一种云服务系统,包括:云服务器和多个本地服务器,所述多个本地服务器中的第一本地服务器通过网络与所述云服务器连接,所述第一本地服务器还连接至少一个边缘设备;其中,所述第一本地服务器用于,获取所述至少一个边缘设备的数据集,所述数据集包括所述至少一个边缘设备使用所述云服务器提供的第一模型进行计算时使用的数据;根据所述至少一个边缘设备的数据集,确定用于对所述第一模型进行更新的第一梯度值;并将所述第一梯度值发送至所述云服务器;所述云服务器用 于,根据所述第一梯度值对所述第一模型进行更新,并向所述第一本地服务器发送更新后的所述第一模型。
综上,本实施例提供的云服务系统,在整个模型更新的过程中,既不完全依赖云服务器进行数据计算,也不依赖边缘设备本身进行模型的更新,而是通过本地服务器提供的计算能力对模型进行更新,从而在保证对云服务器提供的模型进行更新的基础上,还能够减少了边缘设备与服务器之间的数据交互量,也能够降低了对云服务器以及边缘设备的计算能力的要求,进而提高了整个云服务系统的运行效率。
在本申请第一方面一实施例中,所述云服务器还用于,向所述云服务器发送多个模型;所述第一本地服务器还用于,接收并存储所述云服务器发送的多个模型;确定所述至少一个边缘设备中第一边缘设备对应的至少一个模型;并向所述第一边缘设备发送所述至少一个模型。
综上,本实施例提供的云服务系统中,第一本地服务器还具有模型存储、以及确定不同边缘设备对应的不同模型的功能,从而进一步减少了云服务器所需要进行的计算,云服务器只需要将训练得到的模型发送给本地服务器,由本地服务器更有针对性地将模型分别下发给对应的边缘设备,还能够让第一边缘设备所使用的模型更精确,提高了第一边缘设备使用模型计算时的精准度,进而进一步提高了整个云服务系统的运行效率。
在本申请第一方面一实施例中,所述云服务器还用于,向所述第一本地服务器发送构建工具和标注工具;其中,所述构建工具用于所述第一本地服务器的搭建,所述标注工具用于对所述数据集中的数据进行标注。
综上,本实施提供的云服务系统中,云服务器可以通过向第一本地服务器发送构建工具以及标注工具,使得第一本地服务器可以根据云服务器发送的工具,进行本地服务器的构建、以及实现相关的功能,从而补充了整个云服务系统实现时的完备性,使得云服务系统的运营商通过云服务器可以完成对第一本地服务器的构建和部署。
在本申请第一方面一实施例中,所述第一本地服务器还用于,通过所述标注工具,对所述至少一个边缘设备的数据集中的第一数据进行标注得到多个标注结果;并当所述多个标注结果均相同时,将所述第一数据加入本地数据集,所述本地数据集用于确定用于对所述第一模型进行更新的第一梯度值;当所述多个标注结果不完全相同时,向第一设备发送所述第一数据,并在接收到所述第一设备发送的确认信息后,将所述第一数据加入所述本地数据集。
综上,本实施例提供的云服务系统的模型处理方法,第一本地服务器可以通过标注工具对边缘设备的数据集中的数据进行标注,并对标注结果相同的数据才加入本地数据集,从而提高了所加入的本地数据集的数据用于后续对模型进行更新时,计算的准确程度。并且在标注结果不完全相同的数据依靠人工进行标注,进一步保障了所加入本地数据集的数据的标注正确。
在本申请第一方面一实施例中,所述第一本地服务器还用于,确定所连接的所述至少一个边缘设备在使用所述第一本地服务器所存储的多个模型进行计算时的性能参数,并按照所述性能参数对所述多个模型进行排序;向所述云服务器发送所述多个模型的排序信息;所述云服务器用于,根据所述多个模型的排序信息,对所述多个模型进行排序。
综上,本实施例提供的云服务系统,第一服务器还具有排序功能,可以通过第一服 务器对多个模型的排序,不断优化云服务器所提供的模型的组成,实现模型的“优胜劣汰”,提高后续边缘设备使用模型计算时的性能,从而进一步提高了整个云服务系统的运行效率。
在本申请第一方面一实施例中,所述云服务器具体用于,根据所述第一梯度值,以及所述多个本地服务器中至少一个第二本地服务器发送的梯度值,对所述第一模型进行更新。
综上,本实施例提供的云服务系统,可以通过云服务器与本地服务器协同更新的方式,对边缘设备使用的模型进行更新,而这种协同更新的结构可以实现联邦学习,在本地服务器上可以部署联邦学习客户端,使得本地服务器代替了终端设备进行模型的更新,以及与云服务器进行交互,进一步减少了终端设备所进行的计算,减少了边缘设备与服务器之间的数据交互量,进而提高了整个云服务系统的运行效率。
本申请第二方面提供一种云服务系统的模型处理方法,通过在云服务器和边缘设备之间设置的本地服务器,获取边缘设备使用模型进行计算时的数据后,即可通过本地服务器对模型进行更新,并将模型更新后的梯度值发送给云服务器,最终由云服务器根据本地服务器的梯度值对模型进行更新。
综上,本实施例提供的云服务系统的模型处理方法,在整个模型更新的过程中,既不完全依赖云服务器进行数据计算,也不依赖边缘设备本身进行模型的更新,而是通过本地服务器提供的计算能力对模型进行更新,从而在保证对云服务器提供的模型进行更新的基础上,还能够减少了边缘设备与服务器之间的数据交互量,也能够降低了对云服务器以及边缘设备的计算能力的要求,进而提高了整个云服务系统的运行效率。
在本申请第二方面一实施例中,在第一本地服务器获取至少一个边缘设备的数据集之前,还可以由云服务器通过第一本地服务器,将模型下发到边缘设备。具体地,对于第一本地服务器,当接收并存储云服务器发送的多个模型之后,分别确定至少一个边缘设备中每个边缘设备对应的模型,例如,确定第一边缘设备对应的至少一个模型后,将所确定的模型发送给第一边缘设备。
综上,本实施例提供的云服务器系统的模型处理方法中,第一本地服务器还具有模型存储、以及确定不同边缘设备对应的不同模型的功能,从而进一步减少了云服务器所需要进行的计算,云服务器只需要将训练得到的模型发送给本地服务器,由本地服务器更有针对性地将模型分别下发给对应的边缘设备,还能够让第一边缘设备所使用的模型更精确,提高了第一边缘设备使用模型计算时的精准度,进而进一步提高了整个云服务系统的运行效率。
在本申请第二方面一实施例中,为了实现云服务系统,还可以在第一本地服务器开始获取至少一个边缘设备的数据集之前,第一本地服务器接收云服务器发送的构建工具和标注工具,从而使用构建工具搭建第一本地服务器,并使用标注工具对数据集中的数据进行标注。
综上,本实施例提供的云服务系统的模型处理方法中,云服务器可以通过向第一本地服务器发送构建工具以及标注工具,使得第一本地服务器可以根据云服务器发送的工具,进行本地服务器的构建、以及实现相关的功能,从而补充了整个云服务系统实现时的完备性,使得云服务系统的运营商通过云服务器可以完成对第一本地服务器的构建和 部署。
在本申请第二方面一实施例中,第一本地服务器对对数据集进行的标注具体包括:第一本地服务器通过标注工具,对至少一个边缘设备的数据集中的第一数据进行标注,当多个模型的多个标注结果均相同时,则第一本地服务器将第一数据加入本地数据集,本地数据集用于后续第一本地服务器在更新第一模型时使用;当多个模型的多个标注结果不完全相同时,则需要进入人工复检步骤,第一服务器可以向工作人员使用的第一设备发送第一数据,让工作人员对第一数据进行人工标注,并在接收到工作人员使用的第一设备发送的确认信息后,才将第一数据加入本地数据集。
综上,本实施例提供的云服务系统的模型处理方法,第一本地服务器可以通过标注工具对边缘设备的数据集中的数据进行标注,并对标注结果相同的数据才加入本地数据集,从而提高了所加入的本地数据集的数据用于后续对模型进行更新时,计算的准确程度。并且在标注结果不完全相同的数据依靠人工进行标注,进一步保障了所加入本地数据集的数据的标注正确。
在本申请第二方面一实施例中,第一本地服务器还具有对模型进行排序的功能,具体地,第一本地服务器可以根据所连接的至少一个边缘设备在使用多个模型进行计算时的性能参数,并按照性能参数对多个模型进行排序后,向云服务器发送多个模型的排序信息。
综上,本实施例提供的云服务系统的模型处理方法使得云服务器对多个模型进行排序后,可以不断优化云服务器所提供的模型的组成,实现模型的“优胜劣汰”,提高后续边缘设备使用模型计算时的性能,从而进一步提高了整个云服务系统的运行效率。
本申请第三方面提供一种云服务系统的模型处理方法,包括:云服务器接收第一本地服务器发送的第一梯度值后,根据第一梯度值对第一模型进行更新,并向本地服务器发送更新后的第一模型。
综上,本实施例提供的云服务系统的模型处理方法,作为云服务器的角度,只需要协同本地服务器对边缘设备的第一模型进行更新,在整个模型更新的过程中,既不完全依赖云服务器进行数据计算,也不依赖边缘设备本身进行模型的更新,而是通过本地服务器提供的计算能力对模型进行更新,从而在保证对云服务器提供的模型进行更新的基础上,还能够减少了边缘设备与服务器之间的数据交互量,也能够降低了对云服务器以及边缘设备的计算能力的要求,进而提高了整个云服务系统的运行效率。
在本申请第三方面一实施例中,云服务器具体通过同步更新的方式,根据第一本地服务器发送的第一梯度值,以及至少一个第二本地服务器发送的梯度值,共同对第一模型进行更新。
综上,本实施例提供的云服务系统的模型处理方法,可以通过云服务器与本地服务器协同更新的方式,对边缘设备使用的模型进行更新,而这种协同更新的结构可以实现联邦学习,在本地服务器上可以部署联邦学习客户端,使得本地服务器代替了终端设备进行模型的更新,以及与云服务器进行交互,进一步减少了终端设备所进行的计算,减少了边缘设备与服务器之间的数据交互量,进而提高了整个云服务系统的运行效率。
在本申请第三方面一实施例中,为了实现云服务系统,还可以在第一本地服务器开始获取至少一个边缘设备的数据集之前,云服务器向第一本地服务器发送的构建工具和 标注工具,从而使得第一本地服务器能够根据构建工具搭建第一本地服务器,并使用标注工具对数据集中的数据进行标注。
综上,本实施例提供的云服务系统的模型处理方法中,云服务器可以通过向第一本地服务器发送构建工具以及标注工具,使得第一本地服务器可以根据云服务器发送的工具,进行本地服务器的构建、以及实现相关的功能,从而补充了整个云服务系统实现时的完备性,使得云服务系统的运营商通过云服务器可以完成对第一本地服务器的构建和部署。
在本申请第三方面一实施例中,第一本地服务器还具有对模型进行排序的功能,具体地,云服务器可以接收第一本地服务器发送的多个模型的排序信息。使得云服务器对多个模型进行排序后,可以不断优化云服务器所提供的模型的组成,实现模型的“优胜劣汰”,提高后续边缘设备使用模型计算时的性能,从而进一步提高了整个云服务系统的运行效率。
本申请第四方面提供一种云服务系统模型处理装置,可作为本申请第一方面和第二方面中各实施例中的第一本地服务器,并执行第一本地服务器执行的方法。所述装置包括:获取模块,用于获取至少一个边缘设备的数据集,所述数据集包括所述至少一个边缘设备使用云服务器提供的第一模型进行计算时使用的数据;处理模块,用于根据所述至少一个边缘设备的数据集,确定用于对所述第一模型进行更新的第一梯度值;传输模块,用于向所述云服务器发送所述第一梯度值。
在本申请第四方面一实施例中,所述传输模块还用于,接收所述云服务器发送的多个模型,并将多个模型存储在存储模块中;所述处理模块还用于,确定所述至少一个边缘设备中第一边缘设备对应的至少一个模型;所述传输模块还用于,向所述第一边缘设备发送所述至少一个模型。
在本申请第四方面一实施例中,所述传输模块还用于,接收所述云服务器发送的构建工具和标注工具;其中,所述构建工具用于所述第一本地服务器的搭建,所述标注工具用于对所述数据集中的数据进行标注。
在本申请第四方面一实施例中,所述处理模块还用于,通过所述标注工具,对所述至少一个边缘设备的数据集中的第一数据进行标注得到多个标注结果;并当所述多个标注结果均相同时,所述第一本地服务器将所述第一数据加入本地数据集,所述本地数据集用于确定用于对所述第一模型进行更新的第一梯度值;所述传输模块还用于,当所述多个标注结果不完全相同时,向第一设备发送所述第一数据,并在接收到所述第一设备发送的确认信息后,将所述第一数据加入所述本地数据集。
在本申请第四方面一实施例中,所述处理模块还用于,确定所连接的所述至少一个边缘设备在使用所述第一本地服务器所存储的多个模型进行计算时的性能参数,并按照所述性能参数对所述多个模型进行排序;所述传输模块还用于,向所述云服务器发送所述多个模型的排序信息。
本申请第五方面提供一种云服务系统模型处理装置,可作为本申请第一方面和第三方面中各实施例中的云服务器,并执行云服务器执行的方法。所述装置包括:传输模块,用于接收第一本地服务器发送的第一梯度值,其中,所述第一梯度值用于对云服务器提供的第一模型进行更新;处理模块,用于根据所述第一梯度值对所述第一模型进行更新; 所述传输模块还用于,向所述第一本地服务器发送更新后的所述第一模型。
在本申请第五方面一实施例中,所述处理模块具体用于,根据所述第一梯度值,以及所述多个本地服务器中至少一个第二本地服务器发送的梯度值,对所述第一模型进行更新。
在本申请第五方面一实施例中,所述传输模块还用于,向所述第一本地服务器发送构建工具和标注工具;其中,所述构建工具用于所述第一本地服务器的搭建,所述标注工具用于对所述数据集中的数据进行标注。
在本申请第五方面一实施例中,所述传输模块还用于,接收所述第一本地服务器发送的多个模型的排序信息;所述处理模块还用于,根据所述多个模型的排序信息,对所述多个模型进行排序。
第六方面,本申请实施例提供一种计算装置,包括:处理器和通信接口。所述处理器通过所述通信接口发送数据;所述处理器用于实现上述第一方面或第二方面中由第一本地服务器执行的方法。
作为一种可能的设计,上述计算装置还包括:存储器;所述存储器用于存储程序代码,所述处理器执行所述存储器中存储的程序代码,以使得所述计算装置执行上述第一方面或第二方面中由第一本地服务器执行的方法。
第七方面,本申请实施例提供一种计算装置,包括:处理器和通信接口。所述处理器通过所述通信接口发送数据;所述处理器用于实现上述第一方面或第三方面中由云服务器执行的方法。
作为一种可能的设计,上述计算装置还包括:存储器;所述存储器用于存储程序代码,所述处理器执行所述存储器中存储的程序代码,以使得所述计算装置执行上述第一方面或第三方面中由云服务器执行的方法。
附图说明
图1为本申请应用场景的示意图;
图2为一种云服务系统的结构示意图;
图3为本申请提供的云服务系统一实施例的结构示意图;
图4为本申请提供的云服务系统的模型处理方法一实施例的流程示意图;
图5为本申请提供的模型同步更新的流程示意图;
图6为本申请提供的模型异步更新的流程示意图;
图7为本申请提供的云服务系统模型处理方法一实施例的流程示意图;
图8为本申请提供的云服务系统模型处理方法一实施例的流程示意图;
图9为本申请实施例提供的数据标注的流程示意图;
图10为本申请提供的云服务系统的模型处理方法一实施例的流程示意图;
图11为本申请提供的另一种云服务系统的结构示意图;
图12为本申请提供的云服务系统模型处理装置一实施例的结构示意图;
图13为本申请提供的云服务系统模型处理装置一实施例的结构示意图;
图14为本申请提供的计算装置的结构示意图。
具体实施方式
图1为本申请应用场景的示意图,其中,本申请可应用在云计算技术领域,云计算服务的供应商可以在互联网2内设置一个或多个云服务器3,由云服务器3提供云计算服务。例如,当用户所使用的终端设备1需要一定的软硬件计算资源时,可以直接使用、或者向供应商申请、或者向供应商支付一定费用来获得由云服务器3提供的软硬件资源等,实现终端设备1使用供应商提供的云计算服务。由于终端设备1使用的计算资源是由供应商设置在网络侧的云服务器3来提供,故这种使用网络资源进行计算的场景又可被称为“云计算”,云服务器3和终端设备1一起又可被称为“云服务系统”。
在如图1所示场景的一种具体实现方式中,终端设备1可以是用于实现边缘计算的边缘设备。其中,边缘计算是指云服务系统中靠近物或者数据源头的一侧的设备能够提供计算服务,也就是在图1中,终端设备1可以与云服务器3协同进行边缘计算,进行边缘计算的终端设备1又可被称为“边缘设备”。例如,终端设备1能够以较低的时延对本地数据进行处理后,将处理后的数据发送至云服务器3,从而不需要终端设备1将数据发送给云服务器3计算,减少了云服务器3的计算压力,提高了云服务系统的运行效率。
更为具体地,机器学习模型(本申请实施例中简称:模型)的训练与计算,是云服务系统中常见的一种边缘计算方式,例如,云服务器3的供应商通过采集大量的训练数据,借助较高性能的服务器,训练得到可用于识别图像中动物类别的机器学习模型31,并向需要使用该机器学习模型的终端设备1下发该机器学习模型31。如图1中,云服务器3可以将机器学习模型31下发给标号为11-13的三个终端设备1,每个终端设备1均可以通过接收到的机器学习模型31,对各自采集的图像中的动物类别进行识别,从而实现云服务器3提供的模型在终端设备1进行计算的边缘计算场景。
同时,由于供应商所采集的训练数据与终端设备1进行边缘计算时使用的计算数据可能存在差异,并且随着外部条件的变化,计算数据也随时可能变化,可能会导致机器学习模型31在进行边缘计算时的计算精度下降,因此,在上述边缘计算的场景中,在向终端设备1发送机器学习模型31后,云服务器3还可以继续对机器学习模型31进行更新,并将更新后的机器学习模型31发送给终端设备1,以提高终端设备1使用机器学习模型31进行边缘计算的计算精度。
在第一种对机器学习模型31进行更新的方式中,每个终端设备1将使用机器学习模型31进行计算的数据都发送给云服务器3,由云服务器3根据各个终端设备1所发送的数据,对机器学习模型31进行更新,并将更新后的机器学习模型31再发送给各终端设备1。但是,这种更新方式完全依赖于云服务器3的计算能力,并且给云服务器3和终端设备1增加了大量的交互数据量,增加了带宽的要求,进而降低了整个云服务系统的运行效率。同时,一些终端设备1所处理的较为敏感的数据也会直接发送给云服务器3,在这个过程中无法保证数据的安全。并且由于各个终端设备1都将数据直接上传到云服务器,不同终端设备之间,无法实现数据的共享,造成“数据孤岛”问题。
在第二种对机器学习模型31进行更新的方式中,图2为一种云服务系统的结构示意图,如图2所示的系统在如图1所示场景的基础上,通过联邦学习(federated learning)服务对机器学习模型31进行更新,其中,联邦学习服务端(federated  learning service,FLS)部署在云服务器3内,联邦学习客户端(federated learning client,FLC)部署在每个终端设备1内,所有FLC可以通过前置代理(agent)服务器连接FLS,这种结构又可被称为“边云协同”更新结构。在如图2所示的云服务系统中,每个终端设备1部署的FLC可以根据终端设备1使用机器学习模型31进行计算时使用的数据,自行对机器学习模型31进行更新,并将对机器学习模型31进行更新得到的梯度值通过前置代理服务器4发送给FLS。则FLS可以根据多个FLC发送的梯度值,对机器学习模型31进行更新,并将更新后的机器学习模型31再发送给各终端设备1。但是,这种更新方式对终端设备1的计算能力提出了较高的要求,终端设备1除了使用机器学习模型31进行计算,还需要计算对机器学习模型31进行更新的梯度值,而实际使用时更多的终端设备1计算能力有限,通过有限的计算能力很难直接参与机器学习模型31的更新。
综上,基于上述两种方式对机器学习模型31进行更新时都存在各自的不足,或依赖于云服务器3进行更新而降低系统性能且造成数据孤岛问题,或者依赖于终端设备1进行更新而受到计算能力的限制不易实现,本申请提供一种云服务系统的模型处理方法及云服务系统,通过在云服务器和终端设备之间设置本地服务器,由本地服务器根据所连接的至少一个终端设备的数据,与云服务器一起对机器学习模型进行更新,从而在保证机器学习模型进行更新的同时,减少了对云服务器、终端设备的计算能力的要求以及二者之间的数据交互,进而提高了整个云服务系统的运行效率。
下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
图3为本申请提供的云服务系统一实施例的结构示意图,如图3所示的云服务系统包括:云服务器3和多个本地服务器5,例如,在图3中的多个本地服务器5分别与云服务器3连接。同时,本地服务器5还可以连接至少一个边缘设备1,所述边缘设备1可以是能够进行边缘计算的终端设备。例如,在图3中一个本地服务器5可以与多个边缘设备1连接作为示例。所述本地服务器2可以是设置在多个边缘设备所在地的服务器,在一个示例的场景中,位于A市的B公司在其公司内设置了云服务器,并提供机器学习模型,则位于C市的D公司使用多个边缘设备时,可以在D公司内设置本地服务器,使得D公司的多个边缘设备连接D公司内设置的本地服务器,同时,设置在D公司的本地服务器可以通过互联网连接设置在B公司的云服务器。
具体地,云服务器3可以向需要机器学习模型的边缘设备1提供机器学习模型,并且云服务器3向每个边缘设备1所提供的机器学习模型的数量可以是一个或多个。例如,在如图3所示的系统中,云服务器3可以在训练得到多个机器学习模型后,将多个机器学习模型发送给连接的本地服务器5,由本地服务器5将机器学习模型发送给对应的边缘设备1。例如,云服务器3中训练得到用于实现不同功能的机器学习模型之后,假设本地服务器5连接的至少一个边缘设备需要使用识别动物类别的机器学习模型,则云服务器3将用于识别动物类别的多个机器学习模型发送至本地服务器5后,由本地服务器5再将多个机器学习模型发送给连接的边缘设备,接收到机器学习模型的边缘设备可以根据机器学习模型进行动物类别识别的边缘计算,在这个过程中,本地服务器5可以起到网关的作用。
进一步地,本申请提供的如图3所示的云服务系统在实现上述边缘计算的基础上,还可以由本地服务器5和云服务器3协同对云服务器提供的机器学习模型进行更新,下面以云服务器连接的一个本地服务器(记为第一本地服务器),以及第一本地服务器连接的一个边缘设备(记为第一边缘设备)为例,对本实施例提供的云服务系统的模型处理方法中,第一本地服务器和云服务器协同,对第一边缘设备使用的机器学习模型进行更新的流程说明。
图4为本申请提供的云服务系统的模型处理方法一实施例的流程示意图,如图4所示的方法的执行主体可以应用在如图3所示的云服务系统中,由云服务器3和云服务器3所连接的任一个本地服务器5执行,并且本地服务器还连接至少一个边缘设备1。
S101:第一本地服务器获取所连接的至少一个边缘设备的数据集。
具体地,本实施例提供的模型处理方法,基于云服务器已经将机器学习模型发送给本地服务器,并由本地服务器将机器学习模型发送给边缘设备进行使用。则在S101中,为了对机器学习模型进行更新,第一本地服务器所连接的所有边缘设备在使用机器学习模型进行计算时,将计算使用的数据都发送给第一本地服务器。
可以理解的是,每个边缘设备使用的机器学习模型可以有一个或多个,以其中任一个作为第一模型进行说明。则在S101中,将每个边缘设备使用第一模型进行计算时的数据记为一个数据集,第一本地服务器接收所连接的边缘设备发送的使用第一模型进行计算时的数据。
示例性地,云服务器将识别图像中动物类别为猫或者狗的第一模型发送给第一本地服务器,第一本地服务器将第一模型发送给第一本地服务器所连接的两个边缘设备后,在S101中,可以接收到一个边缘设备发送的数据集,包括使用第一模型计算时的两张猫的图像,以及接收到另一个边缘设备发送的数据集,包括使用第一模型计算时的两张狗和一张猫的图像。
S102:第一本地服务器根据S101中获取的至少一个边缘设备的数据集,计算得到用于第一模型进行更新的第一梯度值。
具体地,本实施例中提供的第一本地服务器除了能够提供网关的功能,还具有获得模型更新的参数的能力,并在获取了一定数量的数据集之后,即可计算得到第一模型进行更新的参数,例如用于更新第一模型的梯度值,这种更新方式由于没有云服务器的参与,并且所进行的计算并没有直接对第一模型进行更新,而是得到对第一模型进行更新的参数,故可被称为“本地更新”。
例如,在上述示例中,由于云服务器在训练第一模型时采集的猫和狗的图像与边缘设备实际使用第一模型计算时的图有所不同,第一本地服务器可以将接收到的两张狗和三张猫的图像后,使用这五张图像对第一模型进行本地更新,得到第一梯度值,假设第一模型中的参数为2,第一本地服务器对第一模型进行本地更新后的参数为2.1,则第一梯度值为二者的变化值0.1。
S103:第一本地服务器在S102中得到第一模型的第一梯度值后,在S103中将得到的第一梯度值发送给云服务器,则对应地,云服务器接收第一本地服务器发送的第一梯度值。
具体地,由于第一本地服务器所获取的计算数据的局限性,第一本地服务器进行的 计算仅得到了对第一模型进行更新的参数,而并没有对第一模型进行更新,第一本地服务器将第一梯度值发送给云服务器后,由云服务器再根据第一梯度值对第一模型进行更新。在这个过程中,虽然第一本地服务器没有实际完成对第一模型进行更新,但是第一本地服务器也参与到了云服务器对第一模型进行更新的计算中(计算用于更新第一模型的第一梯度值),因此这个过程也可以被称为云服务器和本地服务器对第一模型进行的“协同更新”。
S104:云服务器根据第一本地服务器发送的第一梯度值,对第一模型进行更新。
具体地,本申请实施例中,云服务器可以采用同步更新或者异步更新的方式,与本地服务器协同对第一模型进行更新。下面结合附图进行说明:
一、同步更新:
图5为本申请提供的模型同步更新的流程示意图,其中,可应用于如图3所示的云服务系统中,将云服务器所连接的除了上述示例中第一本地服务器之外的其他本地服务器都记为第二本地服务器。则当云服务器训练得到第一模型后,首先向所有本地服务器发送第一模型,随后在实际使用过程中,每个本地服务器均通过S101-S103的步骤计算用于对第一模型进行更新的梯度值。例如,第一本地服务器计算得到第一梯度值、并发送给云服务器、第二本地服务器1计算得到第二梯度值1、并发送给云服务器……。所有本地服务器可以根据各自的数据对第一模型进行计算并得到梯度值后,在同一时刻发送至云服务器。则对于云服务器,可以在同时接收到多个本地服务器发送的用于对第一模型更新的梯度值后,对所有梯度值进行梯度聚合,最终对第一模型进行更新。一种简单的示例汇总,假设第一模型中的参数为2,云服务器接收到的梯度值分别为0.1、-0.2、0.3,则云服务器可以将这些梯度相加后得到更新后的第一模型的参数为2.2。云服务器在对第一模型进行更新后,可以将第一模型再次发送给所有本地服务器,并可以继续循环执行图5所示的流程。
二、异步更新:
图6为本申请提供的模型异步更新的流程示意图,其中,与图5中执行主体相同,则当云服务器训练得到第一模型后,首先向所有本地服务器发送第一模型,随后在实际使用过程中,每个本地服务器均通过S101-S103的步骤计算用于对第一模型进行更新的梯度值后,可以分别向云服务器发送更新后的梯度值。例如,第一本地服务器计算得到第一梯度值、并发送给云服务器,此时,云服务器即可对第一模型进行更新后,向第一本地服务器返回更新后的第一模型。随后,当第二本地服务器1计算得到第二梯度值1、并发送给云服务器,此时,云服务器在已经根据第一梯度值对第一模型更新的基础上,再根据第二梯度值1对第一模型进行更新,并向第二本地服务器1返回更新后的第一模型……则当云服务器接收所有本地服务器发送的梯度值,并分别进行更新后,完成整个异步更新的流程,并可以继续循环执行图6所示的流程。
S105:云服务器将更新后的第一模型发送给第一本地服务器。对于第一本地服务器则接收云服务器发送的更新后的第一模型,并对将更新后的第一模型发送给对应的边缘设备,使得后续这些边缘设备可以使用更新后的第一模型进行计算。其中,所述对应的边缘设备可以是需要使用第一模型的边缘设备,或者是已经包括第一模型、但需要对第一模型进行更新的边缘设备。
综上,本实施例提供的云服务系统的模型处理方法,通过在云服务器和边缘设备之间设置的本地服务器,获取边缘设备使用模型进行计算时的数据后,即可通过本地服务器对模型进行更新,并将模型更新后的梯度值发送给云服务器,最终由云服务器根据本地服务器的梯度值对模型进行更新。在整个模型更新的过程中,既不完全依赖云服务器进行数据计算,也不依赖边缘设备本身进行模型的更新,而是通过本地服务器提供的计算能力对模型进行更新,从而在保证对云服务器提供的模型进行更新的基础上,还能够减少了边缘设备与服务器之间的数据交互量,也能够降低了对云服务器以及边缘设备的计算能力的要求,进而提高了整个云服务系统的运行效率。
可选地,上述实施例在一种具体的实现中,第一本地服务器中可以部署FLC、云服务器中可以部署FLC,此时,可以由第一本地服务器代替边缘设备实现如图2所示的联邦学习的更新技术,由于本实施例中提供的第一本地服务器的计算能力可以大于边缘设备,并且边缘设备可以不用进行模型的更新,与如图2所示的在边缘设备中部署FLC的技术相比,同样能够减少对边缘设备的计算能力的要求,进而提高云服务系统的运行效率。
进一步地,本实施例提供的本地服务器还可以具有存储机器学习模型的功能,并可以将存储的模型按照不同边缘设备的需要分别发送给对应的边缘设备。例如,图7为本申请提供的云服务系统模型处理方法一实施例的流程示意图,如图7所示的方法可应用于如图3所示的云服务系统中,并可以在如图4所示的实施例中S101之前执行。
S201、云服务器预训练多个模型。其中,云服务器可以根据供应商提供的训练数据集,得到多个机器学习模型。例如,供应商采集不同动物的图像,并对其中猫和狗的图像进行标注后,由云服务器训练得到的模型,可用于对图像中动物是猫或者狗进行识别。
S202、云服务器将S201中训练得到的多个模型发送给第一本地服务器。则对于第一本地服务器,则接收云服务器发送的多个模型。
S203、第一本地服务器接收多个模型后,存储在第一本地服务器的存储空间内。
S204、第一本地服务器确定与第一边缘设备对应的至少一个模型。
具体地,本实施例云服务器预训练得到的多个模型可以全部发送给第一本地服务器,或者部分发送给第一本地服务器。则第一本地服务器接收到多个模型之后,确定每个连接的边缘设备对应的至少一个模型。记第一本地服务器连接的任一边缘设备为第一边缘设备,第一本地服务器可以根据第一边缘设备的算力大小、对计算的要求或者支持的模型类型等,确定第一边缘设备对应的模型。例如,若存在多个识别图像中动物类别为猫或狗的模型,而模型的大小不同,当第一边缘设备的计算性能较好时,可以确定第一边缘设备对应较大的模型,当第一边缘设备的计算性能较差时,可以确定第一边缘设备对应较小的模型。
S205、第一本地服务器在向第一边缘设备发送在S204中所确定的至少一个模型。
可以理解的是,第一本地服务器可以确定其所连接的每一个边缘设备对应的模型,并分别向每个边缘设备发送对应的模型。同时,对于第一边缘设备,在接收到模型之后,可以使用模型进行计算。可以理解的是,第一本地服务器向第一边缘设备发送的至少一个模型中包括前述实施例中的第一模型。
综上,本实施例提供的云服务系统的模型更新方法中,第一本地服务器还具有存储 模型,并确定边缘设备对应的模型的功能,从而进一步减少了云服务器所需要进行的计算,云服务器只需要将训练得到的模型发送给本地服务器,由本地服务器更有针对性地将模型分别下发给对应的边缘设备,还能够让第一边缘设备所使用的模型更精确,提高了第一边缘设备使用模型计算时的精准度,进而进一步提高了整个云服务系统的运行效率。
可选地,为了实现本申请实施例中提供的云服务系统,在实现前述方法之前,供应商还可以对整个云服务系统进行搭建。图8为本申请提供的云服务系统模型处理方法一实施例的流程示意图,如图8所示的实施例示出了如图3所示的云服务系统的搭建流程。
S301、云服务器首先进行云服务器一侧功能的搭建,例如,在一种具体实现中,云服务器可以部署联邦学习服务端。
S302、第一本地服务器向云服务器发送请求信息,请求搭建第一本地服务器。
S303、云服务器根据请求信息,对第一本地服务器进行认证注册。
S304、云服务器在认证注册成功之后,向第一本地服务器发送构建工具和标注工具。其中,构建工具用于第一本地服务器的搭建,标注工具用于对数据集中的数据进行标注。
S305、第一本地服务器根据接收到的构建工具,进行第一本地服务器一侧功能的搭建,例如,第一本地服务器可以部署联邦学习客户端。
S306、第一本地服务器在接收到构建工具之后,可以对数据进行标注,并通过S307对本地数据集进行更新。
具体地,对于S306-S307的流程可以参照图9所示的示例,其中,图9为本申请实施例提供的数据标注的流程示意图,其中,第一本地服务器在接收到所连接的至少一个边缘设备发送的数据集之后,即可开始对数据集中的数据进行标注,记第一本地服务器正在标注的数据为第一数据。
则第一本地服务器手下能通过标注工具,对第一数据进行标注得到多个标注结果。其中,所述标注工具可以是多个预训练模型,例如,云服务器训练得到的多个用于对图像中动物类别为猫或者狗的模型,第一数据为猫或者狗的图像,每个预训练模型都可以对第一数据进行标注得到猫或者狗的结果。随后,第一本地服务器可以对多个预训练模型的结果进行判读,当多个模型的多个标注结果均相同时,则第一本地服务器将第一数据加入本地数据集,本地数据集用于后续第一本地服务器在更新第一模型时使用;当多个模型的多个标注结果不完全相同时,则需要进入人工复检步骤,第一服务器可以向工作人员使用的第一设备发送第一数据,让工作人员对第一数据进行人工标注,随后第一服务器接收到工作人员通过第一设备发送的确认信息后,可以将第一数据加入本地数据集中。此外,若工作人员认为样本异常,第一服务器在接收到工作人员通过第一设备发送的异常信息后,可以将第一数据删除不做后续处理。
可选地,所述本地数据集存储在第一本地服务器中,其他本地服务器无法访问,而第一本地服务器所连接的至少一个边缘设备可以访问。因此,至少一个边缘设备可以通过第一本地服务器实现了数据共享,同时,也能保证上传至本地服务器的数据的安全性。例如,某个公司的所有边缘设备可以连接一个本地服务器,则这个公司内的所有边缘设备处理的数据可以通过上述流程加入本地数据集,本地服务器在更新每个模型时都可以使用本地数据集中的数据,而其他公司无法获取这个公司的数据。此外,本地服务器在 对模型进行更新后,向云服务器发送的也只是更新的梯度值,使用的数据不会发送至网络,从而进一步保证了数据的安全性。
进一步地,本申请提供的云服务系统中,第一本地服务器还具有对模型进行排序的功能。具体地,图10为本申请提供的云服务系统的模型处理方法一实施例的流程示意图,可应用于如图3所示的云服务系统中。
S401:云服务器向第一本地服务器发送预训练的多个模型。其中,云服务器可以将预训练的多个模型全部发送给第一本地服务器,或者,云服务器将第一本地服务器所连接的边缘设备需要使用的多个模型发送给第一本地服务器。
S402:第一本地服务器确定所连接的至少一个边缘设备在使用多个模型进行计算时的性能参数。可选地,所述性能参数可以是计算精度或者计算速度。则S402中,第一本地服务器将统计所有边缘设备使用不同模型时的性能参数。例如,第一本地服务器连接了边缘设备1-5,并统计边缘设备1-3使用模型a计算得到结果的平均时间为0.1秒、统计边缘设备2-5使用模型b计算得到结果的平均时间为0.2秒……等。
S403:第一本地服务器根据S401中确定的多个模型的性能参数,对多个模型进行排序。例如,第一本地服务器计算所连接的边缘设备使用模型a计算时间为0.1秒、使用模型b计算时间为0.2秒……等,则第一本地服务器可以按照计算速度由快到慢的顺序对多个模型进行排序,例如:a,b,……。
S404:第一本地服务器向云服务器发送S402中所确定的多个模型的排序信息。
S405:云服务器根据排序信息,对多个模型进行排序。最终,云服务器可以根据所有连接的本地服务器所发送的排序信息,对云服务器提供的所有模型进行排序。并且在排序之后,可以删除一些排序靠后的模型,并用一些其他的模型替代。在此之后,云服务器可以重复S401的步骤,将更新后的多个模型发送给本地服务器。此时,由于多个模型按照顺序排列,假设边缘设备需要识别图像中动物类别的两个模型时,云服务器可以将更新后排序最前的两个用于识别图像中动物类别的模型发送给本地服务器,由本地服务器发送给边缘设备,保证了边缘设备使用的模型是排序最靠前的、也即计算性能更优的。
综上,本实施例提供的云服务系统的模型更新方法中,本地服务器可以对所连接的边缘设备使用模型的性能参数进行排序,并将排序信息发送给云服务器,云服务器对多个模型进行排序后,可以不断优化云服务器所提供的模型的组成,实现模型的“优胜劣汰”,提高后续边缘设备使用模型计算时的性能,从而进一步提高了整个云服务系统的运行效率。
可选地,如图3所示的云服务系统以云服务器连接多个本地服务作为示例,在具体实现中,云服务器也可以直接连接边缘设备,实现混合部署。例如,图11为本申请提供的另一种云服务系统的结构示意图,其中,在如图3所示实施例的基础上,云服务器3还可以直接与边缘设备1连接,例如图中以标号为6的边缘设备为例,则对于本地服务器5可以与云服务器3协同执行本申请前述实施例中对模型进行更新等处理,而对于云服务器3直接连接的边缘设备6,可以不参与模型的更新,但是在云服务器3对模型进行更新后,除了给本地服务器5发送更新后的模型,由本地服务器5将更新后的模型发送给所连接的边缘设备1,也会给直接连接的边缘设备6发送更新后的模型。因此, 本实施例提供的云服务系统具有较强的部署灵活性,能够一定程度上减少云服务系统中本地服务器的数量。
在前述实施例中,对本申请实施例提供的云服务系统,以及云服务系统的模型处理方法进行了介绍,而为了实现上述本申请实施例提供的云服务系统的模型处理方法中的各功能,作为执行主体的云服务器和第一本地服务器可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。
例如,图12为本申请提供的云服务系统模型处理装置一实施例的结构示意图,如图12所示的装置可作为本申请前述各实施例中的第一本地服务器,并执行第一本地服务器执行的方法。如图12所示的装置120包括:获取模块1201,处理模块1202和传输模块1203。其中,获取模块1201用于获取至少一个边缘设备的数据集,数据集包括至少一个边缘设备使用云服务器提供的第一模型进行计算时使用的数据;处理模块1202用于根据至少一个边缘设备的数据集,确定用于对第一模型进行更新的第一梯度值;传输模块1203用于向云服务器发送第一梯度值。
可选地,传输模块1203还用于,接收云服务器发送的多个模型,并将多个模型存储在存储模块中;处理模块1202还用于,确定至少一个边缘设备中第一边缘设备对应的至少一个模型;传输模块1203还用于,向第一边缘设备发送至少一个模型。
可选地,传输模块1203还用于,接收云服务器发送的构建工具和标注工具;其中,构建工具用于第一本地服务器的搭建,标注工具用于对数据集中的数据进行标注。
可选地,处理模块1202还用于,通过标注工具,对至少一个边缘设备的数据集中的第一数据进行标注得到多个标注结果;并当多个标注结果均相同时,第一本地服务器将第一数据加入本地数据集,本地数据集用于确定用于对第一模型进行更新的第一梯度值;传输模块1203还用于,当多个标注结果不完全相同时,向第一设备发送第一数据,并在接收到第一设备发送的确认信息后,将第一数据加入本地数据集。
可选地,处理模块1202还用于,确定所连接的至少一个边缘设备在使用第一本地服务器所存储的多个模型进行计算时的性能参数,并按照性能参数对多个模型进行排序;传输模块还用于,向云服务器发送多个模型的排序信息。
如图12所示的云服务系统模型处理装置的具体工作方式及原理可参照本申请前述方法中第一本地服务器的描述,不再赘述。
图13为本申请提供的云服务系统模型处理装置一实施例的结构示意图。如图13所示的装置可作为本申请前述各实施例中的云服务器,并执行云服务器执行的方法。如图13所示的装置130包括:传输模块1301,处理模块1302。其中,传输模块1301用于接收第一本地服务器发送的第一梯度值,其中,第一梯度值用于对云服务器提供的第一模型进行更新;处理模块1302用于根据第一梯度值对第一模型进行更新;传输模块还用于,向第一本地服务器发送更新后的第一模型。
可选地,处理模块1302具体用于,根据第一梯度值,以及多个本地服务器中至少一个第二本地服务器发送的梯度值,对第一模型进行更新。
可选地,传输模块1301还用于,向第一本地服务器发送构建工具和标注工具;其 中,构建工具用于第一本地服务器的搭建,标注工具用于对数据集中的数据进行标注。
可选地,传输模块1301还用于,接收第一本地服务器发送的多个模型的排序信息;处理模块还用于,根据多个模型的排序信息,对多个模型进行排序。
如图13所示的云服务系统模型处理装置的具体工作方式及原理可参照本申请前述方法中云服务器的描述,不再赘述。
需要说明的是,应理解以上装置的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块通过处理元件调用软件的形式实现,部分模块通过硬件的形式实现。例如,处理模块可以为单独设立的处理元件,也可以集成在上述装置的某一个芯片中实现,此外,也可以以程序代码的形式存储于上述装置的存储器中,由上述装置的某一个处理元件调用并执行以上确定模块的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件可以是一种集成电路,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(application specific integrated circuit,ASIC),或,一个或多个微处理器(digital signal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA)等。再如,当以上某个模块通过处理元件调度程序代码的形式实现时,该处理元件可以是通用处理器,例如中央处理器(central processing unit,CPU)或其它可以调用程序代码的处理器。再如,这些模块可以集成在一起,以片上系统(system-on-a-chip,SOC)的形式实现。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘solid state disk(SSD))等。
此外,本申请实施例还提供另外一种可应用于实现本申请提供的第一本地服务器或者云服务器的计算装置结构,图14为本申请提供的计算装置的结构示意图,如图14所示,计算装置1400中可以包括通信接口1410、处理器1420。可选的,计算装置1400中还可以包括存储器1430。其中,存储器1430可以设置于计算装置内部,还可以设置于计算装置外部。
示例性地,上述图4-图10中各个第一本地服务器所执行的动作均可以由处理器1420实现。处理器1420通过通信接口1410发送数据,并用于实现图4-图10中所述的第一本地服务器所执行的任一方法。在实现过程中,处理流程的各步骤可以通过处理器1420中的硬件的集成逻辑电路或者软件形式的指令完成图4-图10中所述第一本地服务器所执行的方法。为了简洁,在此不再赘述。处理器1420用于实现上述方法所执行的程序代码可以存储在存储器1430中。存储器1430和处理器1420连接,如耦合连接等。
又示例性地,上述图4-图10中各个云服务器所执行的动作均可以由处理器1420实现。处理器1420通过通信接口1410发送控制信号以及通信数据,并用于实现图4-图10中所述的云服务器所执行的任一方法。在实现过程中,处理流程的各步骤可以通过处理器1420中的硬件的集成逻辑电路或者软件形式的指令完成图4-图10中所述云服务器所执行的方法。为了简洁,在此不再赘述。处理器1420用于实现上述方法所执行的程序代码可以存储在存储器1430中。存储器1430和处理器1420连接,如耦合连接等。
本申请实施例的一些特征可以由处理器1420执行存储器1430中的程序指令或者软件代码来完成/支持。存储器1430上在加载的软件组件可以从功能或者逻辑上进行概括,例如,图12所示的获取模块1201、处理模块1202以及传输模块1302;又例如,图13所示的传输模块1301以及处理模块1302。
本申请实施例中涉及到的任一通信接口可以是电路、总线、收发器或者其它任意可以用于进行信息交互的装置。比如计算装置1400中的通信接口1410,示例性地,该其它装置可以是与该计算装置相连的设备,比如,当计算装置是第一本地服务器时,其他装置可以是云服务器;当计算装置是云服务器是云服务器时,其他装置可以是第一本地服务器。
本申请实施例中涉及的处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
本申请实施例中的耦合是装置、模块或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、模块或模块之间的信息交互。
处理器可能和存储器协同操作。存储器可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
本申请实施例中不限定上述通信接口、处理器以及存储器之间的具体连接介质。比如存储器、处理器以及通信接口之间可以通过总线连接。所述总线可以分为地址总线、数据总线、控制总线等。当然,处理器与存储器之间的连接总线,并非为前述云服务器和第一本地服务器之间的连接总线。
在本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。 “和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系;在公式中,字符“/”,表示前后关联对象是一种“相除”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中,a,b,c可以是单个,也可以是多个。
可以理解的是,在本申请实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请实施例的范围。可以理解的是,在本申请的实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (20)

  1. 一种云服务系统,其特征在于,包括:
    云服务器和多个本地服务器,所述多个本地服务器中的第一本地服务器通过网络与所述云服务器连接,所述第一本地服务器还连接至少一个边缘设备;
    所述第一本地服务器用于:获取所述至少一个边缘设备的数据集,所述数据集包括所述至少一个边缘设备使用所述云服务器提供的第一模型进行计算时使用的数据;根据所述至少一个边缘设备的数据集,确定用于对所述第一模型进行更新的第一梯度值;并将所述第一梯度值发送至所述云服务器;
    所述云服务器用于:根据所述第一梯度值对所述第一模型进行更新,并向所述第一本地服务器发送更新后的所述第一模型。
  2. 根据权利要求1所述的系统,其特征在于,
    所述云服务器还用于,向所述云服务器发送多个模型;
    所述第一本地服务器还用于:接收并存储所述云服务器发送的多个模型;确定所述至少一个边缘设备中第一边缘设备对应的至少一个模型;并向所述第一边缘设备发送所述至少一个模型。
  3. 根据权利要求1或2所述的系统,其特征在于,
    所述云服务器还用于:向所述第一本地服务器发送构建工具和标注工具;其中,所述构建工具用于所述第一本地服务器的搭建,所述标注工具用于对所述数据集中的数据进行标注。
  4. 根据权利要求3所述的系统,其特征在于,
    所述第一本地服务器还用于:确定所连接的所述至少一个边缘设备在使用所述第一本地服务器所存储的多个模型进行计算时的性能参数,并按照所述性能参数对所述多个模型进行排序;向所述云服务器发送所述多个模型的排序信息;
    所述云服务器用于:根据所述多个模型的排序信息,对所述多个模型进行排序。
  5. 根据权利要求1-4任一项所述的系统,其特征在于,
    所述云服务器具体用于:根据所述第一梯度值,以及所述多个本地服务器中至少一个第二本地服务器发送的梯度值,对所述第一模型进行更新。
  6. 一种云服务系统的模型处理方法,其特征在于,
    所述云服务系统包括云服务器和多个本地服务器,所述多个本地服务器中的第一本地服务器通过网络与所述云服务器连接,所述第一本地服务器还连接至少一个边缘设备;
    所述方法包括:
    所述第一本地服务器获取所述至少一个边缘设备的数据集,所述数据集包括所述至少一个边缘设备使用所述云服务器提供的第一模型进行计算时使用的数据;
    所述第一本地服务器根据所述至少一个边缘设备的数据集,确定用于对所述第一模型进行更新的第一梯度值;
    所述第一本地服务器向所述云服务器发送所述第一梯度值。
  7. 根据权利要求6所述的方法,其特征在于,所述第一本地服务器获取所述至少 一个边缘设备的数据集之前,所述方法还包括:
    所述第一本地服务器接收并存储所述云服务器发送的多个模型;
    所述第一本地服务器确定所述至少一个边缘设备中第一边缘设备对应的至少一个模型;
    所述第一本地服务器向所述第一边缘设备发送所述至少一个模型。
  8. 根据权利要求6或7所述的方法,其特征在于,所述第一本地服务器获取所述至少一个边缘设备的数据集之前,所述方法还包括:
    所述第一本地服务器接收所述云服务器发送的构建工具和标注工具;其中,所述构建工具用于所述第一本地服务器的搭建,所述标注工具用于对所述数据集中的数据进行标注。
  9. 根据权利要求8所述的方法,其特征在于,所述第一本地服务器获取所述至少一个边缘设备的数据集之后,所述方法还包括:
    所述第一本地服务器通过所述标注工具,对所述至少一个边缘设备的数据集中的第一数据进行标注得到多个标注结果;
    当所述多个标注结果均相同时,所述第一本地服务器将所述第一数据加入本地数据集,所述本地数据集用于确定用于对所述第一模型进行更新的第一梯度值;
    当所述多个标注结果不完全相同时,所述第一本地服务器向第一设备发送所述第一数据,并在接收到所述第一设备发送的确认信息后,将所述第一数据加入所述本地数据集。
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    所述第一本地服务器确定所连接的所述至少一个边缘设备在使用所述第一本地服务器所存储的多个模型进行计算时的性能参数,并按照所述性能参数对所述多个模型进行排序;
    所述第一本地服务器向所述云服务器发送所述多个模型的排序信息。
  11. 一种云服务系统的模型处理方法,其特征在于,
    所述云服务系统包括云服务器和多个本地服务器,所述多个本地服务器中的第一本地服务器通过网络与所述云服务器连接,所述第一本地服务器还连接至少一个边缘设备;
    所述方法包括:
    所述云服务器接收所述第一本地服务器发送的第一梯度值,其中,所述第一梯度值用于对所述云服务器提供的第一模型进行更新;
    所述云服务器根据所述第一梯度值对所述第一模型进行更新;
    所述云服务器向所述第一本地服务器发送更新后的所述第一模型。
  12. 根据权利要求11所述的方法,其特征在于,所述云服务器根据所述第一梯度值对所述第一模型进行更新,包括:
    所述云服务器根据所述第一梯度值,以及所述多个本地服务器中至少一个第二本地服务器发送的梯度值,对所述第一模型进行更新。
  13. 根据权利要求11或12所述的方法,其特征在于,所述云服务器接收所述第一本地服务器发送的第一梯度值之前,所述方法还包括:
    所述云服务器向所述第一本地服务器发送构建工具和标注工具;其中,所述构建工具用于所述第一本地服务器的搭建,所述标注工具用于对数据集中的数据进行标注。
  14. 根据权利要求11-13任一项所述的方法,其特征在于,所述方法还包括:
    所述云服务器接收所述第一本地服务器发送的多个模型的排序信息;
    所述云服务器根据所述多个模型的排序信息,对所述多个模型进行排序。
  15. 一种云服务系统的模型处理装置,其特征在于,包括:
    获取模块,用于获取至少一个边缘设备的数据集,所述数据集包括所述至少一个边缘设备使用云服务器提供的第一模型进行计算时使用的数据;
    处理模块,用于根据所述至少一个边缘设备的数据集,确定用于对所述第一模型进行更新的第一梯度值;
    传输模块,用于向所述云服务器发送所述第一梯度值。
  16. 根据权利要求15所述的装置,其特征在于,
    所述传输模块还用于,接收所述云服务器发送的构建工具和标注工具;其中,所述构建工具用于第一本地服务器的搭建,所述标注工具用于对所述数据集中的数据进行标注。
  17. 根据权利要求16所述的装置,其特征在于,
    所述处理模块还用于,确定所连接的所述至少一个边缘设备在使用所述第一本地服务器所存储的多个模型进行计算时的性能参数,并按照所述性能参数对所述多个模型进行排序;
    所述传输模块还用于,向所述云服务器发送所述多个模型的排序信息。
  18. 一种云服务系统的模型处理装置,其特征在于,包括:
    传输模块,用于接收第一本地服务器发送的第一梯度值,其中,所述第一梯度值用于对云服务器提供的第一模型进行更新;
    处理模块,用于根据所述第一梯度值对所述第一模型进行更新;
    所述传输模块还用于,向所述第一本地服务器发送更新后的所述第一模型。
  19. 根据权利要求18所述的装置,其特征在于,
    所述传输模块还用于,向所述第一本地服务器发送构建工具和标注工具;其中,所述构建工具用于所述第一本地服务器的搭建,所述标注工具用于对数据集中的数据进行标注。
  20. 根据权利要求18或19所述的装置,其特征在于,
    所述传输模块还用于,接收所述第一本地服务器发送的多个模型的排序信息;
    所述处理模块还用于,根据所述多个模型的排序信息,对所述多个模型进行排序。
PCT/CN2021/092942 2020-07-17 2021-05-11 云服务系统的模型处理方法及云服务系统 WO2022012129A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21843138.5A EP4174736A4 (en) 2020-07-17 2021-05-11 MODEL PROCESSING METHOD FOR CLOUD SERVICE SYSTEM AND CLOUD SERVICE SYSTEM
US18/152,970 US20230164030A1 (en) 2020-07-17 2023-01-11 Model processing method for cloud service system and cloud service system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010699825.2 2020-07-17
CN202010699825.2A CN113946434A (zh) 2020-07-17 2020-07-17 云服务系统的模型处理方法及云服务系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/152,970 Continuation US20230164030A1 (en) 2020-07-17 2023-01-11 Model processing method for cloud service system and cloud service system

Publications (1)

Publication Number Publication Date
WO2022012129A1 true WO2022012129A1 (zh) 2022-01-20

Family

ID=79326906

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/092942 WO2022012129A1 (zh) 2020-07-17 2021-05-11 云服务系统的模型处理方法及云服务系统

Country Status (4)

Country Link
US (1) US20230164030A1 (zh)
EP (1) EP4174736A4 (zh)
CN (1) CN113946434A (zh)
WO (1) WO2022012129A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107707657A (zh) * 2017-09-30 2018-02-16 苏州涟漪信息科技有限公司 基于多传感器的安全监护系统
US20190138693A1 (en) * 2017-11-09 2019-05-09 General Electric Company Methods and apparatus for self-learning clinical decision support
CN111008709A (zh) * 2020-03-10 2020-04-14 支付宝(杭州)信息技术有限公司 联邦学习、资料风险评估方法、装置和系统
CN111324440A (zh) * 2020-02-17 2020-06-23 深圳前海微众银行股份有限公司 自动化流程的执行方法、装置、设备及可读存储介质
CN111369009A (zh) * 2020-03-04 2020-07-03 南京大学 一种能容忍不可信节点的分布式机器学习方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107707657A (zh) * 2017-09-30 2018-02-16 苏州涟漪信息科技有限公司 基于多传感器的安全监护系统
US20190138693A1 (en) * 2017-11-09 2019-05-09 General Electric Company Methods and apparatus for self-learning clinical decision support
CN111324440A (zh) * 2020-02-17 2020-06-23 深圳前海微众银行股份有限公司 自动化流程的执行方法、装置、设备及可读存储介质
CN111369009A (zh) * 2020-03-04 2020-07-03 南京大学 一种能容忍不可信节点的分布式机器学习方法
CN111008709A (zh) * 2020-03-10 2020-04-14 支付宝(杭州)信息技术有限公司 联邦学习、资料风险评估方法、装置和系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4174736A4

Also Published As

Publication number Publication date
EP4174736A1 (en) 2023-05-03
CN113946434A (zh) 2022-01-18
EP4174736A4 (en) 2023-12-20
US20230164030A1 (en) 2023-05-25

Similar Documents

Publication Publication Date Title
US10360257B2 (en) System and method for image annotation
US9009648B2 (en) Automatic deadlock detection and avoidance in a system interconnect by capturing internal dependencies of IP cores using high level specification
US20230042747A1 (en) Message Processing Method and Device, Storage Medium, and Electronic Device
US8762125B2 (en) Emulated multi-tasking multi-processor channels implementing standard network protocols
US8095495B2 (en) Exchange of syncronization data and metadata
US11218403B2 (en) Methods, devices and systems for determining a target path in a network
CN108027789A (zh) 具有多级仲裁的互连件的服务质量
WO2017107868A1 (zh) 实现移动终端上网流量借用的方法、系统、程序及介质
US20170163533A1 (en) Forwarding Packet In Stacking System
US10193956B2 (en) Grouping and transferring omic sequence data for sequence analysis
WO2022012576A1 (zh) 路径规划方法、装置、路径规划设备及存储介质
US10805241B2 (en) Database functions-defined network switch and database system
WO2022012129A1 (zh) 云服务系统的模型处理方法及云服务系统
TW200939047A (en) Processor-server hybrid system for processing data
CN111970497A (zh) 视频流处理方法、装置、sdn控制器及存储介质
CN115080771A (zh) 基于人工智能的数据处理方法及装置、介质、网关设备
CN103999435A (zh) 用于高效的网络地址转换和应用层网关处理的装置和方法
CN105095248A (zh) 一种数据库集群系统及其恢复方法、管理节点
WO2022057355A1 (zh) 数据包的识别方法及装置
WO2023093065A1 (zh) 数据传输方法、计算设备及计算系统
WO2024114332A1 (zh) 通信方法、装置、电子设备和计算机可读介质
WO2023206049A1 (zh) Ai服务执行方法、装置、网元、存储介质及芯片
WO2024077999A1 (zh) 集合通信方法及计算集群
WO2023179741A1 (zh) 一种计算系统以及数据传输方法
WO2021057844A1 (zh) 一种创建pm任务的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21843138

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021843138

Country of ref document: EP

Effective date: 20230125

NENP Non-entry into the national phase

Ref country code: DE