CN114153525A - AI model service sharing method and system for power grid regulation and control business - Google Patents

AI model service sharing method and system for power grid regulation and control business Download PDF

Info

Publication number
CN114153525A
CN114153525A CN202111447806.1A CN202111447806A CN114153525A CN 114153525 A CN114153525 A CN 114153525A CN 202111447806 A CN202111447806 A CN 202111447806A CN 114153525 A CN114153525 A CN 114153525A
Authority
CN
China
Prior art keywords
model
service
characteristic value
warehouse
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111447806.1A
Other languages
Chinese (zh)
Other versions
CN114153525B (en
Inventor
沈嘉灵
李佳阳
陈佳佳
闫妮
张瑞智
王宇冬
陈子韵
徐丽燕
李�昊
季学纯
张珂珩
劳莹莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nari Technology Co Ltd
Original Assignee
Nari Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nari Technology Co Ltd filed Critical Nari Technology Co Ltd
Priority to CN202111447806.1A priority Critical patent/CN114153525B/en
Publication of CN114153525A publication Critical patent/CN114153525A/en
Application granted granted Critical
Publication of CN114153525B publication Critical patent/CN114153525B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses an AI model service sharing method and system for power grid regulation and control business, which comprises the following steps: the method comprises the steps of obtaining an AI model warehouse, wherein the AI model warehouse comprises a plurality of model sets, and a plurality of AI models of the same type are stored in each model set; model online service for creating an AI model in a cluster system by taking a model set in an AI model warehouse as granularity; storing the AI model in the AI model warehouse into a cluster system by starting the model online service; and loading the AI model from the cluster system according to an external request. The invention adopts a batch access AI model mode, reduces network transmission frequency, effectively improves access efficiency, provides AI model service integrated release based on Kubernetes, realizes rapid and smooth upgrade and capacity expansion, improves hardware resource utilization rate, is convenient, rapid, safe, reliable and low in cost, and constructs an ecological environment shared by AI model achievements in a regulation and control system.

Description

AI model service sharing method and system for power grid regulation and control business
Technical Field
The invention relates to an AI model service sharing method and system for power grid regulation and control business, and belongs to the technical field of power grid regulation and control.
Background
At present, the artificial intelligence technology in the technical field of power grid regulation and control obtains primary results, but various business applications mostly adopt an AI model independent management mode, the problems of repeated construction of bottom layer resources, insufficient compression and conversion capability of various algorithm framework models, incapability of tracing model sources and versions, weak heterogeneous system model deployment and service release capability and the like exist, and no co-construction shared ecological environment is formed by applying AI model results.
According to the development trend of the current power grid regulation and control business, an artificial intelligence model full-life-cycle management framework and system are urgently needed to be provided, and storage, version control, model retrieval, cloud-edge-end model deployment as required and model reasoning capability servitization sharing of the artificial intelligence model are achieved.
Disclosure of Invention
In order to solve the problems that AI model management is weak and models cannot be shared in the prior art, the invention provides a power grid regulation and control business-oriented AI model service sharing method and system, wherein the AI model is uniformly stored and version-managed, model calling is carried out through a cluster system, business application AI model management cost is reduced, management efficiency is improved, and co-construction and sharing of application AI model results in a power grid regulation and control system are realized.
In order to solve the technical problems, the invention adopts the following technical means:
in a first aspect, the invention provides an AI model service sharing method for power grid regulation and control service, which comprises the following steps:
the method comprises the steps of obtaining an AI model warehouse, wherein the AI model warehouse comprises a plurality of model sets, and a plurality of AI models of the same type are stored in each model set;
model online service for creating an AI model in a cluster system by taking a model set in an AI model warehouse as granularity;
storing the AI model in the AI model warehouse into a cluster system by starting the model online service;
and loading the AI model from the cluster system according to an external request.
With reference to the first aspect, further, each model set in the AI model repository is configured with a unique model set ID, the same model set includes one or more model set versions, and each model version is configured with a unique model set version ID; each AI model in the AI model repository is configured with a unique model UID.
With reference to the first aspect, further, a model set information table, a model set version information table, and an AI model information table, all in a two-dimensional table form, are provided in the AI model warehouse; the model set information table comprises a model set ID, a model set name, a model type and a model source; the model set version information table comprises a model set ID, a model set version ID and updating time; the AI model information table includes a model UID, a model set ID, a model set version ID, and an update time.
With reference to the first aspect, further, the method for constructing the AI model warehouse includes:
acquiring a plurality of AI models through power grid regulation and control service training, and temporarily storing the acquired AI models in a memory block form;
calculating the characteristic value of the memory block by using a checking algorithm, and recording the characteristic value as a first characteristic value;
the memory block and the first characteristic value are sent to a file management service of the server together;
the file management service calculates the characteristic value of the memory block again by using a check algorithm and records the characteristic value as a second characteristic value;
and judging whether the first characteristic value is the same as the second characteristic value, if so, acquiring the AI model from the memory block, and storing the AI model into a model set corresponding to an AI model warehouse.
With reference to the first aspect, further, the method for creating a model online service includes:
acquiring a service name, an instance number, a CPU, a GPU and a memory;
according to the model set ID, the model set version ID and the service name, setting a cluster label of the model online service as 'app ═ service name-cluster ID';
calling Kubernetes API to create a Deployment element, wherein a label of the Deployment element is a cluster label;
creating a replicase through the Deployment, and creating a Pod with the number equal to the number of instances in a system background by using the replicase;
calling a Kubernetes API to create Service, and setting the selector of the Service to be consistent with the label of the Deployment;
calling the Kubernetes API to create Ingress and associating the Ingress with Service.
With reference to the first aspect, further, the method for saving the AI model in the AI model repository to the cluster system by starting the model online service includes:
acquiring a corresponding model UID list from an AI model warehouse according to the model set ID and the model set version ID;
acquiring AI models in batches from an AI model warehouse through file management service according to the model UID list, and temporarily storing the acquired AI models in a memory block form;
calculating the characteristic value of the memory block by using a checking algorithm, and recording the characteristic value as a first characteristic value;
sending the first characteristic value and the memory block to a cluster system through a file management service;
the cluster system calculates the characteristic value of the memory block again by using a verification algorithm and records the characteristic value as a second characteristic value;
and judging whether the first characteristic value is the same as the second characteristic value, and if so, storing the memory block to the cluster system.
With reference to the first aspect, in the cluster system, blue-green cluster deployment and upgrade of the model online service are further performed by modifying backhaul configuration information of the Ingress object.
With reference to the first aspect, further, in the cluster system, a kubernets API is called to modify a replicates parameter of the Deployment, so as to increase or decrease the number of Pod; the Kubernetes API is called to modify the requests and limits parameters of the Deployment, thereby modifying the CPU, GPU and memory of each Pod.
With reference to the first aspect, further, the method for loading an AI model from a cluster system according to an external request includes:
acquiring an http request in a cluster external URL format through Ingress, and performing rule matching on the received http request according to a preset rule list;
after the rule matching is successful, the Ingress forwards the http request to the Service according to the Service name and the port number;
the Service proxies the http request to the corresponding Pod of the Deployment according to the selector;
and analyzing the http request by the model online service, and loading the AI model in the cluster system by using a model loading mode corresponding to each AI algorithm frame according to the analyzed model UID.
In a second aspect, the present invention provides an AI model service sharing system for power grid regulation and control service, including:
the AI model warehouse is used for storing AI models in the form of model sets, and each model set stores the same type of AI models;
the model management module is used for accessing the AI models in batches, updating the AI models and managing model set information and model set versions in the AI model warehouse;
the model online service deployment module is used for deploying the model online service of the AI model in the cluster system by taking the model set in the AI model warehouse as granularity;
the service starting module is used for storing the AI model in the AI model warehouse into the cluster system by starting the model on-line service;
and the model loading module loads the AI model from the cluster system according to the external request.
The following advantages can be obtained by adopting the technical means:
the invention provides an AI model service sharing method and system for power grid regulation and control service, wherein AI model building in the power grid regulation and control field is realized by converging, regulating and controlling and applying AI model results through an AI model warehouse; aiming at the characteristics of small and large quantity of AI model files applied to the power grid regulation and control service, the invention manages by taking the model set as granularity, supports batch persistent storage and AI model acquisition, reduces the network transmission frequency and effectively improves the model access efficiency; the invention provides AI model service integrated release based on Kubernetes, can rapidly and smoothly upgrade and expand the model, improve the utilization rate of hardware resources and realize the sharing of the AI model capability of a cross-scheduling mechanism and a service system; through blue-green deployment and upgrading of the model online service, uninterrupted external service provision is guaranteed, and as long as the service of the old version is not deleted, the model online service corresponding to the old version can be switched to at any time, so that the risk of upgrading the AI model is effectively reduced.
The method and the system can obviously reduce the communication cost and the labor cost of business application on the AI model management, effectively improve the model management efficiency and realize the co-construction and sharing of the AI model application results in the regulation and control system.
Drawings
FIG. 1 is a flowchart illustrating steps of an AI model service-oriented sharing method for power grid regulation and control business according to the present invention;
FIG. 2 is a flow chart of AI model management and servicing of shared data in an embodiment of the invention;
FIG. 3 is a flow chart of creating a model online service in an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the accompanying drawings as follows:
in order to realize co-construction and sharing of AI models in a power grid dispatching system, the invention provides an AI model management method facing to the power grid regulation and control field, wherein the AI models in the power grid regulation and control field are stored and managed by taking a model set as granularity, and an AI model warehouse is constructed, and the specific operations are as follows:
s1, a user creates a plurality of model sets as required, each model set is configured with a unique model set ID, each model set only stores AI models of the same type, and the correspondence between common artificial intelligence algorithm frames and model file types is shown in Table 1.
TABLE 1
Figure BDA0003384532380000061
Figure BDA0003384532380000071
And S2, acquiring a plurality of AI models through power grid regulation and control service training, configuring each AI model with a unique model UID, and temporarily storing the acquired AI models in a memory block form.
And S3, calculating the characteristic value of the memory block by using a checking algorithm, recording the characteristic value as a first characteristic value, and sending the memory block and the first characteristic value to the file management service of the server together.
S4, the file management service calculates the characteristic value of the memory block again by using the verification algorithm and records the characteristic value as a second characteristic value.
And S5, judging whether the first characteristic value is the same as the second characteristic value, if so, acquiring the AI model from the memory block, and storing the AI model into a model set corresponding to the AI model warehouse.
In embodiments of the present invention, the checking algorithm includes, but is not limited to, CRC-32, CRC-64, SHA-1, SHA-256, MD5, and the like.
The invention can control the version of the model set, the same model set can comprise one or more model set versions, and each model version is configured with a unique model set version ID. When any AI model in a model set is updated, a new version of the model set is generated, the AI model can be automatically updated or manually updated, that is, the new version of the model set can be automatically created or manually created, the manual updating process of the AI model is generally that a user actively uploads a new AI model or modifies the parameters of the existing AI model and retrains, and the automatic updating process of the AI model is as follows: the AI models are retrained according to different periods such as days, weeks, months, quarters, years and the like according to different application requirements of the power grid regulation and control service, batch AI model updating is carried out, manual maintenance is replaced by an internal automatic updating mode, communication cost and labor cost can be reduced, and management efficiency is effectively improved.
In the AI model warehouse, model set information, model set version information, and AI model information are stored in a two-dimensional table format and are respectively recorded as a model set information table, a model set version information table, and an AI model information table. The model set information comprises a model set ID, a model set name, a model type, a model source and the like; where the model sources may be platform-customized or user-customized. The model set version information table comprises a model set ID, a model set version ID, version updating time and the like; the AI model information includes a model UID, a model set ID, a model set version ID, an update time, and the like.
In the field of power grid regulation and control, hundreds of AI models of the same type need to be constructed for different power grid devices, model files are relatively small but the number of the model files is relatively large, and the model sets are used as granularity for management, so that the communication cost and the labor cost can be effectively reduced, and the management efficiency is improved. In addition, the invention adopts the mode of storing AI models persistently in batches, reduces the network transmission frequency, and effectively improves the transmission and storage efficiency
According to the constructed AI model warehouse, the invention provides an AI model service sharing method facing to power grid regulation and control business, as shown in fig. 1 and 2, the method specifically comprises the following steps:
and A, acquiring an AI model warehouse.
B, establishing model online service of the AI model in the cluster system by taking the model set in the AI model warehouse as granularity; the model online service may be automatically created or manually created, and the model online service may provide functions of model prediction, manual or automatic update, and the like to the outside, as shown in fig. 3, specifically including the following steps:
and step B01, acquiring machine resources required by the service name, the number of instances, the CPU, the GPU, the memory and the like, wherein the required machine resources are generally set manually.
For any model set in the AI model warehouse, only one model online service with a unique name exists, the service name corresponds to the ID of the model set, and the service name is stored in a two-dimensional table form. One model set version has only one cluster, and since one model set may include multiple versions, for any model online service, it is supported that there are multiple clusters (i.e., model set versions), but only one cluster in the model online service is accessible outside the cluster through the URL, the cluster being a green cluster and the other clusters being blue clusters. The cluster ID belongs to UID, and is stored in a two-dimensional table form together with the model set ID, the model set version ID, the number of instances, the CPU, the GPU, the memory, the cluster state and the like, so that the integrity of the mapping relation is ensured.
In the embodiment of the present invention, it is assumed that the model set is G1, the version of the model set is v1, the model online service name test, the cluster ID is d1, the number of instances is 2, the number of CPU cores is 2, the number of GPU cores is 0, and the memory size is 2G.
And step B02, setting the cluster label of the model online service as 'app ═ service name-cluster ID' according to the model set ID, the model set version ID and the service name.
Yaml file, and calling kubernets API to create the Deployment, wherein the label of the Deployment is the cluster label in step B02.
Kubernetes uses Deployment to create a repliaset that creates, in the background, a number of Pod equal to the set number of instances. Each Pod corresponds to a container created based on the basic mirror image, the container can start model online service, provides functions of model prediction, manual or automatic updating and the like for the outside, and calls a corresponding function each time an http request is received.
The basic mirror is internally pre-installed with Java/Python modules required by model online services such as Spark, TensorFlow, Keras, Scikit-leern and the like.
In the embodiment of the present invention, the name of the default object created by calling the kubernets API is "test-d 1", the label is "app ═ test-d 1", and assuming that the externally exposed port number of the model publishing service is 12345, examples of the default.
Figure BDA0003384532380000101
Step B04, creating a service.yaml file, calling a Kubernets API to create Service, and making the selector of Service consistent with the label setting of Deployment, so that the group of Pod created in step B03 can be accessed by the Service.
In this embodiment, calling the kubernets API creates a Service name "test-d 1" with the selector "app" test-d1 "that will proxy the request onto the Pod created at step B03, an example of a service.yaml file is as follows:
Figure BDA0003384532380000102
and B05, creating an Ingress.yaml file, calling a Kubernets API to create Ingress, and associating the Ingress with Service.
The Service and the Pod can only be accessed in the Kubernet cluster internal network through the IP address and cannot be accessed outside the cluster, so the invention realizes the function of sending the http request to the appointed Pod through the URL outside the cluster through Ingress. The Ingress sets a rule list of inbound requests, and can match rules after receiving http requests and forward traffic to Service names and port numbers.
In the embodiment of the present invention, an Ingress object named "test" is created, which matches the http rule/modelService/test and forwards the traffic to the 12345 port of the Service object named "test-d 1" created in step B04. Examples of ingress. yaml files are as follows:
Figure BDA0003384532380000111
step C, saving the AI model in the AI model warehouse into the cluster system by starting the model online service, wherein the specific operation is as follows:
and step C01, starting the model online service, and acquiring a corresponding model UID list from the AI model warehouse according to the model set ID and the model set version ID corresponding to the model online service.
And searching the model set information table, the model set version information table and the AI model information table in the AI model warehouse according to the model set ID and the model set version ID, and obtaining all AI model UIDs meeting the requirements to form a model UID list.
And step C02, acquiring the AI models in batch from the AI model warehouse through the file management service according to the model UID list, and temporarily storing the acquired AI models in a memory block form.
Step C03, calculating the characteristic value of the memory block by using a checking algorithm, and recording the characteristic value as a first characteristic value; and sending the first characteristic value and the memory block to the cluster system through the file management service.
Step C04, the cluster system calculates the eigenvalue of the memory block again by using the check algorithm, records it as the second eigenvalue, and determines whether the first eigenvalue is the same as the second eigenvalue, if so, the memory block is saved in the cluster system. The AI model is temporarily stored in the cluster system, and the machine resources are deleted and released when the online service cluster of the model is stopped.
After the model online service is started, the model online service is externally issued, and the outside can request the model online service.
By repeating the steps C01-C04, the user can automatically update the AI model through external URL request and internal, and the full model set can be updated to the appointed version and any AI model of any version can be updated. And the one-stop type AI model is upgraded, and meanwhile, the requirements of manual updating and automatic periodic updating of the AI model under different scenes of service application are met.
For the same model online service, if a new model set version appears or a model set version corresponding to the service needs to be replaced, blue-green cluster deployment and upgrading of the model online service can be carried out by modifying backhaul configuration information of an Ingress object. Blue green deploys in the upgrading process, the model online service is always online, and uninterrupted external service can be guaranteed. Meanwhile, in the process of online of the new version, any content of the old version is not modified, the state of the old version is not affected during deployment, and as long as the service of the old version is not deleted, the model online service corresponding to the old version can be switched to at any time, so that the risk of upgrading the AI model is reduced.
In the embodiment of the present invention, assuming that the new model set version is v2 (corresponding to v1 above), the number of instances is 1, the number of CPU cores is 2, the number of GPU cores is 0, and the size of the memory is 2G, the specific operations of the deployment and upgrade of the cyan cluster are as follows:
(1) according to configuration information such as a model set, a model set version, a service name, an instance number, a CPU, a GPU, a memory and the like, a new cluster label 'app ═ test-d 2' is set, a default object name created by calling a Kubernets API is 'test-d 2', a label is 'app ═ test-d 2', and the same number of pots as the instance number are utilized, at this time, the group of pots is a blue cluster, and the group of pots created in the step B03 is a green cluster.
(2) Yaml file, call kubernets API to create Service name "test-d 2", and Service selector "app" test-d2 ", which proxies the request to the Pod created in step (1).
(3) And calling a Kubernetes API to modify the configuration information of the Ingress object created in the step B05, and realizing service flow switching.
And (3) modifying the backup configuration information of the Ingress object of the 'test', and forwarding the traffic to the 12345 port of the Service object named 'test-d 2' created in the step (2). After the switching, the group of Pod created in step (1) becomes a green cluster, and the group of Pod created in step B03 becomes a blue cluster.
In a cluster system, when the load capacity of a model service has higher requirements, a Kubernets API can be called to modify the replicas parameters of the Deployment to increase the number of the Pods, and the requests and the limits parameters can be modified according to the use condition of cluster resources to modify the resources such as a CPU, a GPU and a memory of each Pod.
And D, loading an AI model from the cluster system according to an external request, completing model calling, and providing functions of model prediction and the like to the outside.
And D01, acquiring the http request in the external URL format of the cluster through Ingress, and performing rule matching on the received http request according to a preset rule list.
And D02, after the rule matching is successful, the Ingress forwards the http request to the Service according to the Service name and the port number.
And D03, the Service proxies the http request to the corresponding Pod of the Deployment according to the selector, and the model online Service in the container on the Pod can receive the http request.
And D04, analyzing the http request by the model online service, loading the AI model in the cluster system by using a model loading mode corresponding to each AI algorithm frame according to the analyzed model UID, and returning a model prediction result to finish the calling of the AI model.
The external part can access a version model set through the URL, and can call a plurality of AI models in the version model set at the same time.
The method of the invention also comprises the following steps:
step E: stopping the unused model online service cluster according to the cluster ID, specifically operating as follows:
step E01: calling the Kubernetes API deletes the Deployment named "service name-cluster ID". In this embodiment, the Kubernets API is called to delete the Delployment named "test-d 1".
Step E02: the Kubernetes API is called to delete the tag as "app ═ service name-cluster ID" repliaset. In the present embodiment, the kubernets API is called to delete the repliaset labeled "app test-d 1".
Step E03: invoking the kubernets API deletes Pod tagged "app ═ service name-cluster ID". In this embodiment, the kubernets API is called to delete Pod labeled "app test-d 1".
Step E04: calling the Kubernetes API deletes the name "Service name-cluster ID" Service. In this embodiment, the Kubernets API is called to delete Service named "test-d 1".
Step E05: and deleting the model online service cluster configuration information according to the cluster ID. In this embodiment, the model online service cluster configuration information with cluster ID d1 is deleted.
Step F: according to the online service of the service name deletion model, the specific operation is as follows:
step F01: and querying all cluster lists according to the service name.
Step F02: and E01-E05 are repeated, and all clusters are deleted in sequence. In this embodiment, the kubernets API is called to delete the Deployment named "test-d 2", the repliaset labeled "app-test-d 2", the Pod labeled "app-test-d 2", and the Service named "test-d 2", and delete the model online Service cluster configuration information with cluster ID d 2.
Step F03: calling the Kubernetes API deletes the name "service name" Ingress. In this embodiment, a call to the kubernets API deletes Ingress named "test".
Step F04: and deleting the model online service information according to the service name. In this embodiment, the model online service information having the service name test is deleted.
The invention provides AI model service integrated release based on Kubernetes, realizes rapid and smooth upgrade and capacity expansion, improves the utilization rate of hardware resources, realizes the capability sharing of the AI model of a cross-scheduling mechanism and a service system, can obviously reduce the communication cost and the labor cost of service application on AI model management, effectively improves the management efficiency, and realizes the co-construction and sharing of the AI model result in a regulation and control system.
The invention also provides an AI model service sharing system for the power grid regulation and control business, which mainly comprises an AI model warehouse, a model management module, a model on-line service deployment module, a service starting module and a model loading module.
The AI model warehouse is mainly used for storing AI models in the form of model sets, and each model set stores the same type of AI models.
The model management module is mainly used for accessing the AI models in batches, updating the AI models and managing model set information and model set versions in the AI model warehouse.
The model management module comprises the following sub-modules:
a module M101: and the model set management submodule provides functions of adding, deleting and modifying model set information and version control of the model set.
The module M102: and the AI model batch access submodule is used for batch persistent storage and acquisition of the AI models and generating characteristic values according to a verification algorithm.
The module M103: and the AI model updating submodule supports http request and automatically and periodically updates the full model set to the specified version and any AI model of any version.
The model online service deployment module is mainly used for deploying the model online service of the AI model in the cluster system by taking the model set in the AI model warehouse as granularity, and the specific operation of the model online service deployment module is consistent with the step B of the method.
The model online service deployment module comprises the following sub-modules:
module M201: creating a submodule by the Deployment, setting machine resources required by a service name, an instance number, a CPU, a GPU, a memory and the like according to a specified model set and a model set version, creating a Deployment file, and calling a Kubernets API to create the Deployment.
The module M202: and the Service creating submodule creates a service.yaml file and calls a Kubernets API to create the Service.
Module M203: and the Ingress creating submodule creates an Ingress.
A module M204: and the flow switching submodule switches the blue-green cluster ID according to the requirement and calls a Kubernetes API to modify Ingress configuration information.
The module M205: the cluster state query submodule queries the states of all Pod in the cluster according to the cluster ID, and the common Pod states are shown in table 2:
TABLE 2
Service state name Service state description
In creation ContainerCreating
Operation of Running
Stop Terminating
Wait for Pending
Failure of Fail
The service starting module is mainly used for storing the AI model in the AI model warehouse into the cluster system by starting the online service of the model, and the specific operation is consistent with the step C of the method.
The model loading module is mainly used for loading the AI model from the cluster system according to the external request, and the specific operation is consistent with the step D of the method.
The method and the system can obviously reduce the communication cost and the labor cost of business application on the AI model management, effectively improve the model management efficiency and realize the co-construction and sharing of the AI model application results in the regulation and control system.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. An AI model service sharing method oriented to power grid regulation and control business is characterized by comprising the following steps:
the method comprises the steps of obtaining an AI model warehouse, wherein the AI model warehouse comprises a plurality of model sets, and a plurality of AI models of the same type are stored in each model set;
model online service for creating an AI model in a cluster system by taking a model set in an AI model warehouse as granularity;
storing the AI model in the AI model warehouse into a cluster system by starting the model online service;
and loading the AI model from the cluster system according to an external request.
2. The AI model service-oriented sharing method for the power grid regulation and control business as claimed in claim 1, wherein each model set in the AI model repository is configured with a unique model set ID, the same model set comprises one or more model set versions, and each model version is configured with a unique model set version ID; each AI model in the AI model repository is configured with a unique model UID.
3. The AI model service-oriented sharing method for the power grid regulation and control business as claimed in claim 2, wherein the AI model warehouse is provided with a model set information table, a model set version information table and an AI model information table, all in the form of two-dimensional tables; the model set information table comprises a model set ID, a model set name, a model type and a model source; the model set version information table comprises a model set ID, a model set version ID and updating time; the AI model information table includes a model UID, a model set ID, a model set version ID, and an update time.
4. The AI model service-oriented sharing method for power grid regulation and control business as claimed in claim 1, wherein the AI model warehouse is constructed by:
acquiring a plurality of AI models through power grid regulation and control service training, and temporarily storing the acquired AI models in a memory block form;
calculating the characteristic value of the memory block by using a checking algorithm, and recording the characteristic value as a first characteristic value;
the memory block and the first characteristic value are sent to a file management service of the server together;
the file management service calculates the characteristic value of the memory block again by using a check algorithm and records the characteristic value as a second characteristic value;
and judging whether the first characteristic value is the same as the second characteristic value, if so, acquiring the AI model from the memory block, and storing the AI model into a model set corresponding to an AI model warehouse.
5. The AI model service sharing method oriented to the power grid regulation and control business as claimed in claim 1 or 2, wherein the model online service creation method comprises:
acquiring a service name, an instance number, a CPU, a GPU and a memory;
setting a cluster label of the model online service to be 'app = service name-cluster ID' according to the model set ID, the model set version ID and the service name;
calling Kubernetes API to create a Deployment element, wherein a label of the Deployment element is a cluster label;
creating a replicase through the Deployment, and creating a Pod with the number equal to the number of instances in a system background by using the replicase;
calling a Kubernetes API to create Service, and setting the selector of the Service to be consistent with the label of the Deployment;
calling the Kubernetes API to create Ingress and associating the Ingress with Service.
6. The AI model service-oriented sharing method for power grid regulation and control business as claimed in claim 1, wherein the method for saving the AI model in the AI model warehouse to the cluster system by starting the model online service comprises:
acquiring a corresponding model UID list from an AI model warehouse according to the model set ID and the model set version ID;
acquiring AI models in batches from an AI model warehouse through file management service according to the model UID list, and temporarily storing the acquired AI models in a memory block form;
calculating the characteristic value of the memory block by using a checking algorithm, and recording the characteristic value as a first characteristic value;
sending the first characteristic value and the memory block to a cluster system through a file management service;
the cluster system calculates the characteristic value of the memory block again by using a verification algorithm and records the characteristic value as a second characteristic value;
and judging whether the first characteristic value is the same as the second characteristic value, and if so, storing the memory block to the cluster system.
7. The AI model service-oriented sharing method for the power grid regulation and control business as claimed in claim 5, wherein in the cluster system, the model online service is deployed and upgraded in a blue-green cluster by modifying backhaul configuration information of an Ingress object.
8. The AI model service-oriented sharing method for the power grid regulation and control business as claimed in claim 5, wherein in the cluster system, Kubernets API is called to modify the replicates parameter of the Deployment, thereby increasing or decreasing the number of the Pod; the Kubernetes API is called to modify the requests and limits parameters of the Deployment, thereby modifying the CPU, GPU and memory of each Pod.
9. The AI model service-oriented sharing method for the power grid regulation and control business as claimed in claim 5, wherein the method for loading AI models from the cluster system according to the external request comprises:
acquiring an http request in a cluster external URL format through Ingress, and performing rule matching on the received http request according to a preset rule list;
after the rule matching is successful, the Ingress forwards the http request to the Service according to the Service name and the port number;
the Service proxies the http request to the corresponding Pod of the Deployment according to the selector;
and analyzing the http request by the model online service, and loading the AI model in the cluster system by using a model loading mode corresponding to each AI algorithm frame according to the analyzed model UID.
10. An AI model service sharing system oriented to power grid regulation and control business is characterized by comprising:
the AI model warehouse is used for storing AI models in the form of model sets, and each model set stores the same type of AI models;
the model management module is used for accessing the AI models in batches, updating the AI models and managing model set information and model set versions in the AI model warehouse;
the model online service deployment module is used for deploying the model online service of the AI model in the cluster system by taking the model set in the AI model warehouse as granularity;
the service starting module is used for storing the AI model in the AI model warehouse into the cluster system by starting the model on-line service;
and the model loading module loads the AI model from the cluster system according to the external request.
CN202111447806.1A 2021-11-30 2021-11-30 AI model servitization sharing method and system for power grid regulation and control service Active CN114153525B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111447806.1A CN114153525B (en) 2021-11-30 2021-11-30 AI model servitization sharing method and system for power grid regulation and control service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111447806.1A CN114153525B (en) 2021-11-30 2021-11-30 AI model servitization sharing method and system for power grid regulation and control service

Publications (2)

Publication Number Publication Date
CN114153525A true CN114153525A (en) 2022-03-08
CN114153525B CN114153525B (en) 2024-01-05

Family

ID=80455361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111447806.1A Active CN114153525B (en) 2021-11-30 2021-11-30 AI model servitization sharing method and system for power grid regulation and control service

Country Status (1)

Country Link
CN (1) CN114153525B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764808A (en) * 2018-03-29 2018-11-06 北京九章云极科技有限公司 Data Analysis Services system and its on-time model dispositions method
CN110413294A (en) * 2019-08-06 2019-11-05 中国工商银行股份有限公司 Service delivery system, method, apparatus and equipment
CN110688539A (en) * 2019-09-30 2020-01-14 北京九章云极科技有限公司 Model management system and method
CN111414233A (en) * 2020-03-20 2020-07-14 京东数字科技控股有限公司 Online model reasoning system
CN111538563A (en) * 2020-04-14 2020-08-14 北京宝兰德软件股份有限公司 Event analysis method and device for Kubernetes
CN112418438A (en) * 2020-11-24 2021-02-26 国电南瑞科技股份有限公司 Container-based machine learning procedural training task execution method and system
CN112783646A (en) * 2021-01-13 2021-05-11 中国工商银行股份有限公司 Stateful application containerization deployment method and device
CN113642948A (en) * 2020-05-11 2021-11-12 腾讯科技(深圳)有限公司 Model management method, device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764808A (en) * 2018-03-29 2018-11-06 北京九章云极科技有限公司 Data Analysis Services system and its on-time model dispositions method
CN110413294A (en) * 2019-08-06 2019-11-05 中国工商银行股份有限公司 Service delivery system, method, apparatus and equipment
CN110688539A (en) * 2019-09-30 2020-01-14 北京九章云极科技有限公司 Model management system and method
CN111414233A (en) * 2020-03-20 2020-07-14 京东数字科技控股有限公司 Online model reasoning system
CN111538563A (en) * 2020-04-14 2020-08-14 北京宝兰德软件股份有限公司 Event analysis method and device for Kubernetes
CN113642948A (en) * 2020-05-11 2021-11-12 腾讯科技(深圳)有限公司 Model management method, device and storage medium
CN112418438A (en) * 2020-11-24 2021-02-26 国电南瑞科技股份有限公司 Container-based machine learning procedural training task execution method and system
CN112783646A (en) * 2021-01-13 2021-05-11 中国工商银行股份有限公司 Stateful application containerization deployment method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡晓亮: "基于Kubernetes的容器云平台设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2, pages 139 - 198 *

Also Published As

Publication number Publication date
CN114153525B (en) 2024-01-05

Similar Documents

Publication Publication Date Title
US10511506B2 (en) Method and device for managing virtualized network function
US9852220B1 (en) Distributed workflow management system
CN110932912A (en) Method for realizing unified management of configuration files under micro-service architecture
US20210326161A1 (en) Apparatus and method for multi-cloud service platform
CN113961346A (en) Data cache management and scheduling method and device, electronic equipment and storage medium
CN112463290A (en) Method, system, apparatus and storage medium for dynamically adjusting the number of computing containers
CN114077602A (en) Data migration method and device, electronic equipment and storage medium
CN113297031A (en) Container group protection method and device in container cluster
CN115185679A (en) Task processing method and device for artificial intelligence algorithm, server and storage medium
CN113204368B (en) Application processing method, server and storage medium
CN112199200A (en) Resource scheduling method and device, computer equipment and storage medium
CN114153525B (en) AI model servitization sharing method and system for power grid regulation and control service
CN113590304A (en) Business data processing method and device, computer equipment and storage medium
US20200133709A1 (en) System and method for content - application split
CN115357198B (en) Mounting method and device of storage volume, storage medium and electronic equipment
WO2021245447A1 (en) Stable references for network function life cycle management automation
CN111767126A (en) System and method for distributed batch processing
CN104796465B (en) Cloud platform method for processing business and system
CN111767345B (en) Modeling data synchronization method, modeling data synchronization device, computer equipment and readable storage medium
CN114020368A (en) Information processing method and device based on state machine and storage medium
US20230246911A1 (en) Control device, control method, control program and control system
US20110154374A1 (en) Apparatus and method for managing customized application
CN113641641A (en) Switching method, switching system, equipment and storage medium of file storage service
CN107493316B (en) Application operation management method, server and computer readable storage medium
CN111090530A (en) Distributed cross-interprocess communication bus system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant