CN114745264A

CN114745264A - Inference service deployment method and device and processor readable storage medium

Info

Publication number: CN114745264A
Application number: CN202011539937.8A
Authority: CN
Inventors: 王浩
Original assignee: Datang Mobile Communications Equipment Co Ltd
Current assignee: Datang Mobile Communications Equipment Co Ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2022-07-12

Abstract

The embodiment of the application provides a deployment method and device of inference service and a processor readable storage medium, and the method comprises the following steps: receiving a first registration request sent by a reasoning demand end; determining inference service and inference service operation resources matched with the demand information according to the demand information included in the first registration request; and deploying the inference service in the inference service operation resource. The method realizes reasoning service deployment meeting different requirements by determining reasoning service and reasoning service operation resources matched with the requirement information.

Description

Inference service deployment method and device and processor readable storage medium

Technical Field

The present application relates to the field of wireless communication technologies, and in particular, to a method and an apparatus for deploying inference services, and a processor-readable storage medium.

Background

An AI (Artificial Intelligence)/ML (Machine Learning) model is increasingly involved in application research of RAN (Radio Access Network), the AI/ML model provides inference service for service guarantee and resource optimization of RAN, and the inference service is used for decision and judgment of Network planning, Network optimization, user service guarantee, cell resource tuning and the like. The requirements in the specific application scenarios of the RAN are various and complex, such as flow prediction, user quantity prediction, access storm prediction, power consumption prediction, interference prediction, and the like. In the prior art, a deployment scheme for performing adaptation optimization on availability of hardware resources and a single index based on time delay cannot perform corresponding inference service deployment according to different requirements.

Disclosure of Invention

In view of the shortcomings of the prior art, the present application provides a method and an apparatus for deploying inference services, and a processor-readable storage medium, so as to solve the technical shortcomings described above.

In a first aspect, a deployment method of inference services is provided, which is executed by a deployment management end and includes:

receiving a first registration request sent by a reasoning demand end;

determining inference service and inference service operation resources matched with the demand information according to the demand information included in the first registration request;

and deploying the inference service in the inference service operation resource.

In one embodiment, determining inference services and inference service execution resources matching the requirement information according to the requirement information included in the first registration request includes:

determining a function classification matched with the function type requirement and/or a deployment position and inference resources of inference services matched with the inference performance requirement according to at least one of the function type requirement and the inference performance requirement included in the requirement information, and determining the inference services based on the function classification and/or the deployment position and inference resources of the inference services;

and determining an independent strategy or a general strategy matched with the deployment strategy requirement according to the deployment strategy requirement included in the requirement information, and determining inference service operation resources based on the independent strategy or the general strategy.

In one embodiment, deploying inference services in inference service runtime resources comprises:

when the deployment strategy requirement is used for selecting an independent strategy and no inference service is deployed in the inference service operation resource, calling an artificial intelligence model corresponding to the inference service in a preset first list, and deploying the inference service provided by the artificial intelligence model in the inference service operation resource.

In one embodiment, before receiving the first registration request sent by the inference requirement end, the method further includes:

receiving a second registration request sent by the inference service provider, wherein the second registration request comprises relevant information of inference services, and the relevant information of the inference services comprises at least one of service types of the inference services, service levels of the inference services and resource requirements of the inference services;

and storing the related information of the inference service in a preset first list.

In one embodiment, determining a general policy matching the deployment policy requirement according to the deployment policy requirement included in the requirement information includes:

determining relevant information of inference service matched with the deployment strategy requirement according to the deployment strategy requirement included in the requirement information;

and determining a general strategy matched with the deployment strategy requirement according to at least one of the service type of the inference service, the service level of the inference service and the resource requirement of the inference service, which are included in the relevant information of the inference service.

when the requirement information is received, at least one of the service type of the inference service, the service level of the inference service and the resource requirement of the inference service, which are included in the relevant information of the inference service in the first list, is called to be matched with the deployment strategy requirement included in the requirement information.

In one embodiment, the service type of the inference service includes at least one of a user location inference capability, a cell traffic demand inference capability, a transmission network bandwidth demand inference capability, a user quality of service inference capability, and a power consumption prediction inference capability; the service level of the inference service comprises at least one of a regional network level, a base station level, a cell level and a slicing level; the resource requirements of the inference service include at least one of central processor type requirements, central processor resource requirements, graphics processor type requirements, graphics processor resource requirements, storage type requirements, storage requirements, and container management platform type requirements.

receiving a third registration request sent by a reasoning service deployment resource provider, wherein the third registration request comprises relevant information of a reasoning service operation resource, and the relevant information of the reasoning service operation resource comprises at least one of a position level classification of the reasoning service operation resource and a resource configuration classification of the reasoning service operation resource;

and storing the related information of the inference service operation resource in a preset second list.

and when the requirement information is received, calling at least one of the position level classification of the inference service operation resource and the resource configuration classification of the inference service operation resource included in the related information of the inference service operation resource in the second list to be matched with the deployment strategy requirement included in the requirement information.

In one embodiment, the location level classification of the inference service operating resource comprises at least one of a regional cloud, an edge cloud, a base station, a concentration unit, and a distribution unit; the resource configuration classification of the inference service operation resource comprises at least one of a central processing unit, a graphic processor, a storage type and a network interface type.

In one embodiment, after deploying the inference service in the inference service execution resource, the method further includes:

sending a response message to the inference demand end, wherein the response message comprises an address and an interface of the artificial intelligence model and a first satisfaction degree score; the first satisfaction degree score is used for representing the matching degree of at least one item of reasoning service and reasoning service operation resources and the demand information.

In one embodiment, after sending the response message to the inference requirement end, the method further includes:

when it is determined that updated inference service and updated inference service operation resources matched with the demand information exist, and the first satisfaction degree score is smaller than the second satisfaction degree score, sending a notification message including the second satisfaction degree score to the inference demand end so that the inference demand end can determine whether to send an update request to the deployment management end;

the updating request is used for indicating the deployment management terminal to deploy the updated inference service in the updated inference service operation resource; the second satisfaction score is used for representing the matching degree of at least one item of the updated reasoning service and the updated reasoning service operation resource with the demand information.

In a second aspect, there is provided a deployment apparatus of inference service, executed by a deployment manager, including:

a memory for storing a computer program; a transceiver for transceiving data under the control of the processor; a processor for reading the computer program in the memory and performing the following:

receiving a first registration request sent by a reasoning demand end;

In one embodiment, deploying inference services in inference service execution resources comprises:

when the requirement information is received, at least one of the service type of the inference service, the service level of the inference service and the resource requirement of the inference service, which are included in the related information of the inference service in the first list, is called to be matched with the deployment strategy requirement included in the requirement information.

In one embodiment, the service type of the inference service comprises at least one of a user location inference capability, a cell traffic demand inference capability, a transmission network bandwidth demand inference capability, a user quality of service inference capability, and a power consumption prediction inference capability; the service level of the inference service comprises at least one of a regional network level, a base station level, a cell level and a slice level; the resource requirements of the inference service include at least one of central processor type requirements, central processor resource requirements, graphics processor type requirements, graphics processor resource requirements, storage type requirements, storage requirements, and container management platform type requirements.

In a third aspect, the present application provides a deployment apparatus for inference services, executed by a deployment management end, including:

the first processing unit is used for receiving a first registration request sent by the inference demand end;

the second processing unit is used for determining inference service and inference service operation resources matched with the requirement information according to the requirement information included in the first registration request;

and the third processing unit is used for deploying the inference service in the inference service operation resource.

In a fourth aspect, a processor-readable storage medium is provided, wherein the processor-readable storage medium stores a computer program for causing a processor to perform the method of the first aspect.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

the deployment management end receives a first registration request sent by the inference demand end; determining inference service and inference service operation resources matched with the demand information according to the demand information included in the first registration request; and deploying the inference service in the inference service operation resource. Therefore, reasoning service deployment meeting different requirements is realized by determining reasoning service and reasoning service operation resources matched with the requirement information.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic diagram of a system architecture provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a deployment method of inference services according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a deployment of inference services provided by embodiments of the present application;

FIG. 4 is a schematic diagram of a deployment of inference services provided by embodiments of the present application;

FIG. 5 is a schematic diagram of a deployment of inference services provided by embodiments of the present application;

FIG. 6 is a schematic diagram of a deployment of inference services provided by embodiments of the present application;

FIG. 7 is a schematic diagram of a deployment of inference services provided by embodiments of the present application;

FIG. 8 is a schematic diagram of a deployment of inference services provided by embodiments of the present application;

fig. 9 is a schematic structural diagram of a deployment apparatus of an inference service provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of a deployment apparatus of an inference service according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

In the embodiment of the present application, the term "and/or" describes an association relationship of associated objects, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In the embodiments of the present application, the term "plurality" means two or more, and other terms are similar thereto.

For better understanding and description of aspects of embodiments of the present application, some of the techniques involved in embodiments of the present application are briefly described below.

Wireless network intelligence is an extension of network intelligence requirements from core networks and administrative domains to wireless networks. The data analysis function is a key element for realizing intellectualization, and the 3GPP introduces the data analysis function of a core network and a management plane on the basis of a service architecture in the core network and the management function.

The 3GPP defines a Network Data analysis Function (NWDAF) in a core Network, where the NWDAF acquires wireless Network Data through a Network manager, combines the Network Data with the core Network Data to generate a UE-level Data analysis and prediction result, and assists the core Network to complete QoS (Quality of Service) parameter formulation, mobility management, and other functions. NWDAF is mainly a core network oriented optimization. Since real-time wireless network information cannot be obtained and the core network is concentrated in a higher deployment location, policy control optimization on the wireless side cannot be achieved.

The 3GPP defines an MDAF (Management Data analysis Function) in the Management plane, and the MDAF performs analysis and prediction based on network Management Data, and mainly performs optimization for scenes such as "building and maintaining" of the network, for example, implementing parameter configuration optimization and slice Management at a site level. Because the real-time performance of the network management data is poor, the granularity of the wireless network management is coarser.

A real-time data analysis function needs to be introduced into a wireless access network to support the fine control and optimization of a wireless side user terminal UE level, a service level, a QoS level and a slice level, the fine control and optimization is complementary with a core network and a management surface, and the wireless intelligent service capability is provided for services.

The wireless network flexibly meets various challenges of the unknown world, and needs an AI/ML model to assist the wireless network to complete the evolution to high intelligence. When an intelligent wireless network is in conflict with external requirements, the intelligent wireless network firstly needs to be capable of completing requirement mapping and acquiring and storing a large amount of various wireless state information and service state information from different network elements; by acquiring global information of cross-network elements and cross-layers, the wireless intelligence system learns, trains and generates various AI/ML models and deploys the models to be inference services, and assists the network elements in forming the capabilities of statistical analysis, comprehensive perception or prediction of wireless environment and service quality, such as perception of interference environment, perception of service characteristics, prediction of wireless network bandwidth, prediction of delay, prediction of network congestion state and the like. Based on the perception/prediction capability, complex logic analysis and reasoning are completed by combining a basic communication mechanism, and finally an active wireless network optimization decision is made.

The main mode in the generation process of the AI/ML model is to generate the AI/ML model through data acquisition and data training by a selection algorithm, deploy the AI/ML model as inference service and provide the inference service for a specific network unit for use. The acquisition and training processes are basically concentrated on a cloud or edge computing platform due to the fact that a large amount of data storage and time are involved, more choices exist in the deployment process, and the acquisition and training processes can be deployed on the cloud, the edge computing platform, a fog computing platform, an access network element and the like, wherein the access network element can be a base station.

The technical scheme provided by the embodiment of the application can be suitable for various systems, particularly 5G systems. For example, suitable systems may be global system for mobile communications (GSM) systems, Code Division Multiple Access (CDMA) systems, Wideband Code Division Multiple Access (WCDMA) General Packet Radio Service (GPRS) systems, Long Term Evolution (LTE) systems, LTE Frequency Division Duplex (FDD) systems, LTE Time Division Duplex (TDD) systems, long term evolution (long term evolution) systems, LTE-a systems, universal mobile systems (universal mobile telecommunications systems, UMTS), universal internet Access (world interoperability for microwave Access (WiMAX) systems, New Radio interface (NR) systems, etc. These various systems include terminal devices and network devices. The System may further include a core network portion, such as an Evolved Packet System (EPS), a 5G System (5GS), and the like.

Multiple Input Multiple Output (MIMO) transmission may be performed between the network device and the terminal device by using one or more antennas, where the MIMO transmission may be Single User MIMO (SU-MIMO) or Multi-User MIMO (MU-MIMO). The MIMO transmission may be 2D-MIMO, 3D-MIMO, FD-MIMO, or massive-MIMO, or may be diversity transmission, precoding transmission, beamforming transmission, or the like, depending on the form and number of root antenna combinations.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

A schematic diagram of a network architecture provided in an embodiment of the present application is shown in fig. 1, where the network architecture includes: the system comprises a deployment management end, an inference demand end, an inference service provider end and an inference service deployment resource provider end, wherein the deployment management end is, for example, a deployment management end 110 in fig. 1, the inference demand end is, for example, an inference demand end 120 in fig. 1, the inference service provider end is, for example, an inference service provider end 130 in fig. 1, and the inference service deployment resource provider end is, for example, an inference service deployment resource provider end 140 in fig. 1. The deployment management terminal 110, the inference requirement terminal 120, the inference service provider terminal 130 and the inference service deployment resource provider terminal 140 may be network nodes; the inference requirement end 120 may be a cloud core network, a cloud service server, an edge service server of an edge node, a network manager of an edge node, a base station, a CU (Centralized Unit), a DU (Distributed Unit), a user terminal UE, and the like. The deployment management terminal 110, the inference requirement terminal 120, the inference service provider terminal 130, and the inference service deployment resource provider terminal 140 may be deployed in an Access Network, for example, the deployment management terminal 110, the inference requirement terminal 120, the inference service provider terminal 130, and the inference service deployment resource provider terminal 140 may be deployed in an Access Network NG-RAN (New Generation-Radio Access Network) in a 5G system.

The UE referred to in the embodiments of the present application may refer to a device providing voice and/or data connectivity to a user, a handheld device having a wireless connection function, or another processing device connected to a wireless modem. Types of UEs include cell phones, vehicle user terminals, tablets, laptops, personal digital assistants, mobile internet appliances, wearable devices, and the like.

The network node according to the embodiment of the present application may be a base station, and the base station may include a plurality of cells for providing services for the UE. A base station may also be referred to as an access point or a device in an access network that communicates over the air-interface, through one or more sectors, with UEs, or by other names, depending on the particular application. The network node may be configured to exchange the received air frames with Internet Protocol (IP) packets as a router between the UE and the rest of the access network, which may include an Internet Protocol (IP) communication network. The network node may also coordinate attribute management for the air interface. For example, the network Node according to the embodiment of the present application may be a Base Transceiver Station (BTS) in a Global System for Mobile communications (GSM) or a Code Division Multiple Access (CDMA), a network device (NodeB) in a Wideband Code Division Multiple Access (WCDMA), an evolved Node B (eNB) or an e-NodeB) in a Long Term Evolution (LTE) System, a 5G Base Station (gNB) in a 5G network architecture (next generation System), a Home evolved Node B (Home B, HeNB), a relay Node (relay Node), a Home Base Station (femto), a pico Base Station (pico), and the like, which are not limited in the embodiments of the present application. In some network architectures, the network nodes may include Centralized Unit (CU) nodes and Distributed Unit (DU) nodes, which may also be geographically separated.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The embodiment of the application provides a deployment method of inference services, which is executed by a deployment management end, and a flow diagram of the method is shown in fig. 2, and the method includes:

s101, receiving a first registration request sent by an inference requirement end.

In one embodiment, the inference requirement end initiates registration, that is, the inference requirement end sends a first registration request to the deployment management end.

In one embodiment, as shown in FIG. 3, the inference service provider can include a model training mechanism for training generation of various AI/ML models and a model storage mechanism for storing AI/ML models. The deployment management side can be an inference service deployment unit. The inference service providing end can comprise a cloud inference service deployment mechanism, an edge node inference service deployment mechanism and a wireless network side inference service deployment mechanism, wherein the wireless network side inference service deployment mechanism can be gNB, CU, DU and the like under the condition of base station separation deployment.

In one embodiment, the model training mechanism generates an AI model which can be used for deployment, the AI model is stored in the model storage mechanism, and the inference service deployment management unit deploys the AI model to running resources provided by deployment mechanisms of different levels according to a first registration request sent by the inference demand end to form inference service, and provides an inference service address and an interface to the inference demand end.

In one embodiment, as shown in fig. 4, the deployment management end may be a deployment management unit, and the deployment management unit is used for managing a database, reasoning service management, reasoning and running resource management, reasoning requirement management and reasoning policy management. The inference demand end can be a cloud core network, a cloud service server, an edge service server of an edge node, a network manager of the edge node, a gNB, a CU, a DU and the like, wherein the cloud core network and the cloud service server correspond to cloud inference demands, the edge service server of the edge node and the network manager of the edge node correspond to edge node inference demands, and the gNB, the CU and the DU correspond to wireless network side inference demands.

S102, determining inference service and inference service operation resources matched with the requirement information according to the requirement information included in the first registration request.

In one embodiment, the requirement information includes a function type requirement, a reasoning performance requirement, a satisfaction requirement, a deployment policy requirement, and the like; wherein the function type requirement is used for matching the function classification of the inference service; the performance requirements are used for matching deployment positions and inference resources of the inference service; the satisfaction degree requirement is used for assisting in deploying the independent strategy, and the satisfaction degree requirement can represent that the requirement is partially met or completely met.

In one embodiment, as shown in fig. 5, the deployment management architecture corresponding to the deployment management end includes inference service management, inference resource management, deployment management, inference requirement management, and policy management; the reasoning service management comprises a reasoning service type, a reasoning resource requirement and a reasoning service level, wherein the reasoning service type is the service type of the reasoning service, the reasoning resource requirement is the resource requirement of the reasoning service, and the reasoning service level is the service level of the reasoning service; the inference resource management comprises the management of cloud resources, edge resources, base station resources, CU resources and DU resources; the deployment management comprises function matching, resource matching, deployment according to strategies and satisfaction degree management; the inference requirement management comprises the management of function requirements, performance requirements, satisfaction requirements and deployment strategies; policy management includes managing individual policies and general policies.

In one embodiment, the independent policy is used for meeting configuration requirements of the independent policy of the specific area network, the specific base station and the specific cell, such as resource sharing priority, delay priority and the like, and the independent policy is provided by the inference requirement terminal in an inference requirement registration process, wherein the inference requirement registration process comprises the inference requirement terminal sending a first registration request. The general policy is used for providing general or default policy configuration, such as cloud priority, terminal priority and the like, and when the inference demand end does not provide an independent policy, the default policy configuration can be set according to the service type of the inference service, the service level of the inference service, the resource requirement of the inference service and the like, and the deployment of the inference service can be optimized according to operation experience.

S103, deploying the inference service in the inference service operation resource.

In one embodiment, the inference service provider initiates registration, that is, the inference service provider sends a second registration request to the deployment manager.

In one embodiment, the service type of the inference service includes at least one of a user location inference capability, a cell traffic demand inference capability, a transmission network bandwidth demand inference capability, a user quality of service inference capability, and a power consumption prediction inference capability; the service level of the inference service comprises at least one of a regional network level, a base station level, a cell level and a slice level; the resource requirements of the inference service include at least one of central processor type requirements, central processor resource requirements, graphics processor type requirements, graphics processor resource requirements, storage type requirements, storage requirements, and container management platform type requirements.

In one embodiment, the service type of the inference service is used for distinguishing inference service functions, determining inference input and output, service interface type description, and the like; for example, a user location inference function, a cell traffic demand inference capability, a transmission network bandwidth demand inference capability, a user quality of service inference capability, a power consumption prediction inference capability, and the like. The service level of the reasoning service is used for distinguishing a network level range which can be satisfied by the reasoning service; for example, the regional network level may include a plurality of base stations, the base station level may include a plurality of cells, the cell level may include a plurality of slices, the slice level may include a plurality of users, the user level may include a plurality of time-frequency resources, and so on. The more detailed the inference service level, the more targeted it is, the faster it can be matched to the needs. The resource requirements of the inference service are used for determining basic computing resources, storage resources, operation resources, software environment and the like required by the operation of the inference service; for example: a CPU (Central Processing Unit) type requirement, a CPU resource requirement amount, a GPU (Graphics Processing Unit) type requirement, a GPU resource requirement amount, a storage type requirement, a storage requirement amount, a container management platform type requirement, and the like.

In one embodiment, before receiving the first registration request sent by the inference requirement terminal, the method further includes:

In one embodiment, the inference service deployment resource provider initiates registration, that is, the inference service deployment resource provider sends a third registration request to the deployment manager.

In one embodiment, as shown in fig. 6, the inference service provider may be an inference service provider, the inference service deployment resource provider may be an inference deployment resource provider, and the inference demander may be an inference demander. The deployment management architecture corresponds to the deployment management end and comprises inference service management, inference resource management, deployment management, inference demand management and strategy management; the reasoning service management comprises a reasoning service type, a reasoning resource requirement and a reasoning service level, wherein the reasoning service type is the service type of the reasoning service, the reasoning resource requirement is the resource requirement of the reasoning service, and the reasoning service level is the service level of the reasoning service; the inference resource management comprises the management of cloud resources, edge resources, base station resources, CU resources and DU resources; the deployment management comprises function matching, resource matching, deployment according to strategies and satisfaction degree management; the inference requirement management comprises the management of function requirements, performance requirements, satisfaction requirements and deployment strategies; policy management includes managing individual policies and general policies. The deployment management terminal, the inference requirement terminal, the inference service providing terminal and the inference service deployment resource providing terminal are registered through registration management, for example, the deployment management terminal receives a first registration request sent by the inference requirement terminal, the deployment management terminal receives a second registration request sent by the inference service providing terminal, and the deployment management terminal receives a third registration request sent by the inference service deployment resource providing terminal.

In one embodiment, as shown in fig. 7, the inference service provider may be an inference service provider, the inference service deployment resource provider may be an inference deployment resource provider, and the inference demander may be an inference demander. And the deployment management end extracts a service image of the reasoning service provider through a deployment management architecture, wherein the service image is an artificial intelligence model for providing reasoning service. And the deployment management end deploys the service image on the inference deployment resource provider through the deployment management architecture. And the deployment management terminal sends the address and the interface of the artificial intelligence model to the inference demand party through a deployment management framework.

In one embodiment, as shown in fig. 8, the inference service provider may be an inference service provider, the inference service deployment resource provider may be an inference deployment resource provider, and the inference demander may be an inference demander. The deployment management terminal, the inference demander, the inference service provider and the inference deployment resource provider register and respond through registration management; for example, the deployment management end performs extraction service registration and response with the inference service provider, performs extraction resource registration and response with the inference deployment resource provider, and performs inference requirement registration and response between the deployment management end and the inference demander. And the deployment management terminal extracts an inference service image of an inference service provider through a deployment management architecture, wherein the inference service image is an artificial intelligence model for providing inference service. And the deployment management end deploys the inference service image on the inference deployment resource provider through the deployment management architecture. And the deployment management end pushes the address and the interface of the artificial intelligence model corresponding to the inference service to the inference demand party through the deployment management architecture. The information management is used for storing data in the registration management and deployment management architecture.

when the requirement information is received, at least one of the position level classification of the inference service operation resource and the resource configuration classification of the inference service operation resource included in the relevant information of the inference service operation resource in the second list is called to be matched with the deployment strategy requirement included in the requirement information.

In one embodiment, the inference service provider can be an inference service deployment resource provider, the location level classification of the inference service operating resources includes network element classifications such as regional cloud, edge cloud, integrated base station, CU, DU, and the like, and the inference service deployment resource provider uses and carries node location information and identification ID in the registration process; the resource configuration classification of the inference service operation resources comprises a CPU type, a GPU type, a storage type, a network interface type and the like.

In one embodiment, registration and satisfaction scores may trigger updates to the inference service. The inference service is updated and maintained after being deployed, when the inference service has requirements which are not fully met, the deployment management end records the conditions provided by the inference service, and when new indexes meeting the requirements, such as an artificial intelligence model, operation resources, deployment positions and the like, exist, the deployment management end informs the inference demand end whether to reapply the inference service deployment and updates the inference service.

The deployment method of the inference service according to the above embodiment of the present application is fully and thoroughly described by the following embodiments:

in embodiment 1, the steps of establishing and publishing the inference service, A1-A4, and deploying the inference service are as follows:

step a1, the inference service deployment manager may provide service registration capabilities, resource registration capabilities, requirement registration capabilities, registration information management and matching capabilities, on-demand deployment capabilities, feedback results capabilities, continuous maintenance capabilities, and the like.

It should be noted that the deployment management end may be an inference service deployment manager.

Step A2, in terms of physical resource requirements, the inference service deployment manager is deployed in the cloud due to a large network range to be managed, and the inference service deployment manager has computing capability, storage capability, network capability, database capability and the like.

Step A3, the inference service deployment manager comprises functional units of registration service, database service, deployment service, maintenance service, etc.; the configuration administrator configures the generic policies of the inference service deployment manager according to the underlying objectives of the current network, in combination with expectations of traffic and resources over a period of time.

For example, the default policy in terms of deployment hierarchy is cloud-first and no generic policy configuration is done for a specific inference service type.

Step a4, after the inference service deployment manager is run in the cloud, the access addresses and APIs (Application Programming interfaces) of each service mentioned in the general policy are issued to the outside.

In embodiment 2, the registration steps B1-B8 of the inference service provider are as follows:

and step B1, collecting cell historical data, and generating an AI model capable of predicting user service experience through offline training.

In one embodiment, the cell history data includes, for example, user uplink and downlink wireless network service data characteristics from the base station, air interface channel data from the UE, and the like; the uplink and downlink wireless network service data from the base station may be characterized by packet length, number, transmission jitter, and the like, and the air interface Channel data from the UE may be ACK (Acknowledgement), NACK (Negative Acknowledgement), CQI (Channel Quality Indication), MCS (Modulation and Coding Scheme), BLER (block error rate), and the like.

And step B2, according to the characteristics of the AI model, classifying the service types, predicting the service quality of the user, inputting the service data samples and the air interface channel information of the user, and evaluating the service quality.

And step B3, dividing the service level into a combined capability set supporting several levels of capabilities including area network, base station, cell and slice according to the prediction capability range which can be completed by the AI model.

And step B4, according to the input and output required by the AI model and the running characteristics of the AI model, defining the resource requirements as 1vCPU and 16Mb of memory, wherein the GPU is not necessarily configured, the memory requirement is 1Mb under the condition of GPU configuration, and the transmission resource is 100 Kbps.

Step B5, according to the information in step B2-B4, the registration is completed through the inference service deployment management unit.

It should be noted that the deployment management end may be an inference service deployment management unit.

Step B6, a certain operation maintenance manager is located in the office room, the office room has 10 servers (2 channels for each server, each CPU is 16 core, no GPU) that can be incorporated into the inference service resource of the office, the location level is the edge computing platform, the resource configuration is 320 × vCPU, 640G memory, and the network transmission bandwidth is 1G × 10 bps.

And step B7, according to the information in step B6, the inference service provider registers to the inference service deployment management unit.

And step B8, storing all the inference services and the deployable resources in an inference deployment management database to form a resource information list.

It should be noted that the first list and the second list may be resource information lists.

In embodiment 3, the registration steps C1-C2 of the inference requirement side are as follows:

in step C1, the latest version of software for a base station contains inferential service requirements for user quality of service forecasts. The requirement registration content is organized according to the function classification and performance classification scheme provided by the system, no requirement exists in the feedback time delay aspect according to the predicted performance, and the requirement is periodically assisted and predicted, so that the registration strategy is the cloud deployment priority, and only the function type is required to be met in the satisfaction degree aspect.

And step C2, the base station sends the registration information organized in the step C1 to the inference service deployment management unit.

In example 4, the steps D1-D6 of matching of inference requirements and deployment of inference services are as follows:

and D1, the registration service of the inference service deployment manager receives the registration request of the inference demand end, inquires and matches the service quality prediction inference service registered in the step B5 according to the function classification in the registration request, further matches the performance of the service quality prediction inference service, finds that the performance also meets the requirement, and determines that the inference service can be deployed to meet the requirement.

It should be noted that, the deployment management end may be an inference service deployment manager, and the requirement information included in the first registration request may be an inference requirement.

Step D2, the inference service deployment manager inquires the deployment strategy of the inference service requirement predicted by the user service quality, and if the specific deployment strategy is found to be cloud-first, firstly, inquires whether the service is deployed at the current cloud; wherein the particular deployment policy is an independent policy.

Step D3, if the inference service is not deployed, the inference service deployment manager acquires the AI model from the model repository, ready for deployment.

And D4, if the policy not proposed by the demand side is the cloud, the inference service deployment manager queries whether the administrator has configured the general policy for cloud deployment.

And D5, the current cloud general strategy is deployed according to priority, and the inference service deployment manager queries inference service operation resources meeting inference service operation resource requirements through the priority of the cloud resources.

And D6, deploying the AI model in the resource selected in the step D5 by the inference service deployment manager for realizing the inference service, sending the access address and the interface of the inference service to the inference demand end, and carrying the satisfaction score of the inference service to the demand of 100.

In embodiment 5, on the basis of embodiment 4, when the inference service is found to have a deployed instance, an access address and an interface of the deployed inference service are directly sent to an inference requirement end, and a satisfaction score of the inference service on inference requirements is carried, where the satisfaction score is 100.

In embodiment 6, on the basis of embodiment 3, the steps F1-F2 of matching of inference requirements and deployment of inference services are as follows:

step F1, the registration service of the inference service deployment manager receives the registration request of the inference requirement, and according to the function classification in the registration request, it does not match any inference service registered as the service quality prediction, therefore, it cannot enter into the next matching.

It should be noted that the deployment management end may be an inference service deployment manager, and the requirement information included in the first registration request may be an inference requirement.

Step F2, the inference service deployment manager returns the match failure to the inference requirement terminal.

In embodiment 7, on the basis of embodiment 3, the deployment policy carried by the inference requirement is local-first; the steps G1-G6 of matching of inference requirements and deployment of inference services are as follows:

g1, the registration service of the inference service deployment manager receives the registration request of the inference requirement; inquiring and matching the service quality prediction inference service registered in the step B5 according to the function classification in the registration request; further matching the performance of the service, finding that the performance also meets the requirement, and determining that the service quality prediction inference service can be deployed to meet the inference requirement.

It should be noted that the deployment management end may be an inference service deployment manager, the requirement information included in the first registration request may be an inference requirement, and the inference service is a service quality prediction inference service.

Step G2, the inference service deployment manager inquires about the deployment strategy of the demand, finds that its specific deployment strategy is local priority, and first inquires whether the inference service has been deployed locally.

Step G3, if the inference service is not deployed, the inference service deployment manager acquires the AI model from the AI model repository, and prepares for deployment.

And G4, if the strategy not proposed by the inference demand end is local, the inference service deployment manager inquires whether the inference service deployment manager has the resources for the inference service operation locally.

And G5, under the condition that the resources are not available, the inference service deployment manager determines that the satisfaction degree carried in the registration information is only satisfied by the function type, and automatically allocates edge computing resources closer to the local to the inference service deployment manager, and the edge computing resources are used as a deployment platform of the inference service.

It should be noted that the inference service deployment resource provider may be an edge computing resource.

And G6, deploying the reasoning service in the edge computing resource selected in the step G5 by the reasoning service deployment manager, sending the access address and the interface of the reasoning service to the reasoning demand side, and carrying the score of the satisfaction degree of the reasoning demand side by the reasoning service, wherein the score of the satisfaction degree is 70.

In embodiment 8, on the basis of embodiment 7, new resources available for deploying inference services are locally registered for the inference requirement; steps H1-H3 of inferential service post-deployment update maintenance are as follows:

step H1, the deployment management unit periodically queries the non-100 points list of registered and serviced requests.

It should be noted that the deployment management end may be a deployment management unit, and the requirement information included in the first registration request may be an inference requirement. The first list and the second list may be request lists.

In step H2, when the deployment management unit determines that a new inference service capability exists locally at the inference requirement end in embodiment 7, and the inference service capability is available to provide the new inference service capability with the satisfaction score of 100, the deployment management unit sends a service capability provision notification to the inference requirement end.

And step H3, the inference demand end of the inference service receives the notification in the step H2, and the inference demand end decides whether to submit the service updating request to the deployment management unit according to the current service quality and index demand.

based on a classified matching principle, a deployment method which takes a demand as a center and a strategy as a guide is combined with factors in various aspects such as reasoning service capability, cloud deployment resources, edge node deployment resources, requirements of reasoning service and the like, flexible automatic deployment is carried out based on the strategy, so that reasoning service deployment which aims at maximally meeting the demand is realized, and service updating capability which is based on satisfaction degree grading after the reasoning service deployment is provided.

Based on the same inventive concept, the embodiment of the present application further provides a deployment apparatus of the inference service, which is executed by a deployment management end, and a schematic structural diagram of the deployment apparatus is shown in fig. 9, where the transceiver 1200 is configured to receive and transmit data under the control of the processor 1210.

Wherein in fig. 9, the bus architecture may include any number of interconnected buses and bridges, with one or more processors, represented by processor 1210, and various circuits, represented by memory 1220, being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1200 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium including wireless channels, wired channels, fiber optic cables, and the like. The processor 1210 is responsible for managing the bus architecture and general processing, and the memory 1220 may store data used by the processor 1210 in performing operations.

The processor 1210 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or a Complex Programmable Logic Device (CPLD), and may also have a multi-core architecture.

A processor 1210 for reading the computer program in the memory and performing the following operations:

receiving a first registration request sent by an inference demand end;

determining a function classification matched with the function type requirement and/or a deployment position and inference resources of inference service matched with the inference performance requirement according to at least one item of the function type requirement and the inference performance requirement included in the requirement information, and determining the inference service based on the function classification and/or the deployment position and inference resources of the inference service;

receiving a third registration request sent by the inference service deployment resource provider, wherein the third registration request comprises relevant information of inference service operating resources, and the relevant information of the inference service operating resources comprises at least one of position level classification of the inference service operating resources and resource configuration classification of the inference service operating resources;

the updating request is used for indicating the deployment management terminal to deploy the updated inference service in the updated inference service operation resource; the second satisfaction score is used for representing the matching degree of at least one of the updated reasoning service and the updated reasoning service operation resource with the demand information.

It should be noted that, the apparatus provided in the embodiment of the present invention can implement all the method steps implemented by the method embodiment and achieve the same technical effect, and detailed descriptions of the same parts and beneficial effects as the method embodiment in this embodiment are omitted here.

Based on the same inventive concept as the foregoing embodiment, the present embodiment further provides a deployment apparatus for inference service, which is executed by a deployment management end, and a schematic structural diagram of the deployment apparatus is shown in fig. 10, where the deployment apparatus 30 based on inference service includes a first processing unit 301, a second processing unit 302, and a third processing unit 303.

The first processing unit 301 is configured to receive a first registration request sent by an inference requirement end;

the second processing unit 302 is configured to determine, according to the requirement information included in the first registration request, inference services and inference service operation resources that are matched with the requirement information;

and the third processing unit 303 is configured to deploy the inference service in the inference service operation resource.

In an embodiment, the second processing unit 302 is specifically configured to determine, according to at least one of the function type requirement and the inference performance requirement included in the requirement information, a deployment location and an inference resource of the function classification matching the function type requirement and/or the inference service matching the inference performance requirement, and determine the inference service based on the deployment location and the inference resource of the function classification and/or the inference service; and determining an independent strategy or a general strategy matched with the deployment strategy requirement according to the deployment strategy requirement included in the requirement information, and determining inference service operation resources based on the independent strategy or the general strategy.

In an embodiment, the third processing unit 303 is specifically configured to, when the deployment policy requirement is used to select an independent policy and no inference service is deployed in the inference service operating resource, invoke an artificial intelligence model corresponding to the inference service in the preset first list, and deploy the inference service provided by the artificial intelligence model in the inference service operating resource.

In one embodiment, before receiving the first registration request sent by the inference requirement terminal, the first processing unit 301 is further configured to receive a second registration request sent by the inference service provider, where the second registration request includes information related to the inference service, and the information related to the inference service includes at least one of a service type of the inference service, a service level of the inference service, and a resource requirement of the inference service; and storing the related information of the reasoning service in a preset first list.

In an embodiment, the second processing unit 302 is specifically configured to determine, according to the deployment policy requirement included in the requirement information, relevant information of the inference service that matches the deployment policy requirement; and determining a general strategy matched with the deployment strategy requirement according to at least one of the service type of the inference service, the service level of the inference service and the resource requirement of the inference service, which are included in the relevant information of the inference service.

In an embodiment, the second processing unit 302 is specifically configured to, when the requirement information is received, invoke at least one of a service type of the inference service, a service level of the inference service, and a resource requirement of the inference service included in the relevant information of the inference service in the first list to match the deployment policy requirement included in the requirement information.

In an embodiment, before receiving the first registration request sent by the inference requirement terminal, the first processing unit 301 is further configured to receive a third registration request sent by the inference service deployment resource provider, where the third registration request includes information related to inference service operation resources, and the information related to inference service operation resources includes at least one of location level classification of the inference service operation resources and resource configuration classification of the inference service operation resources;

In an embodiment, the second processing unit 302 is specifically configured to, when the requirement information is received, invoke at least one of a location level classification of the inference service operation resource and a resource configuration classification of the inference service operation resource included in the related information of the inference service operation resource in the second list to match the deployment policy requirement included in the requirement information.

In one embodiment, after the inference service is deployed in the inference service operating resource, the third processing unit 303 is further configured to send a response message to the inference demand side, where the response message includes an address and an interface of the artificial intelligence model, and the first satisfaction score; the first satisfaction degree score is used for representing the matching degree of at least one item of reasoning service and reasoning service operation resources and the demand information.

In one embodiment, after sending the response message to the inference requirement end, the third processing unit 303 is further configured to, when it is determined that there are updated inference service and updated inference service operation resources matching the requirement information, and the first satisfaction score is smaller than the second satisfaction score, send a notification message including the second satisfaction score to the inference requirement end, so that the inference requirement end determines whether to send the update request to the deployment management end; the updating request is used for indicating the deployment management terminal to deploy the updated inference service in the updated inference service operation resource; the second satisfaction score is used for representing the matching degree of at least one item of the updated reasoning service and the updated reasoning service operation resource with the demand information.

It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a processor readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Based on the same inventive concept, the embodiment of the present application further provides a processor-readable storage medium, which stores a computer program, where the computer program is used for implementing, when executed by a processor, the steps of the method for deploying the inference service provided in any one of the embodiments or any one of the optional implementation manners of the embodiment of the present application.

The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including but not limited to magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-executable instructions. These computer-executable instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These processor-executable instructions may also be stored in a processor-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the processor-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These processor-executable instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A deployment method of inference services is executed by a deployment management terminal, and is characterized by comprising the following steps:

receiving a first registration request sent by a reasoning demand end;

determining inference service and inference service operation resources matched with the requirement information according to the requirement information included in the first registration request;

2. The method according to claim 1, wherein the determining inference service and inference service operation resource matching with the requirement information according to the requirement information included in the first registration request comprises:

determining a function classification matched with the function type requirement and/or a deployment position and inference resources of the inference service matched with the inference performance requirement according to at least one of the function type requirement and the inference performance requirement included in the requirement information, and determining the inference service based on the function classification and/or the deployment position and inference resources of the inference service;

and determining an independent strategy or a general strategy matched with the deployment strategy requirement according to the deployment strategy requirement included in the requirement information, and determining the inference service operation resource based on the independent strategy or the general strategy.

3. The method of claim 2, wherein deploying the inference service in the inference service runtime resource comprises:

when the deployment strategy requirement is used for selecting the independent strategy and the inference service is not deployed in the inference service operation resource, calling an artificial intelligence model corresponding to the inference service in a preset first list, and deploying the inference service provided by the artificial intelligence model in the inference service operation resource.

4. The method according to claim 2, before said receiving the first registration request sent by the inference requirement side, further comprising:

receiving a second registration request sent by an inference service provider, wherein the second registration request comprises relevant information of the inference service, and the relevant information of the inference service comprises at least one of a service type of the inference service, a service level of the inference service and a resource requirement of the inference service;

5. The method according to claim 4, wherein the determining a general policy matching the deployment policy requirement according to the deployment policy requirement included in the requirement information includes:

determining relevant information of the inference service matched with the deployment strategy requirement according to the deployment strategy requirement included in the requirement information;

6. The method according to claim 4, wherein the determining inference service and inference service operation resource matching with the requirement information according to the requirement information included in the first registration request comprises:

7. The method of claim 5, wherein the service type of the inference service comprises at least one of a user location inference capability, a cell traffic demand inference capability, a transmission network bandwidth demand inference capability, a user quality of service inference capability, and a power consumption prediction inference capability; the service level of the inference service comprises at least one of a regional network level, a base station level, a cell level and a slice level; the resource requirements of the inference service comprise at least one of central processor type requirements, central processor resource requirements, graphics processor type requirements, graphics processor resource requirements, storage type requirements, storage requirements and container management platform type requirements.

8. The method according to claim 2, before said receiving the first registration request sent by the inference requirement terminal, further comprising:

receiving a third registration request sent by a reasoning service deployment resource provider, wherein the third registration request comprises relevant information of the reasoning service operation resource, and the relevant information of the reasoning service operation resource comprises at least one of a position level classification of the reasoning service operation resource and a resource configuration classification of the reasoning service operation resource;

and storing the related information of the inference service operation resources in a preset second list.

9. The method according to claim 8, wherein the determining inference service and inference service operation resource matching with the requirement information according to the requirement information included in the first registration request comprises:

when the requirement information is received, at least one of the position level classification of the inference service operation resource and the resource configuration classification of the inference service operation resource included in the related information of the inference service operation resource in the second list is called to be matched with the deployment strategy requirement included in the requirement information.

10. The method of claim 8, wherein the location level classification of the inference service operating resources comprises at least one of regional clouds, edge clouds, base stations, concentration units, and distribution units; the resource configuration classification of the inference service operation resource comprises at least one of a central processing unit, a graphic processor, a storage type and a network interface type.

11. The method of claim 4, further comprising, after said deploying said inference service in said inference service runtime resource:

sending a response message to the inference demand end, wherein the response message comprises an address and an interface of the artificial intelligence model and a first satisfaction degree score; the first satisfaction score is used for representing the matching degree of at least one of the reasoning service and the reasoning service operation resource with the demand information.

12. The method according to claim 11, further comprising, after said sending a response message to said inference requirement terminal:

when it is determined that updated inference service and updated inference service operation resources matched with the demand information exist, and the first satisfaction degree score is smaller than the second satisfaction degree score, sending a notification message including the second satisfaction degree score to the inference demand end, so that the inference demand end determines whether to send an update request to the deployment management end;

the update request is used for instructing the deployment management terminal to deploy the updated inference service in the updated inference service operation resource; the second satisfaction score is used for representing the matching degree of at least one of the updated reasoning service and the updated reasoning service operation resource with the requirement information.

13. A deployment apparatus of inference services, executed by a deployment manager, comprising a memory, a transceiver, a processor:

a memory for storing a computer program; a transceiver for transceiving data under control of the processor; a processor for reading the computer program in the memory and performing the following operations:

receiving a first registration request sent by a reasoning demand end;

and deploying the inference service in the inference service operating resource.

14. The apparatus according to claim 13, wherein said determining inference services and inference service operation resources matching with the requirement information according to the requirement information included in the first registration request comprises:

15. The apparatus of claim 14, wherein said deploying the inference service in the inference service execution resource comprises:

16. The apparatus according to claim 14, further comprising, before said receiving the first registration request sent by the inference requirement side:

17. The apparatus according to claim 16, wherein the determining a general policy matching the deployment policy requirement according to the deployment policy requirement included in the requirement information includes:

18. The apparatus according to claim 16, wherein said determining inference services and inference service operation resources matching with the requirement information according to the requirement information included in the first registration request comprises:

19. The apparatus according to claim 17, wherein the service type of the inference service includes at least one of a user location inference capability, a cell traffic demand inference capability, a transmission network bandwidth demand inference capability, a user quality of service inference capability, and a power consumption prediction inference capability; the service level of the inference service comprises at least one of a regional network level, a base station level, a cell level and a slice level; the resource requirements of the inference service comprise at least one of central processor type requirements, central processor resource requirements, graphics processor type requirements, graphics processor resource requirements, storage type requirements, storage requirements and container management platform type requirements.

20. The apparatus according to claim 13, further comprising, before said receiving the first registration request sent by the inference requirement side:

21. The apparatus according to claim 20, wherein said determining inference services and inference service operation resources matching with the requirement information according to the requirement information included in the first registration request comprises:

22. The apparatus of claim 20, wherein the location level classification of inference service operational resources comprises at least one of regional clouds, edge clouds, base stations, centralized units, distributed units; the resource allocation classification of the inference service operation resource comprises at least one of a central processing unit, a graphic processor, a storage type and a network interface type.

23. The apparatus of claim 16, further comprising, after said deploying the inference service in the inference service execution resource:

24. The apparatus according to claim 23, further comprising, after said sending the response message to the inference requirement end:

the update request is used for instructing the deployment management terminal to deploy the updated inference service in the updated inference service operation resource; the second satisfaction score is used for representing the matching degree of at least one item of the updated reasoning service and the updated reasoning service operation resource with the requirement information.

25. A deployment apparatus of inference service, executed by a deployment management side, includes:

26. A processor-readable storage medium, characterized in that the processor-readable storage medium stores a computer program for causing a processor to perform the method of any one of claims 1 to 12.