WO2022237484A1 - 一种推理系统、方法、装置及相关设备 - Google Patents

一种推理系统、方法、装置及相关设备 Download PDF

Info

Publication number
WO2022237484A1
WO2022237484A1 PCT/CN2022/088086 CN2022088086W WO2022237484A1 WO 2022237484 A1 WO2022237484 A1 WO 2022237484A1 CN 2022088086 W CN2022088086 W CN 2022088086W WO 2022237484 A1 WO2022237484 A1 WO 2022237484A1
Authority
WO
WIPO (PCT)
Prior art keywords
inference
reasoning
model
update
updating
Prior art date
Application number
PCT/CN2022/088086
Other languages
English (en)
French (fr)
Inventor
谢达奇
王烽
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2022237484A1 publication Critical patent/WO2022237484A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present application relates to the technical field of artificial intelligence, in particular to a reasoning system, method, device and related equipment.
  • AI artificial intelligence
  • machine learning technology is an important method and means in the field of AI.
  • Sample data for inference Usually, the amount of resources used to implement model inference has an important impact on the effect of model inference.
  • a two-level reasoning mechanism can be set according to the resource limit of the model deployment environment.
  • inference models of different specifications can be set on the edge side and on the cloud, and based on the fact that the computing resources on the edge side are less than those on the cloud, the specifications of the inference models deployed on the edge side Usually smaller than the size of inference models deployed in the cloud.
  • the inference effect (such as inference accuracy, efficiency, etc.) of the inference model on the cloud for the input sample is usually better than that of the inference model on the edge side for the input sample.
  • the inference model on the edge side can be preferentially used to infer the input sample, and when the inference model on the edge side has too low confidence in the inference result of the input sample, the input sample is sent to The cloud, so that the input sample can be inferred by using the inference model with a larger specification in the cloud, so as to improve the accuracy of the final inference result.
  • the transmission of input samples from the edge side to the cloud requires a certain amount of transmission bandwidth, in practical application scenarios, users usually limit the proportion of input samples transmitted to the cloud to avoid transmitting too many input samples from the edge side. To the cloud, it takes up a large transmission bandwidth.
  • the performance of the reasoning system based on this reasoning mechanism may be difficult to maintain a high level, for example, in a certain period of time, the accuracy of the reasoning results determined by the reasoning system for the input samples is low. Therefore, there is an urgent need for a reasoning scheme so that the performance of the reasoning system can be maintained at a high level.
  • the present application provides an inference system for maintaining the accuracy of inference input samples at a high level.
  • the present application also provides a reasoning method, an update device, a computer device, a computer-readable storage medium, and a computer program product.
  • the present application provides an inference system, which includes a first inference device, a second inference device, an update device, and a decision device.
  • the first reasoning device is used to use the first reasoning model to perform reasoning on the input sample
  • the decision-making device is used to determine to transmit the input sample to the second model when the reasoning result of the first reasoning model for the input sample satisfies the transmission condition.
  • Reasoning device the second reasoning device is used to use the second reasoning model to reason the received input sample, and the specification of the second reasoning model used by the second reasoning device is larger than the first reasoning model used by the first reasoning device Specification of the model
  • the inference system Since the transmission conditions used to decide whether to transmit the input samples to the second inference device can be adjusted dynamically, this enables the inference system to maintain high inference performance based on the adjusted transmission conditions after the transmission conditions are adjusted according to actual application requirements . For example, when the transmission bandwidth between the first inference device and the second inference device increases, the inference system updates the transmission conditions so that more input samples are inferred by the second inference model with a larger specification, so as to realize the limited Within the range of the transmission bandwidth, the inference accuracy of the inference system is kept at a high level. When the transmission bandwidth between the first reasoning device and the second reasoning device decreases, the reasoning system reduces the number of input samples transmitted to the second reasoning device by updating the transmission conditions, thereby reducing the transmission bandwidth occupied by the reasoning system.
  • the first reasoning device and the second reasoning device may be implemented by software or hardware.
  • the first reasoning device and the second reasoning device when implemented by software, may be, for example, a virtual machine running on a computing device or the like.
  • the first reasoning device and the second reasoning device may include one or more computing devices, such as one or more servers.
  • first reasoning device and the second reasoning device may be deployed in different environments.
  • first reasoning device can be deployed on the edge network, and the second reasoning device can be deployed on the cloud; or, the first reasoning device can be deployed on the local network, and the second reasoning device can be deployed on the edge network, etc.
  • the transmission condition may specifically be that the confidence of the inference result is lower than the confidence threshold, that is, the decision-making device may, when the confidence of the inference result of the first inference model for the input sample is lower than the confidence threshold, It is determined to transmit the input sample to the second inference device, so as to improve the inference accuracy for the input sample.
  • the updating device updates the transmission condition, it may specifically update the size of the confidence threshold, such as increasing or decreasing the confidence threshold. In this way, the updating device can dynamically adjust the confidence threshold so that the performance of the reasoning system can reach a higher level.
  • the transmission condition further includes that the ratio of the input samples sent by the first reasoning device to the second reasoning device relative to the total input samples received by the first reasoning device does not exceed the transmission ratio upper limit.
  • the transmission ratio upper limit may be set in advance by the user. In this way, when the confidence level of the inference result of the first inference model for the input sample is low, the decision-making device can first judge that if the input sample is sent to the second inference device, the input sample that has been sent is relative to the first inference device 101. Whether the proportion of the total input samples received exceeds the pre-configured transmission proportion cap.
  • the decision-making device will still send the reasoning result to the terminal device, so as to prevent the transmission bandwidth between the first reasoning device and the second reasoning device from exceeding the transmission bandwidth upper limit. If not, the decision-making device may instruct the first reasoning device 101 to send the input sample to the second reasoning device 101, so as to obtain a more accurate reasoning result for the input sample.
  • the first update trigger condition includes that the average inference accuracy of the first inference model within the first time period is lower than the first accuracy threshold, the transmission bandwidth between the first inference device and the second inference device at least one of the increases.
  • the updating means updates the confidence threshold, it may specifically increase the confidence threshold. In this way, the updating device can send more input samples to the second reasoning device for reasoning by increasing the confidence threshold, thereby improving the overall reasoning accuracy of the reasoning system for the input samples.
  • the updating device may specifically increase the confidence threshold when the average remaining transmission bandwidth between the first reasoning device and the second reasoning device within the first time period is higher than a preset threshold. In this way, after the updating device increases the confidence threshold, there is sufficient transmission bandwidth between the first reasoning device and the second reasoning device to support transmission of a larger number of input samples.
  • the updating device may not update the confidence threshold. In this way, it is possible to avoid insufficient transmission bandwidth between the first reasoning device and the second reasoning device due to a larger number of input samples being transmitted to the second reasoning device after the updating device increases the confidence threshold.
  • the first update trigger condition may specifically be that the transmission bandwidth between the first reasoning device and the second reasoning device decreases, and the input samples sent by the first reasoning device to the second reasoning device are compared to the first The proportion of total input samples received by the inference device exceeds the upper transmission proportion limit.
  • the updating device updates the transmission condition, it may specifically reduce the confidence threshold. In this way, based on the reduced confidence threshold, the reasoning system can reduce the number of uploads to the second reasoning device, thereby reducing transmission bandwidth consumption between the first reasoning device and the second reasoning device.
  • the updating device is further configured to update the first inference model and/or the second inference model when the second update trigger condition is met. In this way, the updating device can provide the reasoning accuracy of the reasoning system for the input samples by updating the reasoning model.
  • the updating device can improve the inference accuracy of the input samples by updating the inference model when the inference accuracy of the inference model is low.
  • the second precision threshold may be greater than the first precision threshold.
  • the updating device when the updating device updates the inference model, it may specifically obtain incremental training samples first, and the incremental training samples may be, for example, the input samples inferred by the inference system in the latest time period, and may A user or annotator completes the labeling. Then, the updating device may use incremental training samples to perform incremental training on the first inference model and/or the second inference model. In this way, the incrementally trained first inference model and/or the second inference model can perform more precise inference on input samples similar to the incremental training samples.
  • the updating device when the updating device updates the first reasoning model, it may first determine the amount of available resources of the first reasoning device within the second time period, for example, it may be determined by means of prediction, etc. Then, the updating means can update the specification of the first reasoning model according to the resource amount of the available resource. For example, when the amount of available resources decreases, the updating means may decrease the specification of the first inference model; and when the amount of available resources increases, the updating means may increase the specification of the first inference model.
  • the present application provides a reasoning method, which is applied to an updating device in a reasoning system, and the reasoning system further includes a first reasoning device, a second reasoning device, and a decision-making device, and the method includes:
  • the updating means acquires resource information and/or inference results of the inference system, the inference results include the results of the first inference means performing inference on the input samples using the first inference model, wherein when the first inference model When the result of reasoning on the input sample satisfies the transmission condition in the decision-making device, the input sample is transmitted to the second reasoning device; the updating device is based on the resource information of the reasoning system and/or the The reasoning result of the reasoning system determines that the reasoning system satisfies the first update trigger condition; the updating device updates the transmission condition.
  • the transmission condition includes that the confidence of the inference result is lower than a confidence threshold
  • the updating device updating the transmission condition includes: the updating device updating the confidence threshold
  • the transmission condition further includes that the ratio of the input samples transmitted to the second reasoning device relative to the total input samples received by the first reasoning device does not exceed the transmission ratio upper limit.
  • the first update trigger condition includes that the average inference accuracy of the first inference model within the first time period is lower than a first accuracy threshold, the first inference device and the first inference At least one of the increase of the transmission bandwidth between the two reasoning devices; the updating device updating the transmission condition includes: the updating device increasing the confidence threshold.
  • the updating device for updating the transmission condition includes: when the average remaining transmission bandwidth between the first reasoning device and the second reasoning device within the first time period When it is higher than the preset threshold, increase the confidence threshold.
  • the first update trigger condition includes that the transmission bandwidth between the first inference device and the second inference device is reduced, and the input samples transmitted to the second inference device are relatively
  • the proportion of the total input samples received by the first reasoning means exceeds at least one of the transmission proportion upper limit
  • the updating means updating the transmission condition includes: the updating means reducing the confidence threshold.
  • the method further includes: updating, by the updating device, the first inference model and/or the second inference model when a second update trigger condition is satisfied.
  • the updating device updates the first inference model and/or the second inference model, including: when the first inference model When the average inference accuracy within the first time period is lower than a first accuracy threshold and the remaining transmission bandwidth between the first inference device and the second inference device is lower than a preset threshold, the update device updates the A first inference model; and/or, when the average inference accuracy of the second inference model within the first time period is lower than a second accuracy threshold, the updating means updates the second inference model.
  • the updating device for updating the first inference model and/or the second inference model includes: the updating device acquires incremental training samples; the updating device uses the incremental Perform incremental training on the first inference model and/or the second inference model by using a large number of training samples.
  • the update device updating the first inference model includes: the update device determines the amount of available resources of the first inference device within a second time period; the updating The device updates the specification of the first reasoning model according to the amount of available resources of the first reasoning device within the second time period.
  • the present application provides an update device, the update device is applied to an inference system, and the inference system further includes a first inference device, a second inference device, and a decision-making device, and the update device includes: a collection module for In order to acquire resource information and/or inference results of the inference system, the inference results include inference results obtained by the first inference device on input samples using a first inference model, wherein when the first inference model targets the When the reasoning result of the input sample satisfies the transmission condition in the decision-making device, the input sample is transmitted to the second reasoning device; the monitoring module is configured to use the resource information of the reasoning system and/or the The reasoning result of the reasoning system determines that the reasoning system meets the first update trigger condition; the update module is configured to update the transmission condition.
  • the inference results include inference results obtained by the first inference device on input samples using a first inference model, wherein when the first inference model targets the When the reasoning result of the input sample satisfies the transmission condition in the decision-
  • the transmission condition includes that the confidence of the reasoning result is lower than a confidence threshold
  • the updating module is specifically configured to update the confidence threshold
  • the transmission condition further includes that the ratio of the input samples transmitted to the second reasoning device relative to the total input samples received by the first reasoning device does not exceed the transmission ratio upper limit.
  • the first update trigger condition includes that the average inference accuracy of the first inference model within the first time period is lower than a first accuracy threshold, the first inference device and the first inference At least one of the increase in the transmission bandwidth between the two reasoning devices; the update module is specifically configured to increase the confidence threshold.
  • the updating module is configured to: when the average remaining transmission bandwidth between the first reasoning device and the second reasoning device within the first time period is higher than a preset threshold , increase the confidence threshold.
  • the first update trigger condition includes that the transmission bandwidth between the first inference device and the second inference device is reduced, and the input samples transmitted to the second inference device are relatively The proportion of the total input samples received by the first reasoning device exceeds at least one of the transmission proportion upper limit; the update module is specifically configured to reduce the confidence threshold.
  • the update module is further configured to update the first inference model and/or the second inference model when a second update trigger condition is satisfied.
  • the update module is configured to: when the average inference accuracy of the first inference model within a first time period is lower than a first accuracy threshold and the first inference device and the When the remaining transmission bandwidth between the second inference devices is lower than a preset threshold, update the first inference model; and/or, when the average inference accuracy of the second inference model in the first time period is lower than When the second accuracy threshold is reached, the second reasoning model is updated.
  • the update module is configured to: acquire incremental training samples; use the incremental training samples to perform incremental training on the first inference model and/or the second inference model .
  • the update module is configured to: determine the amount of available resources of the first reasoning device within the second time period; The resource amount of the available resources is used to update the specification of the first inference model.
  • the update device provided by the third aspect corresponds to the reasoning system provided by the first aspect
  • the technical effects of the third aspect and any possible implementation of the third aspect please refer to the corresponding first aspect
  • the technical effects corresponding to any one of the possible implementation manners of the first aspect this embodiment will not repeat them again.
  • the present application provides a computer device, the computer device includes a processor and a memory; the memory is used to store instructions, and when the computer device is running, the processor executes the instructions stored in the memory, so that the The computer device executes the second aspect above or the reasoning method in any possible implementation manner of the second aspect.
  • the memory may be integrated in the processor, or independent of the processor.
  • a computer device may also include a bus. Wherein, the processor is connected to the memory through the bus.
  • the memory may include a readable memory and a random access memory.
  • the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer device, the computer device executes any of the above-mentioned second aspect or the second aspect. A method described in an implementation.
  • the present application provides a computer program product containing instructions, which, when run on a computer device, causes the computer device to execute the method described in the second aspect or any implementation manner of the second aspect.
  • FIG. 1 is a schematic diagram of the architecture of an inference system provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of another reasoning system provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an exemplary interactive interface provided by the embodiment of the present application.
  • FIG. 4 is a schematic diagram of an exemplary elastic update configuration interface provided by the embodiment of the present application.
  • FIG. 5 is a schematic flow chart of an inference method provided in an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a computer device 600 provided by an embodiment of the present application.
  • the reasoning system 100 includes a first reasoning device 101 , a second reasoning device 102 , a decision device 103 and an updating device 104 .
  • the first reasoning device 101 and the second reasoning device 102 may be implemented by software or hardware.
  • the first reasoning device 101 and the second reasoning device 102 may be software running on a computer device, such as a virtual machine.
  • both the first reasoning device 101 and the second reasoning device 102 may include at least one computing device.
  • the first reasoning device 101 and the second reasoning device 102 respectively include multiple servers as an example.
  • the computing devices constituting the first reasoning device 101 and the second reasoning device 102 may also be other devices with computing capabilities, and are not limited to the servers shown in FIG. 1 .
  • the first reasoning device 101 and the second reasoning device 102 may be deployed in different environments. Exemplarily, as shown in FIG. 1 , the first reasoning device 101 can be deployed on the edge network, and is used to execute corresponding calculation processes on the edge side, such as the following reasoning process based on the first reasoning model; the second reasoning device 102 can It is deployed on the cloud, and is used to execute a corresponding calculation process on the cloud, such as the following reasoning process based on the second reasoning model.
  • the first reasoning device 101 may be deployed in a local network on the user side, such as a local terminal or server; the second reasoning device 102 may be deployed in an edge network.
  • the specific deployment manners of the first reasoning device 101 and the second reasoning device 102 are not limited.
  • Both the decision-making device 103 and the updating device 104 can be deployed in the same environment as the first reasoning device 101, for example, both the decision-making device 103 and the updating device 104 can be deployed with the first reasoning device 101 on the edge side network as shown in the figure, Or it can also be deployed on a local network.
  • the decision-making means 103 and the updating means 104 can be realized by software.
  • the decision-making means 103 and the updating means 104 may be application programs applied on a computing device, and the computing device is deployed in the same environment as the first reasoning means 101 .
  • the decision-making device 103 can also be implemented by hardware.
  • the decision-making device 103 can be a computing device located in the same environment as the first reasoning device 101, such as a server; Integrated circuit, ASIC) or programmable logic device (programmable logic device, PLD) realized equipment.
  • the above-mentioned PLD can be implemented by complex programmable logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.
  • the update device 104 may also be deployed in the same environment as the second reasoning device 102, such as deployed in the cloud.
  • the first inference device 101 may include, for example, a plurality of edge servers, and the first inference device 101 may receive the input sample sent by the terminal device 105 on the user side.
  • the input sample For example, it may be an image captured by the terminal device 105 (or captured by other devices).
  • the first inference device 101 can use the pre-trained first inference model to infer the acquired input samples, and obtain inference results, such as using the first inference model to detect objects such as safety helmets in the captured images, etc., at the same time , the first inference model can also output the confidence level of the inference result (used to represent the degree of confidence that the inference result is correct).
  • the decision-making device 103 may instruct the first reasoning device 101 to send the input sample to the second reasoning device 102 .
  • the second reasoning device 102 may use the pre-trained second reasoning model to reason the received input samples and obtain the reasoning result.
  • the accuracy of the inference result obtained based on the second inference model (that is, the confidence indicating that the inference result is correct) is usually higher, so that the The inference accuracy of the inference system 100 for this input sample reaches a high level.
  • the transmission bandwidth between the first inference device 101 and the second inference device 102 is usually limited, therefore, the transmission bandwidth occupied by transmitting input samples can be avoided by limiting the proportion of input samples sent to the second inference device 102 is too big.
  • the decision-making device 103 determines whether to transmit the input sample to the second inference device 102, it may determine that if the input sample is transmitted To the second reasoning device, whether the proportion of the input samples received by the second reasoning device 102 relative to the input samples received by the first reasoning device 101 exceeds the upper limit of the transmission ratio specified by the user.
  • the decision-making device 103 determines to transmit the input sample to the second inference device, so as to improve the inference accuracy of the input sample by using the second inference model with a larger specification. And if it exceeds, the decision-making device 103 may refuse to transmit the input sample.
  • the reasoning result of the reasoning system 100 for the input sample is the result obtained by the first reasoning device using the first reasoning model to reason the input sample, which reduces the reasoning accuracy of the reasoning system 100 for the input sample .
  • the confidence threshold may make it difficult to maintain a high level of performance of the reasoning system 100 .
  • the available transmission bandwidth between the first reasoning device 101 and the second reasoning device 102 may increase.
  • the input samples corresponding to the inference results of the confidence threshold are not transmitted to the second inference device 102, so that the inference results output by the inference system 100 for a large number of input samples are all inference results of the first inference model with a smaller specification, Therefore, it is difficult to improve the overall reasoning accuracy of the reasoning system 100 .
  • the available transmission bandwidth between the first reasoning device 101 and the second reasoning device 102 may also be reduced.
  • the update means 104 can dynamically update and adjust the conditions (hereinafter referred to as transmission conditions) used in the decision means 103 for judging whether to transmit input samples to the second inference means 102 .
  • transmission conditions the conditions used in the decision means 103 for judging whether to transmit input samples to the second inference means 102 .
  • the update device 104 can detect whether the inference system 100 currently meets the update trigger condition, and if so, update The device 104 may update the confidence threshold used in the decision device 103 to determine whether to transmit the input sample, so that the decision device 103 uses the updated confidence threshold to determine whether to transmit the input sample to the second reasoning device 102 .
  • the performance of the reasoning system 100 can be kept at a relatively high level.
  • the updating device 104 may correspondingly increase the confidence threshold, so that more input samples (whose reasoning results have low confidence The increased confidence threshold) can be transmitted to the second reasoning device 102, and the second reasoning model with a larger specification in the second reasoning device 102 is used to reason the input samples, thereby improving the overall performance of the reasoning system 100 inference accuracy.
  • the updating device 104 can reduce the confidence threshold accordingly, so as to reduce the need to transmit data to the second reasoning device 102 by reducing the confidence threshold.
  • the number of input samples so as to reduce the time delay of the inference system 100 inferring the input samples and the transmission bandwidth consumed.
  • the reasoning system 100 shown in FIG. 1 is only used as an example, and is not intended to limit the specific implementation of the reasoning system.
  • the reasoning system 100 may include more functional modules to support the reasoning system 100 to have more other functions; or, the decision-making device 103 and the updating device 104 in the reasoning system 100 may be integrated into one function modules, etc.
  • FIG. 2 is a schematic structural diagram of an inference system provided by an embodiment of the present application.
  • the reasoning system 100 shown in FIG. 2 is deployed in an edge-cloud collaborative scenario, that is, the first reasoning device 101 , the decision-making device 103 and the updating device 104 are all deployed in the edge network, and the second reasoning device 102 is deployed in the cloud.
  • the update device 104 in the inference system 100 shown in FIG. 2 includes a collection module 1041 , a monitoring module 1042 and an update module 1043 .
  • the first reasoning device 101 in the reasoning system 100 is pre-configured with a first reasoning model
  • the second reasoning device 102 is pre-configured with a second reasoning model.
  • the computing performance of the cloud is usually higher (for example, the amount of available computing resources is larger, etc.), for this reason, the second reasoning device 102 located in the cloud can perform relatively complex computing tasks
  • the first device 101 located in the edge network can perform relatively simple computing tasks.
  • the specification of the first inference model configured for the first inference device 101 is smaller than the specification of the second inference model configured for the second inference device 102 .
  • the file size of the first inference model is 50M (mega), and the file size of the second inference model is 200M, etc.
  • the first reasoning model and the second reasoning model may be machine learning models constructed based on machine learning algorithms, then the specification of the first reasoning model is smaller than the specification of the second reasoning model, specifically, it may be the first The number of neural network layers in the reasoning model is less than the number of neural network layers in the second reasoning model, or the number of parameters included in the first reasoning model is less than the number of parameters included in the second reasoning model.
  • the computing power (flops) of the first inference model is less than that of the second inference model, and correspondingly, the demand for computing resources of the first inference model at runtime is also less Calculate resource requirements.
  • the reasoning system 100 may configure reasoning models for the first reasoning device 101 and the second reasoning device 102 through the updating device 104 , or configure through other devices.
  • the configuration of the inference model by the updating device 104 is taken as an example below for illustrative description.
  • the updating device 104 may present an interactive interface as shown in FIG. 3 to the user side, and prompt the user to specify constraints for the reasoning system 100 through the interactive interface, and provide training for model training.
  • the constraints can be, for example, the inference accuracy, the specification of the inference model, the AI framework supported by the first inference device 101 located in the edge network, the inference target of the inference system 100 , the relationship between the first inference device 101 and the second inference device 102
  • the upper limit of the transmission bandwidth (or inference delay), the maximum ratio of the number of input samples inferred by the second inference device 102 to the number of inferred input samples in the first inference device 101 hereinafter referred to as the upper limit of the transmission ratio
  • the constraints specified by the user may also include other content, such as the following trigger update confidence threshold, transmission ratio upper limit or inference model conditions, etc.
  • the AI framework supported by the first reasoning device 101 may be, for example, TensorFlow framework, pytorch framework, mindspore framework, etc., and different AI frameworks support reasoning models in different file formats.
  • the reasoning target of the reasoning system 100 is used to indicate the application scenario of the reasoning model, such as using the reasoning model to perform object detection, image classification, and the like.
  • the update means 104 can use the update module 1043 to construct an initial reasoning model according to the constraints specified by the user.
  • the specification of the initial inference model constructed is the specification specified by the user
  • the file format of the initial inference model is the file format supported by the AI framework specified by the user
  • the inference target of the initial inference model is the inference specified by the user Target.
  • the update module 1043 can use the training samples provided by the user to train the constructed initial inference model, and stop the training until the inference accuracy of the initial inference model reaches the inference accuracy specified by the user.
  • the update module 1043 may send the trained initial inference model to the second inference device 102, so as to use the initial inference model as the second inference model configured to the second inference device 102.
  • the updating module 1043 may generate the first reasoning model according to the second reasoning model.
  • the update module 1043 may instruct the collection module 1041 to feed back the amount of available resources in the first inference device 101 .
  • the collection module 1041 sends a resource detection request to the first reasoning device 101 to detect the amount of resources currently available to the first reasoning device 101, and feeds back the detection result to the update module.
  • available resources on the first inference device 101 may include, for example, computing resources (such as CPU, etc.), storage resources (such as cloud disks, etc.), and the like.
  • the update module 1043 can determine the specification of the first inference model to be generated according to the amount of available resources obtained.
  • the update module 1043 can determine the number of processor cores to be generated
  • the specification of the first inference model is specification 1, so that the first inference device 101 can have sufficient resources to support the operation of the first inference model.
  • the update module 1043 can process the second inference model by means of model compression, model distillation, etc.
  • the reasoning model is sent to the first reasoning device 101 , so as to configure the first reasoning model for the first reasoning device 101 . Further, before sending the first reasoning model to the first reasoning device 101, the above-mentioned training samples can be used to train the first reasoning model again, and the trained first reasoning model can be sent to the first reasoning device 101 .
  • the updating module 1043 may simultaneously construct the first reasoning model and the second reasoning model of different specifications, and use the same training samples to complete the training of the first reasoning model and the second reasoning model respectively.
  • the update module 1043 can also configure the transmission conditions for the decision-making device 103 according to the inference accuracy, the upper limit of the transmission bandwidth and the upper limit of the transmission ratio in the constraints specified by the user, for example, it can be a confidence threshold or a discriminant model (The decision means 103 decides whether to transmit the input samples to the second reasoning means 102 based on the discriminant model.
  • the update module 1043 can obtain the data volume of the input samples per unit time and the upper limit of the transmission bandwidth between the first reasoning device 101 and the second reasoning device 102 according to the first reasoning device 101 , calculate the confidence threshold, the confidence threshold makes the average inference accuracy of the inference system 100 for the input samples not lower than the inference accuracy specified by the user. Further, the confidence threshold calculated by the update module 1043 can make the bandwidth occupied by sending input samples to the second inference device 102 per unit time not exceed the upper limit of the transmission bandwidth, that is, the input samples transmitted to the second inference device 102 The ratio of the number of to the number of input samples does not exceed the upper limit of the transmission ratio.
  • the first reasoning device 101 can receive an input sample, and use the configured first reasoning model to perform reasoning on the input sample, so that the The first reasoning model outputs a reasoning result, a confidence degree of the reasoning result, and the like.
  • the first reasoning device 101 may receive an image captured and sent by the terminal device 105 on the user side, which includes images of one or more staff members; then, the first reasoning device 101 uses The first reasoning model recognizes the image, outputs each worker in the image and whether each worker wears a safety helmet, and gives the confidence of the recognition result.
  • the decision-making device 103 can output the inference result to the terminal device 105 on the user side, so that the terminal device 105 can Take the appropriate action.
  • the terminal device 105 determines that some workers are not wearing safety helmets according to the reasoning result, the terminal device 105 can trigger a monitoring alarm so that the monitoring personnel can promptly inform the staff that they are wearing safety helmets correctly.
  • the confidence of the inference result output by the first inference model is less than the confidence threshold, it indicates that the accuracy of the inference result obtained by using the first inference model with a smaller specification is lower.
  • the decision-making device 103 may instruct the An inference device 101 sends the input sample to the second inference model.
  • the second inference module 102 may use the configured second inference model to perform inference on the received input samples, and send the inference result output by the second inference model to the terminal device 105 . Since the specification of the second inference model is relatively large, the accuracy of the inference result obtained by using the second inference model to infer the input sample is relatively high, so as to ensure that the inference accuracy of the inference system 100 for the input sample remains at a high level.
  • the transmission condition for determining whether to transmit the input samples to the second reasoning device 102 by the decision-making device 103 may also include the sent input samples relative to the total input samples received by the first reasoning device 101 (that is, Whether the proportion of all input samples) exceeds the pre-configured upper limit of transmission proportion.
  • the ratio of the sent input sample to the total input samples received by the first inference device 101 exceeds the pre-configured transmission ratio upper limit, then even The first reasoning model infers that the confidence level of the input sample is low, and the decision-making device 103 still sends the reasoning result to the terminal device 105, so as to prevent the transmission bandwidth between the first reasoning device 101 and the second reasoning device 102 from exceeding the upper limit of the transmission bandwidth . If not, the decision-making device 103 may instruct the first reasoning device 101 to send the input sample to the second reasoning device 101, so as to obtain a more accurate reasoning result for the input sample.
  • the transmission conditions such as the confidence threshold configured for the decision-making device 103 by the update unit 104 can be dynamically adjusted according to the operation of the inference system 100 so that the inference system 100 can maintain high performance.
  • the update device 104 can monitor whether the inference system 100 satisfies a preset first update trigger condition, and, when the inference system 100 satisfies the first update trigger condition, When updating the trigger condition, the updating means 104 may update the value of the configured confidence threshold according to the first update trigger condition.
  • the updating means 104 updates the value of the confidence threshold, it may specifically increase the confidence threshold.
  • the preset first update trigger condition may specifically include:
  • the average inference accuracy of the first inference model within the first time period is lower than the first accuracy threshold.
  • the update unit 104 can increase the number of input samples transmitted from the first inference unit 101 to the second inference unit 102 by increasing the confidence threshold (that is, the confidence of the original inference result is lower than the increased confidence threshold The input samples of will also be transmitted to the second inference device 102).
  • inference can be performed through the second inference model in the second inference device 102, so as to improve the reasoning system 100's ability to infer input samples. accuracy.
  • the average inference accuracy of the first inference model within the first time period may be, for example, the average of the confidence levels of the inference results of the first inference model for each input sample within the first time period, that is, the The confidence of the inference result is used as the inference accuracy of the first inference model for the input samples.
  • the updating device 104 can monitor the average inference accuracy of the first inference model through the monitoring module 1042, and the update module 1043 determines whether to update the confidence threshold according to the average inference accuracy.
  • the updating means 104 may also determine whether the average remaining transmission bandwidth between the first reasoning means 101 and the second reasoning means 102 within the first time period is higher than a preset threshold.
  • the remaining transmission bandwidth refers to the difference between the preset transmission bandwidth upper limit and the used transmission bandwidth between the first reasoning device 101 and the second reasoning device 102 .
  • the average remaining transmission bandwidth refers to an average value of remaining transmission bandwidths at multiple moments in the first time period.
  • the preset threshold can be set in advance by technicians according to the requirements of actual application scenarios. When the average remaining transmission bandwidth is higher than the preset threshold, it indicates that there are relatively sufficient bandwidth resources available for data transmission between the first reasoning device 101 and the second reasoning device 102 for a long time.
  • the updating device 104 can increase the confidence
  • the degree threshold is used to increase the number of input samples transmitted from the first inference device 101 to the second inference device 102, so as to improve the accuracy of inference performed by the inference system 100 on the input samples. And when the average remaining transmission bandwidth is not higher than the preset threshold, it indicates that the bandwidth resources between the first reasoning device 101 and the second reasoning device 102 are relatively tight. At this time, the updating device 104 may not increase the confidence threshold, so that To avoid the problem that the confidence threshold is too large and aggravates the shortage of bandwidth resources between the first reasoning device 101 and the second reasoning device 102 .
  • the transmission bandwidth between the first reasoning device 101 and the second reasoning device 102 increases.
  • the update device 104 can increase the input samples transmitted from the first inference device 101 to the second inference device 102 by increasing the confidence threshold , so as to improve the accuracy of inference performed by the inference system 100 on the input sample.
  • the first update trigger condition may also be other conditions besides the above example, which is not limited in this embodiment.
  • the first update trigger condition may be any one of the above examples, or may include multiple conditions at the same time.
  • the update device 104 determines that the inference system 100 satisfies the first update trigger condition, it can update the transmission conditions in the decision-making device 103 not only by increasing the confidence threshold, but also by increasing the transmission ratio
  • the upper limit is used to update the transmission conditions, so that the inference model 100 can use the second inference model to perform inference on a larger number of input samples, thereby improving the inference accuracy.
  • the updating means 104 updates the confidence threshold in the transmission condition, it can not only increase the confidence threshold, but also decrease the confidence threshold.
  • the preset first update trigger condition may specifically include:
  • the ratio of the input samples sent by the first reasoning device 101 to the second reasoning device 102 relative to the total input samples received by the first reasoning device 101 exceeds the preset transmission ratio upper limit.
  • the first reasoning model may have different degrees of difficulty in reasoning different input samples.
  • the input samples may specifically be images taken of workers on a construction site.
  • the number of workers arriving at the construction site is usually small. If the number of workers in hard hats is small, then, using the first inference model with a smaller specification to identify the captured image (that is, the aforementioned reasoning), it is usually possible to more accurately identify the workers in the captured image and Whether each worker wears a safety helmet (the confidence of the reasoning result is higher), that is, the reasoning difficulty of the first reasoning model is relatively low.
  • the first inference model recognizes the The accuracy of the staff and the safety helmet is low (the confidence of the reasoning result is low), that is, the reasoning difficulty of the first reasoning model is relatively high.
  • the update unit 104 can reduce the number of input samples transmitted from the first inference unit 101 to the second inference unit 102 by reducing the confidence threshold, that is, the confidence of the inference result is less than the confidence threshold before adjustment but greater than
  • the input samples with the adjusted confidence threshold may not be transmitted to the second reasoning device, so as to prevent the proportion of input samples transmitted to the second reasoning device 102 from exceeding the upper limit of the transmission ratio specified by the user.
  • the updating means 104 may increase the number of input samples transmitted from the first inference means 101 to the second inference means 102 by increasing the upper limit of the transmission ratio. In this way, for an input sample whose inference result output by the first inference model has a low confidence, inference can be performed by the second inference model in the second inference device 102 , so as to improve the accuracy of inference performed by the inference system 100 on the input sample.
  • the transmission bandwidth between the first reasoning device 101 and the second reasoning device 102 is reduced.
  • the preset transmission ratio upper limit is related to the larger transmission bandwidth between the first reasoning device 101 and the second reasoning device 102, when the transmission bandwidth between the first reasoning device 101 and the second reasoning device 102 decreases, If the input samples are still transmitted to the second inference device 102 according to the previously set confidence threshold, then the first inference device 101 has a problem of insufficient transmission bandwidth when transmitting the input samples.
  • the update unit 104 can reduce the number of input samples transmitted from the first inference unit 101 to the second inference unit 102 by reducing the confidence threshold, so as to reduce the gap between the first inference unit 101 and the second inference unit 102
  • the transmission bandwidth consumption is adapted to the current transmission bandwidth.
  • the updating device 104 may also reduce the number of input samples transmitted from the first reasoning device 101 to the second reasoning device 102 by reducing the upper limit of the transmission ratio , so as to reduce transmission bandwidth consumption between the first reasoning device 101 and the second reasoning device 102 .
  • the first update trigger condition that triggers the update device 104 to decrease the confidence threshold may also be implemented in other manners, which is not limited in this embodiment.
  • the first update trigger condition may be any one of the above examples, or may include multiple conditions at the same time.
  • the monitoring module 1042 in the update device 104 can continuously monitor the inference system 100 to determine whether the confidence threshold in the decision device 103 needs to be updated, and After it is determined that an update is required, the specific value of the updated confidence threshold may be further determined, so that the decision-making device 103 may subsequently determine whether to transmit the input sample to the second reasoning device 102 according to the updated confidence threshold.
  • the update of the confidence threshold in the transmission condition is taken as an example for illustration.
  • updating the transmission condition may also be updating the discriminant model, that is, the decision-making device 103 may use the discriminant model to determine whether to transmit the input sample to the second reasoning device 102 .
  • the discrimination module can be, for example, a binary classification model, etc., and the decision-making device 103 can input the inference result and the confidence degree output by the first inference model into the discrimination model, and output the discrimination result from the discrimination model, so as to make a decision.
  • the device 103 may determine whether to transmit the input sample corresponding to the reasoning result to the second reasoning device 102 according to the discrimination result.
  • the updating device 103 updates the transmission conditions, it may specifically update the discriminant model in the decision-making device 103, such as updating parameters or network structures in the discriminant model, which is not limited in this embodiment.
  • the updating unit 104 can not only update the confidence threshold in the decision-making unit 103, but also update the first reasoning model configured in the first reasoning unit 101, and/or, be the second reasoning unit The second inference model configured in step 102 is updated.
  • the update device 104 can monitor whether the inference system 100 satisfies a preset second update trigger condition, and when the inference system 100 meets the second update trigger condition, the update device 104 can update the The configured first inference model and/or the second inference model are updated.
  • the updating means 104 may update the specification of the reasoning model, or may retrain the reasoning model.
  • the updating means 104 may update the specification of the first reasoning model by using an elastic update mechanism.
  • the first inference device 101 since the resources of the first inference device 101 deployed on the edge network are usually limited, and in actual application scenarios, the first inference device 101 may not only be used to provide inference services for the terminal device 105, but may also have other business Services, such as big data search, edge cloud computing, etc., and the priorities of different business services provided by the first reasoning device 101 may also be different. Therefore, when the first reasoning device 101 preempts more resources of the first reasoning device 101 when providing other business services with higher priority, resulting in a decrease in the amount of resources available for the first reasoning device 101 to provide reasoning services, the first It may be difficult for the currently remaining available resources on the inference device 101 to support the first inference device 101 to perform inference on the input samples at the edge side by using the first inference model of the original specification.
  • the update device 104 may reduce the specification of the first inference model, for example, by performing model distillation or model compression on the original first inference model to reduce the specification of the first inference model, so that the first inference device
  • the currently remaining available resources on 101 can support the first inference model with a smaller specification to perform inference on the input samples.
  • the resource amount of available resources on the first reasoning device 101 may be detected by the collection module 1041 in the updating device 104 .
  • the updating device 104 may also reduce the specification of the first inference model.
  • the update device 104 can increase the specification of the first inference model, such as by rebuilding the inference model, etc. , according to the resource amount of the increased available resources, generate a first inference model with a larger specification, so that the inference model with a larger specification can be used to improve the inference accuracy of inference input samples at the edge side, and at the same time, the first inference model with a larger specification
  • the confidence of model reasoning on the input samples can also be improved, so that the number (or proportion) of input samples transmitted from the first reasoning device 101 to the second reasoning device 102 can be reduced, and the consumption of transmission bandwidth can be reduced.
  • the updating means 104 may determine the amount of available resources of the first reasoning means 101 within a second time period, the second time period may be, for example, a period of time in the past or in the future (such as a week, a month, etc.), so that The updating means 104 may update the specification of the first inference model according to the amount of available resources of the first inference means 101 within the second time period.
  • the update module 1043 can use the collection module 1041 to collect the change of the resource amount of the available resources of the first reasoning device 101 in the past period of time, and according to the change of the resource amount, predict the future of the first reasoning device 101 The amount of resources available for the second time period.
  • the update module 1043 may increase the specification of the first inference model according to the predicted amount of available resources. In this way, the first inference device 101 can use the updated first inference model with a larger specification to perform inference on the input samples at the edge side within the second time period. Conversely, when the predicted amount of available resources is smaller than the amount of currently available resources, the update module 1043 may reduce the specification of the first reasoning model.
  • the update module 1043 may collect the average amount of available resources of the first reasoning device 101 in the past second time period through the collection module 1041, and when the average amount of resources is greater than the resource amount of the current available resources, the update module 1043 may increase the specification of the first inference model; and when the average resource amount is less than the resource amount of currently available resources, the updating module 1043 may decrease the specification of the first inference model.
  • the inference system 100 may also present to the user (through the terminal device 105) an elastic update configuration interface as shown in FIG.
  • the prompt information as shown in Figure 4, "Please choose whether to elastically update the inference model".
  • the updating means 104 can determine whether to automatically update the first reasoning model in the first reasoning means 101 dynamically according to the user's selection operation for elastically updating the reasoning model.
  • the updating means 104 may update the first inference model and/or the second inference model by means of incremental training.
  • the input samples inferred by the inference system 100 may have changes in the distribution of data features, thereby reducing the inference accuracy of the first inference model and/or the second inference model for the input samples, or even model failure.
  • the first inference model and the second inference model in the inference system 100 can identify the red helmet in the captured image (that is, the input sample or the input sample), but if the person working on the construction site If the helmets worn by workers are uniformly changed to yellow or blue, it may be difficult for the first inference model and the second inference model to recognize the yellow or blue helmets, thereby reducing the recognition accuracy of the inference system 100 for the helmets.
  • the monitoring module 1042 in the update device 104 can monitor whether the average inference accuracy of the first inference model within the first time period is lower than the first accuracy threshold and the first Whether the remaining transmission bandwidth between the reasoning device and the second reasoning device is lower than a preset threshold, and feed back the monitoring result to the updating module 1043 .
  • the update module 1043 determines that the average inference accuracy is lower than the first accuracy threshold and the remaining transmission bandwidth is lower than the preset threshold, the update module 1043 determines to update the first inference model.
  • the update module 1043 may update the first inference model by means of incremental training. Specifically, the update module 1043 can acquire incremental training samples, so that the update module 1043 can use the incremental training samples to perform incremental training on the first inference model, so as to improve the inference accuracy of the first inference model for input samples.
  • the incremental training samples can be marked by the user in advance and provided to the inference system 100; or, when the first inference model fails but the second inference model does not fail, the incremental training samples can be generated through the second inference model, etc. .
  • the hard hat detection scene it is possible to use pre-annotated and captured images that include yellow or blue hard hats as incremental training samples, and use the captured images to perform incremental training on the first inference model, which makes The first inference model obtained by the incremental training can effectively infer the red, yellow or blue helmet in the captured image.
  • the monitoring module 1042 can also monitor whether the average inference accuracy of the second inference model within the first time period is lower than the second accuracy threshold, and feed back the monitoring result to the updating module 1043 .
  • the update module 1043 determines that the average inference accuracy is lower than the second accuracy threshold, the update module 1043 performs an update process on the second inference model.
  • the second precision threshold may be greater than the aforementioned first precision threshold, for example.
  • the updating module 1043 may also update the second reasoning model by means of incremental training or rebuilding the reasoning model, and its specific implementation method is similar to that of updating the first reasoning model by the above-mentioned updating module 1043, which can be referred to the foregoing section Relevant descriptions are omitted here.
  • the implementation of incrementally updating the first inference model and the second inference model is only for illustrative purposes.
  • the updating module 1043 can also be completed by rebuilding the model and training the first inference model. and the updating of the second reasoning model, which is not limited in this embodiment.
  • the update module 1043 may use incremental training samples to perform incremental training on the specification-adjusted first inference model while adjusting the specification of the first inference model.
  • the inference system 100 can continue to use the first inference model and the second inference model before the update to provide inference services for the terminal device 105, and after completing the inference model After the update of , the inference system 100 can use the updated first inference model and the second inference model to provide inference services for the terminal device 105, so as to avoid interruption of the inference service provided by the inference system 100 caused by updating the inference model.
  • this embodiment is illustrated by taking the first reasoning device 101 deployed on the edge network and the second reasoning device 102 deployed on the cloud as an example.
  • the first reasoning device 101 may also be deployed on the local network
  • the second reasoning device 102 is deployed on the edge network.
  • the reasoning process of the reasoning system 100 for the input sample and the process of updating the confidence threshold and the model are similar to the above-mentioned process.
  • FIG. 5 is a schematic flowchart of an inference method provided by an embodiment of the present application.
  • the reasoning method shown in FIG. 5 can be applied to the reasoning system 100 shown in FIG. 2 , or to other applicable reasoning systems.
  • the reasoning system 100 shown in FIG. 2 it is applied to the reasoning system 100 shown in FIG. 2 and the reasoning system 100 performs reasoning on two different input samples as an example for illustration.
  • the reasoning method shown in FIG. 5 may specifically include:
  • the first inference device 101 receives an input sample 1 .
  • the terminal device 105 on the user side may send an input sample 1 to the first reasoning device 101.
  • the input sample may be, for example, a captured image, such as a captured image of a construction site in a helmet scene, or may be other on the samples used as input to the model.
  • the first inference device 101 performs inference on the input sample 1 by using the pre-configured inference model 1 with a smaller specification, and obtains an inference result 1 and a confidence level 1.
  • the first reasoning device 101 may feed back a more accurate reasoning result 1 to the terminal device 101 .
  • the confidence level 1 output by the reasoning model 1 is less than the preset confidence level threshold, it can be considered that the reasoning result 1 is inaccurate.
  • the first reasoning device 101 can request the decision-making device 103 to send the input sample 1 to the second reasoning device 102, so that the input sample 1 can be more accurately analyzed by using the larger reasoning model 2 on the second reasoning device 102. reasoning.
  • the decision-making device 103 allows the first reasoning device 101 to upload the input sample 1 to the second reasoning device 102 under the condition that the upper limit of the transmission ratio is not exceeded.
  • the upper limit of the transmission ratio can be specified by the user in advance, specifically, the user can input a specific value of the upper limit of the transmission ratio, or the inference system 100 can use the inference accuracy specified by the user, the difference between the first inference device 101 and the second inference device 102 Calculate the upper limit of the transmission ratio based on the upper limit of the transmission bandwidth between them.
  • the decision-making device 103 can monitor if the input sample 1 is sent to the second reasoning device 102, whether the ratio of the number of sent input samples to the number of input samples processed by the first reasoning device 101 exceeds Pre-configured transfer scale cap. If not, the first inference device 101 is allowed to upload the input sample 1 to the second inference device 102 . If it exceeds, even if the inference model 1 infers that the confidence level 1 of the input sample 1 is low, the decision-making device 103 will still send the inference result 1 to the terminal device 105 (not shown in FIG. 5 ), so as to avoid the first reasoning device The transmission bandwidth between 101 and the second reasoning device 102 exceeds the upper limit of the transmission bandwidth.
  • the first reasoning device 101 sends the input sample 1 to the second reasoning device 102 .
  • the second reasoning device 102 performs reasoning on the input sample 1 by using the pre-configured reasoning model 2 with relatively large specifications, and obtains the reasoning result 2 (and the confidence level 2).
  • the second reasoning module 102 sends the reasoning result 2 (and the confidence level 2) to the terminal device 105.
  • the second reasoning device 102 may send the reasoning result 2 (and the confidence level 2 ) for the input sample 1 to the terminal device 105 and the like through the first reasoning device 101 .
  • the updating unit 104 determines and updates the confidence threshold in the decision-making unit 103 by detecting the reasoning system 100 .
  • the update device 104 monitors whether the inference system 100 satisfies the first update trigger condition, and when the first update trigger condition is met, the update device 104 can perform an update on the value of the configured confidence threshold according to the first update trigger condition. renew. Certainly, if the reasoning system 100 does not satisfy the first update trigger condition, the updating means 104 may not update the confidence threshold.
  • the updating means 104 may increase the confidence threshold.
  • the preset first update trigger condition may specifically include:
  • the average inference accuracy of the first inference model within the first time period is lower than the first accuracy threshold.
  • the transmission bandwidth between the first reasoning device 101 and the second reasoning device 102 increases.
  • the updating means 104 may reduce the confidence threshold.
  • the preset first update trigger condition may specifically include:
  • the transmission bandwidth between the first inference device 101 and the second inference device 102 is reduced.
  • the ratio of the input samples sent by the first reasoning device 101 to the second reasoning device 102 relative to the total input samples received by the first reasoning device 101 exceeds the preset transmission ratio upper limit
  • the first update trigger condition may also be implemented in other ways, which is not limited in this embodiment.
  • the update device 104 may also adjust the value of the upper limit of the transmission ratio, so that the performance of the inference system 100 can be maintained at higher level.
  • the updating device 104 determines to update the reasoning model 1 in the first reasoning device 101 and to update the reasoning model 2 in the second reasoning device 102 by detecting the reasoning system 100 .
  • the update device 104 monitors whether the inference model in the inference system 100 satisfies the second update trigger condition, and when the second update trigger condition is satisfied, the update device 104 can update the configured inference model 1 according to the second update trigger condition. To update, further, the update means 104 may also update the configured inference model 2 according to the second update trigger condition. Of course, if the reasoning model in the reasoning system 100 does not satisfy the second update trigger condition, the updating means 104 may not update the reasoning model.
  • the updating unit 104 may adjust the specification of the inference model 1 when the amount of available resources for the inference service provided by the first inference unit 101 changes or the load of the first inference unit 101 changes. For example, when the amount of available resources of the first reasoning device 101 decreases or the load of the first reasoning device 101 increases, the updating device 101 may reduce the specification of the reasoning model 1; and when the available resources of the first reasoning device 101 When the amount of resources increases or the load of the first inference device 101 decreases, the update device 101 may increase the specification of the inference model 1 .
  • the updating device 104 may perform incremental training or retraining when it is determined that the inference accuracy of the inference model 1 and/or the inference model 2 for input samples is reduced, or even the inference model 1 and/or the inference model 2 fails. Inference Model 1 and/or Inference Model 2 are updated. Wherein, the updating device 104 determines that the inference accuracy of the reasoning model 1 and the reasoning model 2 is reduced and the specific implementation process of updating the reasoning model can refer to the relevant descriptions of the foregoing embodiments, and details are not repeated here.
  • the update device 104 updates the confidence threshold and the inference model as an example for illustration. In actual application, the update device 104 can only update the confidence threshold or only the inference model. This embodiment does not limit it.
  • the first reasoning device 101 receives an input sample 2 .
  • the first reasoning device 101 uses the updated reasoning model 1 to reason the input sample 2, and outputs a reasoning result 3 and a confidence level 3.
  • the decision-making device 103 allows the first reasoning device 101 to upload the input sample 2 to the second reasoning device 102 under the condition that the upper limit of the transmission ratio is not exceeded.
  • the first inference device 101 sends the input sample 2 to the second inference device 102 .
  • the second inference device 102 uses the updated inference model 2 to infer the input sample 2, and outputs an inference result 4 (and confidence 4).
  • the second reasoning module 102 sends the reasoning result 4 (and the confidence level 4) to the terminal device 105.
  • the second reasoning device 102 may send the reasoning result 4 (and the confidence 4) for the input sample 2 to the terminal device 105 and the like through the first reasoning device 101 .
  • the update of the confidence threshold, inference model 1 or inference model 2 is taken as an example to illustrate the gap between inferring two input samples.
  • the update device 104 may also be inference input 2, update the confidence threshold, inference model 1 or inference model 2.
  • the updating means 104 involved in the inference process for the input samples may be software configured on the computer equipment, and by running the software on the computer equipment, the computer equipment can realize the above-mentioned updating means 104 The functions it has.
  • the update device 104 involved in the process of inferring input samples will be introduced in detail.
  • Figure 6 shows a computer device.
  • the computer device 600 shown in FIG. 6 can be specifically used to implement the functions of the updating apparatus 104 in the embodiment shown in FIG. 5 above.
  • the computer device 600 includes a bus 601 , a processor 602 , a communication interface 603 and a memory 604 .
  • the processor 602 , the memory 604 and the communication interface 603 communicate through the bus 601 .
  • the bus 601 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like.
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 6 , but it does not mean that there is only one bus or one type of bus.
  • the communication interface 603 is used for communicating with the outside, for example, receiving a data acquisition request sent by a terminal.
  • the processor 602 may be a central processing unit (central processing unit, CPU).
  • the memory 604 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM).
  • the memory 604 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory, ROM), flash memory, HDD or SSD.
  • Executable codes are stored in the memory 604 , and the processor 602 executes the executable codes to execute the method executed by the aforementioned resource scheduling apparatus 101 .
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when the computer-readable storage medium is run on the computer equipment, the computer equipment executes the update device 104 of the above-mentioned embodiment. method.
  • an embodiment of the present application also provides a computer program product.
  • the computer program product When the computer program product is executed by a computer, the computer executes any one of the aforementioned inference methods.
  • the computer program product may be a software installation package, which can be downloaded and executed on a computer if any of the aforementioned reasoning methods needs to be used.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be A physical unit can be located in one place, or it can be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between the modules indicates that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines.
  • the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a floppy disk of a computer , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the instructions described in various embodiments of the present application method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, training device, or data
  • the center transmits to another website site, computer, training device or data center via wired (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • wired eg, coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg, infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种推理系统,该推理系统包括第一推理装置(101)、第二推理装置(102)、更新装置(104)以及决策装置(103)。第一推理装置(101)用于利用第一推理模型对输入样本进行推理;决策装置(103)用于在第一推理模型针对输入样本的推理结果满足传输条件的情况下,确定将输入样本传输给第二推理装置(102);第二推理装置(102)用于利用第二推理模型对输入样本进行推理,该第二推理模型的规格大于第一推理模型的规格;更新装置(104),用于当推理系统满足第一更新触发条件是,更新传输条件。由于传输条件可以动态调整,这使得在根据实际应用需求调整传输条件后,可以使得推理系统能够基于调整后的传输条件保持较高的推理性能。此外,还提供了对应的方法、装置及相关设备。

Description

一种推理系统、方法、装置及相关设备 技术领域
本申请涉及人工智能技术领域,尤其涉及一种推理系统、方法、装置及相关设备。
背景技术
在人工智能(artificial intelligence,AI)领域中,机器学习技术作为AI领域一种重要的方法和手段,旨在通过机器学习算法对训练数据集进行规律分析得到模型,并利用该模型持续对未知的样本数据进行推理。通常情况下,用于实现模型推理的资源量,对于模型推理效果存在重要影响。
目前,可以根据模型部署环境的资源量限制设置两级的推理机制。比如,在边云协同的推理场景中,可以在边缘侧以及云端分别设置不同规格的推理模型,并且,基于边缘侧的计算资源少于云端的计算资源,部署于在边缘侧的推理模型的规格通常小于部署于云端的推理模型的规格。相应的,针对相同的输入样本,云端的推理模型对于该输入样本的推理效果(如推理精度、效率等),通常优于边缘侧的推理模型对于该输入样本的推理效果。因此,在对输入样本进行推理时,可以优先利用边缘侧的推理模型对该输入样本进行推理,而当边缘侧的推理模型对于输入样本推理结果的置信度过低时,将该输入样本发送至云端,以便利用云端的规格更大的推理模型对该输入样本进行推理,以此提高最终得到的推理结果的精度。并且,由于将输入样本由边缘侧传输至云端,需要占用一定的传输带宽,因此,实际应用场景中,用户通常会限制向云端传输输入样本的比例,以避免边缘侧将过多的输入样本传输至云端而导致占用较大的传输带宽。
但是,实际应用时,基于这种推理机制的推理系统的性能可能难以保持较高的水平,比如,在部分时间段内,推理系统针对输入样本所确定的推理结果的准确度较低等。因此,目前亟需一种推理方案,以使得推理系统的性能保持较高的水平。
发明内容
本申请提供了一种推理系统,用于使得推理输入样本的准确度保持在较高的水平。此外,本申请还提供了一种推理方法、更新装置、计算机设备、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供了一种推理系统,该推理系统包括第一推理装置、第二推理装置、更新装置以及决策装置。其中,第一推理装置用于利用第一推理模型对输入样本进行推理;决策装置用于在第一推理模型针对该输入样本的推理结果满足传输条件的情况下,确定将输入样本传输给第二推理装置;第二推理装置用于利用第二推理模型对接收到的该输入样本进行推理,并且,第二推理装置所使用的第二推理模型的规格大于第一推理装置所使用的第一推理模型的规格;更新装置,用于当推理系统满足第一更新触发条件是,更新决策装置中的传输条件。
由于用于决策是否将输入样本传输至第二推理装置的传输条件可以动态调整,这使得在根据实际应用需求调整传输条件后,可以使得推理系统能够基于调整后的传输条件保持较高的推理性能。比如,当第一推理装置与第二推理装置之间的传输带宽增加时,推理系统通过更新传输条件以使得更多的输入样本被规格更大的第二推理模型进行推理,以此实现在有限的传输带宽范围内,推理系统的推理精度保持在较高时水平。而当第一推理装置与第二推理装置之间的传输带宽减少时,推理系统通过更新传输条件来减少向第二推理装置传输的输入样本的数量,以此减少推理系统所占用的传输带宽。
可选地,第一推理装置以及第二推理装置中可以通过软件或者硬件实现。其中,当通过软件实现时,第一推理装置以及第二推理装置例如可以是运行在计算设备上的虚拟机等。当通过硬件实现时,第一推理装置以及第二推理装置中可以包括一个或者多个计算设备,如包括一个或者多个服务器等。
并且,第一推理装置以及第二推理装置可以部署于不同的环境中。例如,第一推理装置可以部署于边缘网络,第二推理装置可以部署于云端;或者,第一推理装置可以部署于本地网络,而第二推理装置可以部署于边缘网络等。
在一种可能的实施方式中,传输条件具体可以是推理结果的置信度低于置信度阈值,即决策装置可以在第一推理模型针对输入样本的推理结果的置信度低于置信度阈值时,确定将该输入样本传输至第二推理装置,以便提高针对该输入样本的推理精度。相应的,更新装置在更新传输条件时,具体可以是更新置信度阈值的大小,如增大或者减小置信度阈值等。如此,更新装置可以通过动态调整置信度阈值,使得推理系统的性能达到较高的水平。
在一种可能的实施方式中,传输条件还包括第一推理装置发送给第二推理装置的输入样本相对于第一推理装置接收的总输入样本的比例不超过传输比例上限。示例性的,该传输比例上限可以预先由用户进行设定。这样,对于第一推理模型针对输入样本的推理结果的置信度较低时,决策装置可以先判断若将该输入样本发送至第二推理装置后,已发送的输入样本相对于第一推理装置101接收的总输入样本的比例是否超出预先配置的传输比例上限。若超出,则即使第一推理模型推理该输入样本的置信度较低,决策装置仍将推理结果发送至终端设备,以此避免第一推理装置与第二推理装置之间的传输带宽超出传输带宽上限。而若未超出,则决策装置可以指示第一推理装置101将该输入样本发送至第二推理装置101,以便得到针对该输入样本的更加准确的推理结果。
在一种可能的实施方式中,第一更新触发条件包括第一推理模型在第一时间段内的平均推理精度低于第一精度阈值、第一推理装置与第二推理装置之间的传输带宽增加中的至少一种。此时,更新装置在更新置信度阈值时,具体可以是增大置信度阈值。如此,可以更新装置可以通过增大置信度阈值,将更多的输入样本发送至第二推理装置中进行推理,以此可以提高推理系统针对输入样本的整体推理精度。
在一种可能的实施方式中,更新装置具体可以是当第一推理装置与第二推理装置之间在第一时间段内平均剩余传输带宽高于预设阈值时,增大置信度阈值。如此,更新装置在增大置信度阈值后,第一推理装置与第二推理装置之间具有足够的传输带宽 来支持更多数量的输入样本进行传输。
可选地,当第一推理装置与第二推理装置之间在第一时间段内平均剩余传输带宽不高于预设阈值时,更新装置可以不更新置信度阈值。如此,可以避免更新装置在增大置信度阈值后,由于更多数量的输入样本被传输至第二推理装置而导致第一推理装置与第二推理装置之间的传输带宽不足。
在一种可能的实施方式中,第一更新触发条件具体可以是第一推理装置与第二推理装置之间的传输带宽减少、第一推理装置发送至第二推理装置的输入样本相对于第一推理装置接收的总输入样本的比例超过传输比例上限。相应的,更新装置在更新传输条件时,具体可以是减小置信度阈值。如此,基于减小后的置信度阈值,推理系统可以减少上传至第二推理装置的数量,以此可以减少第一推理装置与第二推理装置之间的传输带宽消耗。
在一种可能的实施方式中,更新装置还用于当满足第二更新触发条件时,更新第一推理模型和/或第二推理模型。这样,更新装置可以通过对推理模型进行更新的方式来提供推理系统针对输入样本的推理精度。
在一种可能的实施方式中,当第一推理模型在第一时间段内的平均推理精度低于第一精度阈值且第一推理装置与第二推理装置之间的剩余传输带宽低于预设阈值时,更新第一推理模型。和/或,当第二推理模型在第一时间段内的平均推理精度低于第二精度阈值时,更新第二推理模型。如此,更新装置可以在推理模型的推理精度较低时,可以通过更新推理模型来提高针对输入样本的推理精度。
可选地,第二精度阈值可以大于第一精度阈值。
在一种可能的实施方式中,更新装置在更新推理模型时,具体可以是先获取增量训练样本,该增量训练样本例如可以是推理系统在最近时间段内所推理的输入样本,并且可以有用户或者标注人员完成标记。然后,更新装置可以利用增量训练样本对第一推理模型和/或第二推理模型进行增量训练。这样,经过增量训练后的第一推理模型和/或第二推理模型能够对与增量训练样本类似的输入样本进行更加精度的推理。
在一种可能的实施方式中,更新装置在更新第一推理模型时,可以先确定第一推理装置在第二时间段内的可用资源的资源量,例如可以是通过预测的方式进行确定等,然后,更新装置可以根据该可用资源的资源量,更新第一推理模型的规格。比如,当可用资源的资源量减少时,更新装置可以减小第一推理模型的规格;而当可用资源的资源量增加时,更新装置可以增大第一推理模型的规格等。
第二方面,本申请提供一种推理方法,所述推理方法应用于推理系统中的更新装置,所述推理系统还包括第一推理装置、第二推理装置以及决策装置,所述方法包括:所述更新装置获取所述推理系统的资源信息和/或推理结果,所述推理结果包括所述第一推理装置利用第一推理模型对输入样本进行推理的结果,其中,当所述第一推理模型针对所述输入样本进行推理的结果满足所述决策装置中的传输条件时,所述输入样本被传输至所述第二推理装置;所述更新装置根据所述推理系统的资源信息和/或所述推理系统的推理结果确定所述推理系统满足第一更新触发条件;所述更新装置更新所述传输条件。
在一种可能的实施方式中,所述传输条件包括所述推理结果的置信度低于置信度 阈值,所述更新装置更新所述传输条件,包括:所述更新装置更新所述置信度阈值。
在一种可能的实施方式中,所述传输条件还包括传输至所述第二推理装置的输入样本相对于所述第一推理装置接收的总输入样本的比例,不超过传输比例上限。
在一种可能的实施方式中,所述第一更新触发条件包括所述第一推理模型在第一时间段内的平均推理精度低于第一精度阈值、所述第一推理装置与所述第二推理装置之间的传输带宽增加中的至少一种;所述更新装置更新所述传输条件,包括:所述更新装置增大所述置信度阈值。
在一种可能的实施方式中,所述更新装置更新所述传输条件,包括:当所述第一推理装置与所述第二推理装置之间在所述第一时间段内的平均剩余传输带宽高于预设阈值时,增大所述置信度阈值。
在一种可能的实施方式中,所述第一更新触发条件包括所述第一推理装置与所述第二推理装置之间的传输带宽减少、传输至所述第二推理装置的输入样本相对于所述第一推理装置接收的总输入样本的比例超过传输比例上限中的至少一种;所述更新装置更新所述传输条件,包括:所述更新装置减小所述置信度阈值。
在一种可能的实施方式中,所述方法还包括:当满足第二更新触发条件时,所述更新装置更新所述第一推理模型和/或所述第二推理模型。
在一种可能的实施方式中,所述当满足第二更新触发条件时,所述更新装置更新所述第一推理模型和/或所述第二推理模型,包括:当所述第一推理模型在第一时间段内的平均推理精度低于第一精度阈值且所述第一推理装置与所述第二推理装置之间的剩余传输带宽低于预设阈值时,所述更新装置更新所述第一推理模型;和/或,当所述第二推理模型在所述第一时间段内的平均推理精度低于第二精度阈值时,所述更新装置更新所述第二推理模型。
在一种可能的实施方式中,所述更新装置更新所述第一推理模型和/或所述第二推理模型,包括:所述更新装置获取增量训练样本;所述更新装置利用所述增量训练样本对所述第一推理模型和/或所述第二推理模型进行增量训练。
在一种可能的实施方式中,所述更新装置更新所述第一推理模型,包括:所述更新装置确定所述第一推理装置在第二时间段内的可用资源的资源量;所述更新装置根据所述第一推理装置在第二时间段内的可用资源的资源量,更新所述第一推理模型的规格。
由于第二方面提供的推理方法,对应于第一方面提供的推理系统,故针对第二方面以及第二方面任意一种可能实现方式中所具有的技术效果,可参见与之对应的第一方面以及第一方面任意一种可能实现方式所对应的技术效果,本实施例对此不再进行赘述。
第三方面,本申请提供一种更新装置,所述更新装置应用于推理系统,所述推理系统还包括第一推理装置、第二推理装置以及决策装置,所述更新装置包括:采集模块,用于获取所述推理系统的资源信息和/或推理结果,所述推理结果包括所述第一推理装置利用第一推理模型对输入样本进行推理的结果,其中,当所述第一推理模型针对所述输入样本进行推理的结果满足所述决策装置中的传输条件时,所述输入样本被传输至所述第二推理装置;监测模块,用于根据所述推理系统的资源信息和/或所述推 理系统的推理结果确定所述推理系统满足第一更新触发条件;更新模块,用于更新所述传输条件。
在一种可能的实施方式中,所述传输条件包括所述推理结果的置信度低于置信度阈值,所述更新模块,具体用于更新所述置信度阈值。
在一种可能的实施方式中,所述传输条件还包括传输至所述第二推理装置的输入样本相对于所述第一推理装置接收的总输入样本的比例,不超过传输比例上限。
在一种可能的实施方式中,所述第一更新触发条件包括所述第一推理模型在第一时间段内的平均推理精度低于第一精度阈值、所述第一推理装置与所述第二推理装置之间的传输带宽增加中的至少一种;所述更新模块,具体用于增大所述置信度阈值。
在一种可能的实施方式中,所述更新模块,用于当所述第一推理装置与所述第二推理装置之间在所述第一时间段内的平均剩余传输带宽高于预设阈值时,增大所述置信度阈值。
在一种可能的实施方式中,所述第一更新触发条件包括所述第一推理装置与所述第二推理装置之间的传输带宽减少、传输至所述第二推理装置的输入样本相对于所述第一推理装置接收的总输入样本的比例超过传输比例上限中的至少一种;所述更新模块,具体用于减小所述置信度阈值。
在一种可能的实施方式中,所述更新模块,还用于当满足第二更新触发条件时,更新所述第一推理模型和/或所述第二推理模型。
在一种可能的实施方式中,所述更新模块,用于:当所述第一推理模型在第一时间段内的平均推理精度低于第一精度阈值且所述第一推理装置与所述第二推理装置之间的剩余传输带宽低于预设阈值时,更新所述第一推理模型;和/或,当所述第二推理模型在所述第一时间段内的平均推理精度低于第二精度阈值时,更新所述第二推理模型。
在一种可能的实施方式中,所述更新模块,用于:获取增量训练样本;利用所述增量训练样本对所述第一推理模型和/或所述第二推理模型进行增量训练。
在一种可能的实施方式中,所述更新模块,用于:确定所述第一推理装置在第二时间段内的可用资源的资源量;根据所述第一推理装置在第二时间段内的可用资源的资源量,更新所述第一推理模型的规格。
由于第三方面提供的更新装置,对应于第一方面提供的推理系统,故针对第三方面以及第三方面任意一种可能实现方式中所具有的技术效果,可参见与之对应的第一方面以及第一方面任意一种可能实现方式所对应的技术效果,本实施例对此不再进行赘述。
第四方面,本申请提供一种计算机设备,所述计算机设备包括处理器和存储器;该存储器用于存储指令,当该计算机设备运行时,该处理器执行该存储器存储的该指令,以使该计算机设备执行上述第二方面或第二方面任一种可能实现方式中的推理方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。计算机设备还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第五方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存 储有指令,当其在计算机设备上运行时,使得计算机设备执行上述第二方面或第二方面的任一种实现方式所述的方法。
第六方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机设备上运行时,使得计算机设备执行上述第二方面或第二方面的任一种实现方式所述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其它的附图。
图1为本申请实施例提供的一种推理系统的架构示意图;
图2为本申请实施例提供的另一种推理系统的架构示意图;
图3为本申请实施例提供的一示例性交互界面示意图;
图4为本申请实施例提供的一示例性弹性更新配置界面的示意图;
图5为本申请实施例提供的一种推理方法的流程示意图;
图6为本申请实施例提供的一种计算机设备600的结构示意图。
具体实施方式
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解,这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
参见图1,为一种推理系统的架构示意图。如图1所示,该推理系统100包括第一推理装置101、第二推理装置102、决策装置103以及更新装置104。其中,第一推理装置101以及第二推理装置102可以通过软件或者硬件实现。当通过软件实现时,第一推理装置101以及第二推理装置102可以是运行在计算机设备上的软件,如虚拟机等。当通过硬件实现时,第一推理装置101以及第二推理装置102中均可以包括至少一个计算设备,图1中以第一推理装置101以及第二推理装置102分别包括多个服务器为例。实际应用时,构成第一推理装置101以及第二推理装置102的计算设备也可以是其它具有计算能力的设备,并不局限于图1所示的服务器。第一推理装置101以及第二推理装置102可以部署于不同的环境中。示例性地,如图1所示,第一推理装置101可以部署于边缘网络,用于在边缘侧执行相应的计算过程,如下述基于第一推理模型的推理过程等;第二推理装置102可以部署于云端,用于在云端执行相应的计算过程,如下述基于第二推理模型的推理过程等。而在其它示例中,第一推理装置 101可以部署于用户侧的本地网络,如本地的终端或者服务器等;第二推理装置102可以部署于边缘网络。本实施例中,对于第一推理装置101以及第二推理装置102的具体部署方式并不进行限定。
决策装置103以及更新装置104均可以与第一推理装置101部署于相同的环境中,比如,决策装置103以及更新装置104均可以与第一推理装置101部署于如图所示的边缘侧网络,或者也可以是部署于本地网络。其中,决策装置103以及更新装置104可以通过软件实现。此时,决策装置103以及更新装置104可以是应用于计算设备上的应用程序,该计算设备与第一推理装置101部署于相同的环境。另外,决策装置103还可以通过硬件实现,此时,决策装置103可以是与第一推理装置101位于相同环境的计算设备,如服务器;或者,决策装置103可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。在其它可能的部署方式中,更新装置104也可以是与第二推理装置102部署于相同的环境中,如部署于云端等。
推理系统100在推理输入样本时,如图1所示,第一推理装置101例如可以包括多个边缘服务器,并且第一推理装置101可以接收用户侧的终端设备105发送的输入样本,该输入样本例如可以是终端设备105拍摄(或者通过其它设备拍摄)得到的图像等。然后,第一推理装置101可以利用预先训练得到的第一推理模型对获取的输入样本进行推理,并得到推理结果,如利用第一推理模型对拍摄图像中的安全帽等对象进行检测等,同时,该第一推理模型还可以输出推理结果的置信度(用于表征推理结果正确的可信程度)。当决策装置103确定输入样本的推理结果的置信度较低时(具体可以是低于预设的置信度阈值),表征基于第一推理模型所得到的推理结果的准确度可能较差,因此,决策装置103可以指示第一推理装置101将该输入样本发送至第二推理装置102中。此时,第二推理装置102可以利用预先训练得到的第二推理模型对接收到的输入样本进行推理,并得到推理结果。由于第二推理模型的规格通常高于第一推理模型的规格,因此,基于第二推理模型所得到的推理结果的准确度(即指示推理结果正确的置信度)通常更高,以此可以使得推理系统100针对该输入样本的推理准确性达到较高水平。
实际应用场景中,第一推理装置101与第二推理装置102之间的传输带宽通常有限,因此,可以通过限制向第二推理装置102发送输入样本的比例来避免传输输入样本所占用的传输带宽过大。具体地,在基于第一推理模型对输入样本进行推理的置信度较低的情况下,决策装置103在确定是否将输入样本是否传输至第二推理装置102时,可以确定如果将该输入样本传输至第二推理装置,则第二推理装置102接收的输入样本相对于第一推理装置101接收的输入样本的比例是否超过用户指定的传输比例上限。若未超过,则决策装置103确定该输入样本传输至第二推理装置,以便利用规格更大的第二推理模型来提高对于该输入样本的推理精度。而若超过,决策装置103 可以拒绝传输该输入样本。相应的,推理系统100针对该输入样本的推理结果,即为第一推理装置利用第一推理模型对该输入样本进行推理所得到的结果,这就降低了推理系统100针对该输入样本的推理精度。
在此过程中,如果预先配置的置信度阈值为静态配置的固定取值,则该置信度阈值可能使得推理系统100的性能难以保持较高水平。比如,实际应用中,第一推理装置101与第二推理装置102之间可用的传输带宽可能会增加等,此时,如果原先设定的置信度阈值的取值过大,则置信度低于该置信度阈值的推理结果所对应的输入样本不被传输至第二推理装置102,从而推理系统100针对大量输入样本所输出的推理结果,均为规格较小的第一推理模型的推理结果,从而使得推理系统100的整体推理精度难以得到提升。同时,第一推理装置101与第二推理装置102之间可用的传输带宽也可能会减少等,此时,如果原先设定的置信度阈值的取值过小,则可能会导致较多数量的输入样本因为置信度低于置信度阈值而被传输至第二推理装置102,从而因为传输的输入样本的数量增多而导致占用的传输带宽较高,并且,较多数量的输入样本需要长时间排队等待被传输至第二推理装置102,也增加了推理系统100推理部分输入样本的时延。
基于此,本申请提供的推理系统100中,更新装置104可以对决策装置103中用于判断是否向第二推理装置102传输输入样本的条件(以下称之为传输条件)进行动态更新调整。以传输条件具体为推理结果的置信度不超过上述置信度阈值为例,在推理系统100提供推理服务的过程中,更新装置104可以检测推理系统100当前是否满足更新触发条件,若满足,则更新装置104可以对决策装置103中用于判定是否传输输入样本的置信度阈值等进行更新,从而决策装置103利用更新后的置信度阈值确定是否将该输入样本传输至第二推理装置102。这样,通过动态调整决策装置103中的置信度阈值,可以使得推理系统100的性能保持在较高的水平。比如,当第一推理装置101与第二推理装置102之间可用的传输带宽增加时,更新装置104可以相应的增大置信度阈值,以使得更多的输入样本(其推理结果的置信度低于增大后的置信度阈值)能够被传输至第二推理装置102,并且利用第二推理装置102中的规格更大的第二推理模型对输入样本进行推理,以此提高推理系统100的整体推理精度。而当第一推理装置101与第二推理装置102之间可用的传输带宽减少时,更新装置104可以相应的减小置信度阈值,以便通过减少置信度阈值来减少需要向第二推理装置102传输的输入样本数量,从而实现减少推理系统100推理输入样本的时延以及消耗的传输带宽。
需要说明的是,图1所示的推理系统100仅作为一种示例性说明,并不用于限定推理系统的具体实现。例如,在其它可能的实施方式中,推理系统100可以包括更多的功能模块以支持推理系统100具有更多其它的功能;或者,推理系统100中的决策装置103以及更新装置104可以集成为一个功能模块等。
为便于理解,下面结合附图,对本申请的实施例进行描述。
参见图2,图2为本申请实施例提供的一种推理系统的结构示意图。其中,图2所示的推理系统100部署于边云协同场景,即第一推理装置101、决策装置103以及更新装置104均部署于边缘网络,而第二推理装置102部署于云端。在图1所示的推理系统100的基础上,图2所述的推理系统100中更新装置104包括采集模块1041、监测模块1042以及更新模块1043。
本实施例中,推理系统100中的第一推理装置101预先配置有第一推理模型,而第二推理装置102中预先配置有第二推理模型。实际应用场景中,相对于边缘侧,云端的计算性能通常更高(如可用的计算资源的资源量更多等),为此,位于云端的第二推理装置102可以执行相对复杂的计算任务,而位于边缘网络的第一设备101可以执行相对简单的计算任务。基于此,本实施例中,为第一推理装置101配置的第一推理模型的规格,小于为第二推理装置102配置的第二推理模型的规格。例如,第一推理模型的文件大小为50M(兆),而第二推理模型的文件大小为200M等。作为一种示例,第一推理模型以及第二推理模型,例如可以是基于机器学习算法进行构建的机器学习模型,则,第一推理模型的规格小于第二推理模型的规格,具体可以是第一推理模型中的神经网络层数少于第二推理模型中的神经网络层数,或者第一推理模型包括的参数数量少于第二推理模型包括的参数数量。此时,第一推理模型的计算力(flops)少于第二推理模型的计算力,相应的,第一推理模型在运行时对于计算资源的需求量也少于第二推理模型在运行时对于计算资源的需求量。
其中,推理系统100可以通过更新装置104实现为第一推理装置101以及第二推理装置102配置推理模型,也可以是通过其它设备进行配置等。为便于描述,下面以更新装置104配置推理模型为例进行示例性说明。
在一种可能的实施方式中,更新装置104可以向用户侧呈现如图3所示的交互界面,并通过该交互界面提示用户指定针对推理系统100的约束条件,并提供用于模型训练的训练样本。其中,约束条件例如可以是推理精度、推理模型的规格、位于边缘网络的第一推理装置101所支持的AI框架、推理系统100的推理目标、第一推理装置101与第二推理装置102之间的传输带宽上限(或者是推理时延)、第二推理装置102推理的输入样本的数量相对于第一推理装置101推理输入样本的数量的最大占比(以下简称为传输比例上限)等。实际应用时,用户指定的约束条件还可以包括其它内容,如下述触发更新置信度阈值、传输比例上限或者推理模型的条件等。
其中,第一推理装置101所支持的AI框架,例如可以是TensorFlow框架、pytorch框架、mindspore框架等,并且,不同AI框架支持不同文件格式的推理模型。推理系统100的推理目标,用于指示推理模型的应用场景,如利用推理模型进行对象检测、图像分类等。然后,更新装置104可以利用更新模块1043,根据用户指定的约束条件,构建初始推理模型。其中,所构建的初始推理模型的规格,为用户所指定的规格;初始推理模型的文件格式,为用户指定的AI框架所支持的文件格式;初始推理模型的推理目标,为用户所指定的推理目标。接着,更新模块1043可以利用用户提供的训练样本对构建的初始推理模型进行训练,直至初始推理模型的推理精度达到用户所指定的推理精度时停止训练。此时,更新模块1043可以将完成训练的初始推理模型发送给第 二推理装置102,以便将该初始推理模型作为配置给第二推理装置102的第二推理模型。
在训练得到第二推理模型后,更新模块1043可以根据该第二推理模型生成第一推理模型。示例性地,在生成第二推理模型后,更新模块1043可以指示采集模块1041反馈第一推理装置101中可用资源的资源量。采集模块1041向第一推理装置101发送资源探测请求,以探测第一推理装置101当前可用资源的资源量,并将探测结果反馈给更新模块。其中,第一推理装置101上的可用资源,例如可以包括计算资源(如CPU等)、存储资源(如云磁盘等)等。更新模块1043可以根据获取的可用资源的资源量,确定所要生成的第一推理模型的规格,比如,假设支持规格1的推理模型运行需要64个处理器核、支持规格2的推理模型运行需要128个处理器核(规格2大于规格1),当更新模块1043获取的可用资源的资源量指示第一推理装置101上当前存在88个处理器核处于空闲状态,则更新模块1043可以确定所要生成的第一推理模型的规格为规格1,以便第一推理装置101能够具有足够的资源支持第一推理模型的运行。在确定出所要生成的第一推理模型的规格后,更新模块1043可以通过模型压缩、模型蒸馏等方式对第二推理模型进行处理,生成得到该规格的第一推理模型,并将生成的第一推理模型发送给第一推理装置101,以实现为第一推理装置101配置第一推理模型。进一步的,在将第一推理模型发送给第一推理装置101之前,还可以利用上述训练样本再次对该第一推理模型进行训练,并将完成训练的第一推理模型发送给第一推理装置101。
值得注意的是,上述生成第一推理模型以及第二推理模型的方式仅作为示例性说明,实际应用时也可以是采用其它方式进行生成。比如,在其它可能的实施方式中,更新模块1043也可以同时构建不同规格的第一推理模型以及第二推理模型,并利用相同的训练样本分别对第一推理模型以及第二推理模型完成训练。
同时,更新模块1043还可以根据用户指定的约束条件中的推理精度、传输带宽上限以及传输比例上限,为决策装置103配置传输条件,例如可以是配置传输条件中的置信度阈值或者配置判别模型(决策装置103基于该判别模型决策是否将输入样本传输至第二推理装置102)。以配置传输条件中的置信度阈值为例,更新模块1043可以根据第一推理装置101在单位时间内获取输入样本的数据量以及第一推理装置101与第二推理装置102之间的传输带宽上限,计算出置信度阈值,该置信度阈值使得推理系统100对于输入样本的平均推理精度不低于用户指定的推理精度。进一步地,更新模块1043所计算出的置信度阈值,可以使得单位时间内向第二推理装置102发送输入样本所占用的带宽不超过该传输带宽上限,也即传输至第二推理装置102的输入样本的数量相对于输入样本的数量的占比不超过传输比例上限。
在完成对于第一推理装置101、第二推理装置102以及决策装置103的配置后,第一推理装置101可以接收输入样本,并利用已配置的第一推理模型对该输入样本进行推理,以便由该第一推理模型输出推理结果以及该推理结果的置信度等。比如,在安全帽识别场景中,第一推理装置101可以接收位于用户侧的终端设备105拍摄并且发送的图像,该图像中包括一个或者多个工作人员的图像;然后,第一推理装置101 利用第一推理模型对该图像进行识别,输出该图像中的各个工作人员以及各工作人员是否佩戴有安全帽,并给出识别结果的置信度。
通常情况下,当第一推理模型输出的推理结果的置信度大于预设的置信度阈值时,决策装置103可以将该推理结果输出给用户侧的终端设备105,以便终端设备105根据该推理结果执行相应的操作。比如,在安全帽检测场景中,当终端设备105根据该推理结果确定存在部分工作人员没有佩戴安全帽时,终端设备105可以触发监控报警,以便于监控人员及时告知工作人员正确佩戴安全帽等。而当第一推理模型输出的推理结果的置信度小于该置信度阈值时,表征利用规格较小的第一推理模型所得到的推理结果的准确性较低,此时,决策装置103可以指示第一推理装置101将该输入样本发送给第二推理模型。第二推理装置102可以利用已配置的第二推理模型对接收到的输入样本进行推理,并将第二推理模型输出的推理结果发送给终端设备105。由于第二推理模型的规格相对较大,因此,利用第二推理模型对该输入样本进行推理所得到的推理结果的准确性相对较高,以此保证推理系统100对于该输入样本的推理精度保持在较高水平。
在进一步可能的实施方式中,决策装置103判定是否将输入样本传输至第二推理装置102的传输条件,还可以包括已发送的输入样本相对于第一推理装置101接收的总输入样本(也即所有输入样本)的比例是否超出预先配置的传输比例上限。若将该输入样本发送至第二推理装置102后,已发送的输入样本相对于第一推理装置101接收的总输入样本(也即所有输入样本)的比例超出预先配置的传输比例上限,则即使第一推理模型推理该输入样本的置信度较低,决策装置103仍将推理结果发送至终端设备105,以此避免第一推理装置101与第二推理装置102之间的传输带宽超出传输带宽上限。而若未超出,则决策装置103可以指示第一推理装置101将该输入样本发送至第二推理装置101,以便得到针对该输入样本的更加准确的推理结果。
值得注意的是,本实施例中,更新装置104为决策装置103配置的置信度阈值等传输条件,可以根据推理系统100的运行情况进行动态调整,以便于推理系统100能够保持较高的性能。
具体实现时,以更新传输条件具体为更新置信度阈值的取值为例,更新装置104可以监测推理系统100是否满足预先设定的第一更新触发条件,并且,当推理系统100满足该第一更新触发条件时,更新装置104可以根据第一更新触发条件对已配置的置信度阈值的取值进行更新。
作为一种示例,更新装置104在对置信度阈值的取值进行更新时,具体可以是增大该置信度阈值,相应的,预先设定的第一更新触发条件,具体可以包括:
1、第一推理模型在第一时间段内的平均推理精度低于第一精度阈值。
可以理解,如果推理系统100基于第一推理模型向终端设备105反馈推理结果,则因为第一推理模型的推理精度较低而拉低了整个推理系统100的推理精度。因此,更新装置104可以通过增大置信度阈值的方式,增加第一推理装置101向第二推理装置102传输的输入样本的数量(即原先推理结果的置信度低于增大后的置信度阈值的 输入样本也会被传输至第二推理装置102)。这样,对于更多数量的第一推理模型输出的推理结果置信度较低的输入样本,可以通过第二推理装置102中的第二推理模型进行推理,以便提高推理系统100对输入样本进行推理的准确性。
示例性地,第一推理模型在第一时间段内的平均推理精度,例如可以是在该第一时间段内第一推理模型针对各个输入样本的推理结果的置信度的平均值,即可以将推理结果的置信度作为第一推理模型针对输入样本的推理精度。本实施例中,更新装置104可以通过监测模块1042监测得到第一推理模型的平均推理精度,并由更新模块1043根据该平均推理精度确定是否对置信度阈值进行更新。
进一步地,在增大置信度阈值之前,更新装置104还可以确定第一推理装置101与第二推理装置102之间在该第一时间段内的平均剩余传输带宽是否高于预设阈值。其中,剩余传输带宽是指预设的传输带宽上限与第一推理装置101和第二推理装置102之间已使用的传输带宽之间的差值。相应的,平均剩余传输带宽,是指第一时间段内的多个时刻的剩余传输带宽的平均值。并且,该预设阈值可以预先由技术人员根据实际应用场景的需求进行设定。当平均剩余传输带宽高于预设阈值时,表征第一推理装置101与第二推理装置102之间长时间具有较为充足的带宽资源可用来传输数据,此时,更新装置104可以通过增大置信度阈值的方式,增加第一推理装置101向第二推理装置102传输的输入样本的数量,以便提高推理系统100对该输入样本进行推理的准确性。而当平均剩余传输带宽不高于预设阈值时,表征第一推理装置101与第二推理装置102之间的带宽资源较为紧张,此时,更新装置104可以不增大置信度阈值,以此避免置信度阈值过大而加剧第一推理装置101与第二推理装置102之间的带宽资源紧张的问题。
2、第一推理装置101与第二推理装置102之间的传输带宽增加。
可以理解,当第一推理装置101与第二推理装置102之间的传输带宽增加时,表征二者之间的带宽资源更加充足,由于利用规格较大的第二推理模型推理输入样本的准确性通常高于利用规格较小的第一推理模型推理输入样本的准确性,因此,更新装置104可以通过增大置信度阈值的方式,增加第一推理装置101向第二推理装置102传输的输入样本的数量,以便提高推理系统100对该输入样本进行推理的准确性。
当然,实际应用时,除了上述示例之外,第一更新触发条件也可以是其它条件,本实施例对此并不进行限定。并且,第一更新触发条件可以是上述示例中的任意一种条件,也可以是同时包括多种。
在进一步可能的实施方式中,当用户指定的传输比例上限允许被调整时,如用户在交互界面上指定传输比例上限后,可以在交互界面中进一步指定该传输比例上限允许被自适应调整,此时,更新装置104在确定推理系统100满足第一更新触发条件后,不仅可以通过上述增大置信度阈值的方式来实现对决策装置103中的传输条件进行更新,也可以是通过增大传输比例上限的方式来实现更新传输条件,以使得推理模型100能够利用第二推理模型的对更多数量的输入样本进行推理,从而提高推理精度。
另外,更新装置104在更新传输条件中的置信度阈值时,不仅可以增大置信度阈 值,也可以是减小置信度阈值。作为另一种示例,更新装置104在减小该置信度阈值时,预先设定的第一更新触发条件,具体可以包括:
1、第一推理装置101发送给第二推理装置102的输入样本相对于第一推理装置101所接收到的总输入样本的比例,超过预先设定的传输比例上限。
实际应用场景中,第一推理模型推理不同的输入样本的难易程度可能存在差异。比如,在安全帽检测场景中,输入样本具体可以是针对工地的工作人员的拍摄图像。在非工作时间段,如上午0:00至9:00以及下午18:00~24:00,到达工地的工作人员的数量通常较少,相应的,该拍摄图像中出现的需要被检测是否佩戴安全帽的工作人员的数量较少,则,利用规格较小的第一推理模型对该拍摄图像进行识别(也即前述的推理),通常能够较为准确的识别出该拍摄图像中的工作人员以及各工作人员是否佩戴安全帽(推理结果的置信度较高),即第一推理模型的推理难度较低。而在工作时间段,如9:00至18:00等,到达工地的工作人员的数量较多,此时,由于工作人员之间的相互遮挡等原因,导致第一推理模型识别该拍摄图像中的工作人员以及安全帽的准确度较低(推理结果的置信度较低),也即第一推理模型的推理难度较高。
因此,当第一推理模型推理的输入样本中,置信度低于置信度阈值的输入样本的数量相对于所有输入样本的数量的占比,超过预设限定的传输比例上限时,表征当前存在大量准确性较低的推理结果。此时,更新装置104可以通过减小置信度阈值的方式,减少第一推理装置101向第二推理装置102传输的输入样本的数量,即推理结果的置信度小于调整前的置信度阈值但是大于调整后的置信度阈值的输入样本可以不被传输至第二推理装置,以此避免向第二推理装置102传输输入样本的占比超出用户指定的传输比例上限。
实际应用场景中,当传输比例上限允许被调整时,更新装置104可以通过增大传输比例上限的方式,增加第一推理装置101向第二推理装置102传输的输入样本的数量。这样,对于第一推理模型输出的推理结果置信度较低的输入样本,可以通过第二推理装置102中的第二推理模型进行推理,以便提高推理系统100对该输入样本进行推理的准确性。
2、第一推理装置101与第二推理装置102之间的传输带宽减少。
可以理解,预先设定传输比例上限与第一推理装置101与第二推理装置102之间较大的传输带宽相关,当第一推理装置101与第二推理装置102之间的传输带宽减少时,若仍按照原先设定的置信度阈值向第二推理装置102传输输入样本,则第一推理装置101在传输输入样本时,存在传输带宽不足的问题。为此,更新装置104可以通过减小置信度阈值的方式,减少第一推理装置101向第二推理装置102传输的输入样本的数量,以便减少第一推理装置101与第二推理装置102之间的传输带宽消耗,适应当前的传输带宽数量。
在其它可能的实施方式中,当传输比例上限允许被调整时,更新装置104也可以是通过减小传输比例上限的方式,减少第一推理装置101向第二推理装置102传输的输入样本的数量,以便减少第一推理装置101与第二推理装置102之间的传输带宽消 耗。
实际应用时,除了上述示例之外,触发更新装置104减小置信度阈值的第一更新触发条件也可以采用其它实现方式,本实施例对此并不进行限定。并且,第一更新触发条件可以是上述示例中的任意一种条件,也可以是同时包括多种。
在推理系统100持续为终端设备105提供推理服务时,可以由更新装置104中的监测模块1042对推理系统100进行持续监测,以确定是否需要对决策装置103中的置信度阈值进行更新,并在确定需要进行更新后,可以进一步确定更新后的置信度阈值的具体取值,从而决策装置103后续可以根据更新后的置信度阈值确定是否将输入样本传输至第二推理装置102。
值得注意的是,上述各实施方式中,是以更新传输条件中的置信度阈值为例进行示例性说明。实际应用时,更新传输条件也可以是更新判别模型,即决策装置103可以利用判别模型确定是否将输入样本传输至第二推理装置102。具体的,该判别模块例如可以是二分类模型等,并且,决策装置103可以将第一推理模型输出的推理结果以及置信度输入至该判别模型中,并由该判别模型输出判别结果,从而决策装置103可以根据该判别结果确定是否向第二推理装置102传输该推理结果对应的输入样本。相应的,更新装置103在更新传输条件时,具体可以是对决策装置103中的判别模型进行更新,如更新判别模型中的参数或者网络结构等,本实施例对此并不进行限定。
本实施例中,更新装置104不仅可以对决策装置103中的置信度阈值进行更新,还可以对为第一推理装置101中配置的第一推理模型进行更新,和/或,为第二推理装置102中配置的第二推理模型进行更新。
具体的,更新装置104可以监测推理系统100是否满足预先设定的第二更新触发条件,并且,当推理系统100满足该第二更新触发条件时,更新装置104可以根据第二更新触发条件对已配置的第一推理模型和/或第二推理模型进行更新。
其中,更新装置104在更新推理模型时,可以是更新推理模型的规格,或者可以是对推理模型进行重训练。
在一种示例中,更新装置104可以采用弹性更新机制对第一推理模型的规格进行更新。
具体的,由于部署于边缘网络的第一推理装置101的资源通常有限,并且,实际应用场景中,第一推理装置101可以不仅仅用于为终端设备105提供推理服务,还可能存在其它的业务服务,如大数据搜索、边缘云计算等,并且第一推理装置101提供不同业务服务的优先级也可以不同。因此,第一推理装置101在提供优先级更高的其它业务服务时抢占了第一推理装置101较多的资源,导致第一推理装置101提供推理服务的可用资源的资源量减少时,第一推理装置101上当前剩余的可用资源可能难以支持第一推理装置101利用原先的规格的第一推理模型在边缘侧对输入样本进行推理。因此,更新装置104可以减小第一推理模型的规格,例如可以是通过对原先的第一推理模型进行模型蒸馏或者模型压缩的方式来减小第一推理模型的规格,以使得第一推理装置101上当前剩余的可用资源能够支持规格更小的第一推理模型对输入样本进行 推理。其中,第一推理装置101上可用资源的资源量可以由更新装置104中的采集模块1041进行探测。或者,当第一推理装置101的负荷较大时,如第一推理装置101上的CPU利用率持续达到预设值(如80%等)的时长超出预设时长,或者图形处理器(graphics processing unit,GPU)的显存利用率超出利用率上限等,更新装置104也可以减小第一推理模型的规格。
反之,当第一推理装置101提供推理服务的可用资源的资源量增加或者第一推理装置101的负荷较小时,更新装置104可以增大第一推理模型的规格,如通过重新构建推理模型等方式,根据增加后的可用资源的资源量,生成更大规格的第一推理模型,以便利用更大规格的推理模型来提高在边缘侧推理输入样本的推理精度,同时,规格更大的第一推理模型推理对输入样本进行推理的置信度也能得到提高,从而可以减少第一推理装置101向第二推理装置102传输输入样本的数量(或者比例),减少传输带宽的消耗。
或者,更新装置104可以确定第一推理装置101在第二时间段内的可用资源的资源量,该第二时间段例如可以是过去或者未来的一段时间(如一个星期、一个月等),从而更新装置104可以根据第一推理装置101在第二时间段内的可用资源的资源量,对第一推理模型的规格进行更新。举例来说,更新模块1043可以通过采集模块1041采集第一推理装置101在过去一段时间内的可用资源的资源量的变化情况,并根据该资源量变化情况,预测第一推理装置101在未来的第二时间段内的可用资源的资源量。当预测的可用资源的资源量大于当前可用资源的资源量时,更新模块1043可以根据预测的可用资源的资源量,增大第一推理模型的规格。这样,第一推理装置101可以在第二时间段内,利用更新的、规格更大的第一推理模型在边缘侧对输入样本进行推理。反之,当预测的可用资源的资源量小于当前可用资源的资源量时,更新模块1043可以减小第一推理模型的规格。或者,更新模块1043也可以通过采集模块1041采集第一推理装置101在过去的第二时间段内的可用资源的平均资源量,并且当该平均资源量大于当前可用资源的资源量时,更新模块1043可以增大第一推理模型的规格;而当该平均资源量小于当前可用资源的资源量时,更新模块1043可以减小第一推理模型的规格。
实际应用时,推理系统100还可以(通过终端设备105)向用户呈现如图4所示的弹性更新配置界面,该弹性更新配置界面中可以呈现有提示用户是否选择对第一推理模型进行弹性更新的提示信息,如图4所示的“请选择是否弹性更新推理模型”。这样,更新装置104可以根据用户针对弹性更新推理模型的选择操作,确定是否自动对第一推理装置101中的第一推理模型进行动态更新。
而在另一种示例中,更新装置104可以通过增量训练的方式对第一推理模型和/或第二推理模型进行更新。
实际应用场景中,推理系统100所推理的输入样本,可能会存在数据特征分布发生变化,从而降低了第一推理模型和/或第二推理模型对于输入样本的推理精度,甚至发生模型失效等。仍以安全帽检测场景为例,推理系统100中的第一推理模型以及第二推理模型可以识别出拍摄图像(也即输入样本或者输入样本)中的红色安全帽,但 是如果在工地上作业的工作人员佩戴的安全帽颜色统一更换为黄色或者蓝色等,则第一推理模型以及第二推理模型可能难以识别黄色或者蓝色的安全帽,从而降低了推理系统100对于安全帽的识别精度。
为此,在推理系统100提供推理服务的过程中,更新装置104中的监测模块1042可以监测第一推理模型在第一时间段内的平均推理精度是否低于第一精度阈值且所述第一推理装置与所述第二推理装置之间的剩余传输带宽是否低于预设阈值,并将监测结果反馈给更新模块1043。当更新模块1043确定平均推理精度低于第一精度阈值且剩余传输带宽低于预设阈值时,更新模块1043确定对第一推理模型进行更新。
作为一种实现示例,更新模块1043可以通过增量训练的方式更新第一推理模型。具体的,更新模块1043可以获取增量训练样本,从而更新模块1043可以利用该增量训练样本对第一推理模型进行增量训练,以提高第一推理模型对于输入样本的推理精度。其中,增量训练样本可以由用户预先完成标注并提供给推理系统100;或者,当第一推理模型失效而第二推理模型未发生失效时,可以通过第二推理模型生成该增量训练样本等。举例来说,在安全帽检测场景中,可以利用预先完成标注并且包括黄色或者蓝色安全帽的拍摄图像作为增量训练样本,并利用该拍摄图像对第一推理模型进行增量训练,这使得增量训练所得到的第一推理模型能够有效推理出拍摄图像中的红色、黄色或者蓝色的安全帽。
在更新第一推理模型的同时,监测模块1042还可以监测第二推理模型在第一时间段内的平均推理精度是否低于第二精度阈值,并将监测结果反馈给更新模块1043。当更新模块1043确定该平均推理精度低于第二精度阈值时,更新模块1043执行对第二推理模型的更新过程。示例性地,该第二精度阈值例如可以大于前述第一精度阈值。其中,更新模块1043也可以是通过增量训练或者重新构建推理模型的方式对第二推理模型进行更新,其具体实现方式与上述更新模块1043更新第一推理模型的实现方式类似,可参见前述部分的相关之处描述,在此不做赘述。
当然,上述增量更新第一推理模型以及第二推理模型的实现方式仅作为示例性说明,在其它实现方式中,更新模块1043也可以是通过重新构建模型并训练的方式完成对于第一推理模型以及第二推理模型的更新,本实施例对此并不进行限定。其中,在更新第一退推理模型时,更新模块1043在调整第一推理模型的规格的同时,可以利用增量训练样本对经过规格调整后的第一推理模型进行增量训练。
实际应用时,在对第一推理模型以及第二推理模型完成更新之前,推理系统100可以继续利用更新之前的第一推理模型以及第二推理模型为终端设备105提供推理服务,而在完成推理模型的更新后,推理系统100可以利用更新后的第一推理模型以及第二推理模型为终端设备105提供推理服务,以此避免更新推理模型而导致推理系统100提供的推理服务发生中断。
值得注意的是,本实施例是以第一推理装置101部署于边缘网络、第二推理装置102部署于云端为例进行示例性说明,在其它实现方式中,第一推理装置101也可以部署于本地网络,而第二推理装置102部署于边缘网络,此时,推理系统100对于输 入样本的推理过程以及更新置信度阈值与模型的过程,与上述过程类似,具体可参见前述实施例的相关之处描述,在此不做赘述。
参见图5,图5为本申请实施例提供的一种推理方法的流程示意图。其中,图5所示的推理方法可以应用于图2所示的推理系统100,或者应用于其它可适用的推理系统中。为便于说明,本实施例中以应用于图2所示的推理系统100,并且推理系统100对两个不同的输入样本进行推理为例进行示例性说明。
基于图2所示的推理系统100,图5所示的推理方法具体可以包括:
S501:第一推理装置101接收输入样本1。
示例性地,用户侧的终端设备105可以向第一推理装置101发送输入样本1,该输入样本例如可以是拍摄图像,如在安全帽场景中针对施工工地的拍摄图像等,或者可以是其它用于作为模型输入的样本。
S502:第一推理装置101利用预先配置的规格较小的推理模型1对输入样本1进行推理,得到推理结果1以及置信度1。
S503:当置信度1大于置信度阈值时,第一推理装置101将推理结果1反馈给终端设备105;而当置信度1小于置信度阈值时,第一推理装置101向决策装置103请求将输入样本1发送给第二推理装置102。
通常情况下,若推理模型1输出的置信度1大于预设的置信度阈值,表征推理模型1输出的推理结果1为正确的可信程度较高,也即可以视为该推理结果1的准确度较高。此时,第一推理装置101可以将较为准确的推理结果1反馈给终端设备101。反之,若推理模型1输出的置信度1小于预设的置信度阈值,则可以视为该推理结果1不准确。此时,第一推理装置101可以请求决策装置103将该输入样本1发送至第二推理装置102,以便利用第二推理装置102上的规格更大的推理模型2对该输入样本1进行更加准确的推理。
S504:决策装置103在确定不超过传输比例上限的条件下,允许第一推理装置101将输入样本1上传至第二推理装置102。
其中,传输比例上限可以预先由用户指定,具体可以是由用户输入传输比例上限的具体取值,或者可以由推理系统100根据用户指定的推理精度、第一推理装置101与第二推理装置102之间的传输带宽上限计算出传输比例上限。
作为一种实现示例,决策装置103可以监测若将该输入样本1发送至第二推理装置102后,已发送的输入样本的数量相对于第一推理装置101处理的输入样本的数量占比是否超出预先配置的传输比例上限。若未超出,则允许第一推理装置101将输入样本1上传至第二推理装置102。而若超出,则即使推理模型1推理该输入样本1的置信度1较低,决策装置103仍将推理结果1发送至终端设备105(图5中未示出),以此避免第一推理装置101与第二推理装置102之间的传输带宽超出传输带宽上限。
S505:第一推理装置101将输入样本1发送给第二推理装置102。
S506:第二推理装置102利用预先配置的规格较大的推理模型2对输入样本1进行推理,得到推理结果2(以及置信度2)。
S507:第二推理装置102将推理结果2(以及置信度2)发送给终端设备105。
实际应用时,第二推理装置102可以通过第一推理装置101将针对于输入样本1的推理结果2(以及置信度2)发送给终端设备105等。
S508:更新装置104通过对推理系统100进行检测,确定并更新决策装置103中的置信度阈值。
具体实现时,更新装置104监测推理系统100是否满足第一更新触发条件,并且当满足第一更新触发条件时,更新装置104可以根据第一更新触发条件对已配置的置信度阈值的取值进行更新。当然,若推理系统100不满足第一更新触发条件,则更新装置104可以不对置信度阈值进行更新。
示例性地,更新装置104可以是增大置信度阈值,此时,预先设定的第一更新触发条件,具体可以包括:
1、第一推理模型在第一时间段内的平均推理精度低于第一精度阈值。
2、第一推理装置101与第二推理装置102之间的传输带宽增加。
在另一个示例中,更新装置104可以是减小置信度阈值,此时,预先设定的第一更新触发条件,具体可以包括:
1、第一推理装置101与第二推理装置102之间的传输带宽减少。
2、第一推理装置101发送给第二推理装置102的输入样本相对于第一推理装置101所接收到的总输入样本的比例,超过预先设定的传输比例上限
其中,第一更新触发条件的具体实现方式,可以参见前述实施例中的相关之处描述,在此不做赘述。实际应用时,第一更新触发条件也可以是采用其它方式进行实现,本实施例对此并不进行限定。并且,当用户允许对传输比例上限进行调整时,在满足第一更新触发条件的情况下,更新装置104也可以是对该传输比例上限的取值进行调整,以使得推理系统100的性能保持在较高水平。
S509:更新装置104通过对推理系统100进行检测,确定对第一推理装置101中的推理模型1进行更新以及对第二推理装置102中的推理模型2进行更新。
具体实现时,更新装置104监测推理系统100中的推理模型是否满足第二更新触发条件,并且当满足第二更新触发条件时,更新装置104可以根据第二更新触发条件对已配置的推理模型1进行更新,进一步的,更新装置104还可以根据第二更新触发条件对已配置的推理模型2进行更新。当然,若推理系统100中的推理模型不满足第二更新触发条件,则更新装置104可以不对推理模型进行更新。
示例性地,更新装置104可以是在第一推理装置101提供推理服务的可用资源的 资源量变化或者第一推理装置101的负荷变化时,对推理模型1的规格进行调整。例如,当第一推理装置101的可用资源的资源量减少或者第一推理装置101的负荷增大时,更新装置101可以减小推理模型1的规格;而当第一推理装置101的可用资源的资源量增加或者第一推理装置101的负荷减小时,更新装置101可以增大推理模型1的规格。
或者,更新装置104可以在确定推理模型1和/或推理模型2对于输入样本的推理精度降低,甚至是推理模型1和/或推理模型2发生失效时,通过增量训练或者重新训练的方式对推理模型1和/或推理模型2进行更新。其中,更新装置104确定推理模型1以及推理模型2的推理精度降低以及更新推理模型的具体实现过程,可以参见前述实施例的相关之处描述,在此不做赘述。
值得注意的是,本实施例中,是以更新装置104同时更新置信度阈值以及推理模型为例进行示例性说明,实际应用时,更新装置104可以仅更新置信度阈值,或者仅更新推理模型,本实施例对此并不进行限定。
S510:第一推理装置101接收输入样本2。
S511:第一推理装置101利用更新后的推理模型1对输入样本2进行推理,并输出推理结果3以及置信度3。
S512:当置信度3大于更新后的置信度阈值时,第一推理装置101将推理结果3反馈给终端设备105;而当置信度3小于更新后的置信度阈值时,第一推理装置101向决策装置103请求将输入样本2发送给第二推理装置102。
S513:决策装置103在确定不超过传输比例上限的条件下,允许第一推理装置101将输入样本2上传至第二推理装置102。
S514:第一推理装置101向第二推理装置102发送输入样本2。
S515:第二推理装置102利用更新后的推理模型2对输入样本2进行推理,并输出推理结果4(以及置信度4)。
S516:第二推理装置102将推理结果4(以及置信度4)发送给终端设备105。
实际应用时,第二推理装置102可以通过第一推理装置101将针对于输入样本2的推理结果4(以及置信度4)发送给终端设备105等。
上述实施例中,是以在推理两个输入样本的间隙更新置信度阈值、推理模型1或推理模型2为例进行示例性说明,在其它实施例中,更新装置104也可以是在推理输入样本2的过程中,完成对于置信度阈值、推理模型1或推理模型2的更新。
上述各实施例中,针对输入样本的推理过程中所涉及到的更新装置104可以是配置于计算机设备上的软件,并且,通过在计算机设备上运行该软件,可以使得计算机设备实现上述更新装置104所具有的功能。下面,基于硬件设备实现的角度,对推理输入样本的过程中所涉及的更新装置104进行详细介绍。
图6示出了一种计算机设备。图6所示的计算机设备600具体可以用于实现上述图5所示实施例中更新装置104的功能。
计算机设备600包括总线601、处理器602、通信接口603和存储器604。处理器602、存储器604和通信接口603之间通过总线601通信。总线601可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口603用于与外部通信,例如接收终端发送的数据获取请求等。
其中,处理器602可以为中央处理器(central processing unit,CPU)。存储器604可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器604还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,HDD或SSD。
存储器604中存储有可执行代码,处理器602执行该可执行代码以执行前述资源调度装置101所执行的方法。
具体地,在实现图5所示实施例的情况下,且图5所示实施例中所描述的更新装置104为通过软件实现的情况下,执行图5中的更新装置104的功能所需的软件或程序代码存储在存储器604中,更新装置104与其它设备的交互通过通信接口603实现,处理器用于执行存储器604中的指令,实现更新装置104所执行的方法。
此外,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机设备上运行时,使得计算机设备执行上述实施例更新装置104所执行的方法。
此外,本申请实施例还提供了一种计算机程序产品,所述计算机程序产品被计算机执行时,所述计算机执行前述推理方法的任一方法。该计算机程序产品可以为一个软件安装包,在需要使用前述推理方法的任一方法的情况下,可以下载该计算机程序产品并在计算机上执行该计算机程序产品。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本 质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (33)

  1. 一种推理系统,其特征在于,所述推理系统包括第一推理装置、第二推理装置、更新装置以及决策装置;
    所述第一推理装置,用于利用第一推理模型对输入样本进行推理;
    所述决策装置,用于在所述第一推理模型针对所述输入样本的推理结果满足传输条件的情况下,确定将所述输入样本传输给所述第二推理装置;
    所述第二推理装置,用于利用第二推理模型对所述输入样本进行推理,其中,所述第一推理模型的规格小于所述第二推理模型的规格;
    所述更新装置,用于当所述推理系统满足第一更新触发条件时,更新所述传输条件。
  2. 根据权利要求1所述的推理系统,其特征在于,所述传输条件包括所述推理结果的置信度低于置信度阈值;
    所述更新装置,用于更新所述置信度阈值。
  3. 根据权利要求2所述的推理系统,其特征在于,所述传输条件还包括所述第一推理装置发送至所述第二推理装置的输入样本相对于所述第一推理装置接收的总输入样本的比例,不超过传输比例上限。
  4. 根据权利要求2或3所述的推理系统,其特征在于,所述第一更新触发条件包括所述第一推理模型在第一时间段内的平均推理精度低于第一精度阈值、所述第一推理装置与所述第二推理装置之间的传输带宽增加中的至少一种;
    所述更新装置,用于增大所述置信度阈值。
  5. 根据权利要求4所述的推理系统,其特征在于,所述更新装置,用于当所述第一推理装置与所述第二推理装置之间在所述第一时间段内的平均剩余传输带宽高于预设阈值时,增大所述置信度阈值。
  6. 根据权利要求2或3所述的推理系统,其特征在于,所述第一更新触发条件包括所述第一推理装置与所述第二推理装置之间的传输带宽减少、所述第一推理装置发送至所述第二推理装置的输入样本相对于所述第一推理装置接收的总输入样本的比例超过传输比例上限中的至少一种;
    所述更新装置,用于减小所述置信度阈值。
  7. 根据权利要求1至6任一项所述的推理系统,其特征在于,所述更新装置,还用于当满足第二更新触发条件时,更新所述第一推理模型和/或所述第二推理模型。
  8. 根据权利要求7所述的推理系统,其特征在于,所述更新装置,用于当所述第一推理模型在第一时间段内的平均推理精度低于第一精度阈值且所述第一推理装置与所述第二推理装置之间的剩余传输带宽低于预设阈值时,更新所述第一推理模型;和/或,当所述第二推理模型在所述第一时间段内的平均推理精度低于第二精度阈值时,更新所述第二推理模型。
  9. 根据权利要求7或8所述的推理系统,其特征在于,所述更新装置,用于获取增量训练样本;利用所述增量训练样本对所述第一推理模型和/或所述第二推理模型进行增量训练。
  10. 根据权利要求7至9任一项所述的推理系统,其特征在于,所述更新装置,用 于确定所述第一推理装置在第二时间段内的可用资源的资源量;根据所述第一推理装置在所述第二时间段内的可用资源的资源量,更新所述第一推理模型的规格。
  11. 一种推理方法,其特征在于,所述推理方法应用于推理系统中的更新装置,所述推理系统还包括第一推理装置、第二推理装置以及决策装置,所述方法包括:
    所述更新装置获取所述推理系统的资源信息和/或推理结果,所述推理结果包括所述第一推理装置利用第一推理模型对输入样本进行推理的结果,其中,当所述第一推理模型针对所述输入样本进行推理的结果满足所述决策装置中的传输条件时,所述输入样本被传输至所述第二推理装置;
    所述更新装置根据所述推理系统的资源信息和/或所述推理系统的推理结果确定所述推理系统满足第一更新触发条件;
    所述更新装置更新所述传输条件。
  12. 根据权利要求11所述的方法,其特征在于,所述传输条件包括所述推理结果的置信度低于置信度阈值,所述更新装置更新所述传输条件,包括:
    所述更新装置更新所述置信度阈值。
  13. 根据权利要求12所述的方法,其特征在于,所述传输条件还包括传输至所述第二推理装置的输入样本相对于所述第一推理装置接收的总输入样本的比例,不超过传输比例上限。
  14. 根据权利要求12或13所述的方法,其特征在于,所述第一更新触发条件包括所述第一推理模型在第一时间段内的平均推理精度低于第一精度阈值、所述第一推理装置与所述第二推理装置之间的传输带宽增加中的至少一种;
    所述更新装置更新所述传输条件,包括:
    所述更新装置增大所述置信度阈值。
  15. 根据权利要求14所述的方法,其特征在于,所述更新装置更新所述传输条件,包括:
    当所述第一推理装置与所述第二推理装置之间在所述第一时间段内的平均剩余传输带宽高于预设阈值时,增大所述置信度阈值。
  16. 根据权利要求12或13所述的方法,其特征在于,所述第一更新触发条件包括所述第一推理装置与所述第二推理装置之间的传输带宽减少、传输至所述第二推理装置的输入样本相对于所述第一推理装置接收的总输入样本的比例超过传输比例上限中的至少一种;
    所述更新装置更新所述传输条件,包括:
    所述更新装置减小所述置信度阈值。
  17. 根据权利要求11至16任一项所述的方法,其特征在于,所述方法还包括:
    当满足第二更新触发条件时,所述更新装置更新所述第一推理模型和/或所述第二推理模型。
  18. 根据权利要求17所述的方法,其特征在于,所述当满足第二更新触发条件时,所述更新装置更新所述第一推理模型和/或所述第二推理模型,包括:
    当所述第一推理模型在第一时间段内的平均推理精度低于第一精度阈值且所述第一推理装置与所述第二推理装置之间的剩余传输带宽低于预设阈值时,所述更新装置 更新所述第一推理模型;和/或,当所述第二推理模型在所述第一时间段内的平均推理精度低于第二精度阈值时,所述更新装置更新所述第二推理模型。
  19. 根据权利要求17或18所述的方法,其特征在于,所述更新装置更新所述第一推理模型和/或所述第二推理模型,包括:
    所述更新装置获取增量训练样本;
    所述更新装置利用所述增量训练样本对所述第一推理模型和/或所述第二推理模型进行增量训练。
  20. 根据权利要求17至19任一项所述的方法,其特征在于,所述更新装置更新所述第一推理模型,包括:
    所述更新装置确定所述第一推理装置在第二时间段内的可用资源的资源量;
    所述更新装置根据所述第一推理装置在第二时间段内的可用资源的资源量,更新所述第一推理模型的规格。
  21. 一种更新装置,其特征在于,所述更新装置应用于推理系统,所述推理系统还包括第一推理装置、第二推理装置以及决策装置,所述更新装置包括:
    采集模块,用于获取所述推理系统的资源信息和/或推理结果,所述推理结果包括所述第一推理装置利用第一推理模型对输入样本进行推理的结果,其中,当所述第一推理模型针对所述输入样本进行推理的结果满足所述决策装置中的传输条件时,所述输入样本被传输至所述第二推理装置;
    监测模块,用于根据所述推理系统的资源信息和/或所述推理系统的推理结果确定所述推理系统满足第一更新触发条件;
    更新模块,用于更新所述传输条件。
  22. 根据权利要求21所述的更新装置,其特征在于,所述传输条件包括所述推理结果的置信度低于置信度阈值,所述更新模块,具体用于更新所述置信度阈值。
  23. 根据权利要求22所述的更新装置,其特征在于,所述传输条件还包括传输至所述第二推理装置的输入样本相对于所述第一推理装置接收的总输入样本的比例,不超过传输比例上限。
  24. 根据权利要求22或23所述的更新装置,其特征在于,所述第一更新触发条件包括所述第一推理模型在第一时间段内的平均推理精度低于第一精度阈值、所述第一推理装置与所述第二推理装置之间的传输带宽增加中的至少一种;
    所述更新模块,具体用于增大所述置信度阈值。
  25. 根据权利要求24所述的更新装置,其特征在于,所述更新模块,用于当所述第一推理装置与所述第二推理装置之间在所述第一时间段内的平均剩余传输带宽高于预设阈值时,增大所述置信度阈值。
  26. 根据权利要求22或23所述的更新装置,其特征在于,所述第一更新触发条件包括所述第一推理装置与所述第二推理装置之间的传输带宽减少、传输至所述第二推理装置的输入样本相对于所述第一推理装置接收的总输入样本的比例超过传输比例上限中的至少一种;
    所述更新模块,具体用于减小所述置信度阈值。
  27. 根据权利要求21至26任一项所述的更新装置,其特征在于,所述更新模块, 还用于当满足第二更新触发条件时,更新所述第一推理模型和/或所述第二推理模型。
  28. 根据权利要求27所述的更新装置,其特征在于,所述更新模块,用于当所述第一推理模型在第一时间段内的平均推理精度低于第一精度阈值且所述第一推理装置与所述第二推理装置之间的剩余传输带宽低于预设阈值时,更新所述第一推理模型;和/或,当所述第二推理模型在所述第一时间段内的平均推理精度低于第二精度阈值时,更新所述第二推理模型。
  29. 根据权利要求27或28所述的更新装置,其特征在于,所述更新模块,用于获取增量训练样本;利用所述增量训练样本对所述第一推理模型和/或所述第二推理模型进行增量训练。
  30. 根据权利要求27至29任一项所述的更新装置,其特征在于,所述更新模块,用于确定所述第一推理装置在第二时间段内的可用资源的资源量;根据所述第一推理装置在第二时间段内的可用资源的资源量,更新所述第一推理模型的规格。
  31. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器;
    所述处理器用于执行所述存储器中存储的指令,以使得所述计算机设备执行权利要求11至20中任一项所述的方法。
  32. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在计算设备上运行时,使得所述计算设备执行如权利要求11至20任一项所述的方法。
  33. 一种包含指令的计算机程序产品,其特征在于,当其在计算设备上运行时,使得所述计算设备执行如权利要求11至20中任一项所述的方法。
PCT/CN2022/088086 2021-05-12 2022-04-21 一种推理系统、方法、装置及相关设备 WO2022237484A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110517626 2021-05-12
CN202110517626.X 2021-05-12
CN202111071022.3A CN115345305A (zh) 2021-05-12 2021-09-13 一种推理系统、方法、装置及相关设备
CN202111071022.3 2021-09-13

Publications (1)

Publication Number Publication Date
WO2022237484A1 true WO2022237484A1 (zh) 2022-11-17

Family

ID=83977684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/088086 WO2022237484A1 (zh) 2021-05-12 2022-04-21 一种推理系统、方法、装置及相关设备

Country Status (2)

Country Link
CN (1) CN115345305A (zh)
WO (1) WO2022237484A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610868A (zh) * 2023-07-13 2023-08-18 支付宝(杭州)信息技术有限公司 样本标注方法、端边云协同训练方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103080955A (zh) * 2010-09-13 2013-05-01 西门子公司 用于在计算机辅助的逻辑系统中处理数据的设备及相应的方法
CN108718249A (zh) * 2018-04-27 2018-10-30 广州西麦科技股份有限公司 基于sdn网络的网络加速方法、装置与计算机可读存储介质
CN111797983A (zh) * 2020-05-25 2020-10-20 华为技术有限公司 一种神经网络构建方法以及装置
CN112215357A (zh) * 2020-09-29 2021-01-12 三一专用汽车有限责任公司 模型优化方法、装置、设备和计算机可读存储介质
US20210019652A1 (en) * 2019-07-18 2021-01-21 Qualcomm Incorporated Concurrent optimization of machine learning model performance
CN112334917A (zh) * 2018-12-31 2021-02-05 英特尔公司 对采用人工智能的系统进行防护

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103080955A (zh) * 2010-09-13 2013-05-01 西门子公司 用于在计算机辅助的逻辑系统中处理数据的设备及相应的方法
CN108718249A (zh) * 2018-04-27 2018-10-30 广州西麦科技股份有限公司 基于sdn网络的网络加速方法、装置与计算机可读存储介质
CN112334917A (zh) * 2018-12-31 2021-02-05 英特尔公司 对采用人工智能的系统进行防护
US20210019652A1 (en) * 2019-07-18 2021-01-21 Qualcomm Incorporated Concurrent optimization of machine learning model performance
CN111797983A (zh) * 2020-05-25 2020-10-20 华为技术有限公司 一种神经网络构建方法以及装置
CN112215357A (zh) * 2020-09-29 2021-01-12 三一专用汽车有限责任公司 模型优化方法、装置、设备和计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610868A (zh) * 2023-07-13 2023-08-18 支付宝(杭州)信息技术有限公司 样本标注方法、端边云协同训练方法及装置
CN116610868B (zh) * 2023-07-13 2023-09-29 支付宝(杭州)信息技术有限公司 样本标注方法、端边云协同训练方法及装置

Also Published As

Publication number Publication date
CN115345305A (zh) 2022-11-15

Similar Documents

Publication Publication Date Title
US11102123B2 (en) Sensor network system
US11983909B2 (en) Responding to machine learning requests from multiple clients
WO2021143155A1 (zh) 模型训练方法及装置
US11902396B2 (en) Model tiering for IoT device clusters
US11412574B2 (en) Split predictions for IoT devices
JP2018007245A (ja) データ送信システム、及びデータ送信方法
US20240054354A1 (en) Federated learning method and apparatus
WO2022237484A1 (zh) 一种推理系统、方法、装置及相关设备
CN114095438B (zh) 数据传输方法、装置、设备、存储介质及计算机程序产品
US11368482B2 (en) Threat detection system for mobile communication system, and global device and local device thereof
WO2023093053A1 (zh) 推理实现方法、网络、电子设备及存储介质
CN105740178A (zh) 芯片网络系统以及其形成方法
US9542459B2 (en) Adaptive data collection
US10965572B2 (en) Data transfer control
CN115576534A (zh) 原子服务的编排方法、装置、电子设备及存储介质
CN114073112A (zh) 认知地控制数据传送
WO2023185825A1 (zh) 调度方法、第一计算节点、第二计算节点以及调度系统
CN111901425B (zh) 基于Pareto算法的CDN调度方法、装置、计算机设备及存储介质
US11190317B2 (en) Transmitting terminal, transmitting method, information processing terminal, and information processing method
US11113562B2 (en) Information processing apparatus, control method, and program
US20220357991A1 (en) Information processing apparatus, computer-readable recording medium storing aggregation control program, and aggregation control method
US20230064500A1 (en) Optimizing machine learning as-a-service performance for cellular communication systems
US20240205140A1 (en) Low latency path failover to avoid network blackholes and scheduler for central processing unit engines for hardware offloaded artificial intelligence/machine learning workloads and low power system for acoustic event detection
US20230394355A1 (en) Apparatus and methods for artificial intelligence model management
US20210232411A1 (en) Secure configuration corrections using artificial intelligence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22806454

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22806454

Country of ref document: EP

Kind code of ref document: A1