CN115345305A

CN115345305A - Inference system, method, device and related equipment

Info

Publication number: CN115345305A
Application number: CN202111071022.3A
Authority: CN
Inventors: 谢达奇; 王烽
Original assignee: Huawei Cloud Computing Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2021-05-12
Filing date: 2021-09-13
Publication date: 2022-11-15
Also published as: WO2022237484A1

Abstract

The application provides an inference system which comprises a first inference device, a second inference device, an updating device and a decision device. The first reasoning device is used for reasoning the input sample by utilizing a first reasoning model; the decision device is used for determining to transmit the input sample to the second reasoning device under the condition that the reasoning result of the first reasoning model aiming at the input sample meets the transmission condition; the second reasoning device is used for reasoning the input sample by using a second reasoning model, and the specification of the second reasoning model is larger than that of the first reasoning model; and updating the transmission condition when the inference system meets the first updating triggering condition. Because the transmission condition can be dynamically adjusted, after the transmission condition is adjusted according to the actual application requirement, the inference system can keep higher inference performance based on the adjusted transmission condition. In addition, the application also provides a corresponding method, a corresponding device and related equipment.

Description

Inference system, method, device and related equipment

The present application claims priority of chinese patent application having application number 202110517626.X and application name "a model updating method" filed by the chinese intellectual property office at 12/05/2021, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a reasoning system, method, apparatus, and related device.

Background

In the field of Artificial Intelligence (AI), machine learning technology is an important method and means in the field of AI, and aims to perform regular analysis on a training data set through a machine learning algorithm to obtain a model and continuously infer unknown sample data by using the model. In general, the amount of resources used for implementing model inference has an important influence on the effect of model inference.

Currently, two levels of reasoning mechanisms can be set according to the resource amount limitations of the model deployment environment. For example, in an inference scenario of edge cloud coordination, inference models with different specifications may be respectively set at an edge side and a cloud side, and based on that computing resources of the edge side are less than computing resources of the cloud side, the specification of the inference model deployed at the edge side is usually less than that of the inference model deployed at the cloud side. Accordingly, for the same input sample, the inference effect (such as inference precision, inference efficiency, and the like) of the cloud-side inference model on the input sample is generally better than that of the edge-side inference model on the input sample. Therefore, when the input sample is reasoned, the inference model on the edge side can be preferentially used for reasoning the input sample, and when the confidence coefficient of the inference model on the edge side for the inference result of the input sample is too low, the input sample is sent to the cloud, so that the inference model with a larger specification of the cloud is used for reasoning the input sample, and the accuracy of the finally obtained inference result is improved. Moreover, since the input samples are transmitted from the edge side to the cloud, a certain transmission bandwidth needs to be occupied, and therefore, in an actual application scenario, a user usually limits the proportion of the input samples transmitted to the cloud, so as to avoid that the edge side transmits too many input samples to the cloud, which results in a large transmission bandwidth being occupied.

However, in practical applications, the performance of an inference system based on such an inference mechanism may be difficult to maintain at a high level, for example, the accuracy of an inference result determined by the inference system for an input sample is low in a part of a time period. Therefore, there is a need for an inference scheme to keep the performance of the inference system at a high level.

Disclosure of Invention

An inference system is provided for maintaining the accuracy of inferring input samples at a high level. In addition, the application also provides an inference method, an updating device, computer equipment, a computer readable storage medium and a computer program product.

In a first aspect, the present application provides an inference system comprising a first inference means, a second inference means, an update means, and a decision means. The first reasoning device is used for reasoning the input sample by using a first reasoning model; the decision device is used for determining to transmit the input sample to the second reasoning device under the condition that the reasoning result of the first reasoning model aiming at the input sample meets the transmission condition; the second reasoning device is used for reasoning the received input sample by using a second reasoning model, and the specification of the second reasoning model used by the second reasoning device is larger than that of the first reasoning model used by the first reasoning device; and the updating device is used for updating the transmission condition in the decision-making device when the inference system meets the first updating triggering condition.

Since the transmission conditions for deciding whether to transmit the input samples to the second inference apparatus can be dynamically adjusted, this may enable the inference system to maintain a high inference performance based on the adjusted transmission conditions after adjusting the transmission conditions according to actual application requirements. For example, when the transmission bandwidth between the first inference device and the second inference device increases, the inference system updates the transmission condition so that more input samples are inferred by the second inference model with larger specification, thereby realizing that the inference accuracy of the inference system is kept at a higher time level within a limited transmission bandwidth range. And when the transmission bandwidth between the first inference device and the second inference device is reduced, the inference system reduces the number of input samples transmitted to the second inference device by updating the transmission condition, so as to reduce the transmission bandwidth occupied by the inference system.

Alternatively, the first inference device and the second inference device may be implemented by software or hardware. When implemented by software, the first inference device and the second inference device may be, for example, virtual machines or the like running on a computing device. When implemented by hardware, the first inference apparatus and the second inference apparatus may include one or more computing devices, such as one or more servers.

Also, the first inference means and the second inference means may be deployed in different environments. For example, a first inference apparatus may be deployed in the edge network, and a second inference apparatus may be deployed in the cloud; alternatively, the first inference means may be deployed in a local network, while the second inference means may be deployed in an edge network, etc.

In a possible embodiment, the transmission condition may specifically be that the confidence of the inference result is lower than a confidence threshold, i.e. the decision means may determine to transmit the input sample to the second inference means when the confidence of the first inference model for the inference result of the input sample is lower than the confidence threshold, in order to improve the inference accuracy for the input sample. Accordingly, the updating means may specifically update the size of the confidence threshold, such as increasing or decreasing the confidence threshold, when updating the transmission condition. In this way, the updating device can make the performance of the inference system reach a higher level by dynamically adjusting the confidence threshold.

In a possible embodiment, the transmission conditions further comprise that the proportion of the input samples sent by the first inference means to the second inference means with respect to the total input samples received by the first inference means does not exceed the upper transmission proportion limit. For example, the upper limit of the transmission ratio may be set by a user in advance. In this way, when the confidence of the inference result of the first inference model with respect to the input sample is low, the decision device may first determine whether the ratio of the sent input sample with respect to the total input sample received by the first inference device 101 exceeds the upper limit of the pre-configured transmission ratio after sending the input sample to the second inference device. If the input sample is higher than the preset threshold value, the decision device sends the inference result to the terminal equipment even if the confidence coefficient of the input sample inferred by the first inference model is lower, so that the transmission bandwidth between the first inference device and the second inference device is prevented from exceeding the upper limit of the transmission bandwidth. If not, the decision means may instruct the first inference means 101 to send the input sample to the second inference means 101 in order to obtain a more accurate inference result for the input sample.

In a possible embodiment, the first update triggering condition includes at least one of an average inference precision of the first inference model being below a first precision threshold over a first time period, an increase in transmission bandwidth between the first inference apparatus and the second inference apparatus. At this time, the updating means may specifically increase the confidence threshold when updating the confidence threshold. In this way, the updating device can send more input samples to the second inference device for inference by increasing the confidence threshold, so that the overall inference accuracy of the inference system for the input samples can be improved.

In a possible embodiment, the updating means may specifically increase the confidence threshold when the average remaining transmission bandwidth between the first inference means and the second inference means in the first time period is higher than a preset threshold. In this way, after the updating means increases the confidence threshold, the first inference means and the second inference means have sufficient transmission bandwidth to support transmission of a larger number of input samples.

Alternatively, the updating means may not update the confidence threshold when the average remaining transmission bandwidth between the first inference means and the second inference means is not higher than a preset threshold in the first period of time. In this way, it is avoided that the updating means, after increasing the confidence threshold, causes insufficient transmission bandwidth between the first and second inference means due to a larger number of input samples being transmitted to the second inference means.

In a possible embodiment, the first update triggering condition may specifically be a reduction of a transmission bandwidth between the first inference apparatus and the second inference apparatus, a ratio of input samples sent by the first inference apparatus to the second inference apparatus relative to a total input samples received by the first inference apparatus exceeding a transmission ratio upper limit. Accordingly, the updating means may specifically decrease the confidence threshold when updating the transmission condition. In this way, based on the reduced confidence threshold, the inference system can reduce the number of uploads to the second inference device, thereby reducing the consumption of transmission bandwidth between the first inference device and the second inference device.

In a possible embodiment, the updating means is further configured to update the first inference model and/or the second inference model when a second update triggering condition is satisfied. In this way, the updating device can provide the inference precision of the inference system for the input sample by updating the inference model.

In one possible embodiment, the first inference model is updated when the average inference precision of the first inference model in the first time period is lower than a first precision threshold and the remaining transmission bandwidth between the first inference device and the second inference device is lower than a preset threshold. And/or updating the second inference model when the average inference precision of the second inference model in the first time period is lower than a second precision threshold value. In this way, the updating means can improve the precision of inference on the input sample by updating the inference model when the precision of inference on the inference model is low.

Alternatively, the second precision threshold may be greater than the first precision threshold.

In a possible embodiment, the updating means may specifically obtain an incremental training sample when updating the inference model, where the incremental training sample may be, for example, an input sample inferred by the inference system in a recent time period, and may have a user or a annotating person to complete the labeling. The updating means may then incrementally train the first inference model and/or the second inference model using the incremental training samples. In this way, the first inference model and/or the second inference model after incremental training can perform more accurate inference on input samples similar to the incrementally trained samples.

In a possible embodiment, the updating means may first determine, for example, in a predictive manner, the resource amount of the available resource of the first inference apparatus in the second time period when updating the first inference model, and then the updating means may update the specification of the first inference model according to the resource amount of the available resource. For example, when the resource amount of the available resource decreases, the updating means may decrease the specification of the first inference model; and when the resource amount of the available resources increases, the updating means may increase the specification of the first inference model, and the like.

In a second aspect, the present application provides an inference method, which is applied to an update apparatus in an inference system, the inference system further includes a first inference apparatus, a second inference apparatus, and a decision-making apparatus, and the method includes: the updating device acquires resource information and/or inference results of the inference system, wherein the inference results comprise a result of the first inference device performing inference on the input sample by using a first inference model, and the input sample is transmitted to the second inference device when a result of the first inference model performing inference on the input sample meets a transmission condition in the decision device; the updating device determines that the reasoning system meets a first updating trigger condition according to the resource information of the reasoning system and/or the reasoning result of the reasoning system; the updating means updates the transmission condition.

In a possible embodiment, the transmission condition includes that the confidence of the inference result is lower than a confidence threshold, and the updating means updates the transmission condition, including: the updating means updates the confidence threshold.

In a possible embodiment, said transmission conditions further comprise the proportion of input samples transmitted to said second inference means with respect to the total input samples received by said first inference means, not exceeding a transmission proportion ceiling.

In a possible embodiment, the first update triggering condition includes at least one of an average inference precision of the first inference model being below a first precision threshold over a first time period, an increase in transmission bandwidth between the first inference device and the second inference device; the updating means updates the transmission condition, including: the updating means increases the confidence threshold.

In a possible implementation, the updating means updates the transmission condition, including: and when the average residual transmission bandwidth between the first inference device and the second inference device in the first time period is higher than a preset threshold value, increasing the confidence threshold value.

In a possible embodiment, the first update triggering condition comprises at least one of a reduction in transmission bandwidth between the first inference device and the second inference device, a ratio of input samples transmitted to the second inference device relative to a total input samples received by the first inference device exceeding a transmission ratio ceiling; the updating means updates the transmission condition, including: the updating means reduces the confidence threshold.

In one possible embodiment, the method further comprises: the updating means updates the first inference model and/or the second inference model when a second update triggering condition is satisfied.

In a possible implementation, the updating means updates the first inference model and/or the second inference model when a second update triggering condition is satisfied, including: when the average inference precision of the first inference model in a first time period is lower than a first precision threshold value and the residual transmission bandwidth between the first inference device and the second inference device is lower than a preset threshold value, the updating device updates the first inference model; and/or when the average inference precision of the second inference model in the first time period is lower than a second precision threshold value, the updating device updates the second inference model.

In a possible embodiment, said updating means updates said first inference model and/or said second inference model, comprising: the updating device acquires an incremental training sample; and the updating device performs incremental training on the first inference model and/or the second inference model by using the incremental training samples.

In a possible implementation, said updating means updates said first inference model, including: the updating device determines the resource amount of the available resources of the first reasoning device in a second time period; the updating device updates the specification of the first inference model according to the resource quantity of the available resources of the first inference device in the second time period.

Since the inference method provided by the second aspect corresponds to the inference system provided by the first aspect, for a technical effect of any one of possible implementation manners of the second aspect and the second aspect, reference may be made to the technical effect of any one of possible implementation manners of the first aspect and the first aspect corresponding to the second aspect, which is not described in detail in this embodiment.

In a third aspect, the present application provides an updating apparatus, where the updating apparatus is applied to an inference system, the inference system further includes a first inference apparatus, a second inference apparatus, and a decision apparatus, and the updating apparatus includes: the acquisition module is used for acquiring resource information and/or reasoning results of the reasoning system, wherein the reasoning results comprise the result of reasoning the input sample by the first reasoning device by using a first reasoning model, and when the result of reasoning the input sample by the first reasoning model meets the transmission condition in the decision device, the input sample is transmitted to the second reasoning device; the monitoring module is used for determining that the reasoning system meets a first updating triggering condition according to the resource information of the reasoning system and/or the reasoning result of the reasoning system; and the updating module is used for updating the transmission condition.

In a possible embodiment, the transmission condition includes that the confidence of the inference result is lower than a confidence threshold, and the updating module is specifically configured to update the confidence threshold.

In a possible embodiment, said transmission conditions further comprise a ratio of input samples transmitted to said second reasoning means with respect to the total input samples received by said first reasoning means, not exceeding a transmission ratio upper limit.

In a possible embodiment, the first update triggering condition includes at least one of an average inference precision of the first inference model being below a first precision threshold over a first time period, an increase in transmission bandwidth between the first inference apparatus and the second inference apparatus; the update module is specifically configured to increase the confidence threshold.

In a possible implementation, the updating module is configured to increase the confidence threshold when an average remaining transmission bandwidth between the first inference device and the second inference device in the first time period is higher than a preset threshold.

In a possible embodiment, said first update triggering condition comprises at least one of a reduction of transmission bandwidth between said first inference means and said second inference means, a ratio of input samples transmitted to said second inference means with respect to total input samples received by said first inference means exceeding a transmission ratio upper limit; the update module is specifically configured to reduce the confidence threshold.

In a possible implementation, the updating module is further configured to update the first inference model and/or the second inference model when a second update triggering condition is satisfied.

In a possible implementation, the update module is configured to: updating the first inference model when the average inference precision of the first inference model in a first time period is lower than a first precision threshold and the residual transmission bandwidth between the first inference device and the second inference device is lower than a preset threshold; and/or updating the second inference model when the average inference precision of the second inference model in the first time period is lower than a second precision threshold.

In a possible implementation, the update module is configured to: obtaining an incremental training sample; and carrying out incremental training on the first inference model and/or the second inference model by using the incremental training sample.

In a possible implementation, the update module is configured to: determining an amount of resources of available resources for the first inference engine over a second time period; and updating the specification of the first inference model according to the resource quantity of the available resources of the first inference device in the second time period.

As the updating apparatus provided in the third aspect corresponds to the inference system provided in the first aspect, for a technical effect of any one of the possible implementation manners of the third aspect and the third aspect, reference may be made to the technical effect of the first aspect and the corresponding one of the possible implementation manners of the first aspect, which is not described in detail in this embodiment.

In a fourth aspect, the present application provides a computer device comprising a processor and a memory; the memory is configured to store instructions, and when the computer device is running, the processor executes the instructions stored in the memory to cause the computer device to perform the inference method in any one of the possible implementations of the second aspect or the second aspect. It should be noted that the memory may be integrated into the processor, or may be independent from the processor. The computer device may also include a bus. Wherein, the processor is connected with the memory through a bus. The memory may include a readable memory and a random access memory, among others.

In a fifth aspect, the present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer device, cause the computer device to perform the method of any one of the implementations of the second aspect or the first aspect.

In a sixth aspect, the present application provides a computer program product comprising instructions which, when run on a computer device, cause the computer device to perform the method of any one of the implementations of the second aspect or the first aspect.

The present application can further combine to provide more implementations on the basis of the implementations provided by the above aspects.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a schematic architecture diagram of an inference system according to an embodiment of the present application;

fig. 2 is a schematic architecture diagram of another inference system provided in the embodiment of the present application;

FIG. 3 is a schematic diagram of an exemplary interaction interface provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of an exemplary flexible update configuration interface provided by an embodiment of the present application;

fig. 5 is a schematic flowchart of an inference method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device 600 according to an embodiment of the present application.

Detailed Description

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the relative ease with which objects of the same nature are described in embodiments of the application.

Referring to fig. 1, an architecture diagram of an inference system is shown. As shown in fig. 1, the inference system 100 includes a first inference device 101, a second inference device 102, a decision device 103, and an update device 104. The first inference device 101 and the second inference device 102 may be implemented by software or hardware. When implemented by software, the first inference means 101 and the second inference means 102 may be software running on a computer device, such as a virtual machine or the like. When implemented by hardware, each of the first inference apparatus 101 and the second inference apparatus 102 may include at least one computing device, and fig. 1 illustrates an example in which each of the first inference apparatus 101 and the second inference apparatus 102 includes a plurality of servers. In actual application, the computing devices constituting the first inference apparatus 101 and the second inference apparatus 102 may be other devices with computing capability, and are not limited to the server shown in fig. 1. The first 101 and second 102 inference means may be deployed in different environments. Exemplarily, as shown in fig. 1, the first inference apparatus 101 may be deployed in the edge network, and configured to execute a corresponding calculation process on the edge side, such as the following inference process based on the first inference model; the second inference apparatus 102 may be deployed in the cloud, and configured to perform a corresponding calculation process on the cloud, such as an inference process based on the second inference model described below. In other examples, the first inference apparatus 101 may be deployed in a local network on the user side, such as a local terminal or server; the second inference engine 102 may be deployed at an edge network. In this embodiment, the specific arrangement of the first inference device 101 and the second inference device 102 is not limited.

The decision device 103 and the updating device 104 may be deployed in the same environment as the first inference device 101, for example, the decision device 103 and the updating device 104 may be deployed in an edge side network as shown in the figure with the first inference device 101, or may be deployed in a local network. The decision device 103 and the updating device 104 may be implemented by software. In this case, the decision-making means 103 and the updating means 104 may be application programs applied to a computing device deployed in the same environment as the first inference means 101. In addition, the decision device 103 may also be implemented by hardware, in this case, the decision device 103 may be a computing device located in the same environment as the first inference device 101, such as a server; alternatively, the decision device 103 may be a device implemented by an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or the like. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof. In other possible deployments, the updating apparatus 104 may also be deployed in the same environment as the second inference apparatus 102, such as in the cloud.

The inference system 100 may, for example, include a plurality of edge servers when inferring an input sample, as shown in fig. 1, and the first inference apparatus 101 may receive the input sample sent by the terminal device 105 on the user side, where the input sample may be, for example, an image captured by the terminal device 105 (or captured by another device), or the like. Then, the first inference device 101 may utilize a first inference model obtained through pre-training to infer the acquired input sample, and obtain an inference result, for example, detect an object such as a helmet in a captured image by using the first inference model, and at the same time, the first inference model may further output a confidence level of the inference result (for representing a confidence level of the inference result). When the decision means 103 determines that the confidence of the inference result of the input sample is low (specifically, may be lower than a preset confidence threshold), the accuracy of characterizing the inference result obtained based on the first inference model may be poor, and therefore, the decision means 103 may instruct the first inference means 101 to send the input sample to the second inference means 102. At this time, the second inference device 102 may utilize a second inference model obtained through pre-training to infer the received input sample and obtain an inference result. Since the specification of the second inference model is generally higher than that of the first inference model, the accuracy of the inference result obtained based on the second inference model (i.e., the confidence indicating the inference result is correct) is generally higher, which can make the inference system 100 achieve a higher level of inference accuracy for the input sample.

In practical application scenarios, the transmission bandwidth between the first inference apparatus 101 and the second inference apparatus 102 is usually limited, and therefore, it is possible to avoid the transmission bandwidth occupied by transmitting the input samples from being too large by limiting the proportion of the input samples to be sent to the second inference apparatus 102. In particular, in case the confidence of reasoning on the input sample based on the first reasoning model is low, the decision means 103, in determining whether to transmit the input sample to the second reasoning means 102, may determine whether the ratio of the input sample received by the second reasoning means 102 with respect to the input sample received by the first reasoning means 101 exceeds a user-specified upper transmission ratio limit if the input sample is transmitted to the second reasoning means 102. If not, the decision means 103 determines that the input sample is transmitted to the second inference means in order to improve the inference accuracy for the input sample using a second, more well-defined inference model. If so, the decision-making means 103 may refuse to transmit the input sample. Accordingly, the inference result of the inference system 100 for the input sample is the result obtained by the first inference device using the first inference model to infer the input sample, which reduces the inference accuracy of the inference system 100 for the input sample.

In this process, if the preconfigured confidence threshold is a fixed value of the static configuration, the confidence threshold may make it difficult to maintain the performance of the inference system 100 at a high level. For example, in practical applications, the available transmission bandwidth between the first inference device 101 and the second inference device 102 may be increased, and at this time, if the value of the originally set confidence threshold is too large, the input sample corresponding to the inference result whose confidence is lower than the confidence threshold is not transmitted to the second inference device 102, so that the inference results output by the inference system 100 for a large number of input samples are the inference results of the first inference model with a smaller specification, and thus the overall inference accuracy of the inference system 100 is difficult to be improved. Meanwhile, the available transmission bandwidth between the first inference device 101 and the second inference device 102 may also be reduced, and at this time, if the value of the originally set confidence threshold is too small, a larger number of input samples may be transmitted to the second inference device 102 because the confidence is lower than the confidence threshold, so that the occupied transmission bandwidth is higher because the number of transmitted input samples is increased, and the larger number of input samples need to be queued for a long time to be transmitted to the second inference device 102, which also increases the time delay of the inference system 100 for inferring a part of the input samples.

Based on this, in the inference system 100 provided in the present application, the updating means 104 can dynamically update and adjust the conditions (hereinafter referred to as transmission conditions) used by the decision means 103 to determine whether to transmit the input sample to the second inference means 102. Taking the case that the transmission condition is specifically that the confidence of the inference result does not exceed the confidence threshold, in the process of providing the inference service by the inference system 100, the updating device 104 may detect whether the inference system 100 currently satisfies the update triggering condition, and if so, the updating device 104 may update the confidence threshold and the like used in the decision device 103 for determining whether to transmit the input sample, so that the decision device 103 determines whether to transmit the input sample to the second inference device 102 by using the updated confidence threshold. In this way, by dynamically adjusting the confidence threshold in the decision-making means 103, the performance of the inference system 100 can be kept at a high level. For example, when the transmission bandwidth available between the first inference apparatus 101 and the second inference apparatus 102 increases, the updating apparatus 104 may increase the confidence threshold accordingly, so that more input samples (whose confidence of the inference result is lower than the increased confidence threshold) can be transmitted to the second inference apparatus 102, and make inference on the input samples by using the second inference model with larger specification in the second inference apparatus 102, thereby improving the overall inference accuracy of the inference system 100. When the transmission bandwidth available between the first inference device 101 and the second inference device 102 decreases, the update device 104 may correspondingly decrease the confidence threshold, so as to decrease the number of input samples that need to be transmitted to the second inference device 102 by decreasing the confidence threshold, thereby reducing the time delay of the inference system 100 for inferring the input samples and the consumed transmission bandwidth.

It should be noted that the inference system 100 shown in fig. 1 is only used as an exemplary illustration, and is not used to limit the specific implementation of the inference system. For example, in other possible embodiments, the inference system 100 may include more functional modules to support the inference system 100 with more other functions; alternatively, the decision device 103 and the updating device 104 in the inference system 100 may be integrated into one functional module, etc.

For the sake of understanding, embodiments of the present application will be described below with reference to the accompanying drawings.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an inference system provided in the embodiment of the present application. The inference system 100 shown in fig. 2 is deployed in a cloud collaborative scenario, that is, the first inference device 101, the decision device 103, and the update device 104 are all deployed in a marginal network, and the second inference device 102 is deployed in a cloud. On the basis of the inference system 100 shown in fig. 1, the updating apparatus 104 in the inference system 100 shown in fig. 2 includes an acquisition module 1041, a monitoring module 1042 and an updating module 1043.

In this embodiment, the first inference device 101 in the inference system 100 is configured with a first inference model in advance, and the second inference device 102 is configured with a second inference model in advance. In practical application scenarios, the computing performance of the cloud is generally higher (e.g., the amount of available computing resources is larger) than that of the edge side, and for this reason, the second inference apparatus 102 located in the cloud may perform relatively complex computing tasks, and the first device 101 located in the edge network may perform relatively simple computing tasks. Based on this, in this embodiment, the specification of the first inference model configured for the first inference apparatus 101 is smaller than the specification of the second inference model configured for the second inference apparatus 102. For example, the file size of the first inference model is 50M (mega), while the file size of the second inference model is 200M, etc. As an example, the first inference model and the second inference model may be, for example, machine learning models constructed based on a machine learning algorithm, and then the specification of the first inference model is smaller than that of the second inference model, which may be specifically that the number of neural network layers in the first inference model is smaller than that in the second inference model, or the number of parameters included in the first inference model is smaller than that in the second inference model. At this time, the computational power (flops) of the first inference model is less than that of the second inference model, and accordingly, the demand amount of the first inference model for the computational resources at the time of runtime is also less than that of the second inference model for the computational resources at the time of runtime.

The inference system 100 may be implemented by the updating device 104 to configure the inference model for the first inference device 101 and the second inference device 102, or may be configured by other devices. For convenience of description, the following description is exemplified by configuring the inference model by the updating apparatus 104.

In one possible implementation, the updating apparatus 104 may present an interactive interface as shown in fig. 3 to the user side, and prompt the user to specify constraints for the inference system 100 through the interactive interface, and provide training samples for model training. The constraint condition may be, for example, inference accuracy, specification of an inference model, an AI framework supported by the first inference device 101 located in the edge network, an inference target of the inference system 100, an upper limit of a transmission bandwidth (or an inference delay) between the first inference device 101 and the second inference device 102, a maximum ratio of the number of input samples inferred by the second inference device 102 to the number of input samples inferred by the first inference device 101 (hereinafter, referred to as an upper limit of a transmission ratio). In practical applications, the user-specified constraint condition may also include other contents, such as a condition for triggering update of a confidence threshold, an upper limit of a transmission ratio, or an inference model, as described below.

The AI framework supported by the first inference apparatus 101 may be, for example, a tensrflow framework, a pitorch framework, a mindspore framework, or the like, and different AI frameworks support inference models with different file formats. The inference target of the inference system 100 is used to indicate the application scenario of the inference model, such as object detection, image classification, etc. by using the inference model. The updating apparatus 104 may then use the updating module 1043 to construct an initial inference model according to the constraint conditions specified by the user. The specification of the constructed initial reasoning model is the specification appointed by the user; the file format of the initial reasoning model is a file format supported by an AI frame appointed by a user; and the inference target of the initial inference model is the inference target specified by the user. Next, the updating module 1043 may train the constructed initial inference model by using a training sample provided by the user, and stop training until the inference precision of the initial inference model reaches the inference precision specified by the user. At this time, the updating module 1043 may send the trained initial inference model to the second inference apparatus 102 so as to serve the initial inference model as the second inference model configured to the second inference apparatus 102.

After the second inference model is trained, the updating module 1043 may generate the first inference model according to the second inference model. For example, after generating the second inference model, the updating module 1043 may instruct the collecting module 1041 to feed back the resource amount of the available resources in the first inference apparatus 101. The acquisition module 1041 sends a resource detection request to the first inference device 101 to detect the resource amount of the currently available resource of the first inference device 101, and feeds back the detection result to the update module. The available resources on the first inference device 101 may include, for example, computing resources (such as CPUs, etc.), storage resources (such as cloud disks, etc.), and the like. The updating module 1043 may determine the specification of the first inference model to be generated according to the acquired resource amount of the available resource, for example, assuming that 64 processor cores are required for the inference model supporting specification 1 to operate and 128 processor cores are required for the inference model supporting specification 2 to operate (specification 2 is greater than specification 1), when the resource amount of the available resource acquired by the updating module 1043 indicates that 88 processor cores currently exist in an idle state on the first inference apparatus 101, the updating module 1043 may determine that the specification of the first inference model to be generated is specification 1, so that the first inference apparatus 101 can have enough resources to support the operation of the first inference model. After the specification of the first inference model to be generated is determined, the updating module 1043 may process the second inference model through model compression, model distillation, and the like, generate the first inference model with the specification, and send the generated first inference model to the first inference device 101, so as to configure the first inference model for the first inference device 101. Further, before sending the first inference model to the first inference apparatus 101, the first inference model may be trained again by using the training sample, and the trained first inference model may be sent to the first inference apparatus 101.

It should be noted that the above-mentioned manner of generating the first inference model and the second inference model is only an exemplary illustration, and other manners may be used for generating the first inference model and the second inference model in practical applications. For example, in other possible embodiments, the updating module 1043 may also construct the first inference model and the second inference model with different specifications at the same time, and use the same training sample to complete the training for the first inference model and the second inference model respectively.

Meanwhile, the updating module 1043 may also configure the decision device 103 with the transmission condition according to the inference precision, the upper limit of the transmission bandwidth and the upper limit of the transmission ratio in the constraint condition specified by the user, for example, may configure a confidence threshold in the transmission condition or configure a discriminant model (based on which the decision device 103 decides whether to transmit the input sample to the second inference device 102). Taking the configuration of the confidence threshold in the transmission condition as an example, the updating module 1043 may calculate the confidence threshold according to the data amount of the input sample acquired by the first inference device 101 in the unit time and the upper limit of the transmission bandwidth between the first inference device 101 and the second inference device 102, where the confidence threshold makes the average inference precision of the inference system 100 on the input sample not lower than the inference precision specified by the user. Further, the confidence threshold calculated by the updating module 1043 may be such that a bandwidth occupied by sending the input samples to the second inference device 102 in the unit time does not exceed the upper limit of the transmission bandwidth, that is, an occupation ratio of the number of input samples transmitted to the second inference device 102 to the number of input samples does not exceed the upper limit of the transmission ratio.

After the configuration of the first inference means 101, the second inference means 102 and the decision means 103 is completed, the first inference means 101 may receive an input sample and make an inference on the input sample by using the configured first inference model, so that the first inference model outputs an inference result and a confidence of the inference result, etc. For example, in a helmet identification scenario, the first inference apparatus 101 may receive an image captured and transmitted by the terminal device 105 located on the user side, where the image includes images of one or more workers; then, the first inference device 101 recognizes the image using the first inference model, outputs each worker in the image and whether each worker wears a helmet, and gives confidence of the recognition result.

In general, when the confidence of the inference result output by the first inference model is greater than a preset confidence threshold, the decision device 103 may output the inference result to the terminal device 105 on the user side, so that the terminal device 105 performs a corresponding operation according to the inference result. For example, in a crash helmet detection scenario, when the terminal device 105 determines that there is a part of the staff who does not wear the crash helmet according to the inference result, the terminal device 105 may trigger a monitoring alarm so that the monitoring staff timely informs the staff that the crash helmet or the like is worn correctly. When the confidence of the inference result output by the first inference model is smaller than the confidence threshold, the accuracy of the inference result obtained by using the first inference model with a smaller specification is represented to be lower, and at this time, the decision device 103 may instruct the first inference device 101 to send the input sample to the second inference model. The second inference means 102 may infer the received input samples using the configured second inference model and send the inference result output by the second inference model to the terminal device 105. Because the specification of the second inference model is relatively large, the accuracy of the inference result obtained by using the second inference model to infer the input sample is relatively high, so that the inference precision of the inference system 100 on the input sample is ensured to be kept at a high level.

In a further possible embodiment, the decision device 103 may further determine whether the ratio of the input samples sent with respect to the total input samples (i.e. all input samples) received by the first inference device 101 exceeds a pre-configured upper transmission ratio limit, or not, the transmission conditions for transmitting the input samples to the second inference device 102. If the ratio of the input samples sent relative to the total input samples (i.e. all input samples) received by the first inference device 101 exceeds the upper limit of the pre-configured transmission ratio after the input samples are sent to the second inference device 102, the decision device 103 will send the inference result to the terminal equipment 105 even if the confidence of the first inference model inferring the input samples is low, so as to prevent the transmission bandwidth between the first inference device 101 and the second inference device 102 from exceeding the upper limit of the transmission bandwidth. If not, the decision means 103 may instruct the first inference means 101 to send the input sample to the second inference means 101 in order to obtain a more accurate inference result for the input sample.

It should be noted that, in this embodiment, the transmission conditions, such as the confidence threshold, configured by the updating device 104 for the decision device 103 may be dynamically adjusted according to the operation condition of the inference system 100, so that the inference system 100 can maintain high performance.

In a specific implementation, taking the update transmission condition specifically as a value of the update confidence threshold as an example, the updating device 104 may monitor whether the inference system 100 meets a preset first update triggering condition, and when the inference system 100 meets the first update triggering condition, the updating device 104 may update the configured value of the confidence threshold according to the first update triggering condition.

As an example, when updating the value of the confidence threshold, the updating device 104 may specifically increase the confidence threshold, and accordingly, the preset first update triggering condition may specifically include:

1. the average inference precision of the first inference model over the first time period is below a first precision threshold.

It can be understood that if the inference system 100 feeds back the inference result to the terminal device 105 based on the first inference model, the inference accuracy of the entire inference system 100 is lowered because the inference accuracy of the first inference model is low. Therefore, the updating means 104 may increase the number of input samples transmitted by the first inference means 101 to the second inference means 102 by increasing the confidence threshold (i.e. input samples whose confidence level of the original inference result is lower than the increased confidence threshold are also transmitted to the second inference means 102). In this way, for a greater number of input samples with lower confidence of the inference result output by the first inference model, inference can be performed through the second inference model in the second inference device 102, so as to improve the accuracy of inference performed on the input samples by the inference system 100.

Illustratively, the average inference precision of the first inference model in the first time period may be, for example, an average of the confidence values of the inference results of the first inference model for the respective input samples in the first time period, that is, the confidence value of the inference result may be used as the inference precision of the first inference model for the input samples. In this embodiment, the updating device 104 may obtain the average inference precision of the first inference model through monitoring by the monitoring module 1042, and determine whether to update the confidence level threshold according to the average inference precision by the updating module 1043.

Further, before increasing the confidence threshold, the updating device 104 may also determine whether the average remaining transmission bandwidth between the first inference device 101 and the second inference device 102 in the first time period is higher than a preset threshold. The remaining transmission bandwidth refers to a difference between a preset upper limit of the transmission bandwidth and a transmission bandwidth used between the first inference device 101 and the second inference device 102. Accordingly, the average remaining transmission bandwidth refers to an average value of the remaining transmission bandwidths at a plurality of time points in the first time period. Moreover, the preset threshold value can be preset by technicians according to the requirements of actual application scenarios. When the average remaining transmission bandwidth is higher than the preset threshold, it is characterized that there is sufficient bandwidth resource available for transmitting data between the first inference device 101 and the second inference device 102 for a long time, and at this time, the updating device 104 may increase the number of input samples transmitted from the first inference device 101 to the second inference device 102 by increasing the confidence threshold, so as to improve the accuracy of the inference system 100 for inferring the input samples. When the average remaining transmission bandwidth is not higher than the preset threshold, it indicates that the bandwidth resource between the first inference device 101 and the second inference device 102 is relatively tight, and at this time, the updating device 104 may not increase the confidence threshold, so as to avoid the problem that the confidence threshold is too large to aggravate the bandwidth resource shortage between the first inference device 101 and the second inference device 102.

2. The transmission bandwidth between the first 101 and second 102 inference means increases.

It can be understood that, when the transmission bandwidth between the first inference device 101 and the second inference device 102 increases, the bandwidth resource between the two is more sufficient, and since the accuracy of inferring the input sample by using the second inference model with a larger specification is generally higher than the accuracy of inferring the input sample by using the first inference model with a smaller specification, the update device 104 may increase the number of the input samples transmitted by the first inference device 101 to the second inference device 102 by increasing the confidence threshold value, so as to improve the accuracy of inferring the input sample by the inference system 100.

Of course, in actual application, the first update triggering condition may be other conditions besides the above example, and this embodiment does not limit this. The first update trigger condition may be any one of the conditions in the above examples, or may include a plurality of types at the same time.

In a further possible embodiment, when the transmission ratio upper limit specified by the user is allowed to be adjusted, for example, after the user specifies the transmission ratio upper limit on the interactive interface, the transmission ratio upper limit is further specified in the interactive interface to be adaptively adjusted, at this time, after determining that the inference system 100 satisfies the first update trigger condition, the updating device 104 may update the transmission condition in the decision device 103 by increasing the confidence threshold, or by increasing the transmission ratio upper limit, so that the inference model 100 can utilize the second inference model to infer a greater number of input samples, thereby improving the inference accuracy.

In addition, when updating the confidence threshold in the transmission condition, the updating device 104 may increase the confidence threshold or decrease the confidence threshold. As another example, when the updating apparatus 104 decreases the confidence threshold, the preset first update triggering condition may specifically include:

1. the ratio of the input samples sent by the first inference device 101 to the second inference device 102 to the total input samples received by the first inference device 101 exceeds a preset upper transmission ratio limit.

In practical application scenarios, the difficulty level of the first inference model for inferring different input samples may be different. For example, in a helmet detection scenario, the input sample may specifically be a captured image for a worker at a worksite. In the non-working time period, such as 0 to 9 am and 18 pm, the number of staff arriving at the worksite is generally small, and accordingly, the number of staff who need to be detected whether to wear the safety helmet is small, and then, the photographed image is identified by using the first inference model with a small specification (i.e., the foregoing inference), and the staff in the photographed image and whether each staff wears the safety helmet can be generally accurately identified (the confidence of the inference result is high), i.e., the inference difficulty of the first inference model is low. During the working period, such as 9.

Therefore, when the proportion of the number of the input samples with confidence degrees lower than the confidence degree threshold value to the number of all the input samples in the input samples reasoned by the first inference model exceeds the preset limited upper transmission proportion limit, a large number of inference results with low accuracy exist currently for representing. At this time, the updating apparatus 104 may reduce the number of input samples transmitted by the first inference apparatus 101 to the second inference apparatus 102 by reducing the confidence threshold, that is, the input samples whose confidence of the inference result is smaller than the confidence threshold before the adjustment but larger than the confidence threshold after the adjustment may not be transmitted to the second inference apparatus, so as to avoid that the proportion of the input samples transmitted to the second inference apparatus 102 exceeds the upper limit of the transmission proportion specified by the user.

In practical application scenarios, when the upper limit of the transmission ratio is allowed to be adjusted, the updating means 104 may increase the number of input samples transmitted by the first inference means 101 to the second inference means 102 by increasing the upper limit of the transmission ratio. In this way, for the input sample with low confidence of the inference result output by the first inference model, inference can be performed through the second inference model in the second inference device 102, so as to improve the accuracy of the inference system 100 in inferring the input sample.

2. The transmission bandwidth between the first 101 and second 102 inference means is reduced.

It can be understood that the preset upper limit of the transmission ratio is related to a larger transmission bandwidth between the first inference device 101 and the second inference device 102, and when the transmission bandwidth between the first inference device 101 and the second inference device 102 decreases, if the input samples are still transmitted to the second inference device 102 according to the originally set confidence threshold, the first inference device 101 has a problem of insufficient transmission bandwidth when transmitting the input samples. To this end, the updating means 104 may reduce the number of input samples transmitted by the first reasoning means 101 to the second reasoning means 102 by reducing the confidence threshold, so as to reduce the consumption of transmission bandwidth between the first reasoning means 101 and the second reasoning means 102, adapting to the current amount of transmission bandwidth.

In other possible embodiments, when the upper limit of the transmission ratio is allowed to be adjusted, the updating means 104 may also reduce the number of input samples transmitted by the first inference means 101 to the second inference means 102 by reducing the upper limit of the transmission ratio, so as to reduce the consumption of transmission bandwidth between the first inference means 101 and the second inference means 102.

In practical applications, in addition to the above examples, the first update triggering condition that triggers the updating apparatus 104 to reduce the confidence threshold may also be implemented in other ways, and this embodiment does not limit this. The first update trigger condition may be any one of the conditions in the above examples, or may include a plurality of types at the same time.

When the inference system 100 continuously provides inference services for the terminal device 105, the monitoring module 1042 in the updating apparatus 104 may continuously monitor the inference system 100 to determine whether the confidence threshold in the decision apparatus 103 needs to be updated, and after it is determined that the confidence threshold needs to be updated, a specific value of the updated confidence threshold may be further determined, so that the decision apparatus 103 may subsequently determine whether to transmit the input sample to the second inference apparatus 102 according to the updated confidence threshold.

It should be noted that, in the above embodiments, the confidence threshold in the updated transmission condition is taken as an example for illustration. In practice, the update transmission condition may also be an update discriminant model, that is, the decision device 103 may determine whether to transmit the input sample to the second inference device 102 by using the discriminant model. Specifically, the decision module may be, for example, a binary model, and the decision device 103 may input the inference result and the confidence level output by the first inference model into the decision module, and output the decision result by the decision module, so that the decision device 103 may determine whether to transmit the input sample corresponding to the inference result to the second inference device 102 according to the decision result. Accordingly, when the updating device 103 updates the transmission condition, it may specifically update the discriminant model in the decision device 103, such as updating parameters in the discriminant model or a network structure, which is not limited in this embodiment.

In this embodiment, the updating device 104 may update not only the confidence threshold in the decision device 103, but also a first inference model configured in the first inference device 101 and/or a second inference model configured in the second inference device 102.

Specifically, the updating device 104 may monitor whether the inference system 100 satisfies a second update triggering condition set in advance, and when the inference system 100 satisfies the second update triggering condition, the updating device 104 may update the configured first inference model and/or the second inference model according to the second update triggering condition.

The updating device 104 may update the specification of the inference model when updating the inference model, or may retrain the inference model.

In one example, the updating means 104 may employ a flexible update mechanism to update the specification of the first inference model.

Specifically, since the resources of the first inference device 101 deployed in the edge network are generally limited, and in an actual application scenario, the first inference device 101 may be used not only to provide inference services for the terminal device 105, but also to provide other business services, such as big data search, edge cloud computing, and the like, and the priority of providing different business services by the first inference device 101 may also be different. Therefore, when the first inference device 101 occupies more resources of the first inference device 101 when providing other business services with higher priority, resulting in a decrease in the resource amount of the available resources of the first inference device 101 for providing inference services, the currently remaining available resources on the first inference device 101 may have difficulty in supporting the first inference device 101 to infer the input samples on the edge side by using the first inference model with the original specification. The updating means 104 may thus reduce the specifications of the first inference model, for example by performing model distillation or model compression on the original first inference model, so that the currently remaining available resources on the first inference means 101 can support the inference of the input samples by the first inference model of smaller specifications. The resource amount of the available resource on the first inference apparatus 101 may be detected by the acquisition module 1041 in the updating apparatus 104. Alternatively, when the load of the first inference device 101 is large, for example, a duration that the CPU utilization on the first inference device 101 continuously reaches a preset value (e.g., 80%) exceeds a preset duration, or a video memory utilization of a Graphics Processing Unit (GPU) exceeds an upper limit of the utilization, the updating device 104 may also reduce the specification of the first inference model.

Conversely, when the resource amount of the available resources for providing the inference service by the first inference device 101 increases or the load of the first inference device 101 is smaller, the update device 104 may increase the specification of the first inference model, for example, by reconstructing the inference model, and generate the first inference model with a larger specification according to the increased resource amount of the available resources, so as to improve the inference accuracy of inferring the input sample on the edge side by using the inference model with a larger specification, and at the same time, improve the confidence of inferring the input sample by the first inference model with a larger specification, thereby reducing the number (or proportion) of the input samples transmitted by the first inference device 101 to the second inference device 102, and reducing the consumption of transmission bandwidth.

Alternatively, the updating means 104 may determine the resource amount of the available resources of the first inference means 101 during a second time period, which may be, for example, a past or future time period (such as a week, a month, etc.), so that the updating means 104 may update the specification of the first inference model according to the resource amount of the available resources of the first inference means 101 during the second time period. For example, the updating module 1043 may collect, by the collecting module 1041, a change of the resource amount of the available resource of the first inference device 101 in a past period of time, and predict the resource amount of the available resource of the first inference device 101 in a second period of time in the future according to the change of the resource amount. When the predicted resource amount of the available resource is larger than the current resource amount of the available resource, the updating module 1043 may increase the specification of the first inference model according to the predicted resource amount of the available resource. In this way, the first inference means 101 can infer the input sample on the edge side using the updated, larger-sized first inference model during the second time period. Conversely, when the predicted resource amount of the available resource is smaller than the resource amount of the current available resource, the updating module 1043 may decrease the specification of the first inference model. Alternatively, the updating module 1043 may also collect, by the collecting module 1041, an average resource amount of the available resources of the first inference device 101 in the past second time period, and when the average resource amount is greater than the resource amount of the current available resources, the updating module 1043 may increase the specification of the first inference model; and when the average resource amount is smaller than the resource amount of the currently available resource, the updating module 1043 may decrease the specification of the first inference model.

In practice, the inference system 100 may also present (through the terminal device 105) the user with a flexible update configuration interface as shown in fig. 4, in which prompt information prompting the user whether to select flexible update of the first inference model, such as "please select whether to flexibly update the inference model" as shown in fig. 4, may be presented. In this way, the updating device 104 can determine whether to automatically update the first inference model in the first inference device 101 according to the selection operation of the user for elastically updating the inference model.

In yet another example, the updating apparatus 104 may update the first inference model and/or the second inference model by way of incremental training.

In an actual application scenario, there may be a change in data feature distribution of the input sample inferred by the inference system 100, so that the inference accuracy of the input sample by the first inference model and/or the second inference model is reduced, and even a model failure occurs. Still taking the helmet detection scenario as an example, the first inference model and the second inference model in the inference system 100 may identify a red helmet in a captured image (i.e., an input sample or an input sample), but if the color of a helmet worn by a worker working on a worksite is changed to yellow or blue, etc., the first inference model and the second inference model may be difficult to identify the yellow or blue helmet, thereby reducing the identification accuracy of the inference system 100 for the helmet.

To this end, in the process of providing inference service by the inference system 100, the monitoring module 1042 in the updating apparatus 104 may monitor whether the average inference precision of the first inference model in the first time period is lower than a first precision threshold and whether the remaining transmission bandwidth between the first inference apparatus and the second inference apparatus is lower than a preset threshold, and feed back the monitoring result to the updating module 1043. When the updating module 1043 determines that the average inference precision is lower than the first precision threshold and the remaining transmission bandwidth is lower than the preset threshold, the updating module 1043 determines to update the first inference model.

As an implementation example, the updating module 1043 may update the first inference model by way of incremental training. Specifically, the updating module 1043 may obtain an incremental training sample, so that the updating module 1043 may perform incremental training on the first inference model by using the incremental training sample, so as to improve the inference accuracy of the first inference model on the input sample. Wherein, the incremental training samples can be labeled in advance by the user and provided to the inference system 100; alternatively, the incremental training samples may be generated by the second inference model, etc., when the first inference model fails and the second inference model does not fail. For example, in a helmet detection scenario, a shot image which is labeled in advance and includes a yellow or blue helmet can be used as an incremental training sample, and the first inference model is subjected to incremental training by using the shot image, so that the first inference model obtained by the incremental training can effectively infer the red, yellow or blue helmet in the shot image.

While updating the first inference model, the monitoring module 1042 may also monitor whether the average inference precision of the second inference model in the first time period is lower than the second precision threshold, and feed back the monitoring result to the updating module 1043. When the updating module 1043 determines that the average inference precision is lower than the second precision threshold, the updating module 1043 performs an updating process on the second inference model. Illustratively, this second accuracy threshold may be greater than the aforementioned first accuracy threshold, for example. The updating module 1043 may also update the second inference model by incremental training or reconstructing the inference model, and a specific implementation manner of the updating module 1043 is similar to that of the updating the first inference model, which can refer to the description of the relevant parts in the foregoing description and is not described herein again.

Of course, the implementation manner of incrementally updating the first inference model and the second inference model is only an exemplary illustration, and in other implementation manners, the updating module 1043 may also complete the updating of the first inference model and the second inference model by reconstructing and training the models, which is not limited in this embodiment. When the first reasoning model is updated, the updating module 1043 may perform incremental training on the first reasoning model after specification adjustment by using the incremental training sample while adjusting the specification of the first reasoning model.

In practical applications, before the first inference model and the second inference model are updated, the inference system 100 may continue to provide inference services for the terminal device 105 by using the first inference model and the second inference model before updating, and after the inference model is updated, the inference system 100 may provide inference services for the terminal device 105 by using the first inference model and the second inference model after updating, so as to avoid interruption of the inference services provided by the inference system 100 due to updating of the inference model.

It should be noted that, in the present embodiment, the first inference device 101 is deployed in the edge network, and the second inference device 102 is deployed in the cloud, for example, in other implementation manners, the first inference device 101 may also be deployed in the local network, and the second inference device 102 is deployed in the edge network, at this time, the inference process of the inference system 100 for the input sample and the process of updating the confidence threshold and the model are similar to the above processes, and specific reference may be made to the description of the relevant points of the foregoing embodiments, which is not repeated herein.

Referring to fig. 5, fig. 5 is a schematic flowchart of an inference method provided in the embodiment of the present application. The inference method shown in fig. 5 can be applied to the inference system 100 shown in fig. 2, or other applicable inference systems. For convenience of explanation, the embodiment is exemplified by applying the inference system 100 shown in fig. 2, and the inference system 100 infers two different input samples.

Based on the inference system 100 shown in fig. 2, the inference method shown in fig. 5 may specifically include:

s501: the first inference means 101 receives an input sample 1.

Illustratively, the user-side terminal device 105 may send an input sample 1 to the first inference engine 101, which may be, for example, a captured image, such as for example a captured image for a construction site in a helmet scenario, or other sample for input as a model.

S502: the first inference device 101 infers the input sample 1 by using the inference model 1 with a smaller specification configured in advance, and obtains an inference result 1 and a confidence 1.

S503: when the confidence 1 is greater than the confidence threshold, the first inference device 101 feeds back the inference result 1 to the terminal equipment 105; and when the confidence 1 is less than the confidence threshold, the first inference means 101 requests the decision means 103 to send the input sample 1 to the second inference means 102.

In general, if the confidence level 1 output by the inference model 1 is greater than a preset confidence level threshold, the confidence level that the inference result 1 output by the characterization inference model 1 is correct is higher, that is, it can be regarded that the accuracy of the inference result 1 is higher. At this time, the first inference device 101 may feed back a relatively accurate inference result 1 to the terminal device 101. On the contrary, if the confidence 1 output by the inference model 1 is smaller than the preset confidence threshold, it can be regarded that the inference result 1 is inaccurate. At this time, the first inference apparatus 101 may request the decision apparatus 103 to send the input sample 1 to the second inference apparatus 102, so as to make a more accurate inference on the input sample 1 by using the inference model 2 with a larger specification on the second inference apparatus 102.

S504: the decision means 103 allow the first inference means 101 to upload the input sample 1 to the second inference means 102, on condition that it is determined that the upper limit of the transmission scale is not exceeded.

The upper limit of the transmission ratio may be specified by a user in advance, and specifically may be a specific value of the upper limit of the transmission ratio input by the user, or the inference system 100 may calculate the upper limit of the transmission ratio according to the inference precision specified by the user and the upper limit of the transmission bandwidth between the first inference device 101 and the second inference device 102.

As an implementation example, the decision device 103 may monitor whether the ratio of the number of input samples sent to the second inference device 102 to the number of input samples processed by the first inference device 101 exceeds the upper limit of the pre-configured transmission ratio after sending the input sample 1 to the second inference device 102. If not, the first inference means 101 is allowed to upload the input sample 1 to the second inference means 102. If the confidence level 1 of the inference model 1 inferring the input sample 1 is higher than the threshold, the decision device 103 sends the inference result 1 to the terminal device 105 (not shown in fig. 5), so as to avoid the transmission bandwidth between the first inference device 101 and the second inference device 102 exceeding the upper limit of the transmission bandwidth.

S505: the first inference means 101 sends the input sample 1 to the second inference means 102.

S506: the second inference device 102 uses the pre-configured inference model 2 with a large specification to perform inference on the input sample 1, so as to obtain an inference result 2 (and confidence 2).

S507: the second inference means 102 sends the inference result 2 (and the confidence 2) to the terminal device 105.

In actual application, the second inference apparatus 102 may send the inference result 2 (and confidence 2) for the input sample 1 to the terminal device 105 or the like through the first inference apparatus 101.

S508: the updating means 104 determines and updates the confidence threshold in the decision means 103 by detecting the inference system 100.

In particular implementation, the updating device 104 monitors whether the inference system 100 satisfies a first update trigger condition, and when the first update trigger condition is satisfied, the updating device 104 may update the value of the configured confidence threshold according to the first update trigger condition. Of course, the updating means 104 may not update the confidence threshold if the inference system 100 does not satisfy the first update triggering condition.

For example, the updating device 104 may increase the confidence threshold, and at this time, the preset first update triggering condition may specifically include:

In another example, the updating device 104 may decrease the confidence threshold, and at this time, the preset first update triggering condition may specifically include:

1. the transmission bandwidth between the first 101 and second 102 inference means is reduced.

2. The proportion of the input samples sent by the first inference device 101 to the second inference device 102 relative to the total input samples received by the first inference device 101 exceeds a preset upper limit of the transmission proportion

The specific implementation manner of the first update triggering condition may refer to the description of relevant parts in the foregoing embodiments, and details are not described here. In practical application, the first update triggering condition may also be implemented in other manners, which is not limited in this embodiment. Moreover, when the user allows to adjust the upper limit of the transmission ratio, the updating device 104 may also adjust the value of the upper limit of the transmission ratio so as to keep the performance of the inference system 100 at a high level when the first update triggering condition is satisfied.

S509: the updating means 104 determine to update the inference model 1 in the first inference means 101 and to update the inference model 2 in the second inference means 102 by detecting the inference system 100.

In a specific implementation, the updating apparatus 104 monitors whether the inference model in the inference system 100 satisfies the second update trigger condition, and when the second update trigger condition is satisfied, the updating apparatus 104 may update the configured inference model 1 according to the second update trigger condition, and further, the updating apparatus 104 may update the configured inference model 2 according to the second update trigger condition. Of course, if the inference model in the inference system 100 does not satisfy the second update trigger condition, the updating means 104 may not update the inference model.

Illustratively, the updating means 104 may adjust the specification of the inference model 1 when the resource amount of the available resource of the inference service provided by the first inference apparatus 101 changes or the load of the first inference apparatus 101 changes. For example, when the resource amount of the available resources of the first inference means 101 decreases or the load of the first inference means 101 increases, the update means 101 may decrease the specification of the inference model 1; and when the resource amount of the available resources of the first inference means 101 increases or the load of the first inference means 101 decreases, the update means 101 may increase the specification of the inference model 1.

Alternatively, the updating device 104 may update the inference model 1 and/or the inference model 2 by incremental training or retraining when it is determined that the inference accuracy of the inference model 1 and/or the inference model 2 on the input sample is reduced, or even when the inference model 1 and/or the inference model 2 fails. The updating device 104 determines that the inference precision of the inference model 1 and the inference model 2 is reduced and the inference model is updated, which may refer to the description of the relevant parts of the foregoing embodiments and will not be described herein again.

It should be noted that, in the present embodiment, it is exemplified that the updating device 104 updates the confidence level threshold and the inference model at the same time, and in actual application, the updating device 104 may update only the confidence level threshold or only the inference model, which is not limited in this embodiment.

S510: the first inference means 101 receives the input sample 2.

S511: the first inference device 101 infers the input sample 2 using the updated inference model 1, and outputs an inference result 3 and a confidence 3.

S512: when the confidence coefficient 3 is greater than the updated confidence coefficient threshold, the first inference device 101 feeds back the inference result 3 to the terminal equipment 105; and when the confidence level 3 is less than the updated confidence level threshold, the first inference apparatus 101 requests the decision apparatus 103 to send the input sample 2 to the second inference apparatus 102.

S513: the decision means 103 allow the first inference means 101 to upload the input sample 2 to the second inference means 102, on condition that it is determined that the upper limit of the transmission ratio is not exceeded.

S514: the first inference means 101 sends the input sample 2 to the second inference means 102.

S515: the second inference means 102 infers the input sample 2 using the updated inference model 2 and outputs an inference result 4 (and confidence 4).

S516: the second inference means 102 sends the inference result 4 (and the confidence 4) to the terminal device 105.

In practical applications, the second inference means 102 may send the inference result 4 (and the confidence 4) for the input sample 2 to the terminal device 105 or the like through the first inference means 101.

In the above embodiment, the confidence threshold, inference model 1 or inference model 2 is updated in the interval of reasoning two input samples, and in other embodiments, the updating apparatus 104 may also complete the updating of the confidence threshold, inference model 1 or inference model 2 in the process of reasoning input sample 2.

In the above embodiments, the updating apparatus 104 involved in the inference process for the input sample may be software configured on a computer device, and the software is run on the computer device, so that the computer device can implement the functions of the updating apparatus 104. The updating means 104 involved in the process of reasoning the input samples will be described in detail below based on the implementation of the hardware device.

Fig. 6 shows a computer device. The computer device 600 shown in fig. 6 may be specifically used to implement the functions of the updating apparatus 104 in the embodiment shown in fig. 5.

Computer device 600 includes bus 601, processor 602, communication interface 603, and memory 604. The processor 602, memory 604, and communication interface 603 communicate over a bus 601. The bus 601 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus. The communication interface 603 is used for communication with the outside, such as receiving a data acquisition request transmitted by a terminal.

The processor 602 may be a Central Processing Unit (CPU). The memory 604 may include a volatile memory (volatile memory), such as a Random Access Memory (RAM). The memory 604 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, an HDD, or an SSD.

The memory 604 stores executable code, and the processor 602 executes the executable code to perform the method performed by the resource scheduling device 101.

In particular, in case of implementing the embodiment shown in fig. 5 and the updating apparatus 104 described in the embodiment shown in fig. 5 is implemented by software, the software or program code required for executing the functions of the updating apparatus 104 in fig. 5 is stored in the memory 604, the interaction of the updating apparatus 104 with other devices is implemented by the communication interface 603, and the processor is used for executing the instructions in the memory 604 to implement the method executed by the updating apparatus 104.

In addition, an embodiment of the present application further provides a computer-readable storage medium, in which instructions are stored, and when the instructions are executed on a computer device, the computer device is caused to execute the method executed by the update apparatus 104 according to the above-described embodiment.

In addition, the embodiment of the present application also provides a computer program product, and when the computer program product is executed by a computer, the computer executes any one of the foregoing inference methods. The computer program product may be a software installation package which can be downloaded and executed on a computer in the event that any of the foregoing inference methods need to be used.

It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optics, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device, such as a training device, data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

Claims

1. An inference system, characterized in that it comprises first inference means, second inference means, updating means and decision means;

the first reasoning device is used for reasoning the input sample by utilizing a first reasoning model;

the decision-making device is used for determining to transmit the input sample to the second reasoning device under the condition that the reasoning result of the first reasoning model for the input sample meets the transmission condition;

the second reasoning device is used for reasoning the input sample by using a second reasoning model, wherein the specification of the first reasoning model is smaller than that of the second reasoning model;

and the updating device is used for updating the transmission condition when the inference system meets a first updating trigger condition.

2. The inference system according to claim 1, wherein the transmission conditions include that a confidence of the inference result is below a confidence threshold;

the updating device is used for updating the confidence coefficient threshold value.

3. The inference system according to claim 2, characterized in that the transmission conditions further comprise that the ratio of input samples sent by the first inference means to the second inference means with respect to the total input samples received by the first inference means does not exceed a transmission ratio ceiling.

4. The inference system according to claim 2 or 3, characterized in that the first update trigger condition comprises at least one of an average inference precision of the first inference model being below a first precision threshold over a first time period, an increase in transmission bandwidth between the first inference means and the second inference means;

the updating means is configured to increase the confidence level threshold.

5. The inference system according to claim 4, wherein the updating means is configured to increase the confidence threshold when an average remaining transmission bandwidth between the first inference apparatus and the second inference apparatus in the first time period is higher than a preset threshold.

6. The inference system according to claim 2 or 3, characterized in that the first update trigger condition comprises at least one of a reduction of transmission bandwidth between the first inference apparatus and the second inference apparatus, a ratio of input samples sent by the first inference apparatus to the second inference apparatus with respect to a total input samples received by the first inference apparatus exceeding an upper transmission ratio limit;

the updating device is used for reducing the confidence coefficient threshold value.

7. The inference system according to any of claims 1 to 6, characterized in that the updating means are further configured to update the first inference model and/or the second inference model when a second update triggering condition is met.

8. The inference system according to claim 7, characterized by said updating means is configured to update said first inference model when the average inference precision of said first inference model in a first time period is lower than a first precision threshold and the remaining transmission bandwidth between said first inference means and said second inference means is lower than a preset threshold; and/or updating the second inference model when the average inference precision of the second inference model in the first time period is lower than a second precision threshold.

9. The inference system according to claim 7 or 8, characterized by said updating means for obtaining incremental training samples; and performing incremental training on the first inference model and/or the second inference model by using the incremental training samples.

10. The inference system according to any of claims 7 to 9, characterized by updating means for determining the resource amount of available resources of the first inference means during a second time period; and updating the specification of the first inference model according to the resource quantity of the available resources of the first inference device in the second time period.

11. An inference method, applied to an updating apparatus in an inference system, the inference system further including a first inference apparatus, a second inference apparatus and a decision-making apparatus, the method comprising:

the updating device acquires resource information and/or reasoning results of the reasoning system, wherein the reasoning results comprise the result of reasoning the first reasoning device on the input sample by using a first reasoning model, and the input sample is transmitted to the second reasoning device when the result of reasoning the input sample by the first reasoning model meets the transmission condition in the decision device;

the updating device determines that the reasoning system meets a first updating triggering condition according to the resource information of the reasoning system and/or the reasoning result of the reasoning system;

the updating means updates the transmission condition.

12. The method of claim 11, wherein the transmission condition comprises that the confidence of the inference result is below a confidence threshold, and wherein the updating means updates the transmission condition comprising:

the updating means updates the confidence threshold.

13. The method of claim 12, wherein the transmission conditions further include a ratio of input samples transmitted to the second inference engine relative to a total input samples received by the first inference engine that does not exceed a transmission ratio ceiling.

14. The method according to claim 12 or 13, wherein the first update triggering condition includes at least one of an average inference precision of the first inference model being below a first precision threshold over a first time period, an increase in transmission bandwidth between the first inference apparatus and the second inference apparatus;

the updating means updates the transmission condition, including:

the updating means increases the confidence threshold.

15. The method of claim 14, wherein the updating means updates the transmission condition, comprising:

and when the average residual transmission bandwidth between the first inference device and the second inference device in the first time period is higher than a preset threshold value, increasing the confidence threshold value.

16. The method according to claim 12 or 13, wherein the first update trigger condition comprises at least one of a reduction in transmission bandwidth between the first inference device and the second inference device, a ratio of input samples transmitted to the second inference device relative to a total input samples received by the first inference device exceeding a transmission ratio ceiling;

the updating means updates the transmission condition, including:

the updating means reduces the confidence threshold.

17. The method according to any one of claims 11 to 16, further comprising:

the updating means updates the first inference model and/or the second inference model when a second update triggering condition is satisfied.

18. The method according to claim 17, wherein said updating means updates said first inference model and/or said second inference model when a second update triggering condition is met, comprising:

when the average inference precision of the first inference model in a first time period is lower than a first precision threshold value and the residual transmission bandwidth between the first inference device and the second inference device is lower than a preset threshold value, the updating device updates the first inference model; and/or when the average inference precision of the second inference model in the first time period is lower than a second precision threshold value, the updating device updates the second inference model.

19. Method according to claim 17 or 18, wherein said updating means updates said first inference model and/or said second inference model, comprising:

the updating device acquires an incremental training sample;

and the updating device utilizes the incremental training samples to carry out incremental training on the first inference model and/or the second inference model.

20. The method according to any of claims 17 to 19, wherein said updating means updates said first inference model, comprising:

the updating device determines the resource amount of the available resources of the first reasoning device in a second time period;

the updating means updates the specification of the first inference model according to the resource amount of the available resource of the first inference means in the second time period.

21. An updating apparatus, applied to an inference system, the inference system further including a first inference apparatus, a second inference apparatus and a decision-making apparatus, the updating apparatus comprising:

the acquisition module is used for acquiring resource information and/or reasoning results of the reasoning system, wherein the reasoning results comprise the result of reasoning the input sample by the first reasoning device by using a first reasoning model, and when the result of reasoning the input sample by the first reasoning model meets the transmission condition in the decision device, the input sample is transmitted to the second reasoning device;

the monitoring module is used for determining that the reasoning system meets a first updating triggering condition according to the resource information of the reasoning system and/or the reasoning result of the reasoning system;

and the updating module is used for updating the transmission condition.

22. The updating apparatus according to claim 21, wherein the transmission condition comprises that the confidence of the inference result is below a confidence threshold, and the updating module is specifically configured to update the confidence threshold.

23. The updating apparatus of claim 22 wherein the transmission conditions further include a ratio of input samples transmitted to the second inference engine relative to a total input samples received by the first inference engine, not exceeding a transmission ratio ceiling.

24. The updating apparatus according to claim 22 or 23, wherein the first updating trigger condition comprises at least one of an average inference precision of the first inference model being lower than a first precision threshold value in a first time period, and an increase of a transmission bandwidth between the first inference apparatus and the second inference apparatus;

the update module is specifically configured to increase the confidence threshold.

25. The updating apparatus according to claim 24, wherein the updating module is configured to increase the confidence threshold when an average remaining transmission bandwidth between the first inference apparatus and the second inference apparatus in the first time period is higher than a preset threshold.

26. The updating apparatus according to claim 22 or 23, wherein the first updating trigger condition comprises at least one of a reduction in transmission bandwidth between the first inference apparatus and the second inference apparatus, a ratio of input samples transmitted to the second inference apparatus relative to total input samples received by the first inference apparatus exceeding a transmission ratio upper limit;

the update module is specifically configured to reduce the confidence threshold.

27. The updating apparatus according to any of claims 21 to 26, wherein the updating module is further configured to update the first inference model and/or the second inference model when a second update triggering condition is satisfied.

28. The updating apparatus according to claim 27, wherein the updating module is configured to update the first inference model when the average inference precision of the first inference model in a first time period is lower than a first precision threshold and the remaining transmission bandwidth between the first inference apparatus and the second inference apparatus is lower than a preset threshold; and/or updating the second inference model when the average inference precision of the second inference model in the first time period is lower than a second precision threshold value.

29. The updating apparatus according to claim 27 or 28, wherein the updating module is configured to obtain incremental training samples; and carrying out incremental training on the first inference model and/or the second inference model by using the incremental training sample.

30. The updating apparatus according to any one of claims 27 to 29, wherein the updating module is configured to determine a resource amount of the available resource of the first inference apparatus in a second time period; and updating the specification of the first inference model according to the resource quantity of the available resources of the first inference device in the second time period.

31. A computer device, wherein the computer device comprises a processor and a memory;

the processor is to execute instructions stored in the memory to cause the computer device to perform the method of any of claims 11 to 20.

32. A computer-readable storage medium having stored therein instructions that, when executed on a computing device, cause the computing device to perform the method of any of claims 11 to 20.

33. A computer program product comprising instructions which, when run on a computing device, cause the computing device to perform the method of any of claims 11 to 20.