CN117453377A

CN117453377A - Model scheduling method, terminal equipment and server

Info

Publication number: CN117453377A
Application number: CN202311768543.3A
Authority: CN
Inventors: 徐士立
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-12-21
Filing date: 2023-12-21
Publication date: 2024-01-26
Anticipated expiration: 2043-12-21
Also published as: CN117453377B

Abstract

The embodiment of the application provides a model scheduling method, terminal equipment and a server, wherein the method relates to the technical field of model scheduling, and is suitable for the terminal equipment and comprises the following steps: receiving an operation request sent by a target object, and responding to the operation request to send a parameter request to a server; receiving a parameter response message sent by the server, wherein the parameter response message comprises the model parameter; determining whether the terminal device meets the running condition of the target model running on the terminal device based on the model parameters; if the operation condition is met, acquiring the target model and operating the target model on the terminal equipment; if the operation condition is not met, a first scheduling request is sent to the server, and an operation result sent by the server aiming at the target model is received; the first scheduling request is used for requesting the server to schedule the computing node of the cloud to run the target model. The cloud computing resource saving method can save cloud computing resources on the basis of guaranteeing user experience.

Description

Model scheduling method, terminal equipment and server

Technical Field

The present disclosure relates to the field of model scheduling technologies, and more particularly, to a model scheduling method, a terminal device, and a server.

Background

Under the general condition, the large model generally operates in the cloud, and a large amount of computing resources are consumed when the cloud performs the operation of the large model, so that the cost is high, and the operation cannot be completed on time due to insufficient resources when the traffic is large, so that the use experience of a user is affected.

Therefore, there is a need in the art for a model scheduling method to save cloud computing resources on the basis of guaranteeing user experience.

Disclosure of Invention

The embodiment of the application provides a model scheduling method, terminal equipment and a server, which can save cloud computing resources on the basis of guaranteeing user experience.

In a first aspect, an embodiment of the present application provides a method for scheduling a model, where the method is applicable to a terminal device, and the method includes:

receiving an operation request sent by a target object, and responding to the operation request to send a parameter request to a server; the running request is used for requesting to run a target model, and the parameter request is used for requesting model parameters of the target model;

receiving a parameter response message sent by the server, wherein the parameter response message comprises the model parameter;

determining whether the terminal device meets the running condition of the target model running on the terminal device based on the model parameters;

If the operation condition is met, acquiring the target model and operating the target model on the terminal equipment;

if the operation condition is not met, a first scheduling request is sent to the server, and an operation result sent by the server aiming at the target model is received; the first scheduling request is used for requesting the server to schedule the computing node of the cloud to run the target model.

In a second aspect, an embodiment of the present application provides a model scheduling method, where the method is applicable to a server, and the method includes:

receiving a parameter request sent by a terminal device, and responding to the parameter request to send a parameter response message to the terminal device, wherein the parameter request is used for requesting a model parameter of a target model, and the parameter response message comprises the model parameter; the model parameters are used for the terminal equipment to determine whether the terminal equipment meets the operation conditions of the target model operated on the terminal equipment;

receiving a first scheduling request sent by the terminal equipment in response to the terminal equipment not meeting the operation condition;

determining a first computing node for running the target model in the computing nodes of the cloud in response to the first scheduling request;

the method comprises the steps of sending the target model and input parameters of the target model to the first computing node, receiving an operation result of the first computing node aiming at the target model, and forwarding the operation result to the terminal equipment.

In a third aspect, an embodiment of the present application provides a terminal device, including:

the communication unit is used for receiving the operation request sent by the target object and responding to the operation request to send a parameter request to the server; the running request is used for requesting to run a target model, and the parameter request is used for requesting model parameters of the target model;

the communication unit is further configured to receive a parameter response message sent by the server, where the parameter response message includes the model parameter;

the processing unit is used for determining whether the terminal equipment meets the running condition of the target model running on the terminal equipment or not based on the model parameters;

if the operation condition is met, the processing unit is further used for acquiring the target model and operating the target model on the terminal equipment;

if the operation condition is not met, the communication unit is further used for sending a first scheduling request to the server and receiving an operation result sent by the server aiming at the target model; the first scheduling request is used for requesting the server to schedule the computing node of the cloud to run the target model.

In a fourth aspect, embodiments of the present application provide a server, including:

the communication unit is used for receiving a parameter request sent by the terminal equipment and responding to the parameter request to send a parameter response message to the terminal equipment, wherein the parameter request is used for requesting a model parameter of a target model, and the parameter response message comprises the model parameter; the model parameters are used for the terminal equipment to determine whether the terminal equipment meets the operation conditions of the target model operated on the terminal equipment;

The communication unit is further configured to receive a first scheduling request sent by the terminal device in response to the terminal device not meeting the operation condition;

the processing unit is used for responding to the first scheduling request and determining a first computing node for running the target model from the computing nodes in the cloud;

the communication unit is also for: the method comprises the steps of sending the target model and input parameters of the target model to the first computing node, receiving an operation result of the first computing node aiming at the target model, and forwarding the operation result to the terminal equipment.

In a fifth aspect, embodiments of the present application provide an electronic device, including:

a processor adapted to implement computer instructions; the method comprises the steps of,

a computer readable storage medium storing computer instructions adapted to be loaded by a processor and to perform the method provided in the first or second aspect referred to above.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when read and executed by a processor of a computer device, cause the computer device to perform the method provided by the first or second aspects referred to above.

In a seventh aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, the processor executes the computer instructions, causing the computer device to perform the method provided in the first or second aspect referred to above.

For the model scheduling method provided in the first aspect, the method includes: receiving an operation request sent by a target object, and responding to the operation request to send a parameter request to a server; the running request is used for requesting to run a target model, and the parameter request is used for requesting model parameters of the target model; receiving a parameter response message sent by the server, wherein the parameter response message comprises the model parameter; determining whether the terminal device meets the running condition of the target model running on the terminal device based on the model parameters; if the operation condition is met, acquiring the target model and operating the target model on the terminal equipment; if the operation condition is not met, a first scheduling request is sent to the server, and an operation result sent by the server aiming at the target model is received; the first scheduling request is used for requesting the server to schedule the computing node of the cloud to run the target model. When the target model needs to be operated, firstly requesting model parameters from a server, determining whether the terminal equipment meets the operation condition of the target model operated on the terminal equipment or not based on the model parameters, if the terminal equipment meets the permission condition, preferentially operating the target model on the terminal equipment, namely acquiring the target model by the terminal equipment and operating the target model on the terminal equipment, and if the terminal equipment does not meet the operation condition, sending a first scheduling request to the server, and receiving an operation result sent by the server aiming at the target model; therefore, the computing resources of the terminal equipment can be fully utilized, the computing resources of the cloud end are further saved, the overall operation cost is reduced, and the normal operation of the target model can be ensured through the computing nodes of the cloud end when the terminal equipment does not meet the operation conditions, so that the user experience is ensured not to be influenced. Therefore, the cloud computing resource can be saved on the basis of guaranteeing user experience.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an example of an application scenario provided in an embodiment of the present application.

Fig. 2 is a schematic flowchart of a model scheduling method provided in an embodiment of the present application.

Fig. 3 is another schematic flow chart of a model scheduling method provided in an embodiment of the present application.

Fig. 4 is another schematic flow chart of a model scheduling method provided in an embodiment of the present application.

Fig. 5 is a schematic flowchart of a method for detecting a state of a terminal device according to an embodiment of the present application.

Fig. 6 is a schematic block diagram of a terminal device provided in an embodiment of the present application.

Fig. 7 is a schematic block diagram of a server provided in an embodiment of the present application.

Fig. 8 is a schematic flowchart of an electronic device provided in an embodiment of the present application.

Detailed Description

Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.

In order to facilitate understanding of the technical solutions provided in the present application, the following description of related terms is provided.

Computing resources: hardware and software resources capable of providing computing power include hardware resources such as processors, memories, storage devices, and software resources such as operating systems, compilers, and the like.

Computing power: refers to the ability of a computer system or application to perform a computing task, and may be measured in instructions or floating point operations that can be performed per second.

Calculated amount: refers to the number of computations that a computer system or application requires to perform a computing task.

It is noted that the terminology used in the description section of the present application is used for the purpose of explaining the examples of the present application only and is not intended to limit the present application.

For example, the term "and/or" herein is merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. The term "at least one item" is merely a combinatorial relationship describing enumerated objects, meaning that one or more items may be present, such as at least one of the following: A. b, C, the following combinations can be represented: a alone, B alone, C alone, a and B together, a and C together, B and C together, A, B together, and C together. The term "plurality" refers to two or more. The character "/", generally indicates that the front and rear associated objects are an "or" relationship.

For another example, the term "corresponding" may indicate that there is a direct correspondence or an indirect correspondence between the two, may indicate that there is an association between the two, and may indicate a relationship with an indicated, configured, or the like. The term "indication" may be a direct indication, an indirect indication, or an indication having an association relationship. For example, a indicates B, which may mean that a indicates B directly, e.g., B may be obtained by a; it may also indicate that a indicates B indirectly, e.g. a indicates C, B may be obtained by C; it may also be indicated that there is an association between a and B. The terms "predefined" or "preconfigured" may be used to pre-store corresponding codes, tables, or other relevant information that may be used for indication in the device, and may also refer to agreement by the protocol. "protocol" may refer to a standard protocol in the art. The term "at … …" may be interpreted as a description of "if" or "when … …" or "responsive" or the like. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (stated condition or event)" or "in response to a detection (stated condition or event)" or the like, depending on the context. The terms "first," "second," "third," "fourth," "a," "B," and the like are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprising," "including," and "having," and any variations thereof, are intended to cover a non (or non) exclusive inclusion. Among them, the digital video compression technology mainly compresses huge digital image video data, so as to facilitate transmission and storage.

The solution provided in the present application may relate to the field of artificial intelligence (Artificial Intelligence, AI) technology.

The AI is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

It should be appreciated that artificial intelligence techniques are a comprehensive discipline involving a wide range of fields, both hardware-level and software-level techniques. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

In particular, embodiments of the present application may relate to Machine Learning (ML) in artificial intelligence technology, where ML is a multi-domain interdisciplinary discipline involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like.

More particularly, the solution provided herein may relate to the technical field of scheduling ML-based models (e.g., large models).

In addition, the solution provided in the present application may relate to cloud technology.

Specifically, embodiments of the present application may relate to the field of cloud computing (cloud computing) technology in cloud technology. With the development of the internet, real-time data flow and diversification of connected devices, and the promotion of demands of search services, social networks, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Unlike the previous parallel distributed computing, the generation of cloud computing will promote the revolutionary transformation of the whole internet mode and enterprise management mode in concept. Cloud technology refers to the delivery and usage modes of IT infrastructure, meaning that required resources are obtained in an on-demand and easily-extensible manner through a network; broad cloud computing may refer to modes of delivery and use of services, meaning that desired services are obtained in an on-demand, easily scalable manner over a network. Such services may be IT, software, internet related, or other services. Cloud Computing is a product of fusion of traditional computer and network technology developments such as Grid Computing (Grid Computing), distributed Computing (Distributed Computing), parallel Computing (Parallel Computing), utility Computing (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load balancing (Load balancing), and the like.

More specifically, the solution provided in the present application may relate to the technical field of running models by means of cloud computing.

Fig. 1 is an example of an application scenario 100 provided in an embodiment of the present application.

As shown in fig. 1, the application scenario 100 may include: application 110, server 120, terminal device 130, and computing node 140. Terminal device 130 may communicate with server 120 through application 110. For example, the application 110 and the server 120 may communicate over a network. The server 120 may be a background server for the application 110. The computing node 140 may be a computing node on the cloud end.

The application 110 may be an application installed on the terminal device 130, and may be any application that needs to be run based on a model, for example, including but not limited to: any one of an online video program, a short video program, a picture sharing program, a sound social program, a cartoon program, a wallpaper program, a news pushing program, a supply and demand information pushing program, an academic communication program, a technical communication program, a policy communication program, a program containing a comment mechanism, a program containing a view publishing mechanism, and a knowledge sharing program. The server 120 includes at least one of a server, a plurality of servers, a cloud server, a cloud computing platform, and a virtualization center, and the service 120 is configured to provide background services for the application 110. The terminal 130 may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an electronic book reader, and a laptop portable computer. The computing nodes 140 may be cloud computing-capable devices such as cloud servers or cloud computing platforms.

In particular, the application scenario may be applied to a model (e.g., a computational task of a model) scheduling method. When the application 100 is running and the application 110 has a demand for model operation, the application 110 first determines whether the computing resources of the terminal device 130 can meet the demand, and if so, obtains the latest model from the server 120 and collects the original data required for the model. Then, the terminal device 130 starts the operation of the model, detects the operation state in real time in the process, and when an abnormal situation occurs, the application program 110 determines whether the operation needs to be performed by the computing node 140 which is rescheduled to the cloud, if the terminal device 130 needs to save the current operation breakpoint, the current operation breakpoint is sent to the server 120 through the application program 110. After receiving the request for remodel scheduling, the server 120 confirms whether there are any idle computing resource nodes, selects an appropriate node (i.e., computing node 140), and sends the model and breakpoint information. After receiving the scheduling request of the server 120, the computing node 140 loads the corresponding model and breakpoint information (if any) to start operation, and returns the operation result to the server 120. The server 120 resends the results to the application 110 and eventually presents the results to the user. The user does not have to pay attention to whether the model is running on the terminal device 130 or the computing node 140 in the cloud, the application scenario 100 automatically completes the scheduling and gives the user a final effect in the expected timeframe.

The mode scheduling method will be described based on the respective modules in the application program 110, the server 120, the terminal device 130, and the computing node 140.

Specifically, the application 110 may include the following modules:

the system state detection module: before and during the operation of the model, the system state detection module may be configured to obtain the state of the operation module in the terminal device 130 in real time, and use the main logic module and the model scheduling module in the application program 110.

Model management module: when model operation is needed, the model management module acquires the latest model; in addition, the model management module is further configured to cache all the historically used models, check if the server 120 has an update when it is needed, directly use the model if there is no update, and request the latest model from the server 120 if there is an update.

The model scheduling module is used for confirming calculation resources and ageing requirements required by the model when the model operation requirement exists, and preparing relevant models and data to start operation in the terminal equipment 130 when confirming whether the terminal equipment 130 can be met or not; otherwise, sending a scheduling request for requesting execution of the computing node at the cloud to the server 120; in the process of the model running on the terminal device 130, the model scheduling module is further configured to receive a request for rescheduling by the main logic module in the application program 110, send the execution breakpoint information of the terminal device 130 to the server 120, and be rescheduled by the server 120 in the cloud.

The main logic module is used for controlling main logic of the running model on the terminal equipment 130, including communication with the server 120, operation and scheduling of the model on the terminal equipment 130 through detection and management, and the like.

The terminal device 130 may include the following modules:

the state management module is configured to obtain a current running state of the terminal device 130 and return the current running state to the application 110; when the operation of the terminal device 130 is abnormal, the notification operation module terminates the operation and notifies the application program 110 through the interaction module.

The interaction module is used for receiving a request of the application program 110, keeping long links of the application program 110, receiving a model scheduling, state detection request and termination operation request of the application program 110; the application 110 is notified by callback when an exception occurs in the operation module or terminal device 130.

The operation module can be used for carrying out model operation according to the model and related original data acquired by the interaction module and outputting a calculation result; in addition, when the state management module is abnormal, breakpoint information of the current operation is saved according to an instruction of the state management module, and the operation is terminated.

The server 120 may include the following modules:

the model management module is used for managing the models and returning the latest models according to the request of the application program 110.

The model scheduling module selects an appropriate computing node (i.e. the computing node 140) to execute the model operation according to the request of the application program 110 or continuously completes the model operation according to the breakpoint information uploaded by the application program 110.

The communication module can be used for communicating with the application program 110, receiving a model scheduling (including breakpoint rescheduling) request and a model acquisition request, and calling the corresponding module to reply to the application program 110 after finishing processing; in addition, it is also used to communicate with the computing node 140, obtain the state of the computing node 140, schedule the related model computation to the computing node 140 according to the request of the scheduling module, and obtain the computation result and return to the application program 110.

The computing node 140 may include the following modules:

a logic module for detecting the status of the computing node 140 (mainly real-time available computing resources) and reporting to the server 120; in addition, it may be used to receive queries and model operation scheduling requests from the server 120.

The operation module can be used for executing corresponding model operation according to the instruction of the logic module and the issued data.

It should be understood that fig. 1 is only an example of the present application and should not be construed as limiting the present application. Since the application 110 may be an application installed on the terminal device 130, in other alternative embodiments, the terminal device 130 may also include the application 110, or the application 110 may be a module in the terminal device 130.

Fig. 2 shows a schematic flow diagram of a model scheduling method 200 according to an embodiment of the present application, which method 200 may be performed by any electronic device having data processing capabilities. For example, the electronic device may be implemented as a terminal device. For example, the electronic device may be implemented as the terminal device 130 shown in fig. 1.

As shown in fig. 2, the method 200 may include:

s210, the terminal equipment receives an operation request sent by a target object, and sends a parameter request to a server in response to the operation request; the run request is for requesting to run a target model, and the parameter request is for requesting model parameters of the target model.

The execution request may be, for example, a request sent by the target application, and the terminal device may generate the execution request in response to an execution operation performed by the target object on the target application and send the parameter request to the server. The operation may be a click operation or an input operation of a button. The execution request may include an identification of the target model, such as a name of the target model, etc. Accordingly, the parameter request may also include an identification of the target model. The model parameters may be parameters related to the operation of the target model.

Exemplary description of the model parameters is described below in connection with table 1.

TABLE 1

As shown in table 1, after receiving the parameter request, the server obtains corresponding parameters from a database or other data sources storing model parameters according to the information in the request, and returns the parameters to the terminal device. The model parameters may include one or more of the field parameters in table 1. Where "string" and "int" are two different data types, representing a string and an integer, respectively. "string" is a data type used to represent text and may contain letters, numbers, symbols, and other characters. The character string may be static or dynamic and may contain a plurality of characters or text paragraphs. In programming, character strings are typically used to store and process text data. "int" is a data type used to represent an integer, and can only contain numbers, not decimal or decimal points. The integer may be positive, negative, or zero. In programming, integers are commonly used for counting, comparing, and mathematical operations.

S220, the terminal equipment receives a parameter response message sent by the server, wherein the parameter response message comprises the model parameters.

In other words, the server receives a parameter request to be sent by the terminal device, and sends a parameter response message to the terminal device in response to the parameter request. The parameter response message includes model parameters of the target model. The model parameters included in the parameter response message may be predefined, or may be parameters based on an indication in the parameter request, which is not specifically limited in this application.

S230, the terminal equipment determines whether the terminal equipment meets the operation condition of the target model operated on the terminal equipment or not based on the model parameters.

The operating condition may be, for example, a condition related to the model parameter. For example, when the model parameter includes an operation duration limit threshold, the operation condition may be that a total duration required for the target model to operate on the terminal device is less than or equal to the operation duration limit threshold.

For example, after the terminal device receives the parameter response message, it is determined whether the terminal device meets the operating condition of the target model running on the terminal device based on the model parameters in the parameter response message. The parameter response message may include the operating condition, i.e. the parameter response message may include the model parameters as well as the operating condition. Alternatively, the parameter response message may only include the model parameter, in which case the operating condition may be a predefined condition, for example, an operating condition stored in the terminal device in relation to the model parameter.

S240, if the operation condition is met, the terminal equipment acquires the target model and operates the target model on the terminal equipment.

For example, if the operation condition is satisfied, the terminal device acquires the target model and operates the target model on the terminal device and acquires an operation result of the target model. This process may include the steps of loading the target model, collecting and setting input parameters for the target model, starting up the model parameters, etc., the specifics of which depend on the specific implementation and requirements of the target model.

S250, if the operation condition is not met, the terminal equipment sends a first scheduling request to the server and receives an operation result sent by the server aiming at the target model; the first scheduling request is used for requesting the server to schedule the computing node of the cloud to run the target model.

For example, if the terminal device does not meet the operation condition, the terminal device is triggered to send a first scheduling request to the server, and the server is requested to schedule the computing node of the cloud to operate the target model. The first scheduling request may include an identification of the model, such as a name of the target model. Of course, the first scheduling request may further include an input parameter of the target model, so that after the server sends the input parameter to a computing node in the cloud, the computing node runs the target model based on the input parameter.

In this embodiment, an operation request sent by a target object is received, and a parameter request is sent to a server in response to the operation request; the running request is used for requesting to run a target model, and the parameter request is used for requesting model parameters of the target model; receiving a parameter response message sent by the server, wherein the parameter response message comprises the model parameter; determining whether the terminal device meets the running condition of the target model running on the terminal device based on the model parameters; if the operation condition is met, acquiring the target model and operating the target model on the terminal equipment; if the operation condition is not met, a first scheduling request is sent to the server, and an operation result sent by the server aiming at the target model is received; the first scheduling request is used for requesting the server to schedule the computing node of the cloud to run the target model. When the target model needs to be operated, firstly requesting model parameters from a server, determining whether the terminal equipment meets the operation condition of the target model operated on the terminal equipment or not based on the model parameters, if the terminal equipment meets the permission condition, preferentially operating the target model on the terminal equipment, namely acquiring the target model by the terminal equipment and operating the target model on the terminal equipment, and if the terminal equipment does not meet the operation condition, sending a first scheduling request to the server, and receiving an operation result sent by the server aiming at the target model; therefore, the computing resources of the terminal equipment can be fully utilized, the computing resources of the cloud end are further saved, the overall operation cost is reduced, and the normal operation of the target model can be ensured through the computing nodes of the cloud end when the terminal equipment does not meet the operation conditions, so that the user experience is ensured not to be influenced. Therefore, the cloud computing resource can be saved on the basis of guaranteeing user experience.

In some embodiments, the model parameters include a run-time limit threshold for the target model; wherein, the S230 may include:

the terminal equipment determines a first time length required by the target model when the target model runs on the terminal equipment; if the first duration is less than or equal to the operation duration limiting threshold, determining that the operation condition is met; and if the first time length is greater than the operation time length limiting threshold value, determining that the operation condition is not met.

The terminal device first determines the first duration, where the first duration may be a time (first duration) required for the terminal device to operate the target model on the terminal device based on information such as characteristics and input parameters of the target model, that is, the first duration may be affected by a plurality of factors such as a remaining first computing capability of the terminal device, complexity of the target model, a first computing amount required for the target model to operate, input parameters of the target model (that is, data processed by the target model), and the like. The terminal device then compares the estimated first time period with the run time period limit threshold. If the first duration is less than or equal to the run duration limit threshold, it is indicated that the terminal device is capable of completing the running of the target model in a given time, and therefore the terminal device will determine that the running condition is satisfied and may run the target model on the terminal device. If the first time period is greater than the operation time period limiting threshold value, the terminal device may not complete the operation of the target model in a given time period, so the terminal device determines that the operation condition is not met and does not operate the target model on the terminal device.

In this embodiment, the terminal device determines a first time period required for the target model to run on the terminal device; if the first duration is less than or equal to the operation duration limiting threshold, determining that the operation condition is met; and if the first time length is greater than the operation time length limiting threshold value, determining that the operation condition is not met. The introduction of the first time length can help ensure that the operation of the target model does not exceed the required time length, so that the situation that the operation result is not obtained timely enough due to overlong operation time length of the target model is avoided, namely, timeliness of the operation result of the target model can be ensured, and further user experience can be improved.

Of course, in other alternative embodiments, whether the operating condition is met may be determined in other ways, as well, in other alternative embodiments. For example, the terminal device determines a first calculation amount required by the target model when running on the terminal device; if the first calculated amount is smaller than or equal to the operation preset threshold value, determining to operate the target model on the terminal equipment; if the first time period is greater than the operation threshold value, the operation condition is determined not to be satisfied. The present application is not particularly limited thereto.

In some embodiments, the model parameters further include device types supported by the target model; wherein, the determining, by the terminal device, the first time length required by the target model when running on the terminal device may be implemented as:

in case the device types supported by the object model include the device type of the terminal device, the terminal device determines the first duration.

For example, when the model parameters further include the device types supported by the target model, the terminal device may first check whether the device types supported by the target model include the device types of the terminal device when determining the first time period. Only if the device type supported by the target model includes the device type of the terminal device, the terminal device determines the first duration, and further determines whether the terminal device meets an operation condition of the target model operated on the terminal device based on the first duration, otherwise, it may be directly determined that the terminal device does not meet the operation condition. For example, the device type supported by the object model may be a device type list, the terminal device may check whether the device type supported by the object model is in the device type list, and if so, the terminal device may determine the first duration, and further determine whether the terminal device meets an operation condition of the object model running on the terminal device based on the first duration, otherwise, it may directly determine that the terminal device does not meet the operation condition.

In this embodiment, only if the device type supported by the target model includes the device type of the terminal device, the terminal device determines the first duration, and further determines whether the terminal device meets an operation condition of the target model for operating on the terminal device based on the first duration, so that the target model can be ensured to operate only on the device supporting the operation of the target model, the problems of low operation efficiency or operation error and the like of the target model caused by incompatibility of the terminal device and the target model are avoided, and the operation efficiency and stability of the target model are improved.

Of course, in other alternative embodiments, the model parameters also include the operating system types supported by the target model; wherein, the determining, by the terminal device, the first time length required by the target model when running on the terminal device may be implemented as: in the case that the operating system type supported by the object model includes the operating system type of the terminal device, the terminal device determines the first duration. The present application is not particularly limited thereto.

In some embodiments, the determining, by the terminal device, the first time period required for the target model to run on the terminal device may be implemented as:

The terminal equipment acquires the residual first computing capacity of the terminal equipment; acquiring a first calculated amount required by the operation of the target model; the first duration is calculated based on the first computing power and the first computing power.

The first computing capability may reflect performance states of the terminal device, such as remaining battery power, processor occupancy, etc., for example. The first calculation amount may be estimated based on the complexity of the model and the magnitude of the input parameters of the target model.

The first computing power may be used to characterize a number of operations performed per second by the terminal device, and the first duration may be calculated based on the first computing power and the first computing power after the terminal device obtains the first computing power and the first computing power. For example, the first amount of computation may be divided by the first computing power, or other algorithms may be used to calculate the first duration.

In this embodiment, the first duration is calculated based on the first computing capability and the first computing amount, so that accuracy of the first duration can be ensured, and further, accuracy of a decision of whether to operate the target model on the terminal device can be improved, which is helpful for simultaneously considering a purpose of saving cloud computing resources and a purpose of ensuring operation of the target model, and further, user experience can be ensured.

In some embodiments, the model parameters further comprise the first calculation amount or a calculation mode of the first calculation amount; when the calculation mode of the first calculation amount is the first calculation mode, the terminal device obtains the first calculation amount required by the operation of the target model, which can be implemented as follows:

determining input parameters of the target model; multiplying the byte length of the input parameter by the operation times required by the byte to obtain a first numerical value; and adding the first numerical value and a preset second numerical value to obtain the first calculated amount.

The terminal device may parse the configuration file of the target model, and collect the input parameters based on the parsing result. The input parameters may be data that the target model needs to calculate, such as images, text, sound, etc. The terminal device then calculates the byte length of the input parameter and the number of operations required for the byte. This calculation process can be understood as the calculation power required for each byte multiplied by the byte length (i.e. number of bytes) of the input parameter, resulting in a first value. And finally, the terminal equipment adds the first numerical value and the preset second numerical value to obtain a first calculated amount. The second value may be fixed or may be set according to a model characteristic or a device characteristic. The unit of the second value may be the number of operations required to perform one operation.

In this embodiment, by determining the input parameters, calculating the byte length and the number of operations required by the bytes, the terminal device may obtain a relatively accurate calculation amount estimated value, that is, may ensure the accuracy of the first calculation amount, which may improve the accuracy of the first duration, and further may improve the accuracy of the decision of whether to operate the target model on the terminal device, which may be helpful for simultaneously achieving the purpose of saving cloud computing resources and ensuring the operation of the target model, and further may ensure user experience.

Illustratively, the first calculation amount may be determined by the following formula:

N=L*a+b。

where N represents the first calculation amount, L represents the byte length of the input parameter, a represents the number of calculations required for the byte, b represents a fixed calculation amount independent of L, and the number of unit operations.

In other words, the terminal device may determine the first calculation amount based on the above formula, and then calculate the first duration consumed by the target model to run on the terminal device based on the first calculation capacity remaining in the terminal device and the first calculation amount.

In some embodiments, the model parameters further include a type of the first computation amount; when the type of the first calculated amount is a fixed calculated amount, the model parameter comprises the first calculated amount; alternatively, when the type of the first calculation amount is a non-fixed calculation amount, the model parameter includes a calculation mode of the first calculation amount.

The model parameters illustratively include type information of the first calculation amount. This type of information may be used to indicate whether the first amount of computation is fixed or whether it is dynamically computed. When the type of the first calculation amount is a fixed calculation amount, the calculation amount is directly contained in the model parameter. This means that the amount of computation is a fixed value and does not change due to changes in the operating environment or input parameters. In this case, the terminal device directly uses the fixed value provided in the model parameter when acquiring the first calculation amount. When the type of the first calculation amount is a non-fixed calculation amount, the calculation mode of the calculation amount is contained in the model parameters. This calculation mode may be an algorithm, a formula or a model for dynamically calculating the first calculation amount based on the operating environment or input parameters. In this case, when the terminal device acquires the first calculation amount, it is necessary to perform calculation according to the calculation mode among the model parameters.

In this embodiment, the model parameter further includes the type of the first calculation amount, so that the calculation of the first calculation amount can have greater flexibility, that is, for a fixed calculation amount, the model parameter can directly include the required calculation amount, so that the preparation process of the target model during operation can be simplified. For non-fixed calculation amount, the calculation mode provided in the model parameters can ensure that the first calculation amount can be dynamically adjusted according to actual conditions so as to adapt to different operation environments and input parameters. The method is favorable for better evaluating the first time length required by the operation of the target model, further, the accuracy of the decision of whether the target model is operated on the terminal equipment can be improved, the purpose of saving cloud computing resources and the purpose of guaranteeing the operation of the target model are simultaneously considered, and further, the user experience can be guaranteed.

In some embodiments, the method 200 may further include, while the terminal device is running the object model:

acquiring the running state of the terminal equipment; if the running state changes, determining whether to send a second scheduling request to the server, wherein the scheduling request is used for requesting the server to schedule the computing node of the cloud to continue running the target model from the breakpoint of the target model, the second scheduling request comprises data for representing the running breakpoint of the target model, and the running state changes comprise that the running state is abnormal.

For example, the terminal device may acquire its own operating state in real time or periodically. This operational status information may include information about the resource usage of the device (e.g., processor occupancy, memory usage, etc.), temperature, computing power, performance metrics, whether there is a task conflict, whether a system exception is occurring, the operational status of the object model (e.g., whether it is running, to which step it is running, etc.), network connection status, etc. If the operating state changes, in particular if an abnormality occurs, the terminal device detects this change. The anomaly may be insufficient resources, excessive temperature, reduced computing power, reduced performance index, task conflict, system anomaly, operation error of the target model, network disconnection, etc. Once the running state is detected to change, the terminal equipment judges whether a second scheduling request needs to be sent to the server. The terminal device constructs and sends this request if necessary. Second scheduling request the second scheduling request includes data characterizing an operational breakpoint of the target model so that the server can schedule the computing node of the cloud to continue to operate the target model from the breakpoint of the target model. The second scheduling request may further include exception information or the like so as to record the exception information to a log or the like for processing.

In addition, since the running state of the terminal device changes (especially, the state is abnormal) possibly resulting in that the terminal device cannot complete the remaining task amount in the remaining time, the computing node which needs to be rescheduled to the cloud end continues to run the target model at the breakpoint of the target model. In order to increase the operation speed, the operation breakpoint can be saved by serializing the target model in the calculation process (for example, the operation state of the target model is serialized), then the deserialization is performed on the serialized data at the calculation node of the cloud, so that the operation breakpoint of the target model can be determined, further, the execution can be directly continued from the breakpoint, the completed part of the terminal equipment does not need to be repeatedly calculated again, the whole time consumption can be reduced, and the user experience is better. The methods of serialization and deserialization may be related to the target model, or any method capable of implementing serialization and deserialization, which will not be described herein.

In this embodiment, when the target model runs on the terminal device, the running state of the terminal device is detected in the running process of the target model. If the running state changes, the running breakpoint can be saved, and the second scheduling request is sent to the server, so that the server schedules the target model which is not finished to the cloud computing node for continuous execution, and returns the target model to the terminal equipment when the operation is finished, and the running reliability of the target model is ensured. Specifically, by monitoring the running state of the terminal equipment in real time, abnormal conditions can be found and processed in time, and interruption or error of model running is avoided. Meanwhile, when the second scheduling request is needed, additional computing resources can be requested or model operation adjustment can be performed to meet actual requirements and optimize performance, so that the efficiency and stability of model operation can be improved, and better user experience can be provided.

In some embodiments, if the running state changes, determining whether to send the second scheduling request to the server may be implemented as:

and if the running state changes, sending the second scheduling request to the server by default.

The terminal device will, for example, continuously monitor its own operating state and record or evaluate changes in this state. Once the operating state changes, the terminal device will default to send a second scheduling request to the server. After the server receives the second scheduling request, the computing node of the cloud can be scheduled to continue to run the target model from the breakpoint of the target model.

In this embodiment, the change of the running state of the terminal device is processed by an automatic mechanism, and the second scheduling request is sent to the server by default, so that the abnormality of the terminal can be relieved in time, and the stability and reliability of the running of the model can be ensured.

if the running state changes, acquiring the remaining second computing capacity of the terminal equipment from the running state; acquiring the operation residual calculated amount of the target model; calculating a remaining time period based on the second calculation capability and the remaining calculation amount; subtracting the running time of the target model on the terminal equipment by using the running time limit threshold of the target model to obtain a second time; and if the remaining time length is smaller than the second time length, sending the second scheduling request to the server.

For example, when the operating state of the terminal device changes, the device may first obtain a remaining second computing power from the operating state, where the second computing power is a real-time or recent computing power evaluation reflecting the current performance state of the device. Next, the terminal device acquires the operation residual calculation amount of the target model, which can be estimated by checking the information of the current task amount, the completed task amount, and the like of the target model. The remaining time period is then calculated based on the second computing power and the remaining computing power, and may be obtained, for example, by dividing the remaining computing power by the second computing power. The terminal device may then subtract the time that the target model has been run on the terminal device from the run time limit threshold of the target model to obtain a second time, which may be understood as calculating the remaining available time within the total time limit. Finally, if the remaining time period is less than the second time period, this means that the remaining amount of tasks cannot be completed in the remaining time. Therefore, the terminal device can send a second scheduling request to the server, and the server is requested to schedule the computing node of the cloud to continue to run the target model from the breakpoint of the target model.

In this embodiment, the terminal device may evaluate its own computing power and task amount by monitoring the running state of the device and the running condition of the target model in real time, and send a second scheduling request to the server when the remaining task amount cannot be completed in the remaining time, so as to request the computing node of the server scheduling cloud to continue running the target model from the breakpoint of the target model, which not only can timely alleviate the abnormality of the terminal, but also helps to ensure the stability and reliability of the model running. In addition, by comparing the remaining duration with the second duration, the target model can be preferentially operated on the terminal device as much as possible, i.e. cloud computing resources can be saved as much as possible.

Of course, in other alternative embodiments, if the remaining time period is less than the second time period, the terminal device may not send the second scheduling request to the server, but may request additional computing resources, or may send a third scheduling request to the server, where the third scheduling request is used to request the computing node of the server scheduling cloud to share the remaining computing amount of the target model.

In some embodiments, the S240 may include:

if the buffer memory of the terminal equipment comprises the target model, sending a query request to a server and receiving a query response message sent by the server; if the query response message indicates that the server does not update the target model, the target model is acquired from the buffer memory; if the query response message indicates that the server has updated the target model, the target model is downloaded from the server and the target model stored in the buffer memory is updated using the downloaded target model.

For example, if the target model is included in the buffer memory of the terminal device, this may mean that the model has already been run on the terminal device. In order to confirm whether the server has the latest update, the terminal device sends a query request to the server. The query request may include an identification of the target model, such as a name and version check identification of the target model, to request the server to confirm the current version of the target model or whether an update is available. After receiving the inquiry request, the server performs corresponding processing and sends an inquiry response message to the terminal device. This message may indicate whether the server has the latest update available. After receiving the inquiry response message, the terminal device analyzes the message to obtain feedback from the server. If the inquiry response message indicates that the server does not update the target model, the terminal device acquires the target model from the current buffer memory and continues to use the target model. If the query response message indicates that the server has updated the target model, this means that the version of the model in the buffer memory on the terminal device may have become outdated. In this case, the terminal device will download the new object model from the server. After downloading the new object model, the terminal device may update the object model stored in the buffer memory with the new model. This may involve loading the new model into a buffer memory, replacing the old model's location, or updating the old model's version information, etc.

In this embodiment, by sending a query request to the server periodically or as needed, the target model is downloaded and updated when needed, which helps to improve the accuracy and efficiency of the operation of the target model, and provides a better user experience. Meanwhile, through the use of a buffer memory, a copy of the target model can be locally saved, so that the load of the server is reduced, and the response speed is improved.

Fig. 4 shows a schematic flow chart of a model scheduling method 300 according to an embodiment of the present application, which method 300 may be performed by any electronic device having data processing capabilities. For example, the electronic device may be implemented as a server. For example, the electronic device may be implemented as the server 120 shown in FIG. 1.

As shown in fig. 4, the method 300 may include:

s310, the server receives a parameter request sent by the terminal equipment and responds to the parameter request to send a parameter response message to the terminal equipment, wherein the parameter request is used for requesting model parameters of a target model, and the parameter response message comprises the model parameters; the model parameters are used by the terminal device to determine whether the terminal device satisfies operating conditions for the target model to operate on the terminal device.

S320, the server receives a first scheduling request sent by the terminal equipment in response to the terminal equipment not meeting the operation condition;

s330, the server responds to the first scheduling request, and a first computing node for running the target model is determined in the computing nodes of the cloud;

s340, the server sends the target model and the input parameters of the target model to the first computing node, receives the operation result of the first computing node aiming at the target model, and forwards the operation result to the terminal equipment.

In some embodiments, the method 300 may further comprise:

and receiving a second scheduling request sent by the terminal equipment, wherein the scheduling request is used for requesting the server to schedule the computing node of the cloud to continue to operate the target model from the breakpoint of the target model, the second scheduling request comprises data used for representing the operation breakpoint of the target model, and the operation state change comprises the abnormal operation state.

In some embodiments, the method 300 may further comprise:

receiving a query request sent by the terminal equipment, and sending a query response message to the terminal equipment;

the query response message indicates whether the server updates the target model;

If the query response message indicates that the server has updated the target model, the method 300 may further include:

and sending the target model to the terminal equipment.

In combination with a terminal device, when an application program of the terminal device needs to run a target model, the terminal device can confirm whether the terminal device supports the running of the target model according to the type (such as model) of the terminal device, if so, according to whether the terminal device meets the running condition of the target model running on the terminal device (such as determining whether the time length consumed by running the target model on the terminal device is less than or equal to the running time length limiting threshold value in model parameters), if so, directly acquiring the input parameters of the target model, and directly running the target model on the terminal device; otherwise, notifying the server to schedule the computing node of the cloud to run the target model. When the server schedules the computing node of the cloud to run the target model degree, the server can determine a first computing node which supports the running of the target model and has enough idle computing resources according to the data reported by each computing node, and send the input parameters of the target model and the target model to the first computing node, and the first computing node completes the residual operation and returns the running result to the terminal equipment and finally presents the running result to the user.

It should be understood that the steps in the method 300 may be corresponding steps and descriptions in the method 200, and are not described herein for brevity.

The model scheduling method is described above from the point of view of the terminal device and the server, respectively, and the model scheduling method provided in the present application is described below from the point of view of an application program (e.g., a client on which the application program is installed). For example, the client on which the application program is installed may be a terminal device for running the model, or may be a device different from the device for running the model.

The model scheduling method comprises the following steps:

the application program receives an operation request sent by a target object and responds to the operation request to send a parameter request to a server; the running request is used for requesting to run a target model, and the parameter request is used for requesting model parameters of the target model;

the application program receives a parameter response message sent by the server, wherein the parameter response message comprises the model parameters;

the application program determines whether the terminal equipment meets the running condition of the target model running on the terminal equipment or not based on the model parameters;

if the running condition is met, the application program acquires the target model and dispatches the terminal equipment to run the target model;

In some embodiments, the model parameters include a run-time limit threshold for the target model; wherein the application program determines whether the terminal device meets the operation condition that the target model operates on the terminal device, and can be implemented as follows:

the application program determines a first time length required by the target model when running on the terminal equipment;

if the first duration is less than or equal to the operation duration limiting threshold, the application program determines that the operation condition is met;

if the first time is greater than the run time limit threshold, the application determines that the run condition is not satisfied.

In some embodiments, the model parameters further include device types supported by the target model; wherein the application program determines a first time length required by the target model when running on the terminal device, and can be implemented as follows:

in the case that the device types supported by the object model include the device type of the terminal device, the application program determines the first duration.

In some embodiments, the application program determining the first time period required for the object model to run on the terminal device may be implemented as:

the application program obtains the residual first computing capacity of the terminal equipment;

the application program obtains a first calculated amount required by the operation of the target model;

the application calculates the first duration based on the first computing power and the first amount of computation.

In some embodiments, the model parameters further comprise the first calculation amount or a calculation mode of the first calculation amount; when the calculation mode of the first calculation amount is the first calculation mode, the application program obtains the first calculation amount required by the operation of the target model, and may be implemented as follows:

the application program determines input parameters of the target model;

the application program multiplies the operation times required by the bytes by the byte length of the input parameter to obtain a first numerical value;

the application program adds the first value and a preset second value to obtain the first calculated amount.

In some embodiments, the application obtains an operating state of the terminal device when the terminal device operates the target model; if the running state changes, the application program determines whether to send a second scheduling request to the server, where the scheduling request is used to request the server to schedule the computing node of the cloud to continue running the target model from the breakpoint of the target model, the second scheduling request includes data used to characterize the running breakpoint of the target model, and the running state changes include that the running state is abnormal.

In some embodiments, if the running state changes, the application determines whether to send a second scheduling request to the server, which may be implemented as:

if the running state changes, the application program defaults to sending the second scheduling request to the server.

if the running state changes, the application program acquires the remaining second computing capacity of the terminal equipment from the running state;

the application program obtains the operation residual calculated amount of the target model;

The application program calculates a remaining time period based on the second calculation capability and the remaining calculation amount;

the application program subtracts the running time of the target model on the terminal equipment by using the running time limit threshold of the target model to obtain a second time;

if the remaining time length is less than the second time length, the application program sends the second scheduling request to the server.

In some embodiments, the application obtains the object model, including:

if the buffer memory of the terminal equipment comprises the target model, the application program sends a query request to a server and receives a query response message sent by the server;

if the query response message indicates that the server does not update the target model, the application program acquires the target model from the buffer memory;

if the query response message indicates that the server has updated the target model, the application downloads the target model from the server and updates the target model stored in the buffer memory using the downloaded target model.

It should be understood that the steps in the method embodiment of the application program may be corresponding steps and descriptions in the method 200, and are not described herein for brevity.

The model scheduling method provided in the present application will be described below with reference to fig. 4 and 5 in the framework shown in fig. 1.

Fig. 4 is another schematic flow chart of a model scheduling method 400 provided by an embodiment of the present application.

As shown in fig. 4, the model scheduling method 400 may include:

s401, starting calculation.

The application program receives a request of a user for starting calculation of the target model, and starts a calculation flow.

S402, requesting model parameters.

The application program requests the server parameter for requesting the model parameter of the target model, and the server returns a corresponding parameter response message including the model parameter. The model parameters may be referred to in the above example in table 1, and are not described herein again for avoiding repetition.

S403, obtaining model parameters.

After receiving the parameter response message returned by the server, the application program analyzes the parameter response message to obtain the model parameters of the target model. In addition, the application program obtains input parameters of the model parameters, namely input data which needs to be calculated by the target model.

S404, confirming whether the support is supported.

And the application program determines whether the current terminal equipment supports the calculation of the target model according to the model parameters returned by the server.

S405, the server determines a first computing node.

If the application program determines that the terminal device does not support the calculation of the target model based on the model parameters (for example, the device type supported by the target model included in the model parameters), the application program sends a first scheduling request to the server, where the first scheduling request is used to request the server to schedule the calculation node of the cloud end to run the target model. After receiving a first scheduling request sent by an application program, the server selects a proper first computing node according to the calculated amount of the target model and the state information reported by each computing node. The selection principle of the first computing node may include whether enough computing resources complete the computation in a given time.

S406, starting calculation.

And the first computing node starts computing after receiving the model issued by the server and corresponding input parameters.

S407, returning the operation result.

After the first computing node finishes running, the running result is returned to the server, the server is returned to the application program, and the application program finally presents the running result to the user.

S408, acquiring an operation state.

If the application program determines that the terminal device does not support the calculation of the target model based on the model parameters (e.g., the device types supported by the target model included in the model parameters), the application program requests the running state of the terminal device from the terminal device to determine the remaining first computing capability of the terminal device.

S409, whether it can be completed.

The application program obtains a first computing capacity through the running state of the terminal equipment, determines a first time length required by running the target model on the terminal equipment based on a first computing amount required by running the target model, and determines whether the terminal equipment can complete the calculation of the target model within a given running time length limiting threshold value based on the first time length and model parameters (such as the running time length limiting threshold value in the model parameters) issued by a server. If not, the application runs the target model to the computing node of the dispatch cloud.

S410, acquiring a target model.

If the application program determines that the terminal device is capable of completing the calculation of the target model within the given operation duration limiting threshold, the target model is requested from the server and is operated on the terminal device.

S411, starting detection.

After the application program obtains the target model and the input parameters of the target model, the terminal equipment is informed to operate the target model, and after the application program is successfully started, the application program starts the detection of the operation process so as to detect the operation state of the terminal equipment.

S412, starting calculation.

The terminal device starts the calculation of the target model according to the request of the application program.

S413, the flow ends.

And the target model finishes the flow after the calculation of the terminal equipment or the cloud is completed.

In this embodiment, when the target model needs to be operated, an application program requests a model parameter from a server, determines whether the terminal device supports calculation of the target model based on the model parameter, and determines whether the terminal device can complete calculation of the target model if the terminal device is supported, the terminal device is preferentially requested to operate the target model, that is, the terminal device is obtained and is requested to operate the target model, and if the terminal device does not support calculation of the target model or cannot complete calculation of the target model, the first computing node of the cloud is scheduled to operate the target model through the server; therefore, the computing resources of the terminal equipment can be fully utilized, the computing resources of the cloud end are further saved, the overall operation cost is reduced, and the normal operation of the target model can be ensured through the computing nodes of the cloud end when the terminal equipment does not meet the operation conditions, so that the user experience is ensured not to be influenced. Therefore, the cloud computing resource can be saved on the basis of guaranteeing user experience.

As shown in fig. 5, the state detection method 500 may include:

s501, starting detection.

After the target model is normally started and calculated by the terminal equipment, an application program starts the real-time detection of the running state of the target model.

S502, acquiring an operation state.

The application program obtains the running state of the target model from the terminal equipment so as to obtain the calculation capacity allocated to the target model by the terminal equipment and the calculation progress of the target model, and the calculation progress can be converted into the residual calculation amount.

S503, evaluating the completion time.

The application program can calculate the residual duration according to the residual calculation amount, the calculation capacity allocated by the terminal equipment for the target model and the residual calculation amount.

S504, can be completed.

If the calculation of the target model is completed, the application program informs the user of the running result, and the flow is ended; if the calculation has not been completed, it is determined whether the completion time is within the expected range. I.e. whether the remaining time is less than or equal to the difference obtained by subtracting the running time of the target model from the running time limit threshold in the model parameters, if so, the completion can be achieved within the expected range; otherwise, it is not within the expected range, i.e. cannot be completed.

S505, stopping operation.

If the target model is not within the expected range, namely can not be completed, the terminal device is informed to terminate the operation of the target model.

S506, saving breakpoint data.

And after receiving a request for terminating the operation of the target model sent by the application program, the terminal equipment stores the data of the operation breakpoint of the target model.

S507, stopping detection.

After the application program sends a request for terminating the operation of the target model to the terminal device, the detection of the running state of the terminal device is synchronously stopped.

S508, a second scheduling request is sent.

And the application program sends the data of the running breakpoint of the target model and the input parameters of the target model to the server, and requests the server to schedule the computing node of the cloud to continue to run the target model at the running breakpoint.

S509, determining a first computing node.

And the server selects a proper first computing node according to the state information reported by each computing node and the actual requirement of the target model, and sends the target model and the input parameters of the target model to the first computing node so as to request the first computing node to continue to operate the target model at the operation breakpoint. The input parameters may include data for the run breakpoint.

S510, recovering the breakpoint.

After receiving the request issued by the server, the first computing node first restores the operation breakpoint of the target model according to the data of the operation breakpoint.

S511, the calculation is continued.

And after the first computing node completes the recovery of the running breakpoint, continuing the computation of the target model from the breakpoint.

S512, returning an operation result.

And after the first computing node completes the operation of the target model, returning the operation result of the target model to the server. The server returns the running result to the application program.

S513, the calculation is completed.

After receiving the operation result sent by the terminal device or the server, the application program informs the user of the calculation result, and the whole process is finished.

In addition, the method 500 may further include a process of the first computing node reporting the status to the server at regular time, which may specifically include:

s514, starting a reporting process.

The first computing node starts a reporting process of the state at regular time so that the server can acquire the state of the first computing node in real time, and the server can determine the available computing capacity of the first computing node.

S515, reporting the state.

After the first computing node starts the reporting process, the remaining available computing capacity is calculated according to the maximum computing capacity and the computing capacity which is currently used, and the remaining available computing capacity is reported to the server.

S516, storing the state.

After the server acquires the state information reported by the first computing node in real time, the state of the first computing node in the database is updated.

S517, ending the reporting process.

After the server completes the updating of the state of the first computing node, the reporting process is ended.

In this embodiment, when the terminal device runs the target model, by detecting the running state of the terminal device, not only the computing resources of the terminal device can be fully utilized, the computing resources of the cloud end are saved, and the overall operation cost is reduced, but also the computing resources sufficient in the cloud end can be utilized when the computing resources of the terminal device are insufficient, so that the experience of the user is not affected, and the computing resources of the cloud end can be saved on the basis of guaranteeing the user experience.

The preferred embodiments of the present application have been described in detail above with reference to the accompanying drawings, but the present application is not limited to the specific details of the embodiments described above, and various simple modifications may be made to the technical solutions of the present application within the scope of the technical concept of the present application, and all the simple modifications belong to the protection scope of the present application. For example, the individual features described in the above-mentioned embodiments may be combined in any suitable manner, without contradiction, and various possible combinations are not described further in this application in order to avoid unnecessary repetition. As another example, any combination of the various embodiments of the present application may be made without departing from the spirit of the present application, which should also be considered as disclosed herein.

It should also be understood that, in the various method embodiments of the present application, the size of the sequence numbers of each process referred to above does not mean the order of execution, and the order of execution of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The method provided by the embodiment of the application is described above, and the device provided by the embodiment of the application is described below.

Fig. 6 is a schematic block diagram of a terminal device 600 provided in an embodiment of the present application.

As shown in fig. 6, the terminal device 600 may include:

a communication unit 610, configured to receive an operation request sent by a target object, and send a parameter request to a server in response to the operation request; the running request is used for requesting to run a target model, and the parameter request is used for requesting model parameters of the target model;

the communication unit 610 is further configured to receive a parameter response message sent by the server, where the parameter response message includes the model parameter;

a processing unit 620, configured to determine, based on the model parameter, whether the terminal device meets an operation condition of the target model for operation on the terminal device;

if the operation condition is satisfied, the processing unit 620 is further configured to acquire the target model and operate the target model on the terminal device;

If the operation condition is not satisfied, the communication unit 610 is further configured to send a first scheduling request to the server, and receive an operation result sent by the server for the target model; the first scheduling request is used for requesting the server to schedule the computing node of the cloud to run the target model.

In some embodiments, the model parameters include a run-time limit threshold for the target model;

wherein, the processing unit 620 is specifically configured to:

determining a first time period required by the target model when the target model runs on the terminal equipment;

if the first duration is less than or equal to the operation duration limiting threshold, determining that the operation condition is met;

and if the first time length is greater than the operation time length limiting threshold value, determining that the operation condition is not met.

In some embodiments, the model parameters further include device types supported by the target model;

wherein, the processing unit 620 is specifically configured to:

and determining the first duration in the case that the device types supported by the target model comprise the device type of the terminal device.

In some embodiments, the processing unit 620 is specifically configured to:

acquiring the residual first computing capacity of the terminal equipment;

acquiring a first calculated amount required by the operation of the target model;

The first duration is calculated based on the first computing power and the first computing power.

In some embodiments, the model parameters further comprise the first calculation amount or a calculation mode of the first calculation amount;

wherein, the processing unit 620 is specifically configured to:

determining input parameters of the target model;

multiplying the byte length of the input parameter by the operation times required by the byte to obtain a first numerical value;

and adding the first numerical value and a preset second numerical value to obtain the first calculated amount.

In some embodiments, the model parameters further include a type of the first computation amount;

when the type of the first calculated amount is a fixed calculated amount, the model parameter comprises the first calculated amount; or,

when the type of the first calculation amount is a non-fixed calculation amount, the model parameter includes a calculation mode of the first calculation amount.

In some embodiments, when the terminal device runs the object model, the processing unit 620 is further configured to:

acquiring the running state of the terminal equipment;

if the running state changes, determining whether to send a second scheduling request to the server, wherein the scheduling request is used for requesting the server to schedule the computing node of the cloud to continue running the target model from the breakpoint of the target model, the second scheduling request comprises data for representing the running breakpoint of the target model, and the running state changes comprise that the running state is abnormal.

In some embodiments, the processing unit 620 is specifically configured to:

if the running state changes, acquiring the remaining second computing capacity of the terminal equipment from the running state;

acquiring the operation residual calculated amount of the target model;

calculating a remaining time period based on the second calculation capability and the remaining calculation amount;

subtracting the running time of the target model on the terminal equipment by using the running time limit threshold of the target model to obtain a second time;

and if the remaining time length is smaller than the second time length, sending the second scheduling request to the server.

In some embodiments, the processing unit 620 is specifically configured to:

if the buffer memory of the terminal equipment comprises the target model, sending a query request to a server and receiving a query response message sent by the server;

if the query response message indicates that the server does not update the target model, the target model is acquired from the buffer memory;

if the query response message indicates that the server has updated the target model, the target model is downloaded from the server and the target model stored in the buffer memory is updated using the downloaded target model.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. Specifically, the terminal device 600 may correspond to a corresponding main body in executing the methods 200, 400-500 in the embodiments of the present application, and each unit in the terminal device 600 is for implementing a corresponding flow in the methods 200, 400-500, respectively, and for brevity, will not be described herein again. To avoid repetition, no further description is provided here.

Fig. 7 is a schematic block diagram of a server 700 provided in an embodiment of the present application.

As shown in fig. 7, the terminal device 700 may include:

a communication unit 710, configured to receive a parameter request sent by a terminal device, and send a parameter response message to the terminal device in response to the parameter request, where the parameter request is for requesting a model parameter of a target model, and the parameter response message includes the model parameter; the model parameters are used for the terminal equipment to determine whether the terminal equipment meets the operation conditions of the target model operated on the terminal equipment;

in response to the terminal device not meeting the operation condition, the communication unit 710 is further configured to receive a first scheduling request sent by the terminal device;

a processing unit 720, configured to determine, in response to the first scheduling request, a first computing node for running the target model among computing nodes in the cloud;

The communication unit 710 is further configured to: the method comprises the steps of sending the target model and input parameters of the target model to the first computing node, receiving an operation result of the first computing node aiming at the target model, and forwarding the operation result to the terminal equipment.

In some embodiments, the communication unit 710 is further configured to:

if the query response message indicates that the server has updated the target model, the communication unit 710 is further configured to:

and sending the target model to the terminal equipment.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. Specifically, the server 700 may correspond to a corresponding main body in executing the methods 300 to 500 in the embodiments of the present application, and each unit in the server 700 is for implementing a corresponding flow in the methods 300 to 500, which is not described herein for brevity. To avoid repetition, no further description is provided here.

It should be further understood that each unit in the terminal device 600 or the server 700 according to the embodiments of the present application is divided based on logic functions, and in practical application, the functions of one unit may be implemented by multiple units, or the functions of multiple units may be implemented by one unit, or even these functions may be implemented with assistance of one or more other units. For example, some or all of the terminal device 600 or server 700 may be combined into one or several additional units. For another example, some unit(s) in the terminal device 600 or the server 700 may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. For another example, the terminal device 600 or the server 700 may also include other units, and in practical applications, these functions may also be implemented with assistance of other units, and may be implemented by cooperation of a plurality of units.

It should also be understood that the term "module" or "unit" referred to in the embodiments of the present application refers to a computer program or a part of a computer program having a predetermined function and working together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit.

According to another embodiment of the present application, the terminal device 600 or the server 700 related to the embodiments of the present application may be constructed by running a computer program (including a program code) capable of executing the steps involved in the respective methods on a general-purpose computing device of a general-purpose computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and implementing the methods of the embodiments of the present application. The computer program may be recorded on a computer readable storage medium and loaded on an electronic device through the computer readable storage medium, and the computer program is used to implement the corresponding method of the embodiments of the present application. In other words, the units referred to above may be implemented in hardware, or may be implemented by instructions in software, or may be implemented in a combination of hardware and software. Specifically, each step of the method embodiments in the embodiments of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software, and the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by executing by using a hardware decoding processor, or by executing by using a combination of hardware and software in the decoding processor. Alternatively, the software may reside in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. Software in the memory may be run by the processor to perform the steps in the method embodiments referred to above.

Fig. 8 is a schematic structural diagram of an electronic device 800 provided in an embodiment of the present application.

As shown in fig. 8, the electronic device 800 includes at least a processor 810 and a computer-readable storage medium 820. Wherein the processor 810 and the computer-readable storage medium 820 may be connected by a bus or other means. The computer-readable storage medium 820 is configured to store a computer program 821, the computer program 821 including computer instructions, and the processor 810 is configured to execute the computer instructions stored by the computer-readable storage medium 820. Processor 810 is a computing core and a control core of electronic device 800 that are adapted to implement one or more computer instructions, in particular to load and execute one or more computer instructions to implement a corresponding method flow or a corresponding function.

By way of example, the processor 810 may also be referred to as a central processing unit (Central Processing Unit, CPU). The processor 810 may include, but is not limited to: general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete element gate or transistor logic devices, discrete hardware components, and so forth.

By way of example, computer-readable storage medium 820 may be high-speed RAM memory or Non-volatile memory (Non-Volatilememory), such as at least one magnetic disk memory; alternatively, it may be at least one computer-readable storage medium located remotely from the aforementioned processor 810. In particular, computer-readable storage media 820 includes, but is not limited to: volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).

As shown in fig. 8, the electronic device 800 may also include a transceiver 830.

The processor 810 may control the transceiver 830 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. Transceiver 830 may include a transmitter and a receiver. Transceiver 830 may further include antennas, the number of which may be one or more.

It should be appreciated that the various components in the electronic device 800 are connected by a bus system that includes a power bus, a control bus, and a status signal bus in addition to a data bus. It is noted that the electronic device 800 may be any electronic device having data processing capabilities; the computer-readable storage medium 820 has stored therein computer instructions; computer instructions stored in computer-readable storage medium 820 are loaded and executed by processor 810 to implement the corresponding steps performed by the terminal device or server in the method embodiments; in particular, the computer instructions in the computer-readable storage medium 820 are loaded by the processor 810 and perform the corresponding steps, and for avoiding repetition, a detailed description is omitted here.

According to another aspect of the present application, embodiments of the present application provide a chip. The chip may be an integrated circuit chip with signal processing capability, and may implement or execute the methods, steps and logic blocks disclosed in the embodiments of the present application. The chip may also be referred to as a system-on-chip, a system-on-chip or a system-on-chip, etc. The chip can be applied to various electronic devices capable of mounting the chip, so that the device mounted with the chip can execute the respective steps in the methods or logic blocks disclosed in the embodiments of the present application. For example, the chip may be adapted to implement one or more computer instructions, in particular to load and execute one or more computer instructions to implement the corresponding method flow or corresponding functions.

According to another aspect of the present application, embodiments of the present application provide a computer-readable storage medium (Memory). The computer-readable storage medium is a memory device of a computer for storing programs and data. It is understood that the computer readable storage medium herein may include a built-in storage medium in a computer, and of course, may include an extended storage medium supported by a computer. The computer-readable storage medium provides a storage space that stores an operating system of the electronic device. The memory space holds computer instructions adapted to be loaded and executed by a processor, which when read and executed by the processor of a computer device, cause the computer device to perform the respective steps of the methods or logic blocks disclosed in the embodiments of the present application.

According to another aspect of the present application, embodiments of the present application provide a computer program product or computer program. The computer program product or computer program includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the computer device to perform the respective steps of the methods or logic blocks disclosed in the embodiments of the present application. In other words, when the solution provided in the present application is implemented using software, it may be implemented in whole or in part in the form of a computer program product or a computer program. The computer program product or computer program includes one or more computer instructions. When loaded and executed on a computer, the computer program instructions run, in whole or in part, the processes or implement the functions of embodiments of the present application.

Notably, the computer to which the present application relates may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions referred to herein may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, by wire (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.) from one website site, computer, server, or data center.

Those of ordinary skill in the art will appreciate that the elements and process steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. In other words, the skilled person may use different methods for each specific application to achieve the described functionality, but such implementation should not be considered to be beyond the scope of protection of the present application.

Finally, it should be noted that the above is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about the changes or substitutions within the technical scope of the present application, and the changes or substitutions are covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims. For example, the individual technical features described in the above-described embodiments may be combined in any suitable manner without contradiction. As another example, any combination of the various embodiments of the present application may be made without departing from the basic concepts of the present application, which should also be considered as disclosed herein.

Claims

1. A method for model scheduling, the method being applicable to a terminal device, the method comprising:

receiving an operation request sent by a target object, and responding to the operation request to send a parameter request to a server; the operation request is used for requesting to operate a target model, and the parameter request is used for requesting model parameters of the target model;

receiving a parameter response message sent by the server, wherein the parameter response message comprises the model parameters;

determining whether the terminal equipment meets the running condition of the target model running on the terminal equipment based on the model parameters;

2. The method of claim 1, wherein the model parameters include a run-length limit threshold for the target model;

Wherein the determining whether the terminal device meets the operation condition that the target model operates on the terminal device includes:

if the first time length is smaller than or equal to the operation time length limiting threshold value, determining that the operation condition is met;

3. The method of claim 2, wherein the model parameters further comprise device types supported by the target model;

wherein said determining a first time period required for said target model to run on said terminal device comprises:

and determining the first duration in the case that the device type supported by the target model comprises the device type of the terminal device.

4. The method of claim 2, wherein the determining the first time period required for the target model to run on the terminal device comprises:

acquiring the residual first computing capacity of the terminal equipment;

The first duration is calculated based on the first computing power and the first amount of computation.

5. The method of claim 4, wherein the model parameters further comprise the first calculation amount or a calculation mode of the first calculation amount;

when the calculation mode of the first calculation amount is the first calculation mode, the obtaining the first calculation amount required by the operation of the target model includes:

determining input parameters of the target model;

6. The method of claim 5, wherein the model parameters further comprise a type of the first computation amount;

wherein, when the type of the first calculated amount is a fixed calculated amount, the model parameter includes the first calculated amount; or,

7. The method according to any one of claims 1 to 6, wherein when the terminal device runs the object model, the method further comprises:

Acquiring the running state of the terminal equipment;

if the running state changes, determining whether a second scheduling request is sent to the server, wherein the scheduling request is used for requesting the server to schedule the computing node of the cloud to continue running the target model from the breakpoint of the target model, the second scheduling request comprises data used for representing the running breakpoint of the target model, and the running state changes and comprises that the running state is abnormal.

8. The method of claim 7, wherein determining whether to send a second scheduling request to the server if the operating state changes comprises:

9. The method of claim 7, wherein determining whether to send a second scheduling request to the server if the operating state changes comprises:

acquiring the operation residual calculated amount of the target model;

and if the remaining time length is smaller than a second time length, sending the second scheduling request to the server.

10. The method according to any one of claims 1 to 6, wherein the acquiring the object model comprises:

if the buffer memory of the terminal equipment comprises the target model, sending a query request to a server, and receiving a query response message sent by the server;

if the query response message indicates that the server does not update the target model, acquiring the target model from the buffer memory;

and if the query response message indicates that the server has updated the target model, downloading the target model from the server, and updating the target model stored in the buffer memory by using the downloaded target model.

11. A model scheduling method, wherein the method is applicable to a server, and the method comprises:

Receiving a parameter request sent by terminal equipment, and responding to the parameter request to send a parameter response message to the terminal equipment, wherein the parameter request is used for requesting model parameters of a target model, and the parameter response message comprises the model parameters; the model parameters are used for the terminal equipment to determine whether the terminal equipment meets the operation conditions of the target model operated on the terminal equipment;

and sending the target model and input parameters of the target model to the first computing node, receiving an operation result of the first computing node aiming at the target model, and forwarding the operation result to the terminal equipment.

12. A terminal device, comprising:

the communication unit is used for receiving the operation request sent by the target object and responding to the operation request to send a parameter request to the server; the operation request is used for requesting to operate a target model, and the parameter request is used for requesting model parameters of the target model;

13. A server, comprising:

the communication unit is used for receiving a parameter request sent by the terminal equipment and responding to the parameter request to send a parameter response message to the terminal equipment, wherein the parameter request is used for requesting model parameters of a target model, and the parameter response message comprises the model parameters; the model parameters are used for the terminal equipment to determine whether the terminal equipment meets the operation conditions of the target model operated on the terminal equipment;

the processing unit is used for responding to the first scheduling request and determining a first computing node for running the target model from computing nodes in the cloud;

the communication unit is further configured to: and sending the target model and input parameters of the target model to the first computing node, receiving an operation result of the first computing node aiming at the target model, and forwarding the operation result to the terminal equipment.

14. An electronic device, comprising:

a processor adapted to execute a computer program;

a computer readable storage medium having stored therein a computer program which, when executed by the processor, implements the method of any one of claims 1 to 10 or implements the method of claim 11.

15. A computer readable storage medium for storing a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 10 or to perform the method of claim 11.