CN117972360A

CN117972360A - AI large model optimization method, device, terminal equipment and storage medium

Info

Publication number: CN117972360A
Application number: CN202410361556.7A
Authority: CN
Inventors: 陈季春; 黄德安; 陈子文; 闫超
Original assignee: Shenzhen Imyfone Technology Co ltd
Current assignee: Shenzhen Imyfone Technology Co ltd
Priority date: 2024-03-28
Filing date: 2024-03-28
Publication date: 2024-05-03
Anticipated expiration: 2044-03-28
Also published as: CN117972360B

Abstract

The application is suitable for the technical field of AI models, and provides an AI large model optimization method, an AI large model optimization device, terminal equipment and a storage medium. The optimization method of the AI large model comprises the following steps: acquiring a demand text input by a user; selecting a target sub-model from the AI large models based on the demand text; performing parameter configuration on the target sub-model to obtain a configured sub-model; performing performance analysis on the configuration sub-model to obtain an analysis result; and optimizing the configuration sub-model according to the analysis result to obtain an optimized optimization sub-model. The embodiment of the application can select the proper sub-model according to the required text, and can improve the user experience. And meanwhile, the selected sub-model is optimized, so that the response speed and throughput of the sub-model can be improved, and the use experience of a user is further improved.

Description

AI large model optimization method, device, terminal equipment and storage medium

Technical Field

The application belongs to the technical field of AI models, and particularly relates to an AI large model optimization method, an AI large model optimization device, terminal equipment and a storage medium.

Background

With the rapid development of AI technology, more and more AI large models are presented. The AI large model can be widely applied to the fields of intelligent customer service, automatic question and answer, machine translation, emotion analysis and the like, and helps people to complete various tasks more quickly and accurately, and work efficiency and life quality are improved. In general, an AI large model may include multiple sub-models, different sub-models may provide different functions, e.g., a question-answer sub-model may provide a question-answer function, and a drawing sub-model may provide a drawing function. However, many submodels have the problems of low response speed and low throughput rate at present, so that the use experience of users is poor.

Disclosure of Invention

The embodiment of the application provides an optimization method, an optimization device, terminal equipment and a storage medium for an AI large model, which can solve the problems of poor user experience caused by low response speed and low throughput rate of sub-models in the AI large model in the related technology.

In a first aspect, an embodiment of the present application provides an optimization method for an AI large model, including:

Acquiring a demand text input by a user;

selecting a target sub-model from the AI large models based on the demand text;

Performing parameter configuration on the target sub-model to obtain a configured sub-model;

Performing performance analysis on the configuration sub-model to obtain an analysis result;

And optimizing the configuration sub-model according to the analysis result to obtain an optimized optimization sub-model.

In a second aspect, an embodiment of the present application provides an apparatus for optimizing an AI large model, including:

the acquisition module is used for acquiring a demand text input by a user;

the selecting module is used for selecting a target sub-model from the AI large models based on the demand text;

The configuration module is used for carrying out parameter configuration on the target sub-model to obtain a configured configuration sub-model;

the analysis module is used for performing performance analysis on the configuration sub-model to obtain an analysis result;

And the optimizing module is used for optimizing the configuration sub-model according to the analysis result to obtain an optimized optimizing sub-model.

In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the above-mentioned method for optimizing an AI large model when the processor executes the computer program.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the above-described method for optimizing an AI large model.

In a fifth aspect, an embodiment of the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the above-mentioned optimization method of the AI large model.

Compared with the prior art, the embodiment of the application has the beneficial effects that: according to the embodiment of the application, the target submodel is selected from the AI large model based on the demand text by acquiring the demand text input by the user, then the parameter configuration is carried out on the target submodel to obtain the configured configuration submodel, then the performance analysis is carried out on the configuration submodel to obtain an analysis result, and the configuration submodel is optimized according to the analysis result to obtain the optimized optimization submodel. The embodiment of the application can select the proper sub-model according to the required text, and can improve the user experience. And meanwhile, the selected sub-model is optimized, so that the response speed and throughput of the sub-model can be improved, and the use experience of a user is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an implementation flow of an AI large model optimization method provided by an embodiment of the application;

FIG. 2 is a schematic structural diagram of an AI large model optimizing apparatus according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be protected by the present application based on the embodiments of the present application.

It is noted that the terms "comprising," "including," and "having," and any variations thereof, in the description and claims of the application and in the foregoing figures, are intended to cover non-exclusive inclusions. For example, a process, method, terminal, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. In the claims, specification, and drawings of the present application, relational terms such as "first" and "second", and the like are used solely to distinguish one entity/operation/object from another entity/operation/object without necessarily requiring or implying any such real-time relationship or order between such entities/operations/objects.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In view of this, the embodiment of the application can select a proper sub-model according to the required text, and can improve the user experience. And meanwhile, the selected sub-model is optimized, so that the response speed and throughput of the sub-model can be improved, and the use experience of a user is further improved.

In order to illustrate the technical scheme of the application, the following description is made by specific examples.

Fig. 1 shows a schematic implementation flow diagram of an AI large model optimization method provided by an embodiment of the present application, where the method may be applied to a terminal device. The terminal device may be a mobile phone, tablet computer, notebook computer, ultra-mobile personal computer (UMPC), netbook, etc.

Specifically, the above-described optimization method of the AI large model may include the following steps S101 to S105.

Step S101, obtaining a demand text input by a user.

Where the demand text is a specific request or question entered by the user, it may be, for example, "what is the weather in tomorrow? "please recommend me a good movie. "etc.

In the embodiment of the application, a user can input a requirement text to the terminal equipment deployed with the AI large model at the front-end interface. The terminal device may receive the demand text entered by the user.

And step S102, selecting a target sub-model from the AI large models based on the demand text.

Wherein, the AI large model can be a large model set comprising a plurality of sub-models, each of which can be specialized in handling a certain class of tasks or problems. The target sub-model may be the sub-model selected based on user input that best suits the need to handle.

In the embodiment of the application, after obtaining the demand text, the terminal device can analyze the demand text, so that a proper sub-model is selected from the AI large models to serve as a target sub-model.

And step S103, carrying out parameter configuration on the target sub-model to obtain a configured sub-model.

The configuration sub-model may be a sub-model for processing a specific task after being specifically configured.

In the embodiment of the application, after the target sub-model is selected, the terminal equipment can configure key parameters, such as text model parameters, picture model parameters and the like, for the target sub-model to obtain a configured sub-model.

And step S104, performing performance analysis on the configuration sub-model to obtain an analysis result.

Wherein the analysis results may be used to characterize the performance of the configuration sub-model in processing the demand text.

In the embodiment of the application, the terminal equipment can perform performance analysis on the configuration sub-model, and particularly can analyze response time, throughput and the like of the configuration sub-model, thereby obtaining an analysis result.

Step S105, optimizing the configuration sub-model according to the analysis result to obtain an optimized sub-model.

Wherein the optimization sub-model is an optimized sub-model.

In the embodiment of the application, after the analysis result is obtained, the terminal equipment can perform multiparty optimization on the configuration sub-model according to the analysis result, and specifically can comprise network transmission optimization, cache optimization, task scheduling optimization, data compression optimization, load balancing optimization and the like. The method can reduce delay and errors of data transmission, improve the access speed of data, reasonably allocate computing resources, ensure efficient execution of tasks, reduce the data quantity of storage and transmission, improve the processing speed, ensure uniform load distribution among a plurality of processing units or servers and avoid overload of certain parts.

In some embodiments of the present application, the optimizing the configuration sub-model according to the analysis result may specifically include steps S401 to S405.

Step S401, according to the analysis result, the TCP window size and retransmission strategy in the network transmission parameters are adjusted to perform network transmission optimization.

The TCP window size is a window size in TCP (transmission control protocol) and is used for flow control, and determines the maximum data amount that a sender can continuously send without receiving acknowledgement. Retransmission policies refer to policies that need to be taken to retransmit packets when they are lost or delayed in arrival in network communications.

In the embodiment of the application, the terminal equipment can dynamically adjust the window size according to the analysis result so as to adapt to the network condition and reduce network congestion and delay. The retransmission strategy can also be adjusted, such as increasing the retransmission times, adjusting the retransmission time interval, etc., so as to improve the reliability of data transmission.

Step S402, adjusting the cache size, the cache policy and the cache expiration time, and using the distributed cache to perform cache optimization.

The buffer size may be the amount of data that the buffer can store. The caching policy may be used to decide which data should be cached and how to remove the data from the cache. The cache expiration time may be the time when the cache data should be considered invalid and removed from the cache.

In the embodiment of the application, the terminal equipment can dynamically adjust the size of the buffer memory according to the analysis result so as to adapt to different workloads and data access modes. An appropriate cache policy, such as LRU (least recently used), LFU (least recently used), etc., may also be selected to improve cache hit rate and efficiency. The buffer expiration time can be adjusted, and the buffer expiration time is dynamically adjusted according to the update frequency and the access mode of the data so as to balance the hit rate of the buffer and the freshness of the data. In addition, the terminal device can use distributed cache to distribute the cache data to a plurality of nodes so as to improve the availability and performance of the cache.

Step S403, selecting a task scheduling strategy, optimizing thread synchronization and adjusting parallelism to perform task scheduling optimization.

Among other things, task scheduling policies may be used to decide how tasks are allocated and executed. Thread synchronization may refer to multiple threads coordinating their execution order by some mechanism when accessing shared resources to avoid data contention and conflicts. Parallelism may be the number of parallel operations that a system or task performs at the same time.

In the embodiment of the application, the terminal equipment can select an appropriate task scheduling strategy according to the analysis result, such as priority-based scheduling, polling scheduling and the like, so as to optimize the execution sequence and efficiency of the tasks. Thread synchronization can be optimized, and particularly, through a reasonable thread synchronization mechanism, such as using locks, semaphores and the like, competition and conflict among threads are avoided, and the execution efficiency of tasks is improved. The parallelism of the tasks can be dynamically adjusted according to the hardware resources and task characteristics of the system so as to balance the resource utilization and performance.

Step S404, selecting a compression algorithm and dynamically adjusting the compression level to perform data compression optimization.

The compression level may be a setting that determines the resources used by the compression algorithm in the compression process and the compression ratio that can be achieved, among other things.

In the embodiment of the application, the terminal equipment can select a proper compression algorithm, such as LZ4, snappy, zlib and the like, according to the analysis result so as to realize efficient data compression. The level of compression may also be dynamically adjusted to balance the compression ratio and the consumption of computing resources.

And step S405, deploying a load balancer or a container arranging system to perform load balancing optimization, and obtaining the optimized optimization sub-model.

The load balancer may be a device or software for distributing network requests to multiple servers to achieve load balancing and fault tolerance, among other things. The container orchestration system may be a system for automated deployment, expansion, and management of containerized applications, such as Kubernetes.

In the embodiment of the application, the terminal equipment can deploy a load balancer such as Nginx, HAProxy and the like according to the analysis result, and distribute the request to a plurality of servers so as to realize load balancing and fault tolerance. Or a containerization system, such as Kubernetes, may be used to automatically manage and schedule containerized applications to achieve load balancing and dynamic allocation of resources.

In some embodiments of the present application, the selecting the target sub-model from the AI large model based on the demand text may specifically include steps S501 to S506.

Step S501, determining the type of data to be output according to the required text.

Where the data type refers to the output format or type that the user desires to obtain from the model.

In an embodiment of the present application, the terminal device may analyze the demand text input by the user, and recognize the subject and intention involved in the text through natural language processing technology (e.g., entity recognition, intention recognition, etc.). Based on this information, the terminal device can determine the type of output data desired by the user, such as text, images, audio, video, etc.

Step S502, determining candidate sub-models based on the data types.

The candidate sub-model may be a sub-model set that may satisfy a user requirement and is screened out according to a data type.

In the embodiment of the application, the terminal equipment can screen out the submodel capable of processing and outputting the data of the type from the AI large model base according to the identified data type. It will be appreciated that candidate sub-models may be trained and optimized for a particular task or domain.

Step S503, obtaining the response time, the response success rate and the cost of each candidate sub-model.

Where the response time may refer to the time from the receipt of the request to the output of the result by the model. The response success rate may refer to the proportion of requests that are correctly processed by the model. The fee may refer to the cost paid to use the model and may include computing resources, model permissions, and the like.

In the embodiment of the application, the terminal equipment can collect or calculate the response time, the response success rate, the cost and other information of each candidate sub-model based on historical data or real-time monitoring.

Step S504, weight is respectively given to response time, response success rate and cost.

In the embodiment of the application, the terminal equipment can set weights for response time, response success rate and cost respectively according to actual requirements or preferences. These weights reflect the importance of the different indices in the model selection process.

In step S505, a score of each candidate sub-model is calculated according to the response time, the response success rate, the fee and the corresponding weight.

Wherein the score is a comprehensive evaluation index that can be used to quantify the performance of each candidate sub-model in meeting user requirements and cost effectiveness.

In the embodiment of the application, the terminal equipment can calculate a comprehensive score according to the response time, the response success rate and the cost of each candidate sub-model and the corresponding weight.

In step S506, the candidate sub-model with the highest score is determined as the target sub-model.

In the embodiment of the application, the terminal equipment can compare the scores of all candidate sub-models, and select the model with the highest score as the target sub-model. It can be appreciated that the model performs optimally in terms of response time, response success rate, cost, etc., and best meets the needs and expectations of users.

In some embodiments of the present application, the performance analysis of the configuration sub-model may be performed to obtain an analysis result, which may specifically include step S601 and step S602.

Step S601, performance index data of the configuration sub-model is obtained.

The performance index data may be used to measure performance of the submodel, and may include response time, throughput, and response success rate.

In the embodiment of the application, the terminal equipment can collect various performance data such as response time, throughput, response success rate and the like of the configuration sub-model in the running process.

Step S602, analyzing response time duration and response time fluctuation of response time, throughput size and throughput fluctuation of throughput and response success rate, and determining network bandwidth and CPU utilization rate of configuration submodel to obtain analysis result.

The response time may be, among other things, the time required from the user sending a request to the system returning a response. The response time fluctuations may be the extent of the change in response time in different situations. Throughput may be the number of requests successfully processed in a certain time. Throughput fluctuations may be the variation of throughput at different times or under different loads. The response success rate may be a ratio of the number of successfully processed requests to the total number of requests. The network bandwidth may be the amount of data that the network can transmit per unit time. The CPU utilization may be the percentage of the CPU that is occupied during a particular period of time, used to measure the load condition of the system.

In an embodiment of the application, the terminal device may analyze the length of the response time, i.e. the time required for the sub-model from receiving the request to returning the result. At the same time, fluctuations in response time, i.e. changes in response time in different requests or different time periods, can also be analyzed. It will be appreciated that if the response time is significantly higher, it may be caused by insufficient network bandwidth or limited CPU processing speed. If the response time fluctuates greatly, it may be caused by network instability or CPU load imbalance. The terminal device may also analyze throughput size and throughput fluctuations, it being understood that if throughput is below expected, there may be limited network bandwidth or insufficient CPU processing power. If the throughput fluctuation is large, it may mean that network bandwidth or CPU resources are exhausted at some point. The terminal device may also analyze the response success rate, and it may be understood that if the response success rate is low, this may mean that the network is unstable or CPU resources are strained, resulting in a packet loss or a processing error. After the analysis, the terminal equipment can determine the network bandwidth and the CPU utilization rate of the configuration sub-model, thereby obtaining an analysis result.

In some embodiments of the present application, the configuring the parameters of the target sub-model to obtain the configured configuration sub-model may specifically include step S701 and step S702.

Step S701, obtaining history configuration data of the target sub-model.

The historical configuration data may be configuration information accumulated in the past operation or use process of the target sub-model, including various parameter settings, adjustment records and the like.

In embodiments of the present application, the terminal device may retrieve usage records or configuration information prior to the target sub-model from a database or storage system.

Step S702, according to the demand text and the historical configuration data, an optimization algorithm is used for configuring text model parameters, picture model parameters, content length and frequency limit of the target sub-model, and the configured configuration sub-model is obtained.

The text model parameters may be parameters affecting the text generation or processing effect, such as the content of the promtt, the request mode, the promtt context, etc. The picture model parameters may be parameters that affect the picture generation or processing effect, such as picture scale, picture style type, sample content, sample negative content, etc. The content length may be the length or size of the model output content (e.g., text, pictures, etc.). The frequency limit may be a frequency limit of use of the model (e.g., a frequency limit requested by the API) or a frequency limit of the output result (e.g., a frequency of generating a picture).

In the embodiment of the application, the terminal equipment can utilize the required text and the historical configuration data and combine an optimization algorithm (such as a machine learning algorithm, a heuristic search algorithm and the like) to adjust and optimize a plurality of parameters of the target sub-model so as to find an optimal solution of the model parameters. In particular, parameters of the text model, parameters of the picture model, content length, frequency limitation, and the like can be configured.

In order to monitor the AI channel in real time, so as to process the abnormal situation of the channel in time. In some embodiments of the present application, after optimizing the configuration sub-model according to the analysis result to obtain an optimized optimization sub-model, the method may further include steps S801 to S804.

Step S801, monitoring the optimization sub-model, and acquiring monitoring data in a preset time.

In the embodiment of the application, the terminal equipment can continuously monitor the optimized sub-model so as to collect various performance indexes of the sub-model in the running process. The CPU utilization rate, the memory occupation, the response time, the error rate and other data can be monitored. Such data is typically collected over a preset period of time, such as hourly, daily, or weekly.

And step S802, comparing the response time in the monitoring data with a preset threshold value to obtain a comparison result, and judging whether the optimization sub-model is abnormal or not according to the comparison result.

In an embodiment of the present application, the terminal device may compare the response time in the collected monitoring data with a preset normal value range or threshold. If the response time exceeds a preset threshold, it is considered that the optimization sub-model may be abnormal.

In step S803, if it is determined that the optimization sub-model is abnormal based on the comparison result, a request retry is performed.

In embodiments of the present application, if the optimization sub-model is determined to be abnormal, the system will typically attempt to resend the request, hopefully with the model returning to normal. The terminal device may resend the request, for example, resend the user's demand text, to see if the optimization sub-model can successfully output the corresponding content.

Step S804, if the request retry fails, switching to the standby sub-model, generating an abnormal code according to the monitoring data, and recording the abnormal code to the log.

The spare sub-model may be a model prepared in advance for replacing the main model when it is problematic. The exception code may be a specific code or number for identifying and recording exceptions

In an embodiment of the present application, if the request retry still fails, the terminal device may switch to the standby sub-model to continue providing the service. Meanwhile, an anomaly code can be generated according to the monitoring data and used for identifying the anomaly, and the anomaly code is recorded in a log so as to facilitate subsequent analysis and troubleshooting.

In order to enhance the security and wind control of the AI model, in some embodiments of the present application, the method may further include step S901 and step S902.

Step S901, accessing an external sensitive word stock.

Step S902, filtering sensitive words according to an external sensitive word stock.

In the embodiment of the application, the terminal equipment can be accessed into sensitive word libraries of external channels such as the Arian, openai and the like, and when the AI large model generates or processes texts, whether the texts contain words in the external sensitive word libraries or not is checked. If included, the terminal device may filter, replace, or tag these words to ensure that the output content does not contain sensitive or unsuitable content.

Fig. 2 shows a schematic structural diagram of an AI large model optimizing apparatus provided by an embodiment of the present application, where the AI large model optimizing apparatus 2 may be configured on a terminal device, and specifically, the AI large model optimizing apparatus 2 may include:

an obtaining module 201, configured to obtain a demand text input by a user;

A selection module 202, configured to select a target sub-model from AI big models based on the demand text;

The configuration module 203 is configured to perform parameter configuration on the target sub-model to obtain a configured configuration sub-model;

The analysis module 204 is configured to perform performance analysis on the configuration sub-model to obtain an analysis result;

and the optimizing module 205 is configured to optimize the configuration sub-model according to the analysis result, so as to obtain an optimized optimization sub-model.

In some embodiments of the present application, the optimization module 205 described above may also be used to: according to the analysis result, adjusting the TCP window size and retransmission strategy in the network transmission parameters to perform network transmission optimization; adjusting the cache size, the cache strategy and the cache expiration time, and using a distributed cache to perform cache optimization; selecting a task scheduling strategy, optimizing thread synchronization and adjusting parallelism to perform task scheduling optimization; selecting a compression algorithm, and dynamically adjusting a compression level to perform data compression optimization; and deploying a load balancer or a container arranging system to perform load balancing optimization to obtain the optimized optimization sub-model.

In some embodiments of the present application, the selection module 202 may be further configured to: determining the type of data to be output according to the demand text; determining a candidate sub-model based on the data type; acquiring response time, response success rate and cost of each candidate sub-model; weighting the response time, the response success rate and the cost respectively; calculating the score of each candidate sub-model according to the response time, the response success rate, the fee and the corresponding weight; and determining the candidate submodel with the highest score as the target submodel.

In some embodiments of the present application, the analysis module 204 described above may also be used to: acquiring performance index data of the configuration sub-model, wherein the performance index data comprises response time, throughput and response success rate; and analyzing the response time length and response time fluctuation of the response time, the throughput size and throughput fluctuation of the throughput and the response success rate, and determining the network bandwidth and CPU utilization rate of the configuration sub-model to obtain the analysis result.

In some embodiments of the present application, the configuration module 203 may be further configured to: acquiring historical configuration data of the target sub-model; and configuring text model parameters, picture model parameters, content length and frequency limit of the target sub-model by using an optimization algorithm according to the demand text and the historical configuration data to obtain the configured configuration sub-model.

In some embodiments of the present application, the above-mentioned optimizing device 2 of AI large model may further include a monitoring module for: monitoring the optimization sub-model, and acquiring monitoring data in preset time; comparing the response time in the monitoring data with a preset threshold value to obtain a comparison result, and judging whether the optimization sub-model is abnormal or not according to the comparison result; if the optimization sub-model is judged to be abnormal according to the comparison result, a request retry is carried out; if the request retry fails, switching to a standby sub-model, generating an abnormal code according to the monitoring data, and recording the abnormal code to a log.

In some embodiments of the present application, the above-mentioned optimizing device 2 of AI large model may further include a filtering module for: accessing an external sensitive word stock; and filtering the sensitive words according to the external sensitive word stock.

Fig. 3 is a schematic diagram of a terminal device according to an embodiment of the present application. The terminal device 3 may include: a processor 301, a memory 302 and a computer program 303, e.g. an optimization program of an AI large model, stored in said memory 302 and executable on said processor 301. The processor 301, when executing the computer program 303, implements the steps in the above-described embodiments of the optimization method of each AI large model, such as steps S101 to S105 shown in fig. 1. Or the processor 301 may perform the functions of the modules/units in the above-described apparatus embodiments when executing the computer program 303, for example, the obtaining module 201, the selecting module 202, the configuring module 203, the analyzing module 204, and the optimizing module 205 shown in fig. 2.

The computer program may be divided into one or more modules/units, which are stored in the memory 302 and executed by the processor 301 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used for describing the execution of the computer program in the terminal device.

The terminal device may include, but is not limited to, a processor 301, a memory 302. It will be appreciated by those skilled in the art that fig. 3 is merely an example of a terminal device and is not limiting of the terminal device, and may include more or fewer components than shown, or may combine some components, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 301 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 302 may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory 302 may also be an external storage device of the terminal device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the terminal device. Further, the memory 302 may also include both an internal storage unit and an external storage device of the terminal device. The memory 302 is used for storing the computer program and other programs and data required by the terminal device. The memory 302 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for convenience and brevity of description, the structure of the above terminal device may also refer to a specific description of the structure in the method embodiment, which is not repeated herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program can realize the steps in the AI large model optimizing method when being executed by a processor.

The embodiment of the application provides a computer program product which can realize the steps in the AI large model optimizing method when being executed on a mobile terminal.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for optimizing an AI large model, comprising:

Acquiring a demand text input by a user;

selecting a target sub-model from the AI large models based on the demand text;

2. The method for optimizing an AI large model according to claim 1, wherein optimizing the configuration sub-model according to the analysis result to obtain an optimized optimization sub-model comprises:

According to the analysis result, adjusting the TCP window size and retransmission strategy in the network transmission parameters to perform network transmission optimization;

Adjusting the cache size, the cache strategy and the cache expiration time, and using a distributed cache to perform cache optimization;

Selecting a task scheduling strategy, optimizing thread synchronization and adjusting parallelism to perform task scheduling optimization;

Selecting a compression algorithm, and dynamically adjusting a compression level to perform data compression optimization;

And deploying a load balancer or a container arranging system to perform load balancing optimization to obtain the optimized optimization sub-model.

3. The method for optimizing AI large models according to claim 1, wherein selecting a target sub-model from AI large models based on the demand text comprises:

Determining the type of data to be output according to the demand text;

Determining a candidate sub-model based on the data type;

acquiring response time, response success rate and cost of each candidate sub-model;

Weighting the response time, the response success rate and the cost respectively;

Calculating the score of each candidate sub-model according to the response time, the response success rate, the fee and the corresponding weight;

and determining the candidate submodel with the highest score as the target submodel.

4. The method for optimizing an AI large model of claim 1, wherein performing a performance analysis on the configuration sub model to obtain an analysis result comprises:

Acquiring performance index data of the configuration sub-model, wherein the performance index data comprises response time, throughput and response success rate;

and analyzing the response time length and response time fluctuation of the response time, the throughput size and throughput fluctuation of the throughput and the response success rate, and determining the network bandwidth and CPU utilization rate of the configuration sub-model to obtain the analysis result.

5. The method for optimizing an AI large model according to claim 1, wherein the performing parameter configuration on the target sub model to obtain a configured configuration sub model includes:

acquiring historical configuration data of the target sub-model;

And configuring text model parameters, picture model parameters, content length and frequency limit of the target sub-model by using an optimization algorithm according to the demand text and the historical configuration data to obtain the configured configuration sub-model.

6. The method for optimizing an AI large model of claim 1, wherein after optimizing the configuration sub-model based on the analysis result to obtain an optimized optimization sub-model, the method further comprises:

Monitoring the optimization sub-model, and acquiring monitoring data in preset time;

Comparing the response time in the monitoring data with a preset threshold value to obtain a comparison result, and judging whether the optimization sub-model is abnormal or not according to the comparison result;

If the optimization sub-model is judged to be abnormal according to the comparison result, a request retry is carried out;

If the request retry fails, switching to a standby sub-model, generating an abnormal code according to the monitoring data, and recording the abnormal code to a log.

7. The method of optimizing AI large models according to any of claims 1 to 6, characterized in that the method further comprises:

Accessing an external sensitive word stock;

and filtering the sensitive words according to the external sensitive word stock.

8. An AI large model optimizing apparatus, characterized by comprising:

the acquisition module is used for acquiring a demand text input by a user;

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method for optimizing AI large models of any of claims 1-7 when the computer program is executed.

10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the optimization method of the AI large model of any one of claims 1 to 7.