CN117973635A

CN117973635A - Decision prediction method, electronic device, and computer-readable storage medium

Info

Publication number: CN117973635A
Application number: CN202410361850.8A
Authority: CN
Inventors: 胡军军; 饶建波; 李雨洋; 王尧
Original assignee: Zhongke Advanced Shenzhen Integrated Technology Co ltd
Current assignee: Zhongke Advanced Shenzhen Integrated Technology Co ltd
Priority date: 2024-03-28
Filing date: 2024-03-28
Publication date: 2024-05-03
Anticipated expiration: 2044-03-28
Also published as: CN117973635B

Abstract

The application discloses a decision prediction method, electronic equipment and a computer readable storage medium, wherein the decision prediction method is used for acquiring production data of a plurality of flexible production lines in real time, and inputting the production data of each flexible production line into a first prediction model so that the first prediction model outputs strategy data adopted by the flexible production line in the production data environment; and inputting the production data of each flexible production line into a second prediction model, so that the second prediction model outputs the value data of the flexible production line after strategy is adopted in the production data environment, and determining a scheduling decision scheme of the flexible production line based on a plurality of strategy data and the value data. The decision prediction method can be used for carrying out parallel processing on the multi-thread flexible production line so as to carry out decision prediction on the flexible production line according to the production data, improve the data processing capacity, enable the flexible production line to carry out production management based on an optimal scheduling decision scheme and improve the flexibility and the production efficiency of the flexible production line.

Description

Decision prediction method, electronic device, and computer-readable storage medium

Technical Field

The present application relates to the field of production management technology, and in particular, to a decision prediction method, an electronic device, and a computer-readable storage medium.

Background

The distributed flexible production line is a highly-automatic and adjustable production system in the modern manufacturing industry, can quickly respond to market changes, supports diversified product manufacturing, realizes quick adjustment and configuration change in the production process, and adapts to different production requirements and changing operation conditions.

In the prior art, a distributed flexible production line usually performs decision prediction through a fixed algorithm or rule, and the conventional decision prediction method has limitations in processing complex and changing production demands due to insufficient flexibility and adaptability. For example, decision-making prediction is performed based on static planning and a prediction model, so that a distributed flexible production line is difficult to dynamically adapt to changing production conditions, has low data processing capacity, and is difficult to make a quick and accurate scheduling decision.

In order to improve the production management capability and efficiency of a distributed flexible production line, a more flexible and efficient decision prediction method is needed.

Disclosure of Invention

To solve the above technical problems, the present application provides a decision prediction method, an electronic device, and a computer-readable storage medium.

In order to solve the problems, the application provides a first technical scheme: the decision prediction method for the distributed flexible production line comprises the following steps: acquiring production data of a plurality of flexible production lines in real time; inputting the production data of each flexible production line into a first prediction model, so that the first prediction model outputs strategy data adopted by the flexible production line under the environment of the production data; inputting the production data of each flexible production line into a second prediction model, so that the second prediction model outputs the value data of the flexible production line after strategy is adopted in the environment of the production data; and determining a scheduling decision scheme of the flexible production line based on the strategy data and the value data.

Optionally, the production data includes a plurality of first status data of the flexible production line; after the step of obtaining the production data of the plurality of flexible production lines in real time, the decision prediction method further includes: acquiring action data and rewarding data corresponding to a plurality of the first state data; inputting the first state data and the motion data into the first prediction model to obtain probability prediction data of the motion data output by the first prediction model; and calculating a first gradient function based on the probability prediction data and the reward data, and training the first prediction model based on the first gradient function.

Optionally, after the step of obtaining the action data and the reward data corresponding to the plurality of first state data, the decision prediction method further includes: inputting the first state data and the motion data into the second prediction model to obtain benefit prediction data which is output by the second prediction model and used for executing the motion data; and calculating a second gradient function based on the benefit prediction data and the reward data, and training the second prediction model based on the second gradient function.

Optionally, the first prediction model is configured to select an adjustment action from the action data and output second status data of the flexible production line after the adjustment action is performed; the calculating a first gradient function based on the probability prediction data and the reward data, and training the first prediction model based on the first gradient function includes: calculating a merit function based on the second status data and the bonus data; calculating the first gradient function based on the dominance function, the second state data, and the probability prediction data; and updating model parameters of the first prediction model through the first gradient function.

Optionally, before the step of calculating the merit function based on the second state data and the reward data, the decision prediction method further includes: determining that the benefit of the adjustment action is predicted to be a first preset value in response to the second state data meeting a preset condition, or obtaining benefit prediction data for executing the adjustment action through the second prediction model in response to the second state data not meeting the preset condition; the merit function is calculated based on the second status data, the bonus data, and the benefit prediction data.

Optionally, the first prediction model and the second prediction model are used for acquiring corresponding environmental data based on the first state data so as to predict under the environmental data.

Optionally, the action data includes an adjustment action that the flexible production line may take under the current environmental data, and the reward data includes result data of the flexible production line performing the adjustment action under the current environmental data.

Optionally, the production data includes a plurality of first status data of the flexible production line; after the step of determining the scheduling decision scheme of the flexible production line based on the policy data and the value data, the decision prediction method further includes: scheduling control is carried out on at least one flexible production line according to the scheduling decision scheme; acquiring third state data of the scheduled flexible production line; and optimizing model parameters of the first prediction model and the second prediction model based on the comparison result of the third state data and the first state data.

In order to solve the problems, the application provides a second technical scheme: providing an electronic device, wherein the electronic device comprises a processor and a memory connected with the processor, and the memory stores program instructions; the processor is configured to execute the program instructions stored in the memory to implement the decision prediction method.

In order to solve the above problems, the present application provides a third technical solution: there is provided a computer readable storage medium storing program instructions executable by a processor to implement a decision prediction method as above.

The application provides a decision prediction method, electronic equipment and a computer readable storage medium, wherein the decision prediction method is used for acquiring production data of a plurality of flexible production lines in real time, and inputting the production data of each flexible production line into a first prediction model so that the first prediction model outputs strategy data adopted by the flexible production line in the production data environment; and inputting the production data of each flexible production line into a second prediction model, so that the second prediction model outputs the value data of the flexible production line after strategy is adopted in the production data environment, and determining the scheduling decision scheme of the flexible production line based on the strategy data and the value data. By means of the method, the multi-thread flexible production line can be processed in parallel, so that the flexible production line can be subjected to decision prediction according to real-time production data, the data processing capacity is improved, meanwhile, the flexible production line can be subjected to production management based on an optimal scheduling decision scheme, and the flexibility and the production efficiency of the flexible production line are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a flow chart of a first embodiment of a decision prediction method provided by the present application;

FIG. 2 is a flow chart of a second embodiment of the decision prediction method provided by the present application;

FIG. 3 is a flow chart of a third embodiment of a decision prediction method provided by the present application;

FIG. 4 is a schematic diagram of an embodiment of an electronic device according to the present application;

fig. 5 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present application, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.

In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.

The embodiment of the application firstly provides a decision prediction method which is applied to the field of production management and is used for making decisions on the production flow of a distributed flexible production line, so that the flexible production line can be suitable for different types of production lines and changeable production tasks, intelligent decision making is provided for the flexible production line, and the production efficiency is improved.

Referring to fig. 1, fig. 1 is a flowchart illustrating a decision prediction method according to a first embodiment of the present application. As shown in fig. 1, the decision prediction method of the present embodiment includes the following steps:

step S11: production data of a plurality of flexible production lines are acquired in real time.

Specifically, the flexible production line is an automatic production line formed by connecting a plurality of adjustable machine tools and matching with an automatic conveying device, so that automatic processing, logistics transportation, information collection feedback and other automatic production systems of products are realized. The production data of the flexible production line is usually real-time status of the flexible production line, including but not limited to equipment running status, production speed, stock level of materials, currently performed tasks, task progress, etc. By acquiring production data of the flexible production line, the running state of the flexible production line at a certain time period or time point can be acquired.

Step S12: and inputting the production data of each flexible production line into the first prediction model so that the first prediction model outputs strategy data adopted by the flexible production line in the production data environment.

After the production data of a plurality of flexible production lines are acquired, the production data of each flexible production line are input into a first prediction model, and the first prediction model is used for strategy data adopted by the flexible production line in the production data environment. The production data of the flexible production line is input into the first prediction model, and the production data can be used as a basis for strategy prediction of the first prediction model, so that the first prediction model can analyze and predict an optimal strategy according to the current state of the flexible production line. Wherein, the policy data may represent an action that the flexible production line may take and a probability that the action corresponds to.

Step S13: and inputting the production data of each flexible production line into a second prediction model so that the second prediction model outputs the value data of the flexible production line after strategy is adopted in the production data environment.

And inputting the production data of each flexible production line into a second prediction model, wherein the second prediction model is used for predicting the value generated by the flexible production line after taking a strategy under the current environment. The production data of the flexible production line is input to the second prediction model, and the production data can be used as a basis for value prediction of the second prediction model, so that the second prediction model can analyze and predict value data of a corresponding strategy according to the current state of the flexible production line. The value data may represent, among other things, the benefit or return that the flexible line can obtain when taking a certain action or strategy in the current state.

It can be understood that the decision prediction method of the present embodiment is provided with a plurality of prediction threads, each prediction thread corresponds to one flexible production line and performs data processing and prediction on production data of the flexible production line, and the plurality of prediction threads are parallel, so that the decision prediction method of the present embodiment can be used for performing data processing and prediction on the plurality of flexible production lines, and data processing capability is improved. For example, the distributed flexible production line may include a first production line, a second production line, … …, and an nth production line, the first thread configured to obtain policy data of the first production line through a first prediction model, the first thread further configured to obtain value data of the first production line through a second prediction model; the second thread is used for acquiring strategy data of the second production line through the first prediction model, and the second thread is also used for acquiring value data of the second production line through the second prediction model; the steps of the other threads are similar to those described above and will not be described in detail herein.

Step S14: and determining a scheduling decision scheme of the flexible production line based on the strategy data and the value data.

After the strategy data and the value data of the flexible production line are obtained, whether a certain action or strategy can generate enough value can be estimated based on the strategy data and the value data so as to select and make a corresponding scheduling decision scheme, so that the scheduling decision scheme can be suitable for the current production environment.

By means of the method, the multi-thread flexible production line can be processed in parallel, so that the flexible production line can be subjected to decision prediction according to real-time production data, intelligent decision making and resource optimization of the flexible production line are achieved, the data processing capacity is improved, meanwhile, the flexible production line can be subjected to production management based on an optimal scheduling decision scheme, and flexibility and production efficiency of the flexible production line are improved.

In an embodiment, please refer to fig. 2, fig. 2 is a flowchart illustrating a decision prediction method according to a second embodiment of the present application. As shown in fig. 2, the production data includes a plurality of first status data of the flexible production line. After step S11, the decision prediction method of the present embodiment further includes the following steps:

Step S21: action data and reward data corresponding to the first state data are obtained.

The production data comprises a plurality of first state data, the plurality of first state data can be used for representing or describing the current running state of the flexible production line, the action data is an action executed by the flexible production line under the plurality of first state data, and the reward data is rewards when the flexible production line executes the action data under the plurality of first state data. In the decision prediction method of this embodiment, after production data is obtained in real time, the production data may be stored in a certain preset database, and the action data and the reward data corresponding to the production data are stored in the database, so that the database stores first state data, action data and reward data corresponding to different time periods or time points, and the first state data, action data and reward data of each time period or time point may be used as training data of the first prediction model and the second prediction model for training.

In a possible embodiment, when the production data includes first status data of the flexible production line in a past period of time, the action data may be an action actually performed by the flexible production line in the past period of time, and the reward data may be a result or a reward obtained after the flexible production line actually performs the action. In other embodiments, where the production data includes real-time state data of the flexible production line, the decision-making prediction method may also be implemented using other prediction models; or by incorporating other algorithmic structures, predicting actions that the flexible line may take in that state and obtaining action data, and predicting results obtained when the flexible line performs an action in that state and obtaining result data.

Step S22: the first state data and the motion data are input to a first prediction model to obtain probability prediction data of the motion data output by the first prediction model.

Specifically, after the first state data and the motion data are acquired, the first state data and the motion data are input to the first prediction model. Since the action data indicates an action that the flexible production line may take, the first prediction model may predict different probabilities of the flexible production line in the context of the first state data to output probability prediction data corresponding to the action data. In a possible embodiment, the action data may comprise at least one action, the probability prediction data being indicative of a probability distribution for each action of the flexible production line, and the sum of the probabilities for all actions may be 1.

Step S23: a first gradient function is calculated based on the probability prediction data and the reward data, and a first prediction model is trained based on the first gradient function.

And calculating a first gradient function and carrying out gradient accumulation based on the probability prediction data and the reward data output by the first prediction model so as to obtain a strategy accumulation gradient of the first prediction model. And after the strategy accumulated gradient reaches a certain degree or a certain number of training times are completed, updating parameters of the first prediction model.

According to the method for decision prediction, the action data and the reward data corresponding to the first state data are acquired, the first state data and the action data are input into the first prediction model, so that probability prediction data of the action data output by the first prediction model are acquired, a first gradient function is calculated based on the probability prediction data and the reward data, the first prediction model is trained based on the first gradient function, the trained first prediction model can take the change of the first state data under different production environments into consideration when strategy prediction is carried out, the trained first prediction model can carry out decision prediction on the flexible production line according to real-time production data, the prediction accuracy is improved, and the flexibility and the production efficiency of the flexible production line are further improved.

Optionally, after the step of storing the production data in the database and storing the action data and the reward data corresponding to the production data in the database, the decision prediction method further includes: inputting the first state data and the action data into a second prediction model to obtain benefit prediction data of execution action data output by the second prediction model; a second gradient function is calculated based on the benefit prediction data and the reward data, and a second prediction model is trained based on the second gradient function.

Specifically, the second prediction model is trained through the first state data, the action data and the rewarding data, so that the prediction data of benefits generated by the flexible production line output by the second prediction model when the action data is executed are obtained. And calculating a second gradient function based on the benefit prediction data and the reward data and carrying out gradient accumulation so as to update parameters of the second prediction model after accumulating gradients to a certain degree or completing a certain number of training times.

In the embodiment of the application, the first prediction model and the second prediction model are trained through the first state data, the action data and the reward data, so that the first prediction model and the second prediction model after training can consider the change of the first state data under different production environments when predicting, the scheduling decision scheme output by the decision prediction method of the embodiment can adapt to the changed production environments, the prediction accuracy is improved, and the flexibility and the production efficiency of the flexible production line are further improved.

Optionally, referring to fig. 3, fig. 3 is a flowchart illustrating a third embodiment of the decision prediction method provided by the present application. As shown in fig. 3, the first prediction model is used for selecting an adjustment action from the action data and outputting second state data of the flexible production line after the adjustment action data is executed; step S23 further includes:

Step S31: based on the second state data and the bonus data, a merit function is calculated.

Specifically, when the first prediction model predicts, the first prediction model is further used for selecting a certain adjustment action from the action data to execute, so as to obtain second state data of the flexible production line after the adjustment action is executed. After the second state data is acquired, a dominance function for the adjustment action may be calculated based on the second state data and the bonus data. Wherein, because the equipment state, production efficiency, stock level and the like may change after the flexible production line executes a certain action, the second state data is updated state data corresponding to the first state data; the period in which the first predictive model performs an action is a training period, and the dominance function is used to indicate dominance of the action on the flexible production line.

Wherein the dominance function is further associated with a discount factor, the discount factor being used to calculate a current value of the future rewards. The discount factor can also be used to represent the degree of importance for immediate and long-term benefits in an actual production scenario; in the event that more importance is placed on the immediate benefit,Close to 0; when longer term benefit is more important,/>Close to 1. In a possible implementation manner, the discount factor may be used as a super parameter of the first prediction model and the second prediction model and remain unchanged in the whole training process, or may be adjusted according to the importance degree of the immediate benefit and the long-term benefit in different application scenarios, which is not limited in detail herein.

Step S32: the first gradient function is calculated based on the dominance function, the second state data, and the probability prediction data.

After the dominance function is obtained, a first gradient function of the first prediction model under the training period is calculated and gradient accumulation is carried out based on the dominance function, the second state data and the probability prediction data after the adjustment action is executed.

Step S33: model parameters of the first predictive model are updated by the first gradient function.

And calculating the first gradient function and carrying out gradient accumulation to obtain the benefit accumulation gradient of the second prediction model. And after the benefit accumulation gradient reaches a certain degree or a certain number of training times are completed, updating parameters of the second prediction model.

In this embodiment, the decision prediction method calculates the dominance function based on the second state data and the reward data, calculates the first gradient function based on the dominance function, the second state data and the probability prediction data, and updates the model parameters of the first prediction model through the first gradient function, so that the trained first prediction model can consider the changes of the first state data in different production environments to predict, the prediction accuracy is improved, and the flexibility and the production efficiency of the flexible production line are further improved.

Further, before step S31, the decision prediction method of the present embodiment further includes: determining that the benefit prediction of the adjustment action is a first preset value in response to the second state data meeting a preset condition, or obtaining benefit prediction data for executing the adjustment action through a second prediction model in response to the second state data not meeting the preset condition; based on the second state data, the bonus data, and the benefit prediction data, a merit function is calculated.

Specifically, the preset condition of the second state data is a specific target or an important point of the flexible production line, and the preset condition is usually related to the production requirement of the flexible production line, the product strategy and the like. For example, when the number of products in the flexible production line reaches a preset value or the flexible production line reaches a preset operation time, it is determined that the second state data of the flexible production line satisfies a preset condition. In one embodiment, when the state of the flexible production line meets the preset condition, the benefit of the adjustment action is not required to be calculated through the second prediction model, and the first preset value can be directly used as the benefit of the action, so that the calculation force is reduced, and the data processing efficiency is improved. The first preset value may be 0, or may be increased or decreased based on the actual production state, which is not particularly limited herein.

In another embodiment, when the state of the flexible production line does not meet the preset condition, the benefit prediction data of the adjustment action is calculated through the second prediction model, so that the benefit prediction data can be used for representing the expected return or the expected value obtained after the adjustment action is performed.

Further, before training, the decision prediction method may use the first state data as initialized environmental data of the first prediction model and the second prediction model, and define initialized model parameters and thread states to build the first prediction model and the second prediction model, so that the first prediction model can use the reward data as a learning target and perform policy prediction based on the first state data and the action data, and the second prediction model can use the reward data as a learning target and perform value prediction based on the first state data and the action data. Specifically, the present embodiment may train the first prediction model and the second prediction model by:

model parameters of the first prediction model and the second prediction model are initialized, and environment data are initialized S ₀;

initializing a strategy accumulation gradient: And initializing benefit cumulative gradient/> ；

For each training period t;

acquiring first state data St, action data and rewards data Rt:

Calculating a strategy pi using the first predictive model, and selecting and executing an adjustment action to obtain second state data st+1;

Calculating the value of performing the adjustment action At using the second predictive model ；

Calculating a dominance function, the dominance function beingGamma is a discount factor between 0 and 1;

For the following To 0: /(I)；

Calculating a strategy cumulative gradient:；

calculating a benefit accumulation gradient: ；

By passing through Training and synchronously updating the first prediction model by/>And training and synchronously updating the second prediction model until the training condition is met.

Specifically, the state data of the flexible production line is used for representing the environment in which the first prediction model and the second prediction model are located. The first and second prediction models need to interact with the production data in advance before predicting based on the production data to determine current environmental data, so that the first and second prediction models can subsequently make decision-making predictions based on the current environmental data.

By the mode, the first prediction model and the second prediction model can adjust environmental data based on environmental changes of the flexible production line, so that the prediction data output by the first prediction model and the second prediction model can reflect the latest state of the flexible production line, and the high efficiency and the real-time performance of a scheduling decision scheme are ensured.

Further, the action data comprise adjustment actions possibly taken by the flexible production line under the current environment data, and the reward data comprise result data of the adjustment actions executed by the flexible production line under the current environment data, so that the flexibility and the production efficiency of the flexible production line are improved.

In particular, the action data is used to represent data related to an adjustment action that the flexible production line may take under the environmental data, e.g., the adjustment action may include adjusting production speed, changing priority of production tasks, starting or stopping a particular device, etc.; the reward data is a result of a change in environmental data of the flexible production line after the adjustment of the action data is performed, and for example, the result may include improvement in production efficiency, reduction in cost, and the like. It can be understood that the reward data is the return that may be brought by executing a certain action in the actual production process of the flexible production line, and the decision prediction method of the embodiment can adjust the model parameters of the first prediction model and the second prediction model by comparing the reward data with the expected return so as to maximize the return brought by the scheduling decision scheme and improve the utilization rate of the production resources.

In one embodiment, the production data acquired in step S11 includes a plurality of first status data of the flexible production line. After step S14, the decision prediction method of the present embodiment further includes: scheduling control is carried out on at least one flexible production line according to a scheduling decision scheme; acquiring third state data of the scheduled flexible production line; and optimizing model parameters of the first prediction model and the second prediction model based on the comparison result of the third state data and the first state data.

Specifically, after the scheduling decision scheme is obtained, the scheduling decision scheme can be executed by calling a task scheduling algorithm, so that scheduling operations such as production flow adjustment, production speed optimization, production equipment shutdown, production task change, manual adjustment and the like are performed on at least one flexible production line, and third state data of the flexible production line are continuously monitored, wherein the third state data are used for reflecting state change of the flexible production line after the scheduling decision scheme is executed. And comparing the third state data with the first state data to optimize model parameters of the first prediction model and the second prediction model according to the comparison result, so that the flexibility and adaptability of the flexible production line are further improved, the resource configuration of the flexible production line can be quickly adapted to market change, and the production efficiency is improved.

The status data for comparison may include production line operation indexes before and after scheduling, including but not limited to production efficiency, resource utilization, production cost, and the like. In addition, the first predictive model and the second predictive model may be further tuned and optimized to continuously improve production performance by monitoring and analyzing data in real time for reduced downtime, improved product quality, and the like.

In summary, unlike the prior art, the decision prediction method of the embodiment of the application improves the processing capability of complex production data through self-adaptive learning and multi-thread processing, realizes real-time decision and scheduling control of the flexible production line, can cope with highly dynamic and changeable production environments, improves the operation efficiency and flexibility of the flexible production line, reduces the possibility of accidents such as cost waste caused by improper scheduling, and provides a more efficient and intelligent solution for modern industrial production.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an embodiment of an electronic device according to the present application. As shown in fig. 4, the electronic device of the present embodiment includes a memory 52 and a processor 51 connected to each other. The memory 52 is used to store program instructions for implementing the methods described in any of the embodiments above. The processor 51 is operative to execute program instructions stored in the memory 52.

The processor 51 may also be referred to as a CPU (Central Processing Unit ). The processor 51 may be an integrated circuit chip with processing capabilities for signaling. Processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 52 may be a memory bank, TF card, etc., and may store all information in the electronic device, including input raw data, computer programs, intermediate operation results, and final operation results, all stored in the memory. It stores and retrieves information according to the location specified by the controller. With the memory, the string matching prediction device has a memory function, and can ensure normal operation. The memories of the string matching prediction apparatus may be classified into a main memory (memory) and an auxiliary memory (external memory) according to the purpose, and may be classified into an external memory and an internal memory. The external memory is usually a magnetic medium, an optical disk, or the like, and can store information for a long period of time. The memory refers to a storage component on the motherboard for storing data and programs currently being executed, but is only used for temporarily storing programs and data, and the data is lost when the power supply is turned off or the power is turned off.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a system server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the method of the embodiments of the present application.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present application. As shown in fig. 5, the computer readable storage medium of the present application stores a program instruction 61 capable of implementing all the methods described above, where the program instruction 61 may be stored in the storage medium in the form of a software product, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. The aforementioned storage device includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes, or electronic devices such as a computer, a server, a mobile phone, a tablet, or the like.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.

Claims

1. The decision prediction method for the distributed flexible production line is characterized by comprising the following steps of:

acquiring production data of a plurality of flexible production lines in real time;

Inputting the production data of each flexible production line into a first prediction model, so that the first prediction model outputs strategy data adopted by the flexible production line under the environment of the production data;

Inputting the production data of each flexible production line into a second prediction model, so that the second prediction model outputs the value data of the flexible production line after strategy is adopted in the production data environment;

And determining a scheduling decision scheme of the flexible production line based on the strategy data and the value data.

2. The decision prediction method of claim 1, wherein the production data comprises a plurality of first state data of the flexible production line; after the step of acquiring the production data of the plurality of flexible production lines in real time, the decision prediction method further includes:

Acquiring action data and rewarding data corresponding to a plurality of first state data;

Inputting the first state data and the action data into the first prediction model to obtain probability prediction data of the action data output by the first prediction model;

A first gradient function is calculated based on the probabilistic predictive data and the reward data, and the first predictive model is trained based on the first gradient function.

3. The decision prediction method according to claim 2, wherein after the step of acquiring action data and bonus data corresponding to a plurality of the first state data, the decision prediction method further comprises:

inputting the first state data and the action data into the second prediction model to obtain benefit prediction data which is output by the second prediction model and used for executing the action data;

A second gradient function is calculated based on the benefit prediction data and the reward data, and the second predictive model is trained based on the second gradient function.

4. The decision-making prediction method according to claim 2, wherein the first prediction model is configured to select an adjustment action from the action data and output second state data of the flexible production line after the adjustment action is performed;

the computing a first gradient function based on the probability prediction data and the reward data, and training the first predictive model based on the first gradient function, comprising:

calculating a merit function based on the second status data and the bonus data;

Calculating the first gradient function based on the dominance function, the second state data, and the probability prediction data;

and updating model parameters of the first prediction model through the first gradient function.

5. The decision prediction method according to claim 4, wherein before the step of calculating a merit function based on the second state data and the bonus data, the decision prediction method further comprises:

Determining that the benefit of the adjustment action is predicted to be a first preset value in response to the second state data meeting a preset condition, or obtaining benefit prediction data for executing the adjustment action through the second prediction model in response to the second state data not meeting the preset condition;

The merit function is calculated based on the second status data, the reward data, and the benefit prediction data.

6. The decision-making prediction method according to claim 2, wherein the first prediction model and the second prediction model are used for acquiring corresponding environmental data based on the first state data to make predictions under the environmental data.

7. The decision prediction method of claim 6, wherein the action data comprises an adjustment action that the flexible production line may take under the current environmental data, and the reward data comprises result data of the flexible production line performing the adjustment action under the current environmental data.

8. The decision prediction method of claim 1, wherein the production data comprises a plurality of first state data of the flexible production line; after the step of determining the scheduling decision scheme of the flexible production line based on the policy data and the value data, the decision prediction method further includes:

scheduling control is carried out on at least one flexible production line according to the scheduling decision scheme;

acquiring third state data of the scheduled flexible production line;

And optimizing model parameters of the first prediction model and the second prediction model based on the comparison result of the third state data and the first state data.

9. An electronic device comprising a processor, a memory coupled to the processor, wherein,

The memory stores program instructions;

the processor is configured to execute the program instructions stored in the memory to implement the decision prediction method according to any one of claims 1 to 8.

10. A computer readable storage medium storing program instructions executable by a processor to implement the decision prediction method of any one of claims 1 to 8.