CN111445081A - Digital twin virtual-real self-adaptive iterative optimization method for dynamic scheduling of product operation - Google Patents

Digital twin virtual-real self-adaptive iterative optimization method for dynamic scheduling of product operation Download PDF

Info

Publication number
CN111445081A
CN111445081A CN202010251710.7A CN202010251710A CN111445081A CN 111445081 A CN111445081 A CN 111445081A CN 202010251710 A CN202010251710 A CN 202010251710A CN 111445081 A CN111445081 A CN 111445081A
Authority
CN
China
Prior art keywords
network
deep
digital twin
reinforcement learning
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010251710.7A
Other languages
Chinese (zh)
Inventor
刘振宇
胡亮
裘辿
陈俊奇
谭建荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010251710.7A priority Critical patent/CN111445081A/en
Publication of CN111445081A publication Critical patent/CN111445081A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Educational Administration (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a digital twin virtual-real self-adaptive iterative optimization method for dynamic scheduling of product operation. Establishing timing S for product digital twin operation process3A PR mesh model; constructing a first sublayer structure of library-to-transition feature propagation; constructing a second sublayer structure of the feature propagation of the transition to the library; building neural network fitting timing S3Mapping between the identification state of the PR network model and the scheduling return; will give time S3The dynamic scheduling problem of the PR network is converted into a Markov decision model; method for solving Mark by using deep Q value network reinforcement learning methodA Kefu decision model; three experiments are designed to verify that the method provided by the invention has more advantages. The method utilizes deep reinforcement learning, and has more advantages in scheduling performance, calculation efficiency and adaptability compared with the traditional heuristic rule method, heuristic search method and reinforcement learning method combined with the full-connection neural network.

Description

Digital twin virtual-real self-adaptive iterative optimization method for dynamic scheduling of product operation
Technical Field
The invention relates to a method for processing virtual reality simulation data of equipment, in particular to a digital twin virtual reality self-adaptive iterative optimization method for dynamic scheduling of product operation based on deep reinforcement learning, and belongs to the field of system scheduling management.
Background
Scheduling optimization is a classic research subject in work flow control, and is a key for realizing coordination linkage of unit components and full utilization of resources in complex products. The full life cycle virtual-real consistency and the iteration optimization capability required in the digital twin structure of the complex product provide new problems for the traditional scheduling optimization method, and firstly, how to realize the smooth deployment from a virtual simulation prototype to an actual physical product by the scheduling method and ensure the consistency between the virtual simulation prototype and the actual physical product in the service of the product; secondly, how to use the operation data collected by CPS and other information technologies to carry out iterative optimization by the scheduling method, so that the intelligence is improved, and a scheduling strategy which is more in line with the real service environment is generated; in addition, the real-time requirement of complex product operation also emphasizes the computational efficiency of the scheduling method.
At present, some researches apply the reinforcement learning or the Deep reinforcement learning to the process scheduling problem, however, the actual effect is not ideal, and two reasons exist, namely, the early Deep reinforcement learning directly uses a Deep neural network fitting value function on the traditional Q learning framework, and is easily influenced by the experimental correlation in training to cause unstable learning process, the defect is effectively solved by two technologies of experience and bivalence functions until the Deep Q value network (DQN) method is proposed, and the existing process scheduling method based on the reinforcement learning is based on the value function fitting mode, even the Deep reinforcement learning scheduling method only uses a multilayer fully connected neural network (Multi-layer Perceptron, Multi L a percenter, M L P), the fitting modes are based on the comparison, the fitting state of the Deep reinforcement learning scheduling method only uses a multilayer fully connected neural network (M L P) and the flow is simply restricted by the flow element containing the time, and the flow is not limited by the flow element, and the fitting state is not considered in the process.
Disclosure of Invention
In order to solve the problems in the background art, the invention provides a DQN method combined with a graph convolutional neural Network (GCN), which is applied to the aspect of digital twin operation flow scheduling with three characteristics of resource sharing, line flexibility and task randomness, and realizes more intelligent and stable scheduling optimization.
The method utilizes deep reinforcement learning, and has more advantages in digital twin operation scheduling performance, calculation efficiency and adaptability compared with the traditional heuristic rule method, heuristic search method and reinforcement learning method combined with a full-connection neural network.
In order to achieve the purpose, the establishment of the model comprises the following specific steps:
s1, establishing timing S for product digital twin operation process3A PR mesh model;
the library represents the procedures of the product digital twin operation flow, and the transition represents the transition between the procedures of the product digital twin operation flow.
S2. structure timing S3A first sublayer structure of library-to-transition feature propagation in the PR mesh model;
s3. structure timing S3A second sublayer structure of characteristic propagation of the transition to the library in the PR network model;
s4, building a neural network by using the first sublayer structure and the second sublayer structure, and fitting the timing S by using the neural network3The state of the PR mesh model;
s5, time assignment S is to be based on3The dynamic scheduling problem of the product digital twin operation process of the PR network model is converted into a Markov decision model;
and S6, solving the Markov decision model established in the S5 by using a Deep Q value network (Deep Q-network) reinforcement learning method, and realizing virtual-real adaptive iterative optimization.
In step S2, a 2-unit shared convolution kernel is constructed to calculate a weighted sum of the features of each of the transitional preamble operation libraries and the features of the preamble resource library, and the weighted sum is used as the structure of the first sub-layer structure.
In the specific implementation, a dummy resource base with the identification constantly larger than zero is added for the autonomous transition, so that the time S can be assigned3All transitions in a PR network have the same preamble node structure.
In step S3, a second sublayer structure is constructed according to the linear shift invariant filter. In particular, the method utilizes a linear shift invariant filter on a directed weighted graph
Figure BDA0002435732420000021
Wherein F and
Figure BDA0002435732420000022
the original signal and the filtered signal on the directed weighted graph are respectively, H is a filter, A is an adjacency matrix of the directed weighted graph, HkIs a filter parameter, K is the order of the filter, and the filter order K is limited to 1 so that the filtered features of the library depend only on the original features of the library itself and the original features of its preamble transitions.
In step S4, the output of the first sub-layer structure is used as the input of the second sub-layer structure, and the first sub-layer structure and the second sub-layer structure are combined to form a timing S3Convolution layer of PR net model, convolution layer will give time S3The input matrix and the output matrix of the PR network are taken as static parameters, the number of trainable weight values is only related to the dimension of input and output characteristics, and is related to the timing S3The PR network has no relation with the scale, and can overcome the problem of weight explosion in the deep neural network construction.
In the step S5, quintuple is used
Figure BDA0002435732420000031
To describe the endowmentTime S3Interactive environment for PR net scheduling. Wherein the content of the first and second substances,
Figure BDA0002435732420000032
represents a parameter of the state space that is,
Figure BDA0002435732420000033
represents the motion space parameter, phi represents the state transition parameter, r represents the cost function, and gamma represents the impairment coefficient.
In step S6, the step of solving the markov decision model established in step S5 by using a Deep Q-value network (Deep Q-network) reinforcement learning method is as follows:
s61, establishing a cost function
Figure BDA0002435732420000034
To evaluate the current state at time step tau
Figure BDA0002435732420000035
Selection actions
Figure BDA0002435732420000036
The value of (D); wherein Q represents a value parameter, sτIndicating the state at time tau, a indicates a certain action,
Figure BDA0002435732420000037
the state space is represented by a representation of,
Figure BDA0002435732420000038
representing a motion space;
s62, the optimal cost function obeys Bellman Equation (Bellman Equation), and the Bellman Equation is used for matching the cost function in the training process
Figure BDA0002435732420000039
Learning in an iterative manner;
s63, fitting a value function through two deep neural networks with the same structure by using a deep Q value network reinforcement learning method
Figure BDA00024357324200000310
And a weight value empirical playback (priority Experience Replay) method is used in training;
in specific implementation, when scheduling is trapped in deadlock, a scheduling Agent (scheduling Agent) can obtain a large penalty, the playback probability of deadlock experience can be improved in the early training period by weight experience playback, so that the scheduling Agent learns a deadlock avoidance strategy as fast as possible, the improvement of scheduling performance is emphasized in the later training period, the playback probability of a state containing β ≠ 0 is higher, so that the scheduling Agent is more sensitive to the change of task completion time, and β represents the total cost of completed tasks in the current scheduling step.
And S64, a mask zeta is introduced into the deep Q value network reinforcement learning method, so that trial and error of invalid actions in training can be avoided, and the convergence rate is further improved.
Specifically, an Adam optimization algorithm is selected, the learning rate is set to be 0.0001, and the target neural network in the DQN is updated every 1,000 steps; performance evaluations were performed every 10,000 steps, 100 cycles were scheduled using the current neural network and the average cycle return and deadlock rate were recorded.
The invention firstly passes the time S3Adding a dummy resource pool with the identification constantly larger than zero to the autonomous transition in the PR network so as to ensure that the assigning time S is3All transitions in the PR network have the same preorder node structure, and a first sublayer structure for propagation from the structure library to the transition characteristics is constructed on the basis of the preorder node structure; then, constructing a second sublayer structure of characteristic propagation of the transition to the library by using a construction method of a linear shift invariant filter on the directed weighted graph; the P2T and second sublayer structure are then combined into a convolutional layer, overcoming the difficulties faced in generalizing from the classical convolutional layer of convolutional neural networks of euclidean spatial data to the graph convolutional layer on the graph.
Deep reinforcement learning to solve the dynamic scheduling optimization problem requires a digital twin operation flow operation environment through quintuple
Figure BDA00024357324200000311
To describe the endowmentTime S3A digital twin interoperable environment for PR net scheduling. Wherein the content of the first and second substances,
Figure BDA0002435732420000041
the representation of the state space is represented by,
Figure BDA0002435732420000042
represents the action space, Φ represents the state transition rule, r represents the reward function, and γ represents the discount coefficient.
Next, a cost function is defined
Figure BDA0002435732420000043
To evaluate the current state at time step tau
Figure BDA0002435732420000044
Selection actions
Figure BDA0002435732420000045
And using the Bellman equation pair in the training process
Figure BDA0002435732420000046
Learning is performed in an iterative manner. Fitting by two structurally identical deep neural networks using the DQN method
Figure BDA0002435732420000047
A weight Experience Replay (Prioritized experiential Replay) method is used in training, namely, a scheduling Agent obtains a large penalty when scheduling is trapped into deadlock, the Replay probability of deadlock Experience can be improved in the early training stage by the weight Experience Replay, the scheduling Agent learns a strategy of deadlock avoidance as soon as possible, so that the improvement of scheduling performance is emphasized in the later training stage, the probability that the state containing β ≠ 0 is replayed is higher, so that the scheduling Agent is more sensitive to the change of task completion time, and mask zeta is introduced in the DQN process to avoid trial and error of invalid actions in training, so that the convergence speed is further improved.
Compared with the prior art and method, the method has the following advantages:
the invention constructs a time S3And (3) a depth map convolutional neural network of a PR network model structure. By constructing two convolution sublayers to respectively calculate the characteristic propagation from the library place to the transition and from the transition to the library place, the time S is assigned3PR networks identify the mining of deep implicit information in states. Compared with a fully connected neural network, time S3The graph convolution neural network of the PR network has less trainable weight, higher robustness and better convergence
The invention combines the graph convolution neural network to provide the timing S3A deep reinforcement learning optimization method for PR network dynamic scheduling. By timing S of the work flow3The dynamic scheduling problem of the PR network model is converted into a Markov decision model, the state, the action and the return in the digital twin scheduling are formally defined, and then the Markov decision model is solved by combining a graph convolution network and a deep Q value network learning method, so that the optimization of the digital twin scheduling performance is realized.
Drawings
FIG. 1 is a schematic diagram of the operation of the process of the present invention.
FIG. 2 is a schematic diagram of the first sublayer structural feature propagation calculation process in an embodiment of the present invention.
FIG. 3 is a schematic diagram of a convolutional layer structure in an embodiment of the present invention.
FIG. 4 is a schematic diagram of the partial structure and the operation flow of the chemiluminescence immunoassay analyzer for model verification in the embodiment of the invention.
FIG. 5 is a diagram of results of dynamic scheduling of a chemiluminescence immunoassay analyzer workflow according to an example model of the present invention.
Detailed Description
The present invention is further described below with reference to the accompanying drawings and workflow scheduling of a chemiluminescent immunoassay analyzer as specific examples:
the embodiment of the invention and the process thereof are as follows:
the function that a scheduling Agent based on deep reinforcement learning can play in a digital twin application is described in the figure 1.
S1, dividing the structure of an example and determining the flow;
as shown in fig. 2(a), the present invention is described by taking the workflow schedule of the chemiluminescence immunoassay analyzer as an example, and specifically includes the following steps:
combing the local structure of the chemiluminescence immunoassay analyzer: comprising three transport modules (T1, T2 and T3, transport times of 2, 3 and 4 respectively) and four manipulator modules (M1, M2, M3 and M4, action times of 8, 16, 10 and 10 respectively), wherein each transport module can transport only one sample strip at a time, and each manipulator module is dual-channel (i.e. can process two sample strips at the same time). This configuration is capable of handling three different types of samples (S1, S2, and S3) and interfaces with the input ports (I1, I2, and I3) and output ports (O1, O2, and O3) of each sample strip. The different transportation modules realize the transportation of the sample strips between the different operation modules and the input/output ports: t1 realizes transportation among I1, O1, M1, M2, M3 and M4, T2 realizes transportation among I2, O3, M2 and M3, and T3 realizes transportation among I3, O2, M1 and M4. As can be seen from the figure, the minimum completion times mms for each type of sample are 12, 29 and 29, respectively.
S2. structure timing S3A graph convolution neural network of the PR mesh;
as shown in FIG. 2(b), a timing S of the chemiluminescence immunoassay analyzer was constructed3PR mesh model, p1To p3Represents the input of samples S1, S2 and S3, respectively, p23To p29Respectively represent T1, M2, M1, T3, M4, M3 and T2. The operation time of each step in the operation flow is indicated in brackets after the name of the library in the figure, and the schedulable transition and the autonomous transition are respectively represented by a black solid square and a black slashed square.
Structure timing S3The specific implementation steps of the graph convolution neural network of the PR network are as follows:
s21, adding a nominal resource base with the identification constantly larger than zero for the autonomous transition to ensure that the time S is assigned3All transitions in the PR network have the same preamble node structure, so that a 2-unit shared convolution kernel can be constructed to calculate the weighted sum of the characteristics of the preamble operation library and the preamble resource library of each transition by referring to the traditional convolution layer,and taking the result as the characteristic f of the transitiont
Figure BDA0002435732420000051
Wherein the content of the first and second substances,
Figure RE-GDA0002528300940000052
Figure RE-GDA0002528300940000053
and 1 are d-dimensional feature vectors, w, of the preamble operating pool site, the preamble resource pool site, and the dummy resource pool site, respectivelyPAnd wRD × d dimension trainable weight matrix corresponding to 2 units, b d dimension trainable biastAlso a d-dimensional feature vector, and keeping the transition and library feature dimensions consistent in the first sub-layer is used for realizing the feature propagation calculation of the transition to library feature subsequently. The convolution in the first sublayer may be performed by time-stamping S3The input matrix I of the PR net is calculated as shown in fig. 4. First, transpose and rearrange the assigned time S3The input matrix I of PR network is used to obtain a new position matrix I' ∈ {0, 1}(2|T|)×(|P|+1)Where | T | and | P | are potentials for transition set and bank set, respectively, the odd rows of I' are locations for transition preamble operation banks and the even rows are locations for transition preamble resource banks. In each row of I', only the element at the corresponding position indexed by the bank is 1, and the others are 0. I' is the static parameter of the P2T sublayer. Then, the input of P2T sub-layer-library state feature matrix —
Figure RE-GDA0002528300940000061
By expanding a row to obtain
Figure RE-GDA0002528300940000062
All 1's in the extended row represent features of the dummy resource pool. Mixing I' and FP' matrix multiplication to obtain a feature matrix for convolution
Figure RE-GDA0002528300940000063
. Finally, F is achieved with reference to classical convolutional layersP"and a 2 × 1 × d × d dimension shared convolution kernel by making the convolution kernel step at F with 2P"and calculating the weighted sum to obtain the expected transition feature matrix
Figure RE-GDA00025283009400000611
FPAnd FTTogether as an input to the second sublayer.
S22, utilizing a directional weighted graph to carry out linear shift on an invariant filter
Figure BDA0002435732420000065
And limiting the order K of the filter to 1 so that the filtered features of the library only depend on the original features of the library and the original features of the preorder transition of the library, and constructing a second sublayer structure for feature propagation of the library. Following the theory of construction of linear shift invariant filter, the characteristic propagation formula of the transition to the library in the second sub-layer is defined as
Figure BDA0002435732420000066
h0And h1Is a trainable scalar weight of a first order filter,
Figure BDA0002435732420000067
is a normalized timing S3PR net output matrix, W is trainable weight of d × d 'dimension, original feature of d dimension is mapped to new feature of d' dimension, B is trainable bias.
S23, as shown in figure 3, taking the output of the first sublayer structure as the input of the second sublayer structure, and combining the first sublayer structure and the second sublayer structure to construct a timing S3PR mesh convolutional layer (PNC) using neural network fitting timing S3State of the PR mesh model. The convolution layer will be timed S3The input matrix and the output matrix of the PR network are taken as static parameters, the number of trainable weights is only related to the dimension of the input and output characteristics, and is related to the giving time S3The PR network has no relation with the scale, and can overcome the problem of weight explosion in the deep neural network construction.
S3, giving time S3The dynamic scheduling problem of the PR network is converted into a Markov decision model;
s31, utilizing quintuple
Figure BDA0002435732420000068
To describe the timing S3An interactive environment for PR mesh scheduling, in which,
Figure BDA0002435732420000069
the representation of the state space is represented by,
Figure BDA00024357324200000610
represents the motion space, phi represents the state transition rule, r represents the return function, and gamma represents the discount coefficient.
Figure BDA0002435732420000071
S4, solving the Markov decision model by using a Deep Q value network (Deep Q-network) reinforcement learning method, wherein the steps are as follows:
s41, defining a cost function
Figure BDA0002435732420000081
To evaluate the current state at time step tau
Figure BDA0002435732420000082
Selection actions
Figure BDA0002435732420000083
The value of (A) is obtained.
S42, the optimal cost function obeys Bellman Equation (Bellman Equation), and the Bellman Equation pair is utilized in the training process
Figure BDA0002435732420000084
Learning is performed in an iterative manner.
S43, fitting by using a DQN method through two deep neural networks with the same structure
Figure BDA0002435732420000085
During specific implementation, when scheduling is trapped in deadlock, a scheduling Agent obtains a large penalty, the playback probability of deadlock Experience can be improved in the early training stage through weight Experience playback, the scheduling Agent learns a strategy of deadlock avoidance as soon as possible, so that the improvement of scheduling performance is emphasized in the later training stage, the playback probability of a state containing β ≠ 0 is higher, and the scheduling Agent is more sensitive to the change of task completion time.
And S44, by introducing a mask zeta into the DQN, trial and error of invalid actions in training can be avoided, and the convergence speed is further improved.
Figure BDA0002435732420000086
Figure BDA0002435732420000091
S5. timing S for realizing algorithm 13PR network operating environment and scheduling Agent of algorithm 2 for verifying DQN method combined with PNCN3Performance on PR network dynamic scheduling;
FIG. 5 is a diagram illustrating the result of dynamic scheduling;
s51. the network comprises seven convolutional layers and one fully connected layer, PNC (12) -BN-L eakyRe L U-PNC (12) -BN-L0 eakyRe L1U-PNC (24) -BN-L2 eakyRe L3U-PNC (24) -BN-L4 eakyRe L5U-PNC (36) -BN-L6 eakyRe L7U-PNC (36) -BN-L8 eakyRe L U-PNC (12) -BN-L eakyRe L U-FC (18) -L initial, wherein BN represents batch standardization processing, L eakyRe L U represents leaky linear rectifying activator (L eakyRe L U);
s52, three reference methods are realized: FCFS +, D2WS and M L PQ, and compared to the method proposed by this patent (using the deep Q value network of the PNC layer, PNCQ for short).
The settings for scheduling Agent training in the experiment are as follows: training is started to use FCFS + preheating weight value to perform empirical return for 10,000 steps; trainingUsing the ∈ -greedy strategy, ∈ dropped linearly from 1.0 to 0.1 at 200,000 steps before training and remained at 0.1 in subsequent training steps, each training step performed a back propagation training of the neural network, the optimizer used the Adam algorithm with a learning rate of 0.0001, updated the target neural network in DQN every 1,000 steps, performed a performance assessment every 10,000 steps, scheduled 100 cycles with the current neural network and recorded average cycle return and deadlock rates, while using FCFS + and D2WS schedules these periods for comparison. This experiment recorded a training procedure that performed 3,000,000 steps, with the results shown in figure 5.
From the periodic return curve shown in fig. 5(a), it can be seen that the curve of PNCQ is entirely higher than M L PQ, which indicates that the convergence rate and scheduling performance of PNCQ are better than M L PQ, PNCQ can achieve FCFS + performance when training at about 500,000 steps, while M L PQ needs to exceed 1,300,000 steps, PNCQ approaches D at about 800,000 steps2The value curve in FIG. 5(c) and the loss curve in FIG. 5(D) also support that PNCQ has a faster convergence than M L PQ. the return statistics for the last 100 evaluation cycles are shown in FIG. 5 (e). the average cycle return that PNCQ can eventually reach is about 309.95, about 12.3% above FCFS + and about 7.0% M L PQ, slightly below D2WS about 1.3%.
Albeit D2WS achieves optimal scheduling performance, but its time complexity is high and is unacceptable in some real-time scenarios. The experiment counted the computation time for each method to schedule 100 cycles on a computer equipped with an intel i 53.4 ghz processor and 8G memory, as shown in the table below. As can be seen from the results, D2The slight performance advantage of WS over PNCQ is built on hundreds of times its computation time. The PNCQ has both better tuning performance and better computational performance.
TABLE 1 calculation time (unit: ms) for each scheduling method
Figure BDA0002435732420000101
The above examples are merely the tuning results of the present invention on the examples, but the specific implementation of the present invention is not limited to the examples. Any alternatives which have similar effects according to the principles and concepts of the invention should be considered as the protection scope of the invention.

Claims (5)

1. A digital twin virtual-real self-adaptive iterative optimization method for dynamic scheduling of product operation is characterized by comprising the following steps: the method comprises the following steps:
s1, establishing timing S for product digital twin operation process3A PR mesh model;
s2. structure timing S3A first sublayer structure of library-to-transition feature propagation in the PR mesh model;
s3. structure timing S3A second sublayer structure of characteristic propagation of the transition to the library in the PR network model;
s4, building a neural network by using the first sublayer structure and the second sublayer structure, and fitting the timing S by using the neural network3The state of the PR mesh model;
s5, time assignment S is to be based on3The dynamic scheduling problem of the product digital twin operation process of the PR network model is converted into a Markov decision model;
and S6, solving the Markov decision model established in the S5 by using a Deep Q value network (Deep Q-network) reinforcement learning method, and realizing digital twin virtual-real self-adaptive iterative optimization.
2. The digital twin virtual-real adaptive iterative optimization method based on deep reinforcement learning product job dynamic scheduling according to claim 1, characterized in that:
in step S2, a 2-unit shared convolution kernel is constructed to calculate a weighted sum of features of each of the transitional preamble operation libraries and features of the preamble resource library, and the weighted sum is used as a structure of a first sub-layer structure; in the implementation, a dummy resource pool with the identification constantly larger than zero is added for the autonomous transition, so that the timing S is given3All transitions in a PR network have the same preamble node structure.
3. The digital twin virtual-real adaptive iterative optimization method based on deep reinforcement learning product job dynamic scheduling according to claim 1, characterized in that:
in step S3, the second sub-layer structure is constructed according to the linear shift invariant filter, and in the specific implementation, the linear shift invariant filter on the directed weighted graph is used
Figure FDA0002435732410000011
Wherein F and
Figure FDA0002435732410000012
respectively the original signal and the filtered signal on the directed weighted graph, H is the filter, A is the adjacency matrix of the directed weighted graph, HkIs a filter parameter, K is the order of the filter, and the filter order K is limited to 1 so that the filtered features of the library depend only on the original features of the library itself and the original features of its preamble transitions.
4. The digital twin virtual-real adaptive iterative optimization method based on deep reinforcement learning product job dynamic scheduling according to claim 1, characterized in that:
in step S4, the output of the first sub-layer structure is used as the input of the second sub-layer structure, and the first sub-layer structure and the second sub-layer structure are combined to form a timing S3Convolution layer of PR net model, convolution layer will give time S3The input matrix and the output matrix of the PR net are used as static parameters.
5. The digital twin virtual-real adaptive iterative optimization method based on deep reinforcement learning product job dynamic scheduling according to claim 1, characterized in that: in step S6, the step of solving the markov decision model established in step S5 by using the deep Q-value network reinforcement learning method is as follows:
s61, establishing a cost function Q(s)τA) to evaluate the current state at time step τ
Figure FDA0002435732410000021
Selection actions
Figure FDA0002435732410000022
The value of (D); wherein Q represents a value parameter, sτIndicating the state at time tau, a indicates a certain action,
Figure FDA0002435732410000023
the state space is represented by a representation of,
Figure FDA0002435732410000024
representing a motion space;
s62, the optimal cost function obeys Bellman Equation (Bellman Equation), and the value function Q(s) is matched by using the Bellman Equation in the training processτA) learning by means of iteration;
s63, fitting a value function Q(s) through two deep neural networks with the same structure by using a deep Q value network reinforcement learning methodτAnd a), using a weight empirical Replay (priority Replay) method in training;
s64, by introducing the mask ξ into the deep Q-value network reinforcement learning method, trial and error of invalid actions in training can be avoided, and therefore the convergence speed is further improved.
CN202010251710.7A 2020-04-01 2020-04-01 Digital twin virtual-real self-adaptive iterative optimization method for dynamic scheduling of product operation Pending CN111445081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010251710.7A CN111445081A (en) 2020-04-01 2020-04-01 Digital twin virtual-real self-adaptive iterative optimization method for dynamic scheduling of product operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010251710.7A CN111445081A (en) 2020-04-01 2020-04-01 Digital twin virtual-real self-adaptive iterative optimization method for dynamic scheduling of product operation

Publications (1)

Publication Number Publication Date
CN111445081A true CN111445081A (en) 2020-07-24

Family

ID=71651016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010251710.7A Pending CN111445081A (en) 2020-04-01 2020-04-01 Digital twin virtual-real self-adaptive iterative optimization method for dynamic scheduling of product operation

Country Status (1)

Country Link
CN (1) CN111445081A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200493A (en) * 2020-11-02 2021-01-08 傲林科技有限公司 Digital twin model construction method and device
CN117406684A (en) * 2023-12-14 2024-01-16 华侨大学 Flexible flow shop scheduling method based on Petri network and fully-connected neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228314A (en) * 2016-08-11 2016-12-14 电子科技大学 The workflow schedule method of study is strengthened based on the degree of depth
CN108304795A (en) * 2018-01-29 2018-07-20 清华大学 Human skeleton Activity recognition method and device based on deeply study
CN110045608A (en) * 2019-04-02 2019-07-23 太原理工大学 Based on the twin mechanical equipment component structural dynamic state of parameters optimization method of number
US20190294975A1 (en) * 2018-03-21 2019-09-26 Swim.IT Inc Predicting using digital twins
CN110930016A (en) * 2019-11-19 2020-03-27 三峡大学 Cascade reservoir random optimization scheduling method based on deep Q learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228314A (en) * 2016-08-11 2016-12-14 电子科技大学 The workflow schedule method of study is strengthened based on the degree of depth
CN108304795A (en) * 2018-01-29 2018-07-20 清华大学 Human skeleton Activity recognition method and device based on deeply study
US20190294975A1 (en) * 2018-03-21 2019-09-26 Swim.IT Inc Predicting using digital twins
CN110045608A (en) * 2019-04-02 2019-07-23 太原理工大学 Based on the twin mechanical equipment component structural dynamic state of parameters optimization method of number
CN110930016A (en) * 2019-11-19 2020-03-27 三峡大学 Cascade reservoir random optimization scheduling method based on deep Q learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LIANG HU ET AL.: "Petri-net-based dynamic scheduling of flexible manufacturing system via deep reinforcement learning with graph convolutional network", 《JOURNAL OF MANUFACTURING SYSTEMS》 *
PENGFEI WU ET AL.: "Research on the Virtual Reality Synchronization of Workshop Digital Twin", 《2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC)》 *
薛晗: "S3PR网的同步与控制研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陶飞 等: "数字孪生及其应用探索", 《计算机集成制造系统》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200493A (en) * 2020-11-02 2021-01-08 傲林科技有限公司 Digital twin model construction method and device
CN117406684A (en) * 2023-12-14 2024-01-16 华侨大学 Flexible flow shop scheduling method based on Petri network and fully-connected neural network
CN117406684B (en) * 2023-12-14 2024-02-27 华侨大学 Flexible flow shop scheduling method based on Petri network and fully-connected neural network

Similar Documents

Publication Publication Date Title
Hu et al. Petri-net-based dynamic scheduling of flexible manufacturing system via deep reinforcement learning with graph convolutional network
CN110119467B (en) Project recommendation method, device, equipment and storage medium based on session
CN113053115B (en) Traffic prediction method based on multi-scale graph convolution network model
Li et al. An effective hybrid genetic algorithm and tabu search for flexible job shop scheduling problem
CN104662526B (en) Apparatus and method for efficiently updating spiking neuron network
CN114915630B (en) Task allocation method, network training method and device based on Internet of Things equipment
CN112328914A (en) Task allocation method based on space-time crowdsourcing worker behavior prediction
CN115186821B (en) Core particle-oriented neural network inference overhead estimation method and device and electronic equipment
CN112631717A (en) Network service function chain dynamic deployment system and method based on asynchronous reinforcement learning
CN111445081A (en) Digital twin virtual-real self-adaptive iterative optimization method for dynamic scheduling of product operation
Xia et al. Learning sparse relational transition models
CN115828831B (en) Multi-core-chip operator placement strategy generation method based on deep reinforcement learning
CN113469891A (en) Neural network architecture searching method, training method and image completion method
CN113537580A (en) Public transport passenger flow prediction method and system based on adaptive graph learning
Samsudin et al. A hybrid least squares support vector machines and GMDH approach for river flow forecasting
Jain et al. Queueing network modelling of flexible manufacturing system using mean value analysis
CN117195976A (en) Traffic flow prediction method and system based on layered attention
CN116975686A (en) Method for training student model, behavior prediction method and device
CN113537613B (en) Temporal network prediction method for die body perception
CN115544307A (en) Directed graph data feature extraction and expression method and system based on incidence matrix
Chang et al. A fuzzy neural network for the flow time estimation in a semiconductor manufacturing factory
CN116560731A (en) Data processing method and related device thereof
CN109978143B (en) Stack type self-encoder based on SIMD architecture and encoding method
CN115001978B (en) Cloud tenant virtual network intelligent mapping method based on reinforcement learning model
CN116991564B (en) Operator internal parallel acceleration method for heterogeneous dual-core MCU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200724