CN116485133A - Intelligent shift setting method and device, computer equipment and storage medium - Google Patents
Intelligent shift setting method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN116485133A CN116485133A CN202310429831.XA CN202310429831A CN116485133A CN 116485133 A CN116485133 A CN 116485133A CN 202310429831 A CN202310429831 A CN 202310429831A CN 116485133 A CN116485133 A CN 116485133A
- Authority
- CN
- China
- Prior art keywords
- target
- action
- node state
- modeling model
- next node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000009471 action Effects 0.000 claims abstract description 117
- 239000011159 matrix material Substances 0.000 claims abstract description 56
- 238000012549 training Methods 0.000 claims abstract description 28
- 230000015654 memory Effects 0.000 claims description 25
- 230000006870 function Effects 0.000 claims description 23
- 238000012360 testing method Methods 0.000 claims description 20
- 238000011156 evaluation Methods 0.000 claims description 12
- 238000012795 verification Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 2
- 238000010801 machine learning Methods 0.000 abstract description 2
- 230000000875 corresponding effect Effects 0.000 description 36
- 230000008569 process Effects 0.000 description 10
- 230000002787 reinforcement Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06316—Sequencing of tasks or work
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Operations Research (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Game Theory and Decision Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Educational Administration (AREA)
- Medical Informatics (AREA)
- Development Economics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The disclosure relates to the technical field of machine learning, and discloses a setting method, a device, computer equipment and a storage medium for intelligent scheduling, wherein the method comprises the following steps: acquiring a training sample set, wherein the training sample set contains object information of target objects, and each target object carries a corresponding target shift label; training the initial modeling model according to the object information and the target shift label to obtain a target modeling model; under the condition that the frequency of changing the scheduling information is larger than a first threshold value or indication information triggering a specific scene is received, acquiring a current node state of a target modeling model, a first action and a next node state determined by the first action, wherein the first action is any action selected from all actions corresponding to the current node state; and outputting the target action corresponding to the next node state in the return matrix under the condition that the next node state is the target state. The problems of easy error and low efficiency of the current scheduling are solved.
Description
Technical Field
The disclosure relates to the field of machine learning, and in particular relates to a setting method and device for intelligent scheduling, computer equipment and a storage medium.
Background
In the conventional shift arrangement, the staff's work time is manually arranged by a manager or a shift man, and the following problems easily occur:
(1) Manual scheduling is easy to cause scheduling errors, and staff or supervisors can cause human negligence or calculation errors during scheduling, so that scheduling is inaccurate or unfair.
(2) The scheduling is not flexible enough, and the traditional scheduling mode can lead the working time of staff to be too fixed, thereby being unfavorable for personal demands and living arrangement of the staff.
(3) Manual scheduling is inefficient and requires a significant amount of time and effort by the operator or supervisor, which is wasteful for the enterprise.
(4) The scheduling is unfair, and the traditional scheduling mode may have the situation that the supervisor favors some staff, so that other staff are not satisfied.
(5) The shift staff is prone to fatigue, and long-time manual shifts may cause the shift staff to feel tired and stress.
Therefore, the existing scheduling mode has the problems of easy error, inflexible scheduling time, unfair scheduling and the like.
Disclosure of Invention
In view of the above, the present disclosure provides a method, an apparatus, a computer device, and a storage medium for setting an intelligent scheduling, so as to solve the problems of easy error, inflexible scheduling time, and unfair scheduling in the existing scheduling manner.
In a first aspect, the present disclosure provides a method for setting an intelligent shift, where the method includes:
acquiring a training sample set, wherein the training sample set contains object information of target objects, and each target object carries a corresponding target shift label;
training the initial modeling model according to the object information and the target shift label to obtain a target modeling model;
under the condition that the frequency of changing the scheduling information is larger than a first threshold value or indication information triggering a specific scene is received, acquiring a current node state of a target modeling model, a first action and a next node state determined by the first action, wherein the first action is any action selected from all actions corresponding to the current node state;
and outputting the target action corresponding to the next node state in the return matrix under the condition that the next node state is the target state, wherein the target action corresponds to the shift setting.
According to the method and the device, a target modeling model is generated according to a training sample set, manual scheduling can be avoided according to the target modeling model, working efficiency is improved, error rate is reduced, meanwhile, when special conditions triggering a specific scene or conditions of changing scheduling information are high in frequency, embedding of reinforcement learning algorithm is used, according to the current node state, first action and the next node state determined by the first action of the target modeling model, the target action is output, further more requirement problems are considered, more humanized scheduling setting results of scheduling are obtained, and further the problems that scheduling is prone to error, scheduling time is inflexible and scheduling is unfair in an existing scheduling mode are solved.
In an alternative embodiment, training the initial modeling model according to the object information and the target shift label to obtain a target modeling model includes:
inputting the object information into an initial modeling model, and outputting probability values of each class-scheduling classification label corresponding to the object information;
and adjusting model parameters of the initial modeling model according to the probability value until the probability value is greater than or equal to a second threshold value, stopping adjusting the model parameters until the shift classification label of the object information corresponds to the target shift label, and obtaining the target modeling model.
In the embodiment of the disclosure, the model parameters are adjusted according to the probability value of each class classification label output by the initial modeling model, so that the target modeling model is obtained, class information can be obtained based on the target modeling model, and the problems of easy error and low efficiency of manual class scheduling are avoided.
In an alternative embodiment, the method further comprises, prior to entering the object information into the initial modeling model:
preprocessing each object information to obtain target information with a unified data format;
target information is input into the initial modeling model.
In the embodiment of the disclosure, the generation of the subsequent data modeling is facilitated by processing the object information in a unified data format.
In an alternative embodiment, after stopping the adjustment of the model parameters, the method further comprises:
acquiring an intermediate modeling model and a test sample set;
inputting the test information in the test sample set into an intermediate modeling model to obtain a classification result;
verifying the classification result by using a target verification algorithm to obtain an evaluation value of the intermediate modeling model;
in the case where the evaluation value is greater than the third threshold value, the intermediate modeling model is set as the target modeling model.
In the embodiment of the disclosure, the test sample set is used as a parameter of the test intermediate modeling model, and then the intermediate modeling model is verified for the classification result output by the test sample set by combining the target verification algorithm, so that a final target modeling model is determined, and the scheduling information output by the generated target modeling model is more accurate.
In an alternative embodiment, after obtaining the current node state of the target modeling model, the first action, and the next node state determined by the first action, the method further comprises:
setting the next node state as the current node state under the condition that the next node state is not the target state, and updating the return matrix by utilizing a matrix updating function;
and acquiring the current node state, a second action and a next node state determined by the second action, and judging whether the next node state is a target state, wherein the second action is the action with the largest numerical value selected from the updated return matrix.
In the embodiment of the disclosure, when it is determined that the next node state determined by the first action of the target modeling model is not the target state, updating the report matrix to obtain the second action and the next node state determined by the second action, and then judging whether the next node state is the target state, so as to cycle, and further obtain the corresponding target action when the node state is the target state, and obtain the scheduling information meeting the personalized requirement.
In an alternative embodiment, before updating the report matrix with the matrix updating function, the method further comprises:
acquiring a maximum value and a minimum value of interval nodes of an interval in which all actions are located in the next node state in the return matrix;
and generating a matrix updating function according to the maximum value of the section node and the minimum value of the section node.
In the embodiment of the disclosure, a matrix updating function is generated according to the maximum value or the minimum value in all actions in the next node state in the reward matrix, and the matrix updating function is used for updating the reward value.
In an alternative embodiment, the method further comprises:
generating a callable interface;
the target modeling model is invoked based on the callable interface.
In the embodiment of the disclosure, the model generation can be called for an interface, so that subsequent calling and use are facilitated.
In a second aspect, the present disclosure provides an intelligent shift setting apparatus, including:
the first acquisition module is used for acquiring a training sample set, wherein the training sample set contains object information of target objects, and each target object carries a corresponding target shift label;
the first obtaining module is used for training the initial modeling model according to the object information and the target shift label to obtain a target modeling model;
the second acquisition module is used for acquiring the current node state of the target modeling model, a first action and a next node state determined by the first action under the condition that the frequency of changing the scheduling information is larger than a first threshold value or the indication information triggering a specific scene is received, wherein the first action is any action selected from all actions corresponding to the current node state;
and the output module is used for outputting the target action corresponding to the next node state in the return matrix under the condition that the next node state is the target state, wherein the target action corresponds to the scheduling setting.
In a third aspect, the present disclosure provides a computer device comprising: the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions to perform the method of the first aspect or any implementation manner corresponding to the first aspect.
In a fourth aspect, the present disclosure provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of the first aspect or any of its corresponding embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the prior art, the drawings that are required in the detailed description or the prior art will be briefly described, it will be apparent that the drawings in the following description are some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flow diagram of a method of setting an intelligent shift according to an embodiment of the present disclosure;
FIG. 2 is an overall flow diagram of a method of setting an intelligent shift according to an embodiment of the present disclosure;
FIG. 3 is a process flow diagram of model reinforcement learning in the setting of an intelligent shift according to an embodiment of the present disclosure;
FIG. 4 is a block diagram of a setup device for intelligent scheduling in accordance with an embodiment of the present disclosure;
fig. 5 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
Currently, most workers perform scheduling work through some scheduling systems. In the traditional scheduling mode, staff work time is manually scheduled by a supervisor or a scheduling staff, and the method is easy to cause scheduling errors and is low in efficiency. In order to solve the above problems, an embodiment of the present disclosure provides a setting method for intelligent scheduling. Before setting forth the embodiments of the present disclosure, in order to make the description of the embodiments more consistent with reading understanding habits, the meanings of "first threshold", "second threshold", "third threshold" related to the following embodiments are explained first. The "first threshold" is a numerical critical value corresponding to when the scheduling information is frequently changed, such as 10 times, 20 times, etc., and if the value is greater than the numerical value, it indicates that the change is very frequent. The "second threshold" refers to a threshold set when classifying a shift into a target shift (i.e., the current shift classification tag is a target shift tag), such as 0.9, and if the current obtained classification probability value is greater than the second threshold, the current shift classification tag is set as the target shift tag. The "third threshold" is a threshold value corresponding to the case where the model is evaluated as an excellent model, and if the evaluation value of the obtained modeling model is greater than the third threshold value, it is indicated that the current modeling model is an excellent model.
Specifically, as shown in fig. 1, the method may be applied to a background server side, and the method includes:
step S101, a training sample set is obtained, wherein the training sample set contains object information of target objects, and each target object carries a corresponding target shift label.
Optionally, in the embodiment of the present disclosure, some training sample sets may be obtained, where the sample sets include object information (such as personal information, cleaning area information, attendance requirement information, etc.) of a plurality of target objects (such as employee 1, employee 2, etc.), and corresponding historical shift information of each target object is obtained at the same time, so as to generate a corresponding target shift label. It will be appreciated that these target shift tags characterize shift information that is optimal for each target object.
And step S102, training the initial modeling model according to the object information and the target shift label to obtain a target modeling model.
Optionally, the object information of each target object is input into an initial modeling model (which may be a histogram gradient model, (Histogram Gradient Boost, HGB)), and then the output classification result is compared with a target shift label, so that the initial modeling model is trained, and a target modeling model is generated.
Step S103, under the condition that the frequency of changing the scheduling information is larger than a first threshold value or indication information triggering a specific scene is received, the current node state, a first action and a next node state determined by the first action of the target modeling model are obtained, wherein the first action is any action selected from all actions corresponding to the current node state.
Optionally, in some scenarios, for example, when the shift information is manually changed for multiple times (for example, the number of changes is greater than a first threshold (for example, 10 times)) or a special situation occurs to an employee, so that a specific scenario (for example, leaving a job or asking for a leave) is triggered, the embodiment of the disclosure introduces a reinforcement learning algorithm to obtain the current node state s of the target modeling model t A first action a and a next node state determined by the first actions t+1 . Wherein the first action a is to use the policy from the current node state s t Any action selected in the action operation set.
Step S104, outputting the target action corresponding to the next node state in the return matrix when the next node state is the target state, wherein the target action corresponds to the scheduling setting.
Alternatively, in the embodiment of the present disclosure, a shift result facing a special case is provided in advance, and a shift state corresponding to this shift result is set as a target state.
According to the embodiment of the disclosure, the optimal Q value is obtained by using a reinforcement learning algorithm, so that the node meeting the target state is obtained, the corresponding action is obtained, and the scheduling information setting corresponding to the action is obtained.
Further, under the condition that the next node state is the target state, outputting the target action corresponding to the next node state in the return matrix, namely the Q matrix, so that the scheduling information meeting the personalized requirements and special scenes can be obtained.
In addition, as shown in fig. 2, fig. 2 is an overall flow diagram of a setting method of intelligent scheduling according to an embodiment of the disclosure, including the following steps:
step 1: acquiring input data;
step 2: preprocessing data;
step 3: constructing an input-output initial modeling model;
step 4: inputting the preprocessed data into an initial modeling model, and training the initial modeling model to obtain a target modeling model;
step 5: optimizing the target modeling model by using a reinforcement learning algorithm;
step 6: and outputting a prediction result.
According to the method and the device, a target modeling model is generated according to a training sample set, manual scheduling can be avoided according to the target modeling model, working efficiency is improved, error rate is reduced, meanwhile, when special conditions triggering a specific scene or conditions of changing scheduling information are high in frequency, embedding of reinforcement learning algorithm is used, according to the current node state, first action and the next node state determined by the first action of the target modeling model, the target action is output, further more requirement problems are considered, more humanized scheduling setting results of scheduling are obtained, and further the problems that scheduling is prone to error, scheduling time is inflexible and scheduling is unfair in an existing scheduling mode are solved.
In some alternative embodiments, training the initial modeling model according to the object information and the target shift label to obtain a target modeling model includes:
inputting the object information into an initial modeling model, and outputting probability values of each class-scheduling classification label corresponding to the object information;
and adjusting model parameters of the initial modeling model according to the probability value until the probability value is greater than or equal to a second threshold value, stopping adjusting the model parameters until the shift classification label of the object information corresponds to the target shift label, and obtaining the target modeling model.
Optionally, in the embodiment of the present disclosure, the object information of each target object is input into the initial modeling model, where a probability value corresponding to each object information and belonging to each class label is obtained.
Because the scheduling information corresponding to each piece of object information is set at first, that is, each piece of object information carries a target scheduling label, at this time, according to the probability value of each piece of object information output by the initial modeling model, the model parameters of the initial modeling model are adjusted until the probability value is greater than or equal to a second threshold (such as 90%), the scheduling classification label of the piece of object information corresponds to the target scheduling label, at this time, the adjustment of the model parameters is stopped, and the target modeling model can be obtained.
In the embodiment of the disclosure, the model parameters are adjusted according to the probability value of each class classification label output by the initial modeling model, so that the target modeling model is obtained, class information can be obtained based on the target modeling model, and the problems of easy error and low efficiency of manual class scheduling are avoided.
In some alternative embodiments, prior to entering the object information into the initial modeling model, the method further comprises:
preprocessing each object information to obtain target information with a unified data format;
target information is input into the initial modeling model.
Optionally, in the embodiment of the present disclosure, since a large amount of text information is included in the object information, the text information data formats are inconsistent, in order to ensure convenience in data processing of the initial modeling model, each piece of object information (such as text information) is preprocessed before the object information is input into the initial modeling model, where preprocessing may be to use one-hot encoding to perform unified data format, thereby obtaining target information in unified format, and then inputting the target information into the initial modeling model.
In the embodiment of the disclosure, the generation of the subsequent data modeling is facilitated by processing the object information in a unified data format.
In some alternative embodiments, after stopping the adjustment of the model parameters, the method further comprises:
obtaining an intermediate modeling model and a test sample set which are obtained after model parameter adjustment;
inputting the test information in the test sample set into an intermediate modeling model to obtain a classification result;
verifying the classification result by using a target verification algorithm to obtain an evaluation value of the intermediate modeling model;
in the case where the evaluation value is greater than the third threshold value, the intermediate modeling model is set as the target modeling model.
Optionally, in the embodiment of the present disclosure, in order to ensure accuracy of model training, a test sample set and a verification algorithm are combined to jointly judge classification effects of the model.
Specifically, after the intermediate modeling model obtained after the model parameters of the initial modeling model are adjusted is obtained, some obtained test sample sets can be input into the intermediate modeling model to obtain an output classification result, and then the classification result is verified based on a target verification algorithm (such as a cross verification 10-fold method) to obtain an evaluation value of the intermediate modeling model.
The embodiment of the disclosure is provided with a third threshold value in advance as a standard value for judging whether the model is good or not, at the moment, the evaluation value is compared with the third threshold value, when the evaluation value is larger than the third threshold value, the intermediate modeling model is completely trained, and at the moment, the intermediate modeling model is set as a final target modeling model.
In the embodiment of the disclosure, the test sample set is used as a parameter of the test intermediate modeling model, and then the intermediate modeling model is verified for the classification result output by the test sample set by combining the target verification algorithm, so that a final target modeling model is determined, and the scheduling information output by the generated target modeling model is more accurate.
In some alternative embodiments, after obtaining the current node state of the target modeling model, the first action, and the next node state determined by the first action, the method further comprises:
setting the next node state as the current node state under the condition that the next node state is not the target state, and updating the return matrix by utilizing a matrix updating function;
and acquiring the current node state, a second action and a next node state determined by the second action, and judging whether the next node state is a target state, wherein the second action is the action with the largest numerical value selected from the updated return matrix.
Optionally, in the embodiment of the present disclosure, it is stated that when the next node state is not the target state, it is required to set the next node state to the current node state, and update the reporting matrix with the matrix update function. The initial value of the reporting matrix is a zero value matrix, the number of rows is equal to the number of states, and the number of columns is equal to the number of actions. As shown in equation (1), the Q matrix assumes n states in the state set and m actions in the action set.
Wherein s is t Is a state node, a t Is the action corresponding to the state node.
And updating the Q matrix to obtain the Q value. Wherein the calculation of the Q value is a comprehensive consideration of the node status, the selected action, and the rewards value that the agent (which may be the target object here) obtains through the environment.
After the next node state is set as the current node state, selecting the maximum value according to the Q value to obtain a corresponding action, taking the action as a second action, determining the next node state according to the second action, judging whether the next node state is a target state or not, and executing circularly.
In the embodiment of the disclosure, when it is determined that the next node state determined by the first action of the target modeling model is not the target state, updating the report matrix to obtain the second action and the next node state determined by the second action, and then judging whether the next node state is the target state, so as to cycle, and further obtain the corresponding target action when the node state is the target state, and obtain the scheduling information meeting the personalized requirement.
In some alternative embodiments, before updating the reward matrix with the matrix update function, the method further comprises:
acquiring a maximum value and a minimum value of interval nodes of intervals in which all actions are performed in a next node state in the return matrix;
and generating a matrix updating function according to the maximum value and the minimum value of the interval nodes.
Optionally, before updating the report matrix with the matrix updating function, the matrix updating function needs to be determined, and at this time, the maximum value and the minimum value of the interval nodes in the interval where all actions are located in the next node state in the report matrix are obtained. Equation (2) and equation (3) shown below, a matrix update function is generated according to equation (2) and equation (3).
Q(s t ,a t )←(1-α)Q(s t ,a t )+α(r t+1 +γ min Q(s t+1 ,a t+1 ))(2)
Q(s t ,a t )←(1-α)Q(s t ,a t )+α(r t+1 +γ max Q(s t+1 ,a t+1 ))(3)
Wherein, gamma is the discount rate, gamma max Q(s t+1 ,a t+1 ) Is the influence parameter for obtaining the maximum value of the interval node, gamma min Q(s t+1 ,a t+1 ) Is the influence parameter for obtaining the minimum value of the interval node, s t A is the node of the current state t Action corresponding to current state node s t+1 A is the next state node t+1 For the action corresponding to the next state node, α is the learning rate, and r is the return value.
In the embodiment of the disclosure, a matrix updating function is generated according to the maximum value or the minimum value in all actions in the next node state in the reward matrix, and the matrix updating function is used for updating the reward value.
Based on the above embodiments, as an alternative embodiment, as shown in fig. 3, fig. 3 is a schematic process flow diagram of model reinforcement learning in setting of intelligent scheduling according to an embodiment of the present disclosure. The method comprises the following steps:
the agent obtains the state s of time step t in the iteration process of the HGB model t (HGB). Next, a corresponding action a is performed according to the determined action selection policy t (HGB). In the model, the splitting rules of the HGB are adjusted and updated by actions taken by the proxy. Third, genetic manipulation is performed again, and the state of HGB is converted into s t+1 (HGB) and returns feedback information to the agent. Finally, the proxy will take action a on the HGB t+1 The learning process is recorded by the agent based on the status actions experienced and the feedback received. And simultaneously, performing estimation calculation by using a cost function, and updating the Q value under the action of the corresponding state in the Q value table. After k iterations, the reinforcement process is activated at the subsequent iteration time step t+k, and the optimization is performed according to the existing state of the past learning experience.
If the prize obtained is positive, for action a t (HGB) selection will be enhanced; if the prize obtained is negative, for a t The choice of (HGB) will be correspondingly impaired. The process of continuously acquiring the state, executing the action, acquiring the feedback and adjusting the strategy is an enhanced process, and can be well adapted to the dynamic iteration process of the HGB.
In some alternative embodiments, the method further comprises:
generating a callable interface;
the target modeling model is invoked based on the callable interface.
Alternatively, in the embodiment of the present disclosure, an API interface may be formed on the target modeling model for calling, or the target modeling model added with reinforcement learning may be used as a comprehensive model, and an API interface may be formed on the comprehensive model for calling.
In the embodiment of the disclosure, the model generation can be called for an interface, so that subsequent calling and use are facilitated.
In this embodiment, an intelligent scheduling setting device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The embodiment provides a setting device for intelligent scheduling, as shown in fig. 4, including:
a first obtaining module 401, configured to obtain a training sample set, where the training sample set includes object information of target objects, and each target object carries a corresponding target shift label;
a first obtaining module 402, configured to train the initial modeling model according to the object information and the target shift label, to obtain a target modeling model;
a second obtaining module 403, configured to obtain, when it is determined that the frequency of occurrence of the change of the shift scheduling information is greater than a first threshold or the indication information triggering the specific scene is received, a current node state of the target modeling model, a first action, and a next node state determined by the first action, where the first action is any action selected from all actions corresponding to the current node state;
and the output module 404 is configured to output, when the next node state is the target state, a target action corresponding to the next node state in the report matrix, where the target action corresponds to a shift setting.
In some alternative embodiments, the first obtaining module 402 includes:
the output unit is used for inputting the object information into the initial modeling model and outputting the probability value of each class-scheduling classification label corresponding to the object information;
and the obtaining unit is used for adjusting the model parameters of the initial modeling model according to the probability value until the probability value is greater than or equal to the second threshold value, and stopping adjusting the model parameters to obtain the target modeling model if the class classification label of the object information corresponds to the target class classification label.
In some alternative embodiments, the apparatus further comprises:
the second obtaining module is used for preprocessing each piece of object information before inputting the object information into the initial modeling model to obtain target information with a unified data format;
and the input module is used for inputting the target information into the initial modeling model.
In some alternative embodiments, the apparatus further comprises:
the third acquisition module is used for acquiring an intermediate modeling model and a test sample set which are obtained after the model parameters are adjusted after the adjustment of the model parameters is stopped;
the third obtaining module is used for inputting the test information in the test sample set into the intermediate modeling model to obtain a classification result;
the fourth obtaining module is used for verifying the classification result by utilizing a target verification algorithm to obtain an evaluation value of the intermediate modeling model;
and the setting module is used for setting the intermediate modeling model as the target modeling model in the case that the evaluation value is larger than the third threshold value.
In some alternative embodiments, the apparatus further comprises:
the updating module is used for setting the next node state as the current node state and updating the return matrix by utilizing a matrix updating function under the condition that the next node state is not the target state after the current node state, the first action and the next node state determined by the first action of the target modeling model are acquired;
and a fourth obtaining module, configured to obtain a current node state, a second action, and a next node state determined by the second action, and determine whether the next node state is a target state, where the second action is an action with a maximum value selected from the updated report matrix.
In some alternative embodiments, the apparatus further comprises:
the fifth acquisition module is used for acquiring the interval node maximum value and the interval node minimum value of the interval where all actions are located in the next node state in the return matrix before updating the return matrix by using the matrix updating function;
the first generation module is used for generating a matrix updating function according to the maximum value of the interval node and the minimum value of the interval node.
In some alternative embodiments, the apparatus further comprises:
the second generation module is used for generating a callable interface;
and the calling module is used for calling the target modeling model based on the callable interface.
The intelligent scheduling arrangement of the present embodiment is presented in the form of functional units, where the units refer to ASIC circuits, processors and memories executing one or more software or firmware programs, and/or other devices that provide the functionality described above.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The embodiment of the disclosure also provides a computer device, which is provided with the intelligent scheduling setting device shown in the figure 4.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a computer device according to an alternative embodiment of the disclosure, as shown in fig. 5, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 5.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown to implement the above embodiments.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the computer device of the presentation of a sort of applet landing page, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The presently disclosed embodiments also provide a computer readable storage medium, and the methods described above according to the presently disclosed embodiments may be implemented in hardware, firmware, or as recordable storage medium, or as computer code downloaded over a network that is originally stored in a remote storage medium or a non-transitory machine-readable storage medium and is to be stored in a local storage medium, such that the methods described herein may be stored on such software processes on a storage medium using a general purpose computer, special purpose processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Although embodiments of the present disclosure have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the disclosure, and such modifications and variations are within the scope defined by the appended claims.
Claims (10)
1. The setting method of the intelligent scheduling is characterized by comprising the following steps:
acquiring a training sample set, wherein the training sample set contains object information of target objects, and each target object carries a corresponding target shift label;
training an initial modeling model according to the object information and the target shift label to obtain a target modeling model;
under the condition that the frequency of changing scheduling information is larger than a first threshold value or indication information triggering a specific scene is received, acquiring a current node state of the target modeling model, a first action and a next node state determined by the first action, wherein the first action is any action selected from all actions corresponding to the current node state;
and outputting the target action corresponding to the next node state in the return matrix under the condition that the next node state is the target state, wherein the target action corresponds to the scheduling setting.
2. The method of claim 1, wherein training the initial modeling model according to the object information and the target shift tag to obtain a target modeling model comprises:
inputting the object information into an initial modeling model, and outputting probability values of each class-scheduling classification label corresponding to the object information;
and adjusting model parameters of the initial modeling model according to the probability value, and stopping adjusting the model parameters until the probability value is greater than or equal to a second threshold value and the shift classification label of the object information corresponds to the target shift label, so as to obtain the target modeling model.
3. The method of claim 1, wherein prior to said entering said object information into an initial modeling model, said method further comprises:
preprocessing each piece of object information to obtain target information with a unified data format;
inputting the target information into the initial modeling model.
4. The method of claim 2, wherein after the ceasing the adjustment of the model parameter, the method further comprises:
obtaining an intermediate modeling model and a test sample set which are obtained after model parameter adjustment;
inputting the test information in the test sample set into the intermediate modeling model to obtain a classification result;
verifying the classification result by using a target verification algorithm to obtain an evaluation value of the intermediate modeling model;
and setting the intermediate modeling model as the target modeling model in the case that the evaluation value is greater than a third threshold value.
5. The method of claim 1, wherein after the obtaining the current node state of the target modeling model, a first action, and a next node state determined by the first action, the method further comprises:
setting the next node state as the current node state under the condition that the next node state is not the target state, and updating the return matrix by using a matrix updating function;
and acquiring the current node state, a second action and a next node state determined by the second action, and judging whether the next node state is the target state, wherein the second action is the action with the largest numerical value selected from the updated return matrix.
6. The method of claim 5, wherein prior to updating the reporting matrix with a matrix update function, the method further comprises:
acquiring a maximum value and a minimum value of interval nodes of an interval in which all actions are located in the next node state in the return matrix;
and generating the matrix updating function according to the interval node maximum value and the interval node minimum value.
7. The method according to any one of claims 1 to 6, further comprising:
generating a callable interface;
and calling the target modeling model based on the callable interface.
8. An intelligent shift setting device, which is characterized in that the device comprises:
the first acquisition module is used for acquiring a training sample set, wherein the training sample set contains object information of target objects, and each target object carries a corresponding target shift label;
the first obtaining module is used for training the initial modeling model according to the object information and the target shift label to obtain a target modeling model;
the second acquisition module is used for acquiring the current node state of the target modeling model, a first action and a next node state determined by the first action under the condition that the frequency of changing scheduling information is larger than a first threshold value or indication information triggering a specific scene is received, wherein the first action is any action selected from all actions corresponding to the current node state;
and the output module is used for outputting the target action corresponding to the next node state in the return matrix under the condition that the next node state is the target state, wherein the target action corresponds to the scheduling setting.
9. A computer device, comprising:
a memory and a processor in communication with each other, the memory having stored therein computer instructions which, upon execution, cause the processor to perform the method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310429831.XA CN116485133A (en) | 2023-04-20 | 2023-04-20 | Intelligent shift setting method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310429831.XA CN116485133A (en) | 2023-04-20 | 2023-04-20 | Intelligent shift setting method and device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116485133A true CN116485133A (en) | 2023-07-25 |
Family
ID=87211367
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310429831.XA Pending CN116485133A (en) | 2023-04-20 | 2023-04-20 | Intelligent shift setting method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116485133A (en) |
-
2023
- 2023-04-20 CN CN202310429831.XA patent/CN116485133A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10565442B2 (en) | Picture recognition method and apparatus, computer device and computer- readable medium | |
US11176418B2 (en) | Model test methods and apparatuses | |
Kell et al. | Evaluation of the prediction skill of stock assessment using hindcasting | |
CN111401722B (en) | Intelligent decision method and intelligent decision system | |
AU2016304571A1 (en) | Model integration tool | |
EP2273431A1 (en) | Model determination system | |
CN108846419A (en) | Single page high load image-recognizing method, device, computer equipment and storage medium | |
CN111008707A (en) | Automatic modeling method and device and electronic equipment | |
CN112215293A (en) | Plant disease and insect pest identification method and device and computer equipment | |
CN113407327A (en) | Modeling task and data analysis method, device, electronic equipment and system | |
CN114355793A (en) | Training method and device of automatic driving planning model for vehicle simulation evaluation | |
CN111949502A (en) | Database early warning method and device, computing equipment and medium | |
CN113254153B (en) | Method and device for processing flow task, computer equipment and storage medium | |
Kell et al. | The quantification and presentation of risk | |
CN116485133A (en) | Intelligent shift setting method and device, computer equipment and storage medium | |
CN113505895A (en) | Machine learning engine service system, model training method and configuration method | |
CN109725785A (en) | Task execution situation method for tracing, device, equipment and readable storage medium storing program for executing | |
US20210271925A1 (en) | Contact Center Call Volume Prediction | |
CN113361380B (en) | Human body key point detection model training method, detection method and device | |
CN113361381B (en) | Human body key point detection model training method, detection method and device | |
CN112380204B (en) | Data quality evaluation method and device | |
CN111949271B (en) | Method, transaction system, device and storage medium for customizing transaction policy | |
JP2011198300A (en) | Process improvement measure evaluation device and method | |
CN109409427A (en) | A kind of key detecting method and device | |
CN116483983B (en) | Method and related equipment for generating emotion change quantity of virtual character |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |