CN116679615A - Optimization method and device of numerical control machining process, terminal equipment and storage medium - Google Patents

Optimization method and device of numerical control machining process, terminal equipment and storage medium Download PDF

Info

Publication number
CN116679615A
CN116679615A CN202310968969.7A CN202310968969A CN116679615A CN 116679615 A CN116679615 A CN 116679615A CN 202310968969 A CN202310968969 A CN 202310968969A CN 116679615 A CN116679615 A CN 116679615A
Authority
CN
China
Prior art keywords
expert database
tolerance threshold
processing
similarity
optimized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310968969.7A
Other languages
Chinese (zh)
Other versions
CN116679615B (en
Inventor
谭勇
杨之乐
肖溱鸽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Hangmai CNC Software Shenzhen Co Ltd
Original Assignee
Zhongke Hangmai CNC Software Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Hangmai CNC Software Shenzhen Co Ltd filed Critical Zhongke Hangmai CNC Software Shenzhen Co Ltd
Priority to CN202310968969.7A priority Critical patent/CN116679615B/en
Publication of CN116679615A publication Critical patent/CN116679615A/en
Application granted granted Critical
Publication of CN116679615B publication Critical patent/CN116679615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/18Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form
    • G05B19/4093Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form characterised by part programming, e.g. entry of geometrical information as taken from a technical drawing, combining this with machining and material information to obtain control information, named part programme, for the NC machine
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32306Rules to make scheduling decisions

Abstract

The invention discloses a method, a device, a terminal device and a storage medium for optimizing a numerical control machining process, which are applied to the technical field of intelligent manufacturing and comprise the following steps: constructing a first expert database according to the acquired processing behavior data and state environment data, performing behavior cloning on the first expert database, and performing disturbance change on processing parameters of the first expert database after behavior cloning to obtain a second expert database; determining the similarity between the second expert database and the first expert database, and detecting whether the similarity is greater than or equal to a preset tolerance threshold; if the similarity is greater than or equal to the tolerance threshold, performing reinforcement learning on the second expert database to obtain an optimized process strategy, and confirming whether the optimized process strategy is matched with a processing process strategy of an expected process task; and if the optimized process strategy is confirmed to be matched with the machining process strategy, determining the optimized numerical control machining process according to the optimized process strategy. The problem of low reliability of the process model is solved.

Description

Optimization method and device of numerical control machining process, terminal equipment and storage medium
Technical Field
The present invention relates to the field of intelligent manufacturing technologies, and in particular, to a method and apparatus for optimizing a numerical control machining process, a terminal device, and a storage medium.
Background
Along with the development of intelligent manufacturing, numerical control machining is widely applied, which is a process method for machining parts on a numerical control machine tool, and a mechanical machining method for controlling the displacement of the parts and the cutter by using digital information.
The conventional process optimization method generally trains the behavior data of the expert based on the mathematical model and the specified rules, but the expert may deviate from the cognition in the actual application scene, that is, the cognition of the expert may be inaccurate, so that the performance of the model in the actual scene is reduced, and the reliability of the processing process model is very low.
Disclosure of Invention
The invention mainly aims to provide an optimization method, device, terminal equipment and computer storage medium of a numerical control machining process, and aims to solve the problem that the reliability of a machining process model is very low.
In order to achieve the above purpose, the present invention provides an optimization method of a numerical control machining process, which constructs a first expert database according to acquired machining behavior data and state environment data, performs behavior cloning on the first expert database, and performs disturbance change on machining parameters of the first expert database after behavior cloning to obtain a second expert database;
determining the similarity between the second expert database and the first expert database, and detecting whether the similarity is greater than or equal to a preset tolerance threshold;
if the similarity is greater than or equal to the tolerance threshold, performing reinforcement learning on the second expert database to obtain an optimized process strategy, and confirming whether the optimized process strategy is matched with a processing process strategy of an expected process task;
and if the optimized process strategy is confirmed to be matched with the machining process strategy, determining an optimized numerical control machining process according to the optimized process strategy.
Optionally, before the step of detecting whether the similarity is greater than or equal to a preset tolerance threshold, the method further includes:
normalizing the processing behavior data and the state environment data in the first expert data, and determining a tolerance threshold range according to a normalization processing result;
and selecting one data within the tolerance threshold as the tolerance threshold.
Optionally, after the step of confirming whether the optimized process strategy matches a process strategy of an intended process task, the method further comprises:
if the optimized process strategy is not matched with the machining process strategy, confirming disturbance change parameters of disturbance change of machining parameters of the first expert database after behavior cloning;
updating the disturbance change parameters, and executing the step of performing disturbance change on the processing parameters of the first expert database after behavior cloning according to the updated disturbance change parameters to obtain a second expert database;
if the number of times of mismatch is detected to be greater than a preset number of times threshold, adjusting the tolerance threshold, and re-executing the step of detecting whether the similarity is greater than the preset tolerance threshold according to the adjusted tolerance threshold.
Optionally, the step of adjusting the tolerance threshold includes:
adjusting the tolerance threshold within the tolerance threshold range through a preset adjustment rule; or alternatively, the process may be performed,
responsive to an operation that modifies the tolerance threshold within the tolerance threshold range, the tolerance threshold is adjusted based on the operation.
Optionally, the step of determining the similarity between the second expert database and the first expert database includes:
constructing a first data distribution based on the processing behavior data and the state environment data of the first expert database, and constructing a second data distribution based on the processing behavior data and the state environment data of the second expert database;
and calculating the KL divergence between the first data distribution and the second data distribution, and determining the KL divergence as the similarity.
Optionally, the reinforcement learning includes: reverse reinforcement learning and forward reinforcement learning, wherein the step of performing reinforcement learning on the second expert database to obtain an optimized process strategy comprises the following steps:
performing reverse reinforcement learning on the second expert database to obtain a return function of decision-making behavior of the expert in the processing process of the state environment data;
and carrying out optimization decision on the return function based on forward reinforcement learning to obtain an optimized process strategy.
Optionally, after the step of detecting whether the similarity is greater than or equal to a preset tolerance threshold, the method includes:
if the similarity is smaller than the tolerance threshold, confirming disturbance change parameters for carrying out disturbance change on the processing parameters of the first expert database after behavior cloning;
updating the disturbance change parameters, and executing the step of obtaining a second expert database by carrying out disturbance change on the processing parameters of the first expert database after behavior cloning according to the updated disturbance change parameters.
In addition, in order to achieve the above object, the present invention also provides an optimizing apparatus of a numerical control machining process, the optimizing apparatus of a numerical control machining process comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for constructing a first expert database according to acquired processing behavior data and state environment data, performing behavior cloning on the first expert database, and performing disturbance change on processing parameters of the first expert database after behavior cloning to obtain a second expert database;
the detection module is used for determining the similarity between the second expert database and the first expert database and detecting whether the similarity is larger than or equal to a preset tolerance threshold;
the matching module is used for performing reinforcement learning on the second expert database if the similarity is greater than or equal to the tolerance threshold value to obtain an optimized process strategy, and confirming whether the optimized process strategy is matched with a processing process strategy of an expected process task;
and the determining module is used for determining the optimized numerical control machining process according to the optimized process strategy if the optimized process strategy is confirmed to be matched with the machining process strategy.
In addition, to achieve the above object, the present invention also provides a terminal device including: the system comprises a memory, a processor and a numerical control machining process optimizing program stored in the memory and capable of running on the processor, wherein the numerical control machining process optimizing program realizes the steps of the numerical control machining process optimizing method when being executed by the processor.
In addition, in order to achieve the above object, the present invention also provides a computer storage medium having stored thereon an optimization program of a numerical control machining process, which when executed by a processor, implements the steps of the optimization method of a numerical control machining process as described above.
Compared with the traditional optimization mode based on mathematical models or rules, the method comprises the steps of constructing a first expert database according to the acquired processing behavior data and state environment data, performing behavior cloning on the first expert database, and performing disturbance change on processing parameters of the first expert database after behavior cloning to obtain a second expert database; determining the similarity between the second expert database and the first expert database, and detecting whether the similarity is greater than or equal to a preset tolerance threshold; if the similarity is greater than or equal to the tolerance threshold, performing reinforcement learning on the second expert database to obtain an optimized process strategy, and confirming whether the optimized process strategy is matched with a processing process strategy of an expected process task; and if the optimized process strategy is confirmed to be matched with the machining process strategy, determining an optimized numerical control machining process according to the optimized process strategy. Therefore, the method and the device acquire the first expert database based on the state environment data and the behavior data of the expert during processing, perform disturbance change after performing behavior cloning on the first expert database to acquire the second expert database, perform reinforcement learning on the second expert database when the similarity between the second expert database and the first expert database is larger than a tolerance threshold, and determine the optimized numerical control processing technology according to the optimized technology strategy when the optimized technology strategy acquired by reinforcement learning is matched with the processing technology strategy, thereby improving the tolerance of the expert database, further relieving the cognition deviation of the expert on the actual application scene, and further improving the generalization capability of the processing model.
Drawings
Fig. 1 is a schematic structural diagram of hardware operation of a terminal device according to an embodiment of the present invention;
FIG. 2 is a flow chart of an embodiment of a method for optimizing a numerical control process according to the present invention;
FIG. 3 is a schematic flow chart of a second embodiment of an optimizing method of a numerical control machining process according to the present invention;
FIG. 4 is a flowchart illustrating an embodiment of the refinement step of step S110 in FIG. 3;
FIG. 5 is a schematic diagram of the structural relationship of an optimizing system of the numerical control machining process of the invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a hardware running environment related to a terminal device according to an embodiment of the present invention.
It should be noted that fig. 1 may be a schematic structural diagram of a hardware operating environment of a terminal device. The terminal equipment provided by the embodiment of the invention can be equipment for integrally guiding the machine tool to be assembled.
As shown in fig. 1, the terminal device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a nonvolatile memory (e.g., flash memory), a high-speed RAM memory, or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the terminal device structure shown in fig. 1 is not limiting of the terminal device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, a user interface module, and an optimization program of the numerical control machining process may be included in the memory 1005 as a computer storage medium. The operating system is a program for managing and controlling hardware and software resources of the sample terminal equipment, and supports the running of an optimizing program and other software or programs of the numerical control machining process.
In the terminal device shown in fig. 1, the user interface 1003 is mainly used for data communication with each terminal; the network interface 1004 is mainly used for connecting a background server and carrying out data communication with the background server; and the processor 1001 may be used to invoke an optimization program for the numerical control machining process stored in the memory 1005.
Based on the terminal equipment, the embodiments of the optimizing method of the numerical control processing technology are provided. In various embodiments of the method of optimizing a numerical control process of the present invention.
Referring to fig. 2, fig. 2 is a flow chart of a first embodiment of the optimizing method of the numerical control machining process according to the present invention. In a first embodiment of the method of the present invention, the method for optimizing a numerical control machining process is applied to a terminal device, and the method for optimizing a numerical control machining process of the present invention includes:
step S10: constructing a first expert database according to the acquired processing behavior data and state environment data, performing behavior cloning on the first expert database, and performing disturbance change on processing parameters of the first expert database after behavior cloning to obtain a second expert database;
the traditional process optimization method consumes calculation cost and time expenditure through complex mathematical modeling and searching processes. In addition, the traditional process optimization method causes the performance of the model in the actual scene to be reduced due to the fact that the training data are different from the actual application scene.
In this embodiment, recording is performed on behavior data of an expert, and state environment data is obtained from the expert in an actual machining process, where the state environment data X includes temperature proprietary dry data, angle of a tool, wear condition of the tool, speed of a spindle, and the like when the expert operates, the behavior data a includes information such as a sequence of actions of the expert in a numerical control machining process of a machine tool, for example, a process sequence of a decision, a machining resource, a process parameter, and the like, specifically, the expert divides a set of machining process into three steps to perform cutting, for example, a first step is cutting 2mm, a second step is cutting 3mm, a third step is cutting 1mm, and the like, and a size of each cutting and a number of steps performed are decision behaviors of the expert. It can be appreciated that, due to the deviation between the machining data in the numerical control machining process and the theoretical machining data, the actual machining process can be more fitted by the actual machining data of the expert in the actual machining process.
For example, if a set of numerical control processes includes N steps, one record of the expert is N (X, a), and the first expert database may include a plurality of records of the same process performed by the expert, that is, M records of a set of numerical control processes performed by the expert are obtained, the first expert database is MN (X, a), that is, the first distribution. The first expert database is a precise database composed of decision information at the time of expert actual processing and state information of the machine tool at the time of actual processing.
After the first expert database is obtained, if target records with the difference larger than a preset difference value exist in the records, screening the target records, and taking the remaining records as the first expert database. For example, a behavior data record of expert errors or a status environment record of machine tool system faults is screened out, and the remaining record is used as a first expert database.
In this embodiment, after the actual processing data of the expert is obtained, the processing operation of the expert is simulated in the training process, the problem of sample deviation is alleviated to a certain extent, disturbance change is performed after behavior cloning is performed on the first expert database, that is, left-right change is performed on the accurate data processed by the expert, so as to obtain a second expert database within the tolerable tolerance threshold range, wherein the second expert database is a database with tolerance after disturbance change based on the accurate first expert database. When disturbance change is carried out, the disturbance change is carried out on the first expert database through a trigger mechanism designed by the system, and the second expert database after the disturbance change meets the preset tolerance threshold condition.
Step S20: determining the similarity between the second expert database and the first expert database, and detecting whether the similarity is greater than or equal to a preset tolerance threshold;
in the present embodiment, the variation position and the variation degree of the distribution MN (X, a) of the first expert database a, that is, the variation position is the processing step of the process and the variation degree is the processing dimension of the process are confirmed, and it is to be noted that the distribution MN (X, a) includes N steps, and if the variation position is the front, the remaining amount of the rear is also changed according to the second change of the front, for example, because when the processing dimension of the first step is large, the processing dimension of the rear step needs to be reduced, that is, the rear step needs to be shallow after the front step is deep cut. Therefore, the matching tolerance of the first expert database after the disturbance change and the first expert database, namely, the similarity between the first expert database after the disturbance change and the first expert database can be influenced based on the change position and the change degree. After calculating the similarity between the second expert database and the first expert database, further detecting whether the similarity is greater than or equal to a preset tolerance threshold.
It should be noted that, if the similarity is greater than or equal to the preset tolerance threshold, it is noted that the similarity between the second expert database and the first expert database is higher than the tolerance threshold, that is, the second expert database and the first expert database are successfully matched, and if the similarity is less than the preset tolerance threshold, the second expert database and the first expert database are failed to be matched.
Optionally, in some possible embodiments, the step of determining the similarity between the second expert database and the first expert database in step S20 may further include the steps of:
step S201: constructing a first data distribution based on the processing behavior data and the state environment data of the first expert database, and constructing a second data distribution based on the processing behavior data and the state environment data of the second expert database;
step S202: and calculating the KL divergence between the first data distribution and the second data distribution, and determining the KL divergence as the similarity.
In this embodiment, loose matching is performed on the first data distribution of the first expert database and the second data distribution of the second expert database, and the specific matching process is calculated through KL divergence, so that the degree of difference between the first expert database and the second expert database is calculated, and the larger the KL divergence is, the larger the difference between the first expert database and the second expert database is, the smaller the KL divergence is, and the smaller the difference between the first expert database and the second expert database is.
Optionally, in some possible embodiments, before the step of detecting whether the similarity is greater than or equal to the preset tolerance threshold in step S20, the optimizing method of the numerical control machining process of the present invention may further include the following steps:
step S50: normalizing the processing behavior data and the state environment data in the first expert database, and obtaining tolerance range data according to a normalization processing result;
in this embodiment, the expert databases of the machining process of each numerical control machine tool are different, and normalization processing is performed on the obtained first expert database to obtain tolerance range data from 0 to 1, so that the data of the expert database of the machining process is in the range from 0 to 1.
Step S60: and selecting one data within the tolerance threshold as the tolerance threshold.
In the present embodiment, the preset tolerance threshold is randomly confirmed in the tolerance range data of 0 to 1. Illustratively, the preset tolerance threshold for random acknowledgements is set in advance to a number, for example 0.5, which is obtained through multiple experiments and tests, and is an initial value of the tolerance threshold.
Step S30: if the similarity is greater than or equal to the tolerance threshold, performing reinforcement learning on the second expert database to obtain an optimized process strategy, and confirming whether the optimized process strategy is matched with a processing process strategy of an expected process task;
in this embodiment, after the second expert database with tolerance is obtained, the optimization process strategy is obtained after reinforcement learning calculation is performed based on the second expert database, then, whether the optimization process sample is matched with the processing process strategy of the expected process task is further confirmed, whether the processing data obtained by processing the optimization process strategy meets the expected process task is confirmed, if a set of processing processes actually need to cut 4mm, whether the processing data obtained by processing the optimization process sample is 4mm or is within the range of processing data which can be born, if yes, whether the optimization process sample meets the expected process task is confirmed, and if not, whether the optimization process sample does not meet the expected process task is confirmed. Whether the optimized process strategy is matched with the machining process strategy of the expected process task is confirmed, whether the optimized process strategy is matched with the machining process strategy of the expected process task is judged through a program set by the system, if the optimized process strategy is matched with the machining process strategy, the output result is correct, and if the optimized process strategy is not matched with the machining process strategy, the output result is wrong.
Optionally, in some possible embodiments, the reinforcement learning includes: the step of performing reinforcement learning on the second expert database to obtain an optimized process strategy in step S30 may include the following steps:
step S301: performing reverse reinforcement learning on the second expert database to obtain a return function of decision-making behavior of the expert in the processing process of the state environment data;
in this embodiment, the second expert database is subjected to reverse reinforcement learning to obtain a track sample of the machining process, that is, the optimization strategy of the machining process is learned from the database with tolerance by the reverse reinforcement learning algorithm, so as to infer the summarized targets and preferences of the experts in the machining process, and further generate a return function capable of explaining the behaviors of the experts, and the return function is reversely pushed to obtain the track sample by the second expert database.
Step S302: and carrying out forward reinforcement learning on the track sample to obtain an optimization process.
In this embodiment, the forward reinforcement learning algorithm is used to learn the return function, so as to obtain an optimization process, and the forward learning algorithm may be a markov decision process, and it should be noted that the reverse learning algorithm and the forward learning algorithm are all conventional techniques and are not described one by one.
Optionally, in some possible embodiments, after the step of detecting whether the similarity is greater than or equal to the preset tolerance threshold in step S30, the optimizing method of the numerical control machining process of the present invention further includes the following steps:
step S70: if the similarity is smaller than the tolerance threshold, confirming disturbance change parameters for carrying out disturbance change on the processing parameters of the first expert database after behavior cloning;
in this embodiment, the trigger mechanism designed by the system randomly confirms the change position and the change degree, and confirms the random confirmation change position and the change degree of the first expert database after behavior cloning as disturbance change parameters.
Step S80: updating the disturbance change parameters, and executing the step of obtaining a second expert database by carrying out disturbance change on the processing parameters of the first expert database after behavior cloning according to the updated disturbance change parameters.
In this embodiment, the change position and the change degree are updated, and disturbance change is performed on the machining parameters of the cloned first expert database according to the updated change position and the updated change degree, so as to obtain the second database, until the similarity between the updated second expert database and the first expert database is greater than or equal to the tolerance threshold.
Step S40: and if the optimized process strategy is confirmed to be matched with the machining process strategy, determining an optimized numerical control machining process according to the optimized process strategy.
In this embodiment, if the optimized process strategy is matched with the machining process strategy, the optimized numerical control machining process is determined according to the optimized process strategy, so that the optimized numerical control machining process simulates expert behavior in the training process, the problem of sample deviation is relieved, and a process optimization scheme similar to an expert is formed, so that the method has feasibility and practicability in practice.
Compared with the traditional optimization mode based on mathematical models or rules, the method comprises the steps of constructing a first expert database according to the acquired processing behavior data and state environment data, performing behavior cloning on the first expert database, and performing disturbance change on processing parameters of the first expert database after behavior cloning to obtain a second expert database; determining the similarity between the second expert database and the first expert database, and detecting whether the similarity is greater than or equal to a preset tolerance threshold; if the similarity is greater than or equal to the tolerance threshold, performing reinforcement learning on the second expert database to obtain an optimized process strategy, and confirming whether the optimized process strategy is matched with a processing process strategy of an expected process task; and if the optimized process strategy is confirmed to be matched with the machining process strategy, determining an optimized numerical control machining process according to the optimized process strategy. Therefore, the method and the device acquire the first expert database based on the state environment data and the behavior data of the expert during processing, perform disturbance change after performing behavior cloning on the first expert database to acquire the second expert database, perform reinforcement learning on the second expert database when the similarity between the second expert database and the first expert database is larger than a tolerance threshold, and determine the optimized numerical control processing technology according to the optimized technology strategy when the optimized technology strategy acquired by reinforcement learning is matched with the processing technology strategy, thereby improving the tolerance of the expert database, further relieving the cognition deviation of the expert on the actual application scene, and further improving the generalization capability of the processing model.
Optionally, based on the first embodiment of the optimization method of the numerical control machining process of the present invention, a second embodiment of the optimization method of the numerical control machining process of the present invention is proposed.
In some possible embodiments, referring to fig. 3, fig. 3 is a schematic flow chart of a second embodiment of an optimizing method of a numerical control machining process according to the present invention, after the step of confirming whether the optimizing process strategy matches the machining process strategy of the expected process task in step S30, the optimizing method of a numerical control machining process according to the present invention may further include the following steps:
step S90: if the optimized process strategy is not matched with the machining process strategy, confirming disturbance change parameters of disturbance change of machining parameters of the first expert database after behavior cloning;
in this embodiment, if the optimized process strategy obtained by learning based on the second expert database does not match the processing process strategy, the variation position and the variation degree of the disturbance with respect to the first expert database after behavior cloning are confirmed.
Step S100: updating the disturbance change parameters, and executing the step of performing disturbance change on the processing parameters of the first expert database after behavior cloning according to the updated disturbance change parameters to obtain a second expert database;
in this embodiment, the change position and the change degree are updated, and disturbance change is performed on the machining parameters of the cloned first expert database according to the updated change position and change degree, so as to obtain the second database.
It should be noted that, firstly, disturbance change is performed on the processing parameters of the first expert database after behavior cloning to obtain a second expert database with the similarity between the second expert database and the first expert database being greater than or equal to a preset tolerance threshold, when the similarity is smaller than the tolerance threshold, the change position and the change degree are circularly updated until the similarity between the second expert database and the first expert database is greater than or equal to the preset tolerance threshold, then, an optimization process strategy is further obtained according to the second expert database, and when the optimization process strategy is confirmed to be not matched with the processing process strategy of the expected process task, the change position and the change degree are circularly updated until the optimization process strategy obtained based on the learning of the second expert database is matched with the processing process strategy of the expected process task.
Step S110: if the number of times of mismatch is detected to be greater than a preset number of times threshold, adjusting the tolerance threshold, and re-executing the step of detecting whether the similarity is greater than the preset tolerance threshold according to the adjusted tolerance threshold.
In this embodiment, the number of times that the optimized process strategy is not matched with the processing process strategy of the expected process task is detected, if the number of times that the optimized process strategy is not matched with the processing process strategy of the expected process task is detected to be greater than a preset number of times threshold, the tolerance threshold is adjusted, after the tolerance threshold is adjusted, the change position and the change degree are continuously updated until the similarity between the second expert database and the first expert database is greater than or equal to the preset tolerance threshold, and then the optimized process strategy is further obtained according to the second expert database, and whether the optimized process strategy is matched with the processing process strategy of the expected process task is confirmed.
After the tolerance threshold is adjusted, if the number of times of mismatch is still greater than the preset number of times threshold, the tolerance threshold is continuously adjusted until the optimized process strategy is matched with the processing process strategy of the expected process task.
Optionally, in some possible embodiments, referring to fig. 4, fig. 4 is a schematic flow chart of an embodiment of the refinement step of step S90 in fig. 3, and the step of "adjusting the tolerance threshold" in step S110 may include the following steps:
step A: adjusting the tolerance threshold within the tolerance threshold range through a preset adjustment rule; or alternatively, the process may be performed,
in this embodiment, if the number of times of mismatch is detected to be greater than the preset number of times threshold, the tolerance threshold is adjusted within the tolerance threshold range based on the adjustment rule set by the system, and the preset adjustment rule may be a prescribed value such as 0.1 or 0.01 for each adjustment.
And (B) step (B): and responding to an operation of modifying the preset tolerance threshold within the tolerance threshold range, and adjusting the preset tolerance threshold based on the operation.
In this embodiment, if the number of times of mismatch is detected to be greater than the preset number of times threshold, prompting is performed, an operator performs an operation of modifying the tolerance threshold based on the prompting, and the terminal device responds to the operation of modifying the preset tolerance threshold within the tolerance threshold range and adjusts the preset tolerance threshold. The user modifies the preset tolerance threshold within the tolerance threshold based on how well the optimized process strategy matches the process strategy of the intended process task.
In addition, referring to fig. 5, an embodiment of the present invention further provides an optimizing apparatus for a numerical control machining process, where the optimizing apparatus for a numerical control machining process of the present invention includes:
the acquisition module 10 is configured to construct a first expert database according to the acquired processing behavior data and state environment data, perform behavior cloning on the first expert database, and perform disturbance change on processing parameters of the first expert database after behavior cloning to obtain a second expert database;
a detection module 20, configured to determine a similarity between the second expert database and the first expert database, and detect whether the similarity is greater than or equal to a preset tolerance threshold;
the matching module 30 is configured to reinforcement learn the second expert database if the similarity is greater than or equal to the tolerance threshold, obtain an optimized process policy, and confirm whether the optimized process policy matches a processing process policy of an expected process task;
and the determining module 40 is configured to determine an optimized numerical control machining process according to the optimized process policy if the optimized process policy is determined to match the machining process policy.
Optionally, the detection module 20 further includes:
normalizing the processing behavior data and the state environment data in the first expert data, and determining a tolerance threshold range according to a normalization processing result;
and selecting one data within the tolerance threshold as the tolerance threshold.
Optionally, the matching module 30 further includes:
if the optimized process strategy is not matched with the machining process strategy, confirming disturbance change parameters of disturbance change of machining parameters of the first expert database after behavior cloning;
updating the disturbance change parameters, and executing the step of performing disturbance change on the processing parameters of the first expert database after behavior cloning according to the updated disturbance change parameters to obtain a second expert database;
if the number of times of mismatch is detected to be greater than a preset number of times threshold, adjusting the tolerance threshold, and re-executing the step of detecting whether the similarity is greater than the preset tolerance threshold according to the adjusted tolerance threshold.
Optionally, the matching module 30 further includes:
adjusting the tolerance threshold within the tolerance threshold range through a preset adjustment rule; or alternatively, the process may be performed,
responsive to an operation that modifies the tolerance threshold within the tolerance threshold range, the tolerance threshold is adjusted based on the operation.
Optionally, the detection module 20 further includes:
constructing a first data distribution based on the processing behavior data and the state environment data of the first expert database, and constructing a second data distribution based on the processing behavior data and the state environment data of the second expert database;
and calculating the KL divergence between the first data distribution and the second data distribution, and determining the KL divergence as the similarity.
Optionally, the reinforcement learning includes: reverse reinforcement learning and forward reinforcement learning, the matching module 30 further includes:
performing reverse reinforcement learning on the second expert database to obtain a return function of decision-making behavior of the expert in the processing process of the state environment data;
and carrying out optimization decision on the return function based on forward reinforcement learning to obtain an optimized process strategy.
Optionally, the detection module 20 further includes:
if the similarity is smaller than the tolerance threshold, confirming disturbance change parameters for carrying out disturbance change on the processing parameters of the first expert database after behavior cloning;
updating the disturbance change parameters, and executing the step of obtaining a second expert database by carrying out disturbance change on the processing parameters of the first expert database after behavior cloning according to the updated disturbance change parameters.
In addition, the embodiment of the invention also provides a terminal device, which comprises: the system comprises a memory, a processor and a numerical control machining process optimizing program stored in the memory and capable of running on the processor, wherein the numerical control machining process optimizing program realizes the steps of the numerical control machining process optimizing method when being executed by the processor.
The steps implemented when the optimization program of the numerical control machining process running on the processor is executed may refer to various embodiments of the optimization method of the numerical control machining process of the present invention, which are not described herein again.
In addition, the embodiment of the invention also provides a computer storage medium which is applied to a computer, wherein the computer storage medium can be a nonvolatile computer readable computer storage medium, and an optimization program of the numerical control machining process is stored on the computer storage medium, and the optimization program of the numerical control machining process realizes the steps of the optimization method of the numerical control machining process when being executed by a processor.
The steps implemented when the optimization program of the numerical control machining process running on the processor is executed may refer to various embodiments of the optimization method of the numerical control machining process of the present invention, which are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a computer storage medium (such as a Flash memory, a ROM/RAM, a magnetic disk, an optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.), a controller for controlling the storage medium to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. The optimizing method of the numerical control machining process is characterized by comprising the following steps of:
constructing a first expert database according to the acquired processing behavior data and state environment data, performing behavior cloning on the first expert database, and performing disturbance change on processing parameters of the first expert database after behavior cloning to obtain a second expert database;
determining the similarity between the second expert database and the first expert database, and detecting whether the similarity is greater than or equal to a preset tolerance threshold;
if the similarity is greater than or equal to the tolerance threshold, performing reinforcement learning on the second expert database to obtain an optimized process strategy, and confirming whether the optimized process strategy is matched with a processing process strategy of an expected process task;
and if the optimized process strategy is confirmed to be matched with the machining process strategy, determining an optimized numerical control machining process according to the optimized process strategy.
2. The method of optimizing a numerically controlled process according to claim 1, wherein prior to said step of detecting whether said similarity is greater than or equal to a preset tolerance threshold, said method further comprises:
normalizing the processing behavior data and the state environment data in the first expert data, and determining a tolerance threshold range according to a normalization processing result;
and selecting one data within the tolerance threshold as the tolerance threshold.
3. The method of optimizing a numerically controlled process of claim 1, wherein after said step of determining whether said optimized process strategy matches a process strategy for an intended process task, said method further comprises:
if the optimized process strategy is not matched with the machining process strategy, confirming disturbance change parameters of disturbance change of machining parameters of the first expert database after behavior cloning;
updating the disturbance change parameters, and executing the step of performing disturbance change on the processing parameters of the first expert database after behavior cloning according to the updated disturbance change parameters to obtain a second expert database;
if the number of times of mismatch is detected to be greater than a preset number of times threshold, adjusting the tolerance threshold, and re-executing the step of detecting whether the similarity is greater than the preset tolerance threshold according to the adjusted tolerance threshold.
4. A method of optimizing a numerically controlled process as in claim 3, wherein said step of adjusting said tolerance threshold comprises:
adjusting the tolerance threshold within the tolerance threshold range through a preset adjustment rule; or alternatively, the process may be performed,
responsive to an operation that modifies the tolerance threshold within the tolerance threshold range, the tolerance threshold is adjusted based on the operation.
5. The method of optimizing a numerically controlled process according to claim 1, wherein said step of determining a similarity between said second expert database and said first expert database comprises:
constructing a first data distribution based on the processing behavior data and the state environment data of the first expert database, and constructing a second data distribution based on the processing behavior data and the state environment data of the second expert database;
and calculating the KL divergence between the first data distribution and the second data distribution, and determining the KL divergence as the similarity.
6. The method of optimizing a numerically controlled process as in claim 1, wherein the reinforcement learning comprises: reverse reinforcement learning and forward reinforcement learning, wherein the step of performing reinforcement learning on the second expert database to obtain an optimized process strategy comprises the following steps:
performing reverse reinforcement learning on the second expert database to obtain a return function of decision-making behavior of the expert in the processing process of the state environment data;
and carrying out optimization decision on the return function based on forward reinforcement learning to obtain an optimized process strategy.
7. The method of optimizing a numerically controlled process according to claim 1, wherein after the step of detecting whether the similarity is greater than or equal to a preset tolerance threshold, comprising:
if the similarity is smaller than the tolerance threshold, confirming disturbance change parameters for carrying out disturbance change on the processing parameters of the first expert database after behavior cloning;
updating the disturbance change parameters, and executing the step of obtaining a second expert database by carrying out disturbance change on the processing parameters of the first expert database after behavior cloning according to the updated disturbance change parameters.
8. An optimizing device of a numerical control machining process is characterized in that the optimizing device of the numerical control machining process comprises:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for constructing a first expert database according to acquired processing behavior data and state environment data, performing behavior cloning on the first expert database, and performing disturbance change on processing parameters of the first expert database after behavior cloning to obtain a second expert database;
the detection module is used for determining the similarity between the second expert database and the first expert database and detecting whether the similarity is larger than or equal to a preset tolerance threshold;
the matching module is used for performing reinforcement learning on the second expert database if the similarity is greater than or equal to the tolerance threshold value to obtain an optimized process strategy, and confirming whether the optimized process strategy is matched with a processing process strategy of an expected process task;
and the determining module is used for determining the optimized numerical control machining process according to the optimized process strategy if the optimized process strategy is confirmed to be matched with the machining process strategy.
9. A terminal device, characterized in that the terminal device comprises: a memory, a processor and an optimization program for a numerical control machining process stored on the memory and operable on the processor, which when executed by the processor, implements the steps of the optimization method for a numerical control machining process according to any one of claims 1 to 7.
10. A computer storage medium, wherein an optimization program of a numerical control machining process is stored on the computer storage medium, and the optimization program of the numerical control machining process realizes the steps of the optimization method of the numerical control machining process according to any one of claims 1 to 7 when being executed by a processor.
CN202310968969.7A 2023-08-03 2023-08-03 Optimization method and device of numerical control machining process, terminal equipment and storage medium Active CN116679615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310968969.7A CN116679615B (en) 2023-08-03 2023-08-03 Optimization method and device of numerical control machining process, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310968969.7A CN116679615B (en) 2023-08-03 2023-08-03 Optimization method and device of numerical control machining process, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116679615A true CN116679615A (en) 2023-09-01
CN116679615B CN116679615B (en) 2023-10-20

Family

ID=87781344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310968969.7A Active CN116679615B (en) 2023-08-03 2023-08-03 Optimization method and device of numerical control machining process, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116679615B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652371A (en) * 2020-05-29 2020-09-11 京东城市(北京)数字科技有限公司 Offline reinforcement learning network training method, device, system and storage medium
CN112434323A (en) * 2020-12-01 2021-03-02 Oppo广东移动通信有限公司 Model parameter obtaining method and device, computer equipment and storage medium
CN113077052A (en) * 2021-04-28 2021-07-06 平安科技(深圳)有限公司 Reinforced learning method, device, equipment and medium for sparse reward environment
CN113239629A (en) * 2021-06-03 2021-08-10 上海交通大学 Method for reinforcement learning exploration and utilization of trajectory space determinant point process
CN113741533A (en) * 2021-09-16 2021-12-03 中国电子科技集团公司第五十四研究所 Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning
CN114625838A (en) * 2022-03-10 2022-06-14 平安科技(深圳)有限公司 Search system optimization method and device, storage medium and computer equipment
CN115062761A (en) * 2022-06-08 2022-09-16 北京航空航天大学 Weapon force behavior decision model accelerated construction method based on off-line training combination
US20230029993A1 (en) * 2021-07-28 2023-02-02 Toyota Research Institute, Inc. Systems and methods for behavior cloning with structured world models
CN115826404A (en) * 2022-11-18 2023-03-21 浙江大学 Multi-agent cooperation control method for guiding reward decomposition by using expert samples
CN116204503A (en) * 2021-12-01 2023-06-02 中兴通讯股份有限公司 Database parameter tuning method, network device and computer readable storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652371A (en) * 2020-05-29 2020-09-11 京东城市(北京)数字科技有限公司 Offline reinforcement learning network training method, device, system and storage medium
CN112434323A (en) * 2020-12-01 2021-03-02 Oppo广东移动通信有限公司 Model parameter obtaining method and device, computer equipment and storage medium
CN113077052A (en) * 2021-04-28 2021-07-06 平安科技(深圳)有限公司 Reinforced learning method, device, equipment and medium for sparse reward environment
CN113239629A (en) * 2021-06-03 2021-08-10 上海交通大学 Method for reinforcement learning exploration and utilization of trajectory space determinant point process
US20230029993A1 (en) * 2021-07-28 2023-02-02 Toyota Research Institute, Inc. Systems and methods for behavior cloning with structured world models
CN113741533A (en) * 2021-09-16 2021-12-03 中国电子科技集团公司第五十四研究所 Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning
CN116204503A (en) * 2021-12-01 2023-06-02 中兴通讯股份有限公司 Database parameter tuning method, network device and computer readable storage medium
CN114625838A (en) * 2022-03-10 2022-06-14 平安科技(深圳)有限公司 Search system optimization method and device, storage medium and computer equipment
CN115062761A (en) * 2022-06-08 2022-09-16 北京航空航天大学 Weapon force behavior decision model accelerated construction method based on off-line training combination
CN115826404A (en) * 2022-11-18 2023-03-21 浙江大学 Multi-agent cooperation control method for guiding reward decomposition by using expert samples

Also Published As

Publication number Publication date
CN116679615B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN110276446B (en) Method and device for training model and selecting recommendation information
US7565333B2 (en) Control system and method
US7499777B2 (en) Diagnostic and prognostic method and system
CN110471276B (en) Apparatus for creating model functions for physical systems
US20170061313A1 (en) System and Method for Estimating a Performance Metric
KR102229859B1 (en) Operation management method, apparatus and system using machine learning and transfer learning
US20200201342A1 (en) Obstacle avoidance model generation method, obstacle avoidance model generation device, and obstacle avoidance model generation program
KR20180019662A (en) Dynamic Vehicle Performance Analyzer with Smoothing Filter
KR102326733B1 (en) Method and apparatus for actuating actuator control system, computer program and machine readable storage medium
CN111008148B (en) Code testing method and device and computer readable storage medium
CN113836755A (en) Control method and device based on digital twin model
CN116679615B (en) Optimization method and device of numerical control machining process, terminal equipment and storage medium
CN114637268A (en) Industrial production control method and system based on artificial intelligence and cloud platform
Hernández‐Mejías et al. Reliable controllable sets for constrained Markov‐jump linear systems
CN112529218A (en) Building safety detection method and system based on correlation analysis
CN117033209A (en) AI model training method, BIOS testing method, device, equipment and storage medium
CN115587545B (en) Parameter optimization method, device and equipment for photoresist and storage medium
CN104345637B (en) Method and apparatus for adapting a data-based function model
US20210012195A1 (en) Information processing apparatus
CN114138597B (en) Operating system performance tuning device, method, equipment and storage medium
CN115840864A (en) Method and device for operating a technical system
US20090030861A1 (en) Probabilistic Prediction Based Artificial Intelligence Planning System
CN112758106A (en) Vehicle running track prediction method and device
US20140026136A1 (en) Analysis engine control device
US20230273925A1 (en) Method and apparatus for database management system query planning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant