CN110824954A

CN110824954A - Intelligent agent training method and system, computer equipment and readable storage medium

Info

Publication number: CN110824954A
Application number: CN201911016946.6A
Authority: CN
Inventors: 贾政轩; 林廷宇; 肖莹莹; 施国强; 李伯虎; 张迎曦
Original assignee: Beijing Simulation Center
Current assignee: Beijing Simulation Center
Priority date: 2019-10-24
Filing date: 2019-10-24
Publication date: 2020-02-21

Abstract

The invention discloses an intelligent agent training method, which comprises the following steps: s1, constructing a simplified environment, and performing primary training of the agent in the simplified environment; s2, constructing a fidelity environment, and performing supplementary training of the intelligent agent in the fidelity environment; and S3, constructing a semi-physical simulation environment, and verifying the performance of the intelligent agent in the semi-physical simulation environment. The invention realizes smooth transition from model training to model application in physical space, realizes training of intelligent agent with good reliability in practical system within acceptable time range, further expands application field of the existing data-driven intelligent calculation method, and has the capability of transferring application to real physical system.

Description

Intelligent agent training method and system, computer equipment and readable storage medium

Technical Field

The invention relates to the technical field of artificial intelligence. And more particularly, to a method and system for agent training, a computer device, and a readable storage medium.

Background

In recent years, with the rapid development of artificial intelligence technology, data-driven means has gradually emerged its power. Through data-driven computational intelligence, computers have gradually achieved dramatic increases in performance in a number of domains, even far beyond human levels in some. With the aid of deep learning techniques, computers are trained in massive data, and have reached a very high level in the fields of image recognition, target detection, machine translation, word and sentence prediction, and even in the field of creative design such as poetry creation, drawing creation, cover design, and the like. Particularly, the technologies in the fields of image recognition, target detection, machine translation and the like have successfully realized productization and commercialization. Meanwhile, based on deep reinforcement learning, massive interactive training of computers and given environments and rule programs has also respectively prevailed top-level human players in simple interactive games such as Atari games, MuJoCo, Gym, and the like, go games, chess, general chess, texas poker, and the like, and complex instant strategy games such as Dota2, interstellar dispute 2, and the like.

However, with the continuous expansion of the application field of the learning technology, practical problems of some engineering applications begin to appear, and the application of the learning technology in some fields is restricted to a certain extent. Specifically, the magnificent achievements achieved by the learning technology up to now, whether to recognize a prediction-type task or a decision-type task, depend on the training of an agent developed by massive specific task data. For image-like, voice-like, and computer game-like tasks, acquisition of massive amounts of data is relatively easy. On one hand, the acquisition cost of the image and voice data is low, and the acquisition is more convenient; on the other hand, the game data acquisition is more convenient, and the game data can be generated directly by a computer.

However, when encountering design tasks in the development of large complex projects, such as complex product designs, the amount of data that can be obtained will be quite limited. Under the condition of extremely limited data set, the adoption of the original method for training can cause serious overfitting of an intelligent agent model, and the training of an intelligent agent with good fitting and generalization capability is difficult to support. In response to the above problems, a simulation system with complete model verification may be used to generate data to supplement the training. However, considering the requirement of the approximation degree between the simulation system with complete model verification and the real physical system, the simulation system is very complex, which results in unacceptable time consumption for generating mass data.

Disclosure of Invention

In order to solve the technical problems mentioned in the background, a first aspect of the present invention provides an agent training method, including the following steps:

s1, constructing a simplified environment, and performing primary training of the agent in the simplified environment;

s2, constructing a fidelity environment, and performing supplementary training of the intelligent agent in the fidelity environment;

and S3, constructing a semi-physical simulation environment, and verifying the performance of the intelligent agent in the semi-physical simulation environment.

Optionally, the S1 includes:

s11, constructing a plurality of simplified environment models;

s12, verifying the correctness of the principles of the plurality of simplified environment models;

s13, constructing the simplified environment according to the plurality of simplified environment models after correctness verification;

s14, performing preliminary training on the agent in the simplified environment;

and S15, storing the preliminarily trained intelligent agent.

Optionally, the S11 includes:

acquiring core principles and corresponding mechanism behaviors of a plurality of real physical systems;

constructing a plurality of said simplified environmental models according to a plurality of core principles and corresponding mechanistic behaviour.

Optionally, the S12 includes:

and respectively comparing the principles of the plurality of simplified environment models with the core principles of the plurality of real physical systems, and verifying the correctness of the principles of the plurality of simplified environment models according to the comparison result.

Optionally, the S2 includes:

s21, constructing a plurality of fidelity environment models;

s22, verifying the plurality of fidelity environment models;

s23, constructing the fidelity environment according to the verified plurality of fidelity environment models;

s24, performing supplementary training on the preliminarily trained intelligent agent in the fidelity environment;

s25, evaluating the performance of the intelligent agent after the supplementary training in the fidelity environment, if the performance meets the requirement, entering S3, and if the performance does not meet the requirement, returning to S23 for iteration;

the S3 includes:

s31, constructing a semi-physical simulation environment;

and S32, evaluating the performance of the intelligent agent in the semi-physical simulation environment, and returning to S23 for iteration if the performance of the intelligent agent does not meet the requirement.

Optionally, the S21 includes:

acquiring the composition and corresponding detailed behaviors of a plurality of real physical systems;

and respectively constructing a plurality of fidelity environment models according to the constitution of the real physical systems and the corresponding detailed behaviors.

Optionally, the S31 includes:

replacing a part of fidelity environment model modules in the fidelity environment with a real physical system;

and constructing the semi-physical simulation environment according to another part of environment modules in the fidelity environment and the real physical system.

A second aspect of the invention provides an agent training system comprising:

the simplified environment module is used for constructing a simplified environment and performing primary training of the intelligent agent in the simplified environment;

the fidelity environment module is used for constructing a fidelity environment and performing supplementary training of the intelligent agent in the fidelity environment;

and the semi-physical simulation module is used for constructing a semi-physical simulation environment and verifying the performance of the intelligent agent in the semi-physical simulation environment.

A third aspect of the invention provides a computer apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect of the invention when executing the program.

A fourth aspect of the present invention provides a computer-readable storage medium having instructions stored thereon, which, when run on a computer, cause the computer to perform the method of the first aspect of the present invention.

The invention has the following beneficial effects:

the method has the advantages of clear principle and simple design, realizes smooth transition from model training to model application in a physical space by means of intelligent agent initial training based on a simplified environment, intelligent agent supplementary training based on a fidelity environment and intelligent agent performance verification based on a semi-physical simulation environment, realizes training of an intelligent agent with good reliability in an acceptable time range in an actual system, further expands the application field of the existing data-driven intelligent computing method, and has the capability of transferring application to the actual physical system.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 shows a flowchart of an agent training method according to an embodiment of the present invention.

Fig. 2 shows a block diagram of an intelligent agent training system according to another embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a computer device according to still another embodiment of the present invention.

Detailed Description

In order to make the technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

In view of the problems raised in the background art, an embodiment of the present invention provides an agent training method, as shown in fig. 1, including the following steps:

In some optional implementations of this embodiment, the S1 includes:

s11, constructing a plurality of simplified environment models;

and S15, storing the preliminarily trained intelligent agent.

Specifically, in S11, the simplified environment model is constructed mainly based on the core principle of the real physical system and the corresponding mechanism behavior: by acquiring core principles and corresponding mechanism behaviors of a plurality of real physical systems and further constructing a plurality of simplified environment models according to the core principles and the corresponding mechanism behaviors, it should be noted that in this step, only a simplified object model is constructed, and non-core factors such as model uncertainty and noise are ignored.

Next, the details of S11 will be briefly described, taking the automobile model construction as an example. As is well known, the motion of the automobile can be regarded as a rigid motion as a whole, and the motion process can be described by a 6-degree-of-freedom model, which gives the kinetic equations of the position motion and the attitude motion of the automobile respectively. Meanwhile, variables in the equation, such as the position and the attitude of the automobile, are generally obtained by measurement in a real system, but noise exists. And this part is not considered in the simplified environment model construction.

Further, the S12 includes:

Specifically, the correctness verification of the principle of the simplified environment model mainly relates to the verification and verification of the correctness of the model principle established in the step S11, and the verification method mainly verifies and judges the rationality and correctness of the established simplified environment model by actually monitoring the variables of the simplified environment model in the simulation process and comparing the principle of the simplified environment model with the core principle of a real physical system.

Further, in S13, the simplified environment is constructed by the multiple simplified environment models after correctness verification, and after the construction is completed, in S14, the agent is initially trained in the simplified environment.

Specifically, the preliminary training of the intelligent agent in the simplified environment aims to generate data based on the simplified environment, and the preliminary training is carried out on the intelligent agent through learning means such as supervision, semi-supervision and reinforcement, so that the performance requirement of the intelligent agent in the simplified environment is met. The specific implementation process mainly comprises the following aspects:

(1) designing and realizing an intelligent agent model: the process realizes the programming of the intelligent agent model program by adopting a specific program programming language through constructing a framework based on a specific intelligent agent. Meanwhile, in the process, the expansion of the structure design of the intelligent agent under the subsequent fidelity simulation environment model needs to be considered.

(2) Designing and realizing an interface between an intelligent agent model and a simplified simulation environment: the process adopts a specific programming language, realizes the transmission of the output decision of the intelligent agent model into the simplified simulation environment model in the modes of interface calling, interface communication and the like, and transmits the current model state to the intelligent agent model by the simplified simulation environment model.

(3) Training an intelligent agent model: in the process, a specific program compiling language is adopted, the performance index evaluation of the strategy generated by the intelligent agent is designed and realized, and the structure, the parameters and the like of the intelligent agent model are optimized and adjusted based on the intelligent agent model, the simplified simulation environment model state and the performance index, so that a better strategy can be formed.

In S15, the agent after the initial training is saved.

Specifically, the converged agent model is stored mainly in the form of a data file, a database record and the like, and the structure, parameters and the like of the agent which has undergone preliminary training and better performance are persistently stored so as to be directly used later.

The model and training in the simplified environment aim to enable the agent to quickly converge the exploration range of the agent from the whole feasible solution space to the vicinity of a better solution, so that a better initial state and a smaller search range are provided for the next agent parameter fine tuning and supplementary training in the fidelity environment. In the process, the simplified environment and the model construct a core mechanism model of the system in the real physical environment, and the mechanism model needs to reflect the core characteristics of the physical system. Meanwhile, because the search space is huge, and the intelligent agent training needs the support of mass data, the simplified environment and the model need to have the conditions of simpler structure, lighter weight and the like, so that the combination of the multi-instance parallel simulation support technology based on cloud simulation is facilitated, the rapid and parallel simulation deduction is realized, and the training convergence of the intelligent agent model aiming at the virtual environment is further realized in a shorter time.

In some optional implementations of this embodiment, the S2 includes:

s21, constructing a plurality of fidelity environment models;

s22, verifying the plurality of fidelity environment models;

and S25, evaluating the performance of the intelligent agent after the supplementary training in the fidelity environment, if the performance meets the requirement, entering S3, and if the performance does not meet the requirement, returning to S23 for iteration.

Specifically, in S21, the construction of the fidelity environment model is mainly based on the basic principle to construct the composition and detail behavior of the real physical system, and specifically includes: acquiring the composition and corresponding detailed behaviors of a plurality of real physical systems; and respectively constructing a plurality of fidelity environment models according to the constitution of the real physical systems and the corresponding detailed behaviors. Meanwhile, for the portions of disturbance, noise and the like which are not considered yet in the construction of the simplified environment model, the portions need to be considered in the step, and the portions need to be considered separately in a distributed manner.

In the following, still taking the automobile model construction as an example, the specific content of this step will be briefly described. As is well known, automobiles are made up of a number of systems, such as an engine, a transmission, a control system, a cooling system, and so forth. For the construction of the fidelity environment model of the automobile, the model construction of each part mainly comprises the construction of a principle model of the subsystem based on the disciplines of control science, thermodynamics, heat transfer science, general mechanics and the like, and the principle model is expressed by mathematical formulas such as a differential equation, a partial differential equation, discrete event scheduling and the like; the system integration model construction mainly relates to the variable relation modeling and the data exchange interface modeling among all modules. The model building in the process can realize the model building of the whole automobile system. In order to ensure the trueness and reliability of the model, specific experimental data are still required to be further adopted to correct specific structures and parameters in the model.

Meanwhile, for disturbance and noise, taking a following system of an automobile for a direction turning command as an example, the following system is realized by eliminating the deviation of the current actual direction of the vehicle from the command direction. Whereas the actual direction of the vehicle needs to be acquired based on sensor measurements. In practical systems, this measurement includes measurement disturbances and noise. This part of the noise and the disturbance also needs modeling.

Further, in S22, the verification of the fidelity environment model is mainly performed by VV & a to ensure that the behavior of the fidelity environment model and the behavior of the real physical system are consistent within the acceptable error range in terms of the required index. The specific data of the fidelity environment model verification mainly comes from two parts: (a) actual measurement data of a real physical system. The partial data is obtained by adding sensors to the physical system to carry out actual measurement and gathering and sorting. (b) And measuring data of the semi-physical simulation system, wherein the part of data is obtained by actually measuring related variables of the physical part system by adopting a sensor in the semi-physical simulation verification process in the next step.

Further, in S23, a fidelity environment is constructed according to the verified fidelity environment models, and after the construction is completed, in S24, the primarily trained agent is subjected to supplementary training under the condition of ensuring the environment.

Specifically, the intelligent agent fine-tuning training under the fidelity environment mainly relates to the fidelity environment constructed based on the above, and the intelligent agent model which is preliminarily trained in the previous step is subjected to fine-tuning training within an acceptable time range, so that the intelligent agent can capture system detail characteristics outside a core principle and form a decision on the basis, and the intelligent agent can be more suitable for the real physical environment. Specifically, this step includes the following processes:

(1) transforming input and output of the intelligent agent: because the models in the fidelity environment and the simplified environment are changed, the input and output quantities of the intelligent agent need to be adaptively modified.

(2) And (3) intelligent agent model extension design and realization: the process is mainly based on the expansion design of the intelligent agent model of the previous step, and a supplementary expansion intelligent agent structure (for example, the number of modules, the number of layers, the number of neurons and the like are involved by taking a neural network as an example) is formed for each part of the system on the basis of the intelligent agent model of the previous step.

(3) And (3) fine tuning training of the intelligent agent extension model: this process mainly involves the training of the extended agent in a fidelity environment. The same as the training of the intelligent agent in the simplified environment, the fidelity environment model can support the forms of supervision, semi-supervision and the like through the form of data generation, and can also generate data through the form of simulation interaction with the intelligent agent so as to drive the training of the intelligent agent.

Further, in S25, after the training is converged, it is evaluated whether the performance of the agent in the current fidelity environment meets the requirements, and if not, the process returns to the previous step, and the agent model is adjusted by means of adjusting the simplified model, adjusting the training algorithm, supplementing the training, and the like.

It should be noted that the intelligent agent after the supplementary training can also persistently store the structure, parameters and the like of the intelligent agent with better training performance in the form of data files, database records and the like, so as to be directly used later.

The intelligent body fine-tuning supplementary training under the fidelity environment aims to construct a fidelity model and environment meeting the fidelity requirement, and further train and fine-tune the intelligent body which is trained and converged under the simplified environment and the model condition under the model and environment condition, so that the intelligent body approaches to the best decision fitting under the real physical system condition. In the process, the fidelity environment is realized based on a simplified environment, and not only needs to completely reflect each mechanism characteristic of the system in the real physical environment, but also needs to pass through VV & A simulation model verification to verify that the fidelity environment meets the given requirement with the real physical system in the aspect of expression approximation degree. Meanwhile, the part of training also requires the intelligent agent to adapt to the newly added complex characteristics and characteristics in the fidelity environment on the basis of training the obtained model under the simplified environment model. Because the model and the environment are increased in complexity and fineness to a greater extent compared with the simplified condition, the training in the process is only a small amount of training to the fidelity environment data is used for fine-tuning training of the intelligent agent, and the intelligent agent is adapted to the real physical environment.

In some optional implementations of this embodiment, the S3 includes:

s31, constructing a semi-physical simulation environment;

Specifically, the S31 includes: replacing a part of fidelity environment model modules in the fidelity environment with a real physical system; and constructing the semi-physical simulation environment according to another part of environment modules in the fidelity environment and the real physical system.

The construction of the semi-physical bionic environment is mainly realized in a form of hardware in a ring, a part of fidelity environment models in a fidelity environment system is accessed through a port and is replaced by a real physical system, the further approximation of the fidelity environment to the real environment is realized, and the usability and the performance of the intelligent body with training convergence in the real physical system are verified, specifically, the method comprises the following steps:

(1) constructing a computer fidelity environment model and communicating and interacting data between the computer fidelity environment model and a real physical system;

(2) actually measuring and collecting data required by decision making of an intelligent agent;

(3) supplementing the acquisition of measured data for verifying the fidelity environment model;

in the following, still taking the automobile model construction as an example, the specific content of this step will be briefly described. As described above, an automobile is composed of a plurality of systems, such as an engine, a transmission system, a control system, a cooling system, and the like. In the process of establishing the fidelity environment, the system and part of the fidelity environment model are respectively established and integrated. In the step, the engine system or the transmission system model and the like can be replaced by the actual system, integration and interaction with the original fidelity environment are realized through technologies such as port communication and the like, joint simulation of the digital simulation part and the actual system part of the automobile simulation model is realized, and the construction of the semi-physical simulation environment is completed.

Further, in S32, the intelligent evaluation index design of the agent is mainly designed to determine whether the performance of the agent in the current semi-physical simulation environment meets the requirements, and specifically needs to be carried out in combination with a specific application case.

Still take the automobile model construction as an example, if the automobile steering control system is controlled by an intelligent model and the intelligent steering control performance of the automobile is required, the indexes of the automobile steering maneuverability and stability, the steering portability, the minimum turning radius, the steering wheel stability effect and the like can be considered for evaluation.

In the above, evaluation index design has been performed on whether the performance of the agent in the current semi-physical simulation environment meets the requirement, and this part mainly designs specific evaluation calculation on the above-mentioned indexes and evaluates whether the performance meets the requirement, and if not, returns to S23 for iteration, and trains and adjusts the agent by means of adjusting the fidelity environment model, adjusting the training algorithm, and supplementing training.

After the two steps of training are completed, the simulation environment is further physicochemically, namely, a plurality of parts of a real physical system are introduced into the virtual environment, and the effectiveness of the convergent intelligent model is verified and trained by the semi-physical system. In consideration of the semi-physical condition, the complexity of the simulation process and the simulation time consumption are further improved, so that the part of data is difficult to train the intelligent body independently, but the part of data can be fused with data in a fidelity environment, and the intelligent body is trained by adopting the fused data.

In summary, the embodiment has the advantages of clear principle and simple design, and by means of initial training of the agent based on a simplified environment, supplementary training of the agent based on a fidelity environment, and performance verification of the agent based on a semi-physical simulation environment, smooth transition from model training to application of the model in a physical space is realized, training of the agent with good reliability in an acceptable time range is realized in an actual system, so that the application field of the existing data-driven computational intelligence method is further expanded, and the method has the capability of migrating application to a real physical system.

Another embodiment of the present invention provides an agent training system, as shown in fig. 2, including:

Yet another embodiment of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the agent training method when executing the program. As shown in fig. 3, a computer system suitable for implementing the server provided in the present embodiment includes a Central Processing Unit (CPU) that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage section into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the computer system are also stored. The CPU, ROM, and RAM are connected thereto via a bus. An input/output (I/O) interface is also connected to the bus.

An input section including a keyboard, a mouse, and the like; an output section including a speaker and the like such as a Liquid Crystal Display (LCD); a storage section including a hard disk and the like; and a communication section including a network interface card such as a LAN card, a modem, or the like. The communication section performs communication processing via a network such as the internet. The drive is also connected to the I/O interface as needed. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive as necessary, so that a computer program read out therefrom is mounted into the storage section as necessary.

In particular, it is mentioned that the processes described in the above flowcharts can be implemented as computer software programs according to the present embodiment. For example, the present embodiments include a computer program product comprising a computer program tangibly embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium.

The flowchart and schematic diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to the present embodiments. In this regard, each block in the flowchart or schematic diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the schematic and/or flowchart illustration, and combinations of blocks in the schematic and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the present embodiment may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition module, a calculation module, a detection module, and the like. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves. For example, the simplified environment module may also be described as a "simplified environment build module".

As another aspect, the present application also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiments; or it may be a separate computer-readable storage medium not incorporated in the terminal. The computer readable storage medium stores one or more programs for use by one or more processors in performing the agent training methods described in the present disclosure.

It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention, and it will be obvious to those skilled in the art that other variations or modifications may be made on the basis of the above description, and all embodiments may not be exhaustive, and all obvious variations or modifications may be included within the scope of the present invention.

Claims

1. An agent training method, comprising the steps of:

2. The training method according to claim 1, wherein the S1 includes:

s11, constructing a plurality of simplified environment models;

and S15, storing the preliminarily trained intelligent agent.

3. The training method according to claim 2, wherein the S11 includes:

4. The training method according to claim 2, wherein the S12 includes:

5. Training method according to claim 1,

the S2 includes:

s21, constructing a plurality of fidelity environment models;

s22, verifying the plurality of fidelity environment models;

the S3 includes:

s31, constructing a semi-physical simulation environment;

6. The training method according to claim 5, wherein the S21 includes:

7. The training method according to claim 5, wherein the S31 includes:

8. An agent training system, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the program.

10. A computer-readable storage medium having instructions stored thereon, which when run on a computer, cause the computer to perform the method of any one of claims 1-7.