CN109255442A

CN109255442A - Training method, equipment and the readable medium of control decision module based on artificial intelligence

Info

Publication number: CN109255442A
Application number: CN201811132192.6A
Authority: CN
Inventors: 王凡; 周波; 陈科; 来杰; 周古月
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2019-01-22
Anticipated expiration: 2038-09-27
Also published as: CN109255442B

Abstract

The present invention provides training method, equipment and the readable medium of a kind of control decision module based on artificial intelligence.Its method includes: that the intervention data of smart machine are acquired in field test scene；Control decision module according to the intervention data of smart machine, in training smart equipment.Training method in the present invention is a kind of process for intervening study, learnt by the intervention of invention, control decision module can more efficiently be trained, to improve the control and decision-making capability of the control decision module in smart machine, enhance the intelligence of control decision module.

Description

Training method, equipment and the readable medium of control decision module based on artificial intelligence

[technical field]

The present invention relates to computer application technology more particularly to a kind of control decision modules based on artificial intelligence Training method, equipment and readable medium.

[background technique]

Artificial intelligence (Artificial Intelligence；AI), it is research, develops for simulating, extending and extending people Intelligence theory, method, a new technological sciences of technology and application system.Artificial intelligence is one of computer science Branch, it attempts to understand the essence of intelligence, and produces a kind of new intelligence that can be made a response in such a way that human intelligence is similar Energy machine, the research in the field includes robot, language identification, image recognition, natural language processing and expert system etc..

With the development of artificial intelligence, required in many smart machines using control decision module, and by control Decision-making module is trained, so that the study of control decision module carries out hardware controls and decision to smart machine.For example, at nobody In the smart machines such as machine, unmanned vehicle, robot, it is both provided with the control decision module learnt by Decision Control.In addition, existing The control decision module of technology realizes hardware controls and decision, and there are two major classes schemes: one kind is classic control, is built by physics Mould and accurate calculating or mathematical model, to obtain control signal；Another kind is intelligent control, by the operation of study people, or Person is directly strengthened from feedback signal.Wherein, the former corresponding learning method is usually supervised learning (Supervised Learning), the corresponding learning method of the latter usually reinforcing semi-supervised learning (Reinforcement Learning).Before Person is in the application to rely on the very high expert data of cost in the presence of very big defect, and this kind of data are high in addition to procurement cost, there is also One problem is that expert data can not usually cover institute's state space in need, once there is the shape not in training data State, control may fail and highly unstable.And intensified learning is more effective in practical application, because it can independently be learned It practises, and more stable.However such as unmanned vehicle in some hardware, there are one very big obstacle when the control of unmanned plane, it is exactly The problem of training cost.In general, intensified learning is needed by constantly failing, and learnt from these experiences.With nobody It is example that machine, which learns avoiding barrier, and in the training process, unmanned plane needs learn failure experience by colliding, and it is this at This is usually unacceptable.

Based on the above, it is known that the intensified learning mode Training Control decision-making module of the prior art can not be in reality It is realized in；And the training method of supervised learning, generalization ability are weaker, so that the control decision in the smart machine of training Module can not be coped with when encountering the state except training and control failure occur, lead to the control decision module of smart machine Intelligence it is poor.

[summary of the invention]

Training method, equipment and the readable medium for the control decision module based on artificial intelligence that the present invention provides a kind of, For improving the intelligence of the control decision module of smart machine.

The present invention provides a kind of training method of control decision module based on artificial intelligence, and the control decision module is set It sets in smart machine, which comprises

In field test scene, the intervention data of the smart machine are acquired；

According to the intervention data of the smart machine, the control decision module in the smart machine is trained.

Still optionally further, in method as described above, in field test scene, the intervention number of smart machine is acquired According to specifically including:

In the field test scene, corresponding institute when acquisition operator is to smart machine progress intervention operation State the intervention data of smart machine.

Still optionally further, in method as described above, the data of intervening include shape when intervening the smart machine State data and the smart machine respond the output signal and/or status data of the intervention operation.

In field test scene, the smart machine is acquired according to caused by preset guarantee rule and intervenes data.

Still optionally further, in method as described above, the data of intervening include the intervention item ensured in rule Part and the smart machine respond the output signal and/or status data of the intervention condition.

Still optionally further, in method as described above, according to the intervention data of the smart machine, the training intelligence The control decision module in equipment, specifically includes:

According to the intervention data of the smart machine, using the training method of intensified learning or the training side of supervised learning Formula trains the control decision module in the smart machine.

The present invention provides a kind of training device of control decision module based on artificial intelligence, and the control decision module is set It sets in smart machine, described device includes:

Acquisition module, for acquiring the intervention data of the smart machine in field test scene；

Training module trains the control in the smart machine for the intervention data according to the smart machine Decision-making module.

Still optionally further, in device as described above, the acquisition module is specifically used for:

Still optionally further, in device as described above, the data of intervening include shape when intervening the smart machine State data and the smart machine respond the output signal and/or status data of the intervention operation.

Still optionally further, in device as described above, the data of intervening include the intervention item ensured in rule Part and the smart machine respond the output signal and/or status data of the intervention condition.

Still optionally further, in device as described above, the training module is specifically used for:

The present invention also provides a kind of computer equipment, the equipment includes:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes the training method of the control decision module based on artificial intelligence as described above.

The present invention also provides a kind of computer-readable mediums, are stored thereon with computer program, which is held by processor The training method of the control decision module based on artificial intelligence as described above is realized when row.

Training method, equipment and the readable medium of control decision module based on artificial intelligence of the invention, by reality In ground test scene, the intervention data of smart machine are acquired；Control according to the intervention data of smart machine, in training smart equipment Decision-making module processed.Training method in the present invention is a kind of process for intervening study, and intervention study through the invention can be right Control decision module is more efficiently trained, to improve control and the decision energy of the control decision module in smart machine Power enhances the intelligence of control decision module.

[Detailed description of the invention]

Fig. 1 is the flow chart of the training method embodiment of the control decision module of the invention based on artificial intelligence.

Fig. 2 is the signal controlled using the control decision module based on artificial intelligence of the present embodiment smart machine Figure.

Fig. 3 is a kind of exemplary diagram that expert of the invention intervenes situation.

Fig. 4 is the structure chart of the training device embodiment of the control decision module of the invention based on artificial intelligence.

Fig. 5 is the structure chart of computer equipment embodiment of the invention.

Fig. 6 is a kind of exemplary diagram of computer equipment provided by the invention.

[specific embodiment]

To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is described in detail.

In the field test scene of the smart machines such as unmanned vehicle or unmanned plane, there is also a kind of important numbers According to i.e. test data.This kind of test data is usually to collect to complete there are the monitoring of people and intervention.Such as unmanned vehicle Drive test data.The characteristics of this kind of test data is: smart machine most of the time dependence control decision module, which is made decisions on one's own, to be held Row, but have operator's monitor state at any time, once problem occurs, operator carries out intervention intervention (intervention).Usually existence amount ratio is larger in practical projects for this kind of test data, but seldom effective use.The present invention Convenient for providing a kind of training of control decision module based on artificial intelligence based on test data present in above-mentioned actual scene Scheme.In this scenario, a kind of special intensified learning can be carried out for this kind of data, can such as be referred to as " intervening study " (Learning From Intervention,LFI).This kind of the destination of study is that effective use test data is strengthened in turn The control decision module of smart machine keeps the control decision module of smart machine more perfect during the test.

Fig. 1 is the flow chart of the training method embodiment of the control decision module of the invention based on artificial intelligence.Such as Fig. 1 Shown, the training method of the control decision module based on artificial intelligence of the present embodiment can specifically include following steps:

100, in field test scene, the intervention data of smart machine are acquired；

101, the control decision module according to the intervention data of smart machine, in training smart equipment.

The executing subject of the training method of the control decision module based on artificial intelligence of the present embodiment can be for based on people The training device of the control decision module of work intelligence, the control decision module are arranged in smart machine.The training device can also To be arranged in smart machine, the control decision module in smart machine is trained with realizing.

Specifically, in practical applications, it in order to carry out validity test to smart machine, needs for smart machine to be placed on the spot In, field test is carried out to smart machine.Before field test, the control decision module in smart machine can be learned according to supervision The mode of habit carried out some training, or can also carry out some instructions in simulated environment by way of intensified learning Practice.At this point, the control decision module in smart machine can be thermal starting.In this way, in field test scene, can mainly by The control decision module control smart machine operation of smart machine, and by the entire test process of Pole restriction, when expert has found intelligence When energy equipment is in the presence of undesirable, the control of the control decision module of smart machine can be intervened, influence smart machine Operation.Or in field test scene, such as in the field test scene of unmanned vehicle, tester can have been taken on unmanned vehicle Member, during the test, such as unmanned vehicle avoiding barrier or when not observing traffic rules and regulations, tester can be to unmanned vehicle Decision Control module intervened, avoiding barrier or observed traffic rules and regulations with success.Again alternatively, in smart machine, It can also be stored with guarantee rule that some guarantee smart machines operate normally, preset, for example, for unmanned vehicle, it is preset Ensure that rule can be with are as follows: around there are when barrier within the scope of pre-determined distance, speed no more than pre-set velocity threshold value, or When detecting that there is barrier in 50 meters of front, Reduced Speed Now etc. is answered.In this way, smart machine is run according to control decision module When, if detecting, current state conforms exactly to ensure the intervention condition in rule, intervenes smart machine according to guarantee rule Operation.

At this time accordingly, step 100 can specifically include the following two kinds situation:

The first situation, in field test scene, acquisition operator corresponds to when carrying out intervention operation to smart machine Smart machine intervention data.

The operator of the first situation can be the expert in above-described embodiment, or the tester of smart machine Member.In this case the intervention data of smart machine may include the status data and smart machine when intervening smart machine Respond the output signal and/or status data of intervention operation.

For example, the speed of smart machine, position when status data when intervening smart machine can be to intervene smart machine Etc. data.These data can be obtained by speed instrument, camera, radar and the various sensors etc. installed on smart machine It gets.The control letter exported when wherein the output signal of smart machine response intervention operation can be intervened for smart machine Number, such as brake signal, turn signal.Smart machine responds the status data of intervention operation, can identify smart machine and ring State after answering intervention operation, for example, smart machine intervened after speed be reduced to 0 perhaps other numerical value or smart machine by Changing Lane or other states etc. after to intervention.

Specifically, after the intervention data for collecting smart machine, according to the intervention data of smart machine, training smart equipment In the process of control decision module can be referred to as the study of the intervention to control decision module, the process of intervention study can be with Mode based on supervised learning carries out intervention study, or the mode based on intensified learning carries out intervention study.

For example, the output signal of status data and smart machine response intervention operation when intervening smart machine can be made Intervention study is carried out to control decision module using the training method of supervised learning for the data of supervised learning.Or it can incite somebody to action Status data and smart machine when intervening smart machine respond data of the status data of intervention operation as intensified learning, adopt With the training method of intensified learning, intervention study is carried out to control decision module.In practical application, the process for intervening study can be with It simultaneously include the training method of above-mentioned supervised learning and the training method of intensified learning, so as to be carried out to control decision module It more efficiently trains, to improve the control and decision-making capability of the control decision module in smart machine, enhances control decision mould The intelligence of block.

Second situation, in field test scene, acquisition smart machine is done according to caused by preset guarantee rule Pre- data.Specifically, when the status data of smart machine reaches and ensures the intervention condition in rule, just smart machine is carried out Intervene, so as to collect the intervention data of smart machine.

Data of intervening corresponding at this time may include the intervention condition and smart machine response intervention item ensured in rule The status data of output signal and/or smart machine the response intervention condition of part.Intervention condition in guarantee rule specifically may be used Think the status data for meeting and being intervened smart machine.

Similarly, after the intervention data for collecting smart machine, the intervention condition in rule and smart machine can will be ensured Data of the output signal of intervention condition as supervised learning are responded, using the training method of supervised learning, to control decision mould Block carries out intervention study.Or it can be by the status data of the intervention condition in rule that ensures and smart machine response intervention condition As the data of intensified learning, using the training method of intensified learning, intervention study is carried out to control decision module.Similarly, real In the application of border, the process for intervening study can include the training method of above-mentioned supervised learning and the training side of intensified learning simultaneously Formula, to enhance the intelligence of control decision module.

In addition, before smart machine is tested in field test scene, it can also be without any training, at this point, intelligence Control decision module in equipment can be cold start-up.And the process mainly learnt by above-mentioned intervention to control decision module into Row training.At this time relative to above-mentioned thermal starting, which needs to acquire more data of intervening and is learnt.

Based on the above, it is recognised that smart machine is in the process of running, the control of three aspects, control can be received The control of decision-making module processed, the preset control for ensureing rule and the control of operator.Wherein in common normally travel In, the preset control for ensureing rule and operator will not be triggered, and mainly be controlled by control decision module.And it presets Guarantee rule and operator control in any one be triggered, the control of control decision module is the guarantor being predetermined The control of the control or operator that hinder rule is captured.If the preset control for controling and operating personnel for ensureing rule When being triggered simultaneously, the priority of the control of operator is greater than the priority of the preset control for ensureing rule.

With s_tTo indicate that smart machine is presently in state, s_tIt may include the one of the current system sensor of smart machine The initial data that a little data targets (such as speed instrument, position) and camera, radar, ultrasound etc. are got records the time with t Step, so-called time step are the period of observed samples or control.The control system output control signal of smart machine can indicate For u_t=f (s_t；θ), wherein u_tIt may include the rotor-speed of aircraft, the steering wheel rotation of posture and unmanned vehicle, throttle/ Brake amplitude etc..θ is some parameters for adjusting the control system of smart machine, and f is then some forms being manually ranked.

Under normal conditions, in classical control, f based on the mathematical model of some classics or artificial experience, rule, For example, PID control.In intelligent control, the process of supervised learning and intensified learning is illustrated respectively.For example, there is supervision to learn In habit, first pass through expert and control, such as drive vehicle or operation aircraft pass through one section there are the distances of obstacle.It crosses herein Cheng Zhong, the data pair that can be operated with acquisition state and expertWhereinMiddle No. * indicates it is that expert's operation is corresponding Data.It is fitted f by the data being collected into, so that u_tAs far as possible close toTherefore, it sets objectives function

Objective function L (θ) is commonly referred to as mean square deviation, will not enumerate in addition to this there are also other errors.

In intensified learning, expert's operation is general unnecessary, but the entirely autonomous exploration of system, in the process of utonomous working In, it usually needs feedback (reward) is determined by Current observation to state and certain rule.Feedback is usually real to want It subject to existing target, for example, obtaining negative sense feedback when collision, and reaches target point and obtains positive feedback, and consume energy (such as anxious acceleration, anxious to slow down) can also use the feedback of negative sense when big.Smart machine during autonomous operation, each when Spacer step records (s_t,u_t,s_t+1,r_t), wherein r_tBe current time step feedback, then by the training method of intensified learning come Study.The technology of intensified learning includes that there are many type, in continuous control problem (u_tIn continuous space) in, can use is strategy Gradient (Policy Gradient) major class method, simplest is by following formula come generation strategy:

u_t=f (s_t,θ)+∈

Wherein ∈ is the additional noise for exploration, while by following formula come Optimal Parameters θ:

Wherein, T indicates the time span of whole fragment episode.In addition, Policy Gradient can also be using such as The policy replacements such as DDPG, TRPO, will not enumerate herein.

Fig. 2 is the signal controlled using the control decision module based on artificial intelligence of the present embodiment smart machine Figure.As shown in Fig. 2, the control decision module based on artificial intelligence is indicated in the present embodiment with Agent.Wherein s_tIt indicates The information that Observation is generated, generally refers to the information input of sensor, to start the control decision based on artificial intelligence Module work.f(s_t, θ) and it is the movement exported when Agent does not intervene.Sensor-based information input s_tStart smart machine Intervention signal Interference can also be drawn after control decision module, in the present embodiment and intervene the defeated of control decision module Out.Specifically, the control signal that the movement exported when not intervened Agent by Merge and expert intervene merges, and finally obtains The control signal u that smart machine is an actually-received_t, to act in the corresponding controller of smart machine (Controller).

As shown in Fig. 2, in intervening study, wherein increasing a symbolIndicate whether t moment has expert's intervention, such as Fruit, which exists, intervenes, i.e.,Otherwise it isAnd the control signal that expert intervenes is expressed asThen smart machine Practical received control signal can indicate are as follows:

When indicating to have intervention, expert's signal is used；When there is no intervening, the control decision based on artificial intelligence is used The output of module.Fig. 3 is a kind of exemplary diagram that expert of the invention intervenes situation.As shown in figure 3, complete for one Episode, it may be multiple segments by its cutting that expert, which intervenes, and each fractional time point includes: sart point in time (or upper one Wheel intervenes end time point) and intervention sart point in time.Such as T (k) and T ' (k), T (k+1) and T ' (k+1), T (k+2) and T ' (k+2) etc..

Under the premise of above-mentioned, for intervention learning process, it can adjust as follows:

Based on the above, on the one hand the technical solution of the present embodiment carries out the punishment of reward to interventional proceduresOn the other hand, learning by imitation is carried out to the layer of crossing of interference.

Intervention study i.e. through this embodiment, after generating artificial or regular intervention, smart machine can be kept away Exempt from the situation of generation danger close.Then this intervention can have been arrived in the network model of control decision module by two aspect study, On the one hand, intervene and produce the feedback of negative sense, this feedback can be learned in network model, dry to avoid generating again below In advance, that is, intervene and some danger informed into control decision module in advance so that control decision module avoid output act close to Any precarious position.Second aspect, the signal in intervention have been treated as supervisory signals study and have arrived in control decision module, The convergence of the network model of the control decision module of acceleration.

The training method of the control decision module based on artificial intelligence of the present embodiment, by field test scene, Acquire the intervention data of smart machine；Control decision module according to the intervention data of smart machine, in training smart equipment.This Training method in embodiment is a kind of process for intervening study, and intervention study through this embodiment can be to control decision Module is more efficiently trained, to improve the control and decision-making capability of the control decision module in smart machine, enhancing control The intelligence of decision-making module processed.

Fig. 4 is the structure chart of the training device embodiment of the control decision module of the invention based on artificial intelligence.Such as Fig. 4 Shown, the training device of the control decision module based on artificial intelligence of the present embodiment, the control decision module is arranged in intelligence In equipment, it can specifically include:

Acquisition module 10 is used in field test scene, acquires the intervention data of smart machine；

Training module 11 is used for the intervention data of smart machine acquired according to acquisition module 10, in training smart equipment Control decision module.

The training device of the control decision module based on artificial intelligence of the present embodiment realizes base by using above-mentioned module Realization principle and technical effect and above-mentioned related method embodiment in the training managing of the control decision module of artificial intelligence Realization it is identical, in detail can refer to above-mentioned related method embodiment record, details are not described herein.

Still optionally further, the training device of the control decision module based on artificial intelligence of above-mentioned embodiment illustrated in fig. 4 In, acquisition module 10 is specifically used for:

In field test scene, corresponding smart machine when acquisition operator is to smart machine progress intervention operation Intervene data.

Still optionally further, in the training device of the control decision module based on artificial intelligence of above-described embodiment, intervene Data include the output signal and/or state of the status data and smart machine response intervention operation when intervening smart machine Data.

Still optionally further, it in the training device of the control decision module based on artificial intelligence of above-mentioned Fig. 4 embodiment, adopts Collection module 10 is specifically used for:

In field test scene, acquisition smart machine intervenes data according to caused by preset guarantee rule.

Still optionally further, in the training device of the control decision module based on artificial intelligence of above-described embodiment, intervene Data include ensureing the output signal and/or status data of intervention condition and smart machine response intervention condition in rule.

Still optionally further, in the training device of the control decision module based on artificial intelligence of above-mentioned Fig. 4 embodiment, instruction Practice module 11 to be specifically used for:

According to the intervention data for the smart machine that acquisition module 10 acquires, using the training method or supervision of intensified learning The training method of study, the control decision module in training smart equipment.

The training device of the control decision module based on artificial intelligence of above-described embodiment is realized by using above-mentioned module The realization principle and technical effect of the training managing of control decision module based on artificial intelligence and above-mentioned correlation technique are implemented The realization of example is identical, can refer to the record of above-mentioned related method embodiment in detail, details are not described herein.

Fig. 5 is the structure chart of computer equipment embodiment of the invention.As shown in figure 5, the computer equipment of the present embodiment, It include: one or more processors 30 and memory 40, memory 40 works as memory for storing one or more programs The one or more programs stored in 40 are executed by one or more processors 30, so that one or more processors 30 are realized such as The training method of the control decision module based on artificial intelligence of figure 1 above illustrated embodiment.To include more in embodiment illustrated in fig. 5 For a processor 30.The computer equipment of the present embodiment can be smart machine.

For example, Fig. 6 is a kind of exemplary diagram of computer equipment provided by the invention.Fig. 6, which is shown, to be suitable for being used to realizing this The block diagram of the exemplary computer device 12a of invention embodiment.The computer equipment 12a that Fig. 6 is shown is only an example, Should not function to the embodiment of the present invention and use scope bring any restrictions.

As shown in fig. 6, computer equipment 12a is showed in the form of universal computing device.The component of computer equipment 12a can To include but is not limited to: one or more processor 16a, system storage 28a connect different system components (including system Memory 28a and processor 16a) bus 18a.

Bus 18a indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer equipment 12a typically comprises a variety of computer system readable media.These media can be it is any can The usable medium accessed by computer equipment 12a, including volatile and non-volatile media, moveable and immovable Jie Matter.

System storage 28a may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 30a and/or cache memory 32a.Computer equipment 12a may further include it is other it is removable/ Immovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 34a can be used for reading Write immovable, non-volatile magnetic media (Fig. 6 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 6, The disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided, and non-easy to moving The CD drive that the property lost CD (such as CD-ROM, DVD-ROM or other optical mediums) is read and write.In these cases, each Driver can be connected by one or more data media interfaces with bus 18a.System storage 28a may include at least One program product, the program product have one group of (for example, at least one) program module, these program modules are configured to hold The function of the above-mentioned each embodiment of Fig. 1-Fig. 4 of the row present invention.

Program with one group of (at least one) program module 42a/utility 40a, can store and deposit in such as system In reservoir 28a, such program module 42a include --- but being not limited to --- operating system, one or more application program, It may include the reality of network environment in other program modules and program data, each of these examples or certain combination It is existing.Program module 42a usually executes the function and/or method in above-mentioned each embodiment of Fig. 1-Fig. 4 described in the invention.

Computer equipment 12a can also be with one or more external equipment 14a (such as keyboard, sensing equipment, display 24a etc.) communication, the equipment interacted with computer equipment 12a communication can be also enabled a user to one or more, and/or (such as network interface card is adjusted with any equipment for enabling computer equipment 12a to be communicated with one or more of the other calculating equipment Modulator-demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 22a.Also, computer equipment 12a can also by network adapter 20a and one or more network (such as local area network (LAN), wide area network (WAN) and/or Public network, such as internet) communication.As shown, network adapter 20a passes through its of bus 18a and computer equipment 12a The communication of its module.It should be understood that although not shown in the drawings, other hardware and/or software can be used in conjunction with computer equipment 12a Module, including but not limited to: microcode, device driver, redundant processor, external disk drive array, RAID system, tape Driver and data backup storage system etc..

Processor 16a by the program that is stored in system storage 28a of operation, thereby executing various function application and Data processing, such as realize the training method of the control decision module based on artificial intelligence shown in above-described embodiment.

The present invention also provides a kind of computer-readable mediums, are stored thereon with computer program, which is held by processor The training method of the control decision module as shown in above-described embodiment based on artificial intelligence is realized when row.

The computer-readable medium of the present embodiment may include in the system storage 28a in above-mentioned embodiment illustrated in fig. 6 RAM30a, and/or cache memory 32a, and/or storage system 34a.

With the development of science and technology, the route of transmission of computer program is no longer limited by tangible medium, it can also be directly from net Network downloading, or obtained using other modes.Therefore, the computer-readable medium in the present embodiment not only may include tangible Medium can also include invisible medium.

The computer-readable medium of the present embodiment can be using any combination of one or more computer-readable media. Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer-readable storage medium Matter for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or Any above combination of person.The more specific example (non exhaustive list) of computer readable storage medium includes: with one Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium can With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or Person is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).

In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various It can store the medium of program code.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims

1. a kind of training method of the control decision module based on artificial intelligence, the control decision module is arranged in smart machine In, which is characterized in that the described method includes:

In field test scene, the intervention data of the smart machine are acquired；

2. the method according to claim 1, wherein acquiring the intervention of smart machine in field test scene Data specifically include:

In the field test scene, corresponding intelligence when acquisition operator is to smart machine progress intervention operation The intervention data of energy equipment.

3. according to the method described in claim 2, it is characterized in that, the intervention data include when intervening the smart machine Status data and the smart machine respond the output signal and/or status data of the intervention operation.

4. the method according to claim 1, wherein acquiring the intervention of smart machine in field test scene Data specifically include:

5. according to the method described in claim 4, it is characterized in that, the data of intervening include the intervention ensured in rule Condition and the smart machine respond the output signal and/or status data of the intervention condition.

6. -5 any method according to claim 1, which is characterized in that according to the intervention data of the smart machine, instruction Practice the control decision module in the smart machine, specifically include:

According to the intervention data of the smart machine, using the training method of intensified learning or the training method of supervised learning, The control decision module in the training smart machine.

7. a kind of training device of the control decision module based on artificial intelligence, the control decision module is arranged in smart machine In, which is characterized in that described device includes:

Training module trains the control decision in the smart machine for the intervention data according to the smart machine Module.

8. device according to claim 7, which is characterized in that the acquisition module is specifically used for:

9. device according to claim 8, which is characterized in that the intervention data include when intervening the smart machine Status data and the smart machine respond the output signal and/or status data of the intervention operation.

10. device according to claim 7, which is characterized in that the acquisition module is specifically used for:

11. device according to claim 10, which is characterized in that the data of intervening include dry in the guarantee rule Fore condition and the smart machine respond the output signal and/or status data of the intervention condition.

12. according to any device of claim 7-11, which is characterized in that the training module is specifically used for:

13. a kind of computer equipment, which is characterized in that the equipment includes:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.

14. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor Shi Shixian method for example as claimed in any one of claims 1 to 6.