CN109581874B - Method and apparatus for generating information - Google Patents

Method and apparatus for generating information Download PDF

Info

Publication number
CN109581874B
CN109581874B CN201811641376.5A CN201811641376A CN109581874B CN 109581874 B CN109581874 B CN 109581874B CN 201811641376 A CN201811641376 A CN 201811641376A CN 109581874 B CN109581874 B CN 109581874B
Authority
CN
China
Prior art keywords
signal
target
compensation
forward road
road image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811641376.5A
Other languages
Chinese (zh)
Other versions
CN109581874A (en
Inventor
张连川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN201811641376.5A priority Critical patent/CN109581874B/en
Publication of CN109581874A publication Critical patent/CN109581874A/en
Application granted granted Critical
Publication of CN109581874B publication Critical patent/CN109581874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The embodiment of the disclosure discloses a method and a device for generating information. One embodiment of the method comprises: acquiring a target signal, wherein the target signal is used for indicating the movement of a controlled device; inputting the obtained target signal into a pre-trained motion model to obtain a signal for compensation, wherein the motion model is a model obtained by training by adopting a reinforcement learning algorithm and is used for representing the corresponding relation between the target signal and the signal for compensation; acquiring a control signal generated by a target proportional integral derivative controller aiming at the acquired target signal; compensating the control signal by using the obtained compensation signal to generate a compensated signal; based on the compensated signal, a signal is generated for instructing the controlled device to perform a motion. The embodiment realizes the forward feedback of the feedforward proportional-integral-derivative controller through reinforcement learning, so that the movement of the controlled equipment has stronger robustness and wider application range.

Description

Method and apparatus for generating information
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for generating information.
Background
Today's closed-loop automatic control techniques are often based on feedback to reduce uncertainty. The elements of the feedback theory include three parts: measuring, comparing and executing. What is essential to the measurement is the actual value of the controlled variable, which is compared with the desired value, and this deviation is used to correct the response of the system and to perform the regulation control.
In engineering practice, a proportional integral derivative controller is generally used to realize the above regulation control. The feedforward control system adopting the proportional-integral-derivative controller compensates, so that the deviation of the system is reduced.
The feedforward control system of the existing proportional-integral-derivative controller is usually implemented by human experience or based on an Inertial Measurement Unit (IMU).
Disclosure of Invention
The present disclosure presents methods and apparatus for generating information.
In a first aspect, an embodiment of the present disclosure provides a method for generating information, the method including: acquiring a target signal, wherein the target signal is used for indicating the movement of a controlled device; inputting the obtained target signal into a pre-trained motion model to obtain a signal for compensation, wherein the motion model is a model obtained by training by adopting a reinforcement learning algorithm and is used for representing the corresponding relation between the target signal and the signal for compensation; acquiring a control signal generated by a target proportional integral derivative controller aiming at the acquired target signal; compensating the control signal by using the obtained compensation signal to generate a compensated signal; based on the compensated signal, a signal is generated for instructing the controlled device to perform a motion.
In some embodiments, the target signal is indicative of a speed of the controlled device, the controlled device is a vehicle, and the compensation signal is indicative of an acceleration of the controlled device and a time at which the controlled device is moving in accordance with the indicated acceleration.
In some embodiments, inputting the acquired target signal to a pre-trained motion model to obtain a signal for compensation includes: acquiring a forward road image of the controlled equipment, wherein the forward road image is an image of a road in the moving direction of the controlled equipment; and inputting the acquired target signal and the acquired forward road image into a pre-trained motion model to obtain a signal for compensation, wherein the motion model is used for representing the corresponding relation among the target signal, the forward road image and the signal for compensation.
In some embodiments, the motion model is trained by: acquiring a target signal set, a target forward road image set and a signal set for target compensation; and adopting a reinforcement learning algorithm to execute the following training steps to learn the generation behavior of the compensated signal: selecting a target signal from a target signal set; selecting a forward road image from a target forward road image set; selecting a compensation signal from a target compensation signal set; compensating a control signal corresponding to the selected compensation signal by using the selected compensation signal to generate a compensated signal, wherein the control signal corresponding to the selected compensation signal is generated by a target proportional-integral-derivative controller aiming at the selected target signal; determining whether the target vehicle satisfies a predetermined motion smoothing condition in a state where the target vehicle moves according to the acceleration indicated by the selected compensated signal and the indicated time of the movement; establishing a correspondence between the selected target signal, the selected forward road image and the selected signal for compensation in response to determining that the motion smoothing condition is satisfied; determining whether a preset training end condition is met; in response to determining that the training-end condition is satisfied, a motion model characterizing the established at least one correspondence is generated.
In some embodiments, the method further comprises: in response to determining that the training end condition is not satisfied, continuing to perform the training step.
In some embodiments, the forward road image in the target forward road image set is an image of a road having a slope; inputting the acquired target signal and the acquired forward road image into a pre-trained motion model to obtain a signal for compensation, wherein the signal for compensation comprises: in response to determining that the selected forward road image is an image of a forward road having a slope, a target signal and the forward road image are input to a pre-trained motion model, resulting in a signal for compensation.
In some embodiments, the motion smoothing condition comprises at least one of: the maximum movement speed of the target vehicle is less than or equal to a preset speed threshold; the variance of the speed of the target vehicle is less than or equal to a preset variance threshold; the acceleration change rate of the target vehicle is less than or equal to a preset acceleration change rate threshold value.
In some embodiments, the positive feedback of the target pid controller includes a forward road image of a road having a slope, and the negative feedback of the target pid controller includes a forward road image of a flat road.
In some embodiments, the forward road image is an image of the forward road of the controlled device on a spiral ramp.
In a second aspect, an embodiment of the present disclosure provides an apparatus for generating information, the apparatus including: a first acquisition unit configured to acquire a target signal, wherein the target signal is indicative of a motion of a controlled device; the input unit is configured to input the acquired target signal to a pre-trained motion model to obtain a signal for compensation, wherein the motion model is a model obtained by training through a reinforcement learning algorithm and is used for representing the corresponding relation between the target signal and the signal for compensation; a second acquisition unit configured to acquire a signal for control generated by the target pid controller for the acquired target signal; a first generating unit configured to compensate the control signal using the obtained compensation signal, and generate a compensated signal; a second generating unit configured to generate a signal for instructing the controlled device to perform a motion based on the compensated signal.
In some embodiments, the target signal is indicative of a speed of the controlled device, the controlled device is a vehicle, and the compensation signal is indicative of an acceleration of the controlled device and a time at which the controlled device is moving in accordance with the indicated acceleration.
In some embodiments, the input unit includes: the device comprises an acquisition module, a display module and a control module, wherein the acquisition module is configured to acquire a forward road image of the controlled device, and the forward road image is an image of a road in the moving direction of the controlled device; and the input module is configured to input the acquired target signal and the acquired forward road image into a pre-trained motion model to obtain a signal for compensation, wherein the motion model is used for representing the corresponding relation among the target signal, the forward road image and the signal for compensation.
In some embodiments, the motion model is trained by: acquiring a target signal set, a target forward road image set and a signal set for target compensation; and adopting a reinforcement learning algorithm to execute the following training steps to learn the generation behavior of the compensated signal: selecting a target signal from a target signal set; selecting a forward road image from a target forward road image set; selecting a compensation signal from a target compensation signal set; compensating a control signal corresponding to the selected compensation signal by using the selected compensation signal to generate a compensated signal, wherein the control signal corresponding to the selected compensation signal is generated by a target proportional-integral-derivative controller aiming at the selected target signal; determining whether the target vehicle satisfies a predetermined motion smoothing condition in a state where the target vehicle moves according to the acceleration indicated by the selected compensated signal and the indicated time of the movement; establishing a correspondence between the selected target signal, the selected forward road image and the selected signal for compensation in response to determining that the motion smoothing condition is satisfied; determining whether a preset training end condition is met; in response to determining that the training-end condition is satisfied, a motion model characterizing the established at least one correspondence is generated.
In some embodiments, the apparatus further comprises: a training continuation unit configured to continue to perform the training step in response to determining that the training end condition is not satisfied.
In some embodiments, the forward road image in the target forward road image set is an image of a road having a slope; and the input module includes: an input sub-module configured to input the target signal and the forward road image to a pre-trained motion model in response to determining that the selected forward road image is an image of a forward road having a slope, resulting in a signal for compensation.
In some embodiments, the motion smoothing condition comprises at least one of: the maximum movement speed of the target vehicle is less than or equal to a preset speed threshold; the variance of the speed of the target vehicle is less than or equal to a preset variance threshold; the acceleration change rate of the target vehicle is less than or equal to a preset acceleration change rate threshold value.
In some embodiments, the positive feedback of the target pid controller includes a forward road image of a road having a slope, and the negative feedback of the target pid controller includes a forward road image of a flat road.
In some embodiments, the forward road image is an image of the forward road of the controlled device on a spiral ramp.
In a third aspect, an embodiment of the present disclosure provides an electronic device for generating information, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments of the method for generating information as described above.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium for generating information, on which a computer program is stored, which when executed by a processor, implements the method of any of the embodiments of the method for generating information as described above.
The method and the device for generating information provided by the embodiments of the present disclosure obtain a target signal, where the target signal is used to indicate the motion of a controlled device, then input the obtained target signal to a pre-trained motion model to obtain a signal for compensation, where the motion model is a model trained by a reinforcement learning algorithm, and the motion model is used to represent the corresponding relationship between the target signal and the signal for compensation, then obtain a signal for control generated by a target pid controller for the obtained target signal, then compensate the signal for control by using the obtained signal for compensation to generate a compensated signal, and finally generate a signal for indicating the controlled device to move based on the compensated signal, thereby implementing the forward feedback of a feedforward pid controller by reinforcement learning, the method has the advantages that the movement of the controlled equipment has stronger robustness and wider application range, improves the precision of the generated signal for indicating the controlled equipment to move, and is beneficial to more accurately controlling the controlled equipment.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for generating information, according to the present disclosure;
FIG. 3 is a schematic diagram of one application scenario of a method for generating information according to the present disclosure;
FIG. 4 is a flow diagram of yet another embodiment of a method for generating information according to the present disclosure;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for generating information according to the present disclosure;
FIG. 6 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 of an embodiment of a method for generating information or an apparatus for generating information to which embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, a server 103, a network 104, and a controlled device 105. The network 104 is used to provide a medium of communication links between the terminal devices 101, 102, the server 103, and the controlled device 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, the server 103, the controlled device 105 may interact through the network 104 to receive or transmit data (e.g., signals indicating the movement of the controlled device), and the like. The terminal devices 101 and 102 may have various communication client applications installed thereon, such as a device control application, an image processing application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal apparatuses 101 and 102 may be hardware or software. When the terminal devices 101, 102 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101 and 102 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 103 may be a server that provides various services, such as a background server that controls the movement of the controlled device 105. The backend server may perform processing such as calculation on the received data (e.g., target signal) and feed back the processing result (e.g., control signal for instructing the controlled device to perform motion) to the controlled device 105.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
The controlled device 105 may be a controlled device. The controlled device 105 may be controlled by a control instruction sent by the terminal device 101, 102, or the server 103; or may be controlled by a controller or software installed in the controlled device 105 itself. By way of example, the controlled device 105 may include, but is not limited to, any of the following: a vehicle, a temperature control device, a pressure control device, a flow control device, a fluid level control device, and the like. The controlled device 105, after acquiring the control signal, may perform a movement as instructed by the control signal. It should be noted that the movement herein refers not only to movement, but also to operation. For example, the temperature control device may be operated to achieve a desired temperature to achieve temperature regulation. The vehicle can move to reach a desired speed, so that the speed control is realized. The above process may be referred to herein as the controlled device moving as instructed by the control signal to achieve the desired amount of control.
It should be noted that the method for generating information provided by the embodiment of the present disclosure may be executed by the server 103, or may be executed by the terminal devices 101 and 102, or may be executed by the controlled device 105; accordingly, the means for generating information may be provided in the server 103, in the terminal devices 101 and 102, or in the controlled device 105.
It should be understood that the number of terminal devices, networks, servers, and controlled devices in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, servers, and controlled devices, as desired for implementation. For example, the system architecture may only include the electronic device on which the method for generating information operates, when the electronic device on which the method for generating information operates does not require data transfer with other electronic devices.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating information in accordance with the present disclosure is shown. The method for generating information comprises the following steps:
step 201, a target signal is acquired.
In this embodiment, an execution subject of the method for generating information (for example, a server, a terminal device, or a controlled device shown in fig. 1) may obtain the target signal from other electronic devices, or locally, by a wired connection manner or a wireless connection manner. Wherein the target signal is indicative of a motion of the controlled device. The controlled device may be a controlled device. By way of example, the controlled device may include, but is not limited to, any of the following: a vehicle, a temperature control device, a pressure control device, a flow control device, a fluid level control device, and the like. It should be noted that the movement herein refers not only to movement, but also to operation. For example, the temperature control device may be operated to achieve a desired temperature to achieve temperature regulation. The vehicle can move to reach a desired speed, so that the speed control is realized. The above process may be referred to herein as the controlled device moving as instructed by the control signal to achieve the desired amount of control.
It will be appreciated that when the executing subject is a controlled device, the target signal acquired by the executing subject can be used to indicate the motion of the executing subject.
Specifically, the above target signal may be a signal for instructing the controlled device to change speed (e.g., increase speed or decrease speed), may be a signal for instructing the controlled device to change temperature (e.g., increase temperature or decrease temperature), may be a signal for instructing the controlled device to change flow (e.g., increase flow or decrease flow), may be a signal for instructing the controlled device to change pressure (e.g., increase pressure or decrease pressure), or the like. It will be appreciated that the target signal may be any signal for controlling a controlled device.
Step 202, inputting the acquired target signal to a pre-trained motion model to obtain a signal for compensation.
In this embodiment, the executing entity may input the target signal obtained in step 201 to a pre-trained motion model to obtain a signal for compensation. The motion model is obtained by training through a reinforcement learning algorithm and is used for representing the corresponding relation between the target signal and the signal for compensation.
In this embodiment, the compensation signal can be used to reduce the motion deviation of the controlled device. As an example, assume that the controlled device is a vehicle whose current travel speed is 20 km/h and the target speed of the vehicle indicated by the target signal is 30 km/h. Wherein the target speed is the speed to be reached by the vehicle. In this case, the actual speed of the vehicle is often deviated from the target speed due to the influence of factors such as the resistance of the vehicle. And the above-mentioned deviation can be reduced to some extent by the compensation signal.
Here, the motion model may be obtained by training the execution main body or an electronic device communicatively connected to the execution main body by using a Monte-Carlo Learning (Monte-Carlo Learning) reinforcement Learning algorithm, a Temporal-difference (td) reinforcement Learning algorithm, or a Q-Learning algorithm.
As an example, the motion model may be obtained by training the execution subject, or an electronic device communicatively connected to the execution subject, through the following steps:
first, a target signal set and a compensation signal set are acquired. Here, the above-described target signal may be used to indicate a target control amount (e.g., a control amount desired to be achieved by the controlled device, such as a speed, a temperature, etc. desired to be achieved by the controlled device). The compensation signals in the compensation signal set may be all signals that can be used to reduce the deviation resulting from the operation of the controlled device.
Then, with the reinforcement learning algorithm, the following training steps (including the first step to the seventh step) are performed to learn the generation behavior of the compensated signal:
in the first step, a target signal is selected from the acquired target signal set.
Here, the target signal may be selected from the acquired target signal set in various ways. For example, randomly, or in a particular order.
And a second step of selecting a compensation signal from the compensation signal set.
And thirdly, compensating the control signal corresponding to the selected compensation signal by using the selected compensation signal to generate a compensated signal. Wherein the control signal corresponding to the selected compensation signal is generated for the selected target signal by the target pid controller. The target pid controller may be a pid controller for generating a control signal for controlling the controlled device.
For example, if the selected compensation signal is positive compensation, the execution body may obtain the compensated signal by adding the target control amount indicated by the selected compensation signal to the control amount indicated by the control signal corresponding to the selected compensation signal. And the control quantity indicated by the compensated signal is the sum of the control quantity indicated by the compensation signal and the control quantity indicated by the control signal corresponding to the selected compensation signal. If the selected compensation signal is negative compensation, the execution body may subtract the control amount indicated by the control signal corresponding to the selected compensation signal from the control amount indicated by the selected compensation signal, thereby obtaining a compensated signal. The control quantity indicated by the compensated signal is the difference between the control quantity indicated by the compensation signal and the control quantity indicated by the control signal corresponding to the selected compensation signal.
And fourthly, determining whether the target vehicle meets a preset condition or not in a state that the target vehicle moves according to the target control quantity indicated by the selected compensated signal. Wherein, the preset condition may include but is not limited to at least one of the following: the movement speed is less than a preset speed threshold; a frequency of motion less than a preset frequency threshold, and so on.
And a fifth step of establishing a correspondence between the selected target signal and the selected signal for compensation in response to determining that the preset condition is satisfied.
Here, in the case where the above-described preset condition is satisfied, a reward value of the established correspondence relationship may be determined (for example, a reward value at each time may be calculated using a variable), and the motion model may be trained with the obtained total reward value as a maximum as a target, and in the training process, the correspondence relationship between the target signal and the compensation signal may be established by determining a probability of transition between each target signal in the above-described target signal set and each compensation signal in the above-described compensation signal set.
And sixthly, determining whether a preset training ending condition is met. Wherein, the training end condition may include, but is not limited to, at least one of the following: the training times reach or exceed the preset times; the training time reaches or exceeds the preset time length; the function value of the predetermined loss function is smaller than a preset threshold value, and so on.
And a seventh step of generating a motion model characterizing the established at least one correspondence in response to determining that the training end condition is satisfied.
Here, the motion model may be characterized by a probability between each target signal in the set of target signals and a transition of a respective compensation signal in the set of compensation signals, whereby the respective established correspondence may be characterized.
And responding to the determination that the training end condition is not met, and continuing to execute the training steps.
It is understood that the process of performing the training step is the process of adjusting the probabilities in the Q-table. When the second step is executed for the first time or the first few times, a greedy algorithm can be adopted to select a compensation signal from the compensation signal set; as the number of times the second step is performed increases, the compensation signal having the maximum probability corresponding to the selected target signal may be selected from the compensation signal set.
In step 203, a control signal generated by the target pid controller for the acquired target signal is acquired.
In this embodiment, the execution body may acquire a control signal generated by the target pid controller with respect to the acquired target signal.
Here, the control signal may be a signal output from a target pid controller after the target signal is input to the target pid controller; first, a difference operation is performed between a control amount indicated by a target signal and a control amount indicated by a signal of a feedback loop of a target proportional-integral-derivative controller, and a signal indicating a result of the difference operation is input to the target proportional-integral-derivative controller.
And 204, compensating the control signal by using the obtained compensation signal to generate a compensated signal.
In this embodiment, the execution body may compensate the control signal using the obtained compensation signal, and generate a compensated signal. Wherein the control signal corresponding to the selected compensation signal is generated for the selected target signal by the target pid controller. The target pid controller may be a pid controller for generating a control signal for controlling the controlled device.
Here, if the selected compensation signal is a positive compensation, the execution body may obtain a compensated signal by adding the target control amount indicated by the selected compensation signal to the control amount indicated by the control signal corresponding to the selected compensation signal. And the control quantity indicated by the compensated signal is the sum of the control quantity indicated by the compensation signal and the control quantity indicated by the control signal corresponding to the selected compensation signal. If the selected compensation signal is negative compensation, the execution body may subtract the control amount indicated by the control signal corresponding to the selected compensation signal from the control amount indicated by the selected compensation signal, thereby obtaining a compensated signal. The control quantity indicated by the compensated signal is the difference between the control quantity indicated by the compensation signal and the control quantity indicated by the control signal corresponding to the selected compensation signal.
In some alternative implementations of this embodiment, the target signal is indicative of a speed of the controlled device, the controlled device is a vehicle, and the compensation signal is indicative of an acceleration of the controlled device and a time at which the controlled device is moving in accordance with the indicated acceleration.
In some optional implementations of this embodiment, in a case that the controlled device is a vehicle, the executing body may further execute the step 202 by:
first, a forward road image of a vehicle (i.e., a controlled device) is acquired. Wherein the forward road image is an image of a road located in the moving direction of the controlled apparatus. For example, the forward road image may be an image of a road in the moving direction of the controlled device when the controlled device is traveling on a straight slope.
Here, the forward road image may be captured by the vehicle or may be captured by a road camera.
Then, the acquired target signal and the acquired forward road image are input to a pre-trained motion model to obtain a signal for compensation. The motion model is used for representing the corresponding relation among the target signal, the forward road image and the compensation signal.
Here, the motion model may be obtained by training the execution main body or an electronic device communicatively connected to the execution main body by using a Monte-Carlo Learning (Monte-Carlo Learning) reinforcement Learning algorithm, a Temporal-difference (td) reinforcement Learning algorithm, or a Q-Learning algorithm.
In some optional implementations of this embodiment, the motion model may be obtained by training the execution subject or an electronic device communicatively connected to the execution subject by:
first, a target signal set, a target forward road image set, and a target compensation signal set are acquired.
Here, the target signal may be used to indicate a target speed (i.e., a desired speed) of the vehicle. The target forward road image may be a forward road image of a flat road or an image of a forward road with a slope. The compensating signals in the set of compensating signals may be all signals that may be used to reduce the deviation resulting from the operation of the controlled device as described above.
Then, with the reinforcement learning algorithm, the following training steps (including the first step to the eighth step) are performed to learn the generation behavior of the compensated signal:
in the first step, a target signal is selected from a set of target signals.
Here, the target signal may be selected from the acquired target signal set in various ways. For example, randomly, or in a particular order.
And secondly, selecting a forward road image from the target forward road image set.
And thirdly, selecting a compensation signal from the target compensation signal set.
And fourthly, compensating the control signal corresponding to the selected compensation signal by using the selected compensation signal to generate a compensated signal. Wherein the control signal corresponding to the selected compensation signal is generated for the selected target signal by the target pid controller.
For example, if the selected compensation signal is a positive compensation, the execution body may obtain the compensated signal by adding the acceleration indicated by the selected compensation signal to the acceleration indicated by the control signal corresponding to the selected compensation signal. And the acceleration indicated by the compensated signal is the sum of the acceleration indicated by the compensation signal and the acceleration indicated by the control signal corresponding to the selected compensation signal. If the selected compensation signal is a negative compensation signal, the execution body may subtract the acceleration indicated by the control signal corresponding to the selected compensation signal from the acceleration indicated by the selected compensation signal to obtain a compensated signal. The acceleration indicated by the compensated signal is the difference between the acceleration indicated by the compensation signal and the acceleration indicated by the control signal corresponding to the selected compensation signal.
And fifthly, determining whether the target vehicle meets a predetermined motion smoothing condition in a state that the target vehicle moves according to the acceleration indicated by the selected compensated signal and the indicated motion time. Wherein, the preset condition may include but is not limited to at least one of the following: the movement speed is less than a preset speed threshold; a frequency of motion less than a preset frequency threshold, and so on.
And sixthly, establishing a corresponding relation among the selected target signal, the selected forward road image and the selected compensation signal in response to the fact that the motion smoothing condition is met.
Here, in the case where the above-described preset condition is satisfied, a reward value of the established correspondence relationship may be determined (for example, a reward value at each time may be calculated using a variable), and the motion model may be trained with the obtained total reward value as a maximum as a target, and in the training process, the correspondence relationship between the target signal and the compensation signal may be established by determining a probability between transition of each target signal in the above-described target signal set and each compensation signal in the above-described compensation signal set.
And step seven, determining whether a preset training end condition is met. Wherein, the training end condition may include, but is not limited to, at least one of the following: the training times reach or exceed the preset times; the training time reaches or exceeds the preset time length; the function value of the predetermined loss function is smaller than a preset threshold value, and so on.
And an eighth step of generating a motion model characterizing the established at least one correspondence in response to determining that the training end condition is satisfied.
Here, the motion model may be characterized by a probability between each target signal in the set of target signals and each compensation signal in the set of compensation signals, thereby characterizing each established correspondence.
In response to determining that the training end condition is not satisfied, continuing to perform the training step.
It is understood that the process of performing the training step is the process of adjusting the probabilities in the Q-table. When the second step is executed for the first time or the first few times, a greedy algorithm can be adopted to select a compensation signal from the compensation signal set; as the number of times the second step is performed increases, the compensation signal having the maximum probability corresponding to the selected target signal may be selected from the compensation signal set.
And step 205, generating a signal for indicating the controlled equipment to move based on the compensated signal.
In this embodiment, based on the compensated signal obtained in step 204, the executing body may generate a signal for instructing the controlled device to perform a motion.
Here, the execution body may directly determine the compensated signal as a signal generated by the execution body to instruct the controlled device to perform the motion; it is also possible to use the compensated signal as an input for a predetermined transfer function, resulting in an output for said transfer function, and to determine the resulting output as a signal for instructing the controlled device to perform a movement. For example, the above-mentioned transfer function may be used to characterize a correspondence between an acceleration of the vehicle (i.e., the controlled device) and an accelerator brake of the vehicle, whereby, in the case where the compensated signal indicates the acceleration of the vehicle, a signal indicating a control amount of the accelerator brake of the vehicle may be obtained using the compensated signal as an input amount of the transfer function.
In some optional implementations of the embodiment, the forward road image in the target forward road image set is an image of a road having a slope. Thus, for inputting the acquired target signal and the acquired forward road image to a pre-trained motion model, obtaining a signal for compensation may include: in a case where it is determined that the selected forward road image is an image of a forward road having a slope, the executing body may input the target signal and the forward road image to a motion model trained in advance, to obtain a signal for compensation.
It is understood that the motion model in this alternative implementation is substantially similar to the above-described manner of training the motion model in step 204, and is not described in detail here. It should be understood that the motion model is trained by using the image of the road with the slope, so that the motion precision of the controlled device can be improved, the motion of the controlled device has stronger robustness, and the application range is wider.
In some optional implementations of the embodiment, the motion smoothing condition includes at least one of: the maximum movement speed of the target vehicle is less than or equal to a preset speed threshold; the variance of the speed of the target vehicle is less than or equal to a preset variance threshold; the acceleration change rate of the target vehicle is less than or equal to a preset acceleration change rate threshold value.
It can be understood that by setting the above-described motion smoothing condition, the motion of the controlled device can be made smoother, the hardware loss of the controlled device can be reduced, and when the controlled device is a vehicle and the vehicle is running on a forward road with a slope, the comfort of the passengers in the vehicle can be improved.
In some optional implementations of the embodiment, the positive feedback of the target pid controller includes a forward road image of a road having a slope, and the negative feedback of the target pid controller includes a forward road image of a flat road.
In this case, the target pid controller of the flat road may be first parametered to realize the low-speed constant-speed running of the flat road. And then, taking the motion model obtained by the training as the forward compensation of the target proportional-integral-derivative controller. During the running of the vehicle, if the forward road image is an image of a road with a slope (for example, a road with a slope greater than or equal to a preset slope threshold), taking the image as the positive feedback of the target proportional-integral-derivative controller; if the forward road image is an image of a flat road (e.g., a road having a gradient less than a preset gradient threshold), the image is taken as negative feedback of the target pid controller.
In some optional implementations of this embodiment, the forward road image is an image of a forward road of the controlled device on a spiral ramp.
It can be understood that when the forward road image is an image of a forward road of the controlled device on the spiral ramp, and the controlled device is a vehicle, the robustness of the vehicle running on the spiral ramp can be improved, so that the vehicle can be applied to a low-speed (generally, the speed is less than 10 kilometers per hour) running scene of valet parking.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating information according to the present embodiment. In the application scenario of fig. 3, a vehicle arithmetic unit (i.e., the execution subject) provided on the vehicle acquires a target signal 3001, wherein the target signal 3001 is used for indicating the movement of the vehicle movement unit 303 (i.e., the controlled device). Then, the vehicle arithmetic unit inputs the acquired target signal 3001 to the previously trained motion model 301 to obtain the compensation signal 3002. The motion model 301 is a model obtained by training with a reinforcement learning algorithm, and the motion model 301 is used for representing the corresponding relationship between the target signal and the compensation signal. Then, the vehicle operation unit acquires a control signal 3003 generated by the proportional-integral-derivative controller 302 with respect to the acquired target signal 3001. Subsequently, the vehicle arithmetic unit compensates the control signal 3003 using the obtained compensation signal 3002, and generates a compensated signal. Finally, the vehicle arithmetic unit generates a signal 3004 for instructing the vehicle moving unit 303 to move, based on the compensated signal. The compensated signal is input to a predetermined transfer function, for example, resulting in a signal 3004 indicating that the vehicle movement unit 303 is moving.
The method provided by the above embodiment of the present disclosure obtains a target signal, where the target signal is used to indicate the motion of a controlled device, and then inputs the obtained target signal to a pre-trained motion model to obtain a signal for compensation, where the motion model is a model trained by an reinforcement learning algorithm, and the motion model is used to characterize the corresponding relationship between the target signal and the signal for compensation, and then obtains a signal for control generated by a target pid controller for the obtained target signal, and then compensates the signal for control by using the obtained signal for compensation to generate a compensated signal, and finally generates a signal for indicating the controlled device to move based on the compensated signal, so as to obtain the forward feedback of a feedforward pid controller by reinforcement learning, and make the motion of the controlled device have stronger robustness, the application range is wider, the precision of the generated signal for indicating the controlled equipment to move is improved, and the controlled equipment can be controlled more accurately.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method for generating information is shown. The flow 400 of the method for generating information comprises the steps of:
step 401, a target signal is acquired.
In this embodiment, an execution subject of the method for generating information (for example, a server, a terminal device, or a controlled device shown in fig. 1) may obtain the target signal from other electronic devices, or locally, by a wired connection manner or a wireless connection manner. Wherein the target signal is indicative of a speed of the vehicle.
Step 402, a forward road image of a vehicle is acquired.
In this embodiment, the execution subject may acquire a forward road image of the vehicle. The forward road image may be an image of a road located in a moving direction of the vehicle while the vehicle is traveling on a spiral curve. Here, the forward road image may be captured by the vehicle or may be captured by a road camera.
Step 403, in response to determining that the acquired forward road image is an image of a forward road with a slope, inputting the target signal and the forward road image into a pre-trained motion model to obtain a signal for compensation.
In this embodiment, in a case where it is determined that the acquired forward road image is an image of a forward road having a slope, the executing body may input the target signal acquired in step 401 and the forward road image acquired in step 402 to a pre-trained motion model to obtain a signal for compensation. The motion model is used for representing the corresponding relation among the target signal, the forward road image and the compensation signal. And the motion model is obtained by adopting a reinforcement learning algorithm for training. The signal for compensation may be used to indicate the acceleration of the vehicle and the time at which the vehicle is moving according to the indicated acceleration.
As an example, the motion model may be obtained by the execution main body or an electronic device communicatively connected to the execution main body by using a reinforcement learning algorithm Q-learning. Specifically, in constructing the Q-function for reinforcement learning, a forward road image may be used as an input, so that the gradient of the current road is learned by the forward road image. The resistance experienced by the vehicle is then determined based on the grade to determine the acceleration of the vehicle and the time for which the vehicle is moving at the indicated acceleration, thereby obtaining a signal for compensation. It will be appreciated that after inputting an image of a flat road to the trained motion model described above, the resulting slope of the current road may be 0, and thus the resulting compensated signaled acceleration may be 0.
In step 404, a control signal generated by the target pid controller for the acquired target signal is acquired.
In this embodiment, the execution body may acquire a control signal generated by the target pid controller with respect to the acquired target signal.
Here, it may be necessary to acquire an image of the car on a flat road as negative feedback and to acquire images of the car during uphill and downhill as positive feedback. And the parameter of the target proportional integral derivative controller of the flat road is regulated so as to realize the low-speed constant-speed running of the flat road.
In step 405, the control signal is compensated using the obtained compensation signal, and a compensated signal is generated.
In this embodiment, the execution body may further compensate the control signal using the obtained compensation signal to generate a compensated signal.
Based on the compensated signal, a signal indicative of the vehicle moving is generated, step 406.
In this embodiment, the executing body may further generate a signal for instructing the vehicle to move based on the compensated signal generated in step 405.
In this embodiment, the steps 404 to 406 are substantially the same as the steps 203 to 205 in the corresponding embodiment of fig. 2, and are not described herein again.
It is understood that the motion model of the embodiments of the present disclosure can learn that the vehicle passes through a spiral slope, by using the forward picture of a flat road as normal negative feedback, and the forward picture when going up and down a slope as normal positive feedback. It should be noted that, for the feedback of the downward slope, a fixed larger constant value (the constant value may be different from the value of the feedback indication of the upward slope by orders of magnitude, for example, the value of the feedback indication of the upward slope is a single bit (e.g., 5), then the fixed larger constant value may be a 3-bit number, a 4-bit number, etc., and for example, the fixed larger constant value may be 1000) may be added. It will be readily appreciated that positive feedback on uphill and downhill slopes can be distinguished by this fixed, relatively large constant value. And (4) training a Q function through the strategy, continuously improving the strategy and realizing an optimization control strategy.
Further, the control signal may be compensated using the obtained compensation signal as follows: because the compensation of the reinforcement learning output of the flat road is normal negative compensation, the compensation can not be processed; when the compensation of the motion model output is normal positive compensation, the compensated signal may be the sum of the output of the target pid controller and the feedforward compensation (i.e. the compensation signal of the motion model output); when the compensation of the motion model output is a large negative compensation, the output of the system is the difference between the output of the target pid controller and the absolute value of the compensation (i.e., the result obtained by subtracting the fixed large constant value from the feedback of the downhill).
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for generating information in the present embodiment highlights the step of determining the forward feedback of the pid controller by using the motion model obtained by the reinforcement learning algorithm, so as to realize that the vehicle travels in a spiral curve. Therefore, the scheme described in the embodiment can realize more accurate vehicle control through the strategy adopted in the training of the motion model. For example, the running speed of the vehicle may be controlled to be less than a preset threshold value, or the acceleration rate of the vehicle may be controlled to be a preset rate threshold value, so that the running of the vehicle may be controlled to be smoother (e.g., closer to a uniform speed), thereby improving the riding comfort of the vehicle passengers. In addition, the motion model of the embodiment of the present disclosure can learn that the vehicle passes through a spiral ramp, and thus, can be applied to an application scenario of valet parking.
With further reference to fig. 5, as an implementation of the methods illustrated in the above figures, the present disclosure provides an embodiment of an apparatus for generating information, the apparatus embodiment corresponding to the method embodiment illustrated in fig. 2, which may include the same or corresponding features as the method embodiment illustrated in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.
As shown in fig. 5, the apparatus 500 for generating information of the present embodiment includes: a first acquisition unit 501, an input unit 502, a second acquisition unit 503, a first generation unit 504, and a second generation unit 505. Wherein the first acquiring unit 501 is configured to acquire a target signal, wherein the target signal is used for indicating the motion of the controlled device; the input unit 502 is configured to input the acquired target signal to a pre-trained motion model, which is a model trained by a reinforcement learning algorithm, to obtain a signal for compensation, where the motion model is used to represent a corresponding relationship between the target signal and the signal for compensation; the second acquisition unit 503 is configured to acquire a control signal generated by the target pid controller for the acquired target signal; the first generating unit 504 is configured to compensate the control signal using the obtained compensation signal, and generate a compensated signal; the second generating unit 505 is configured to generate a signal for instructing the controlled device to perform a motion based on the compensated signal.
In this embodiment, the first obtaining unit 501 of the apparatus 500 for generating information may obtain the target signal from other electronic devices or locally through a wired connection manner or a wireless connection manner. Wherein the target signal is indicative of a motion of the controlled device. The controlled device may be a controlled device. By way of example, the controlled device may include, but is not limited to, any of the following: a vehicle, a temperature control device, a pressure control device, a flow control device, a fluid level control device, and the like.
In this embodiment, the input unit 502 may input the target signal acquired by the first acquiring unit 501 to a pre-trained motion model to obtain a signal for compensation. The motion model is obtained by training through a reinforcement learning algorithm and is used for representing the corresponding relation between the target signal and the signal for compensation.
In this embodiment, the second acquiring unit 503 may acquire a control signal generated by the target pid controller with respect to the acquired target signal.
In this embodiment, the first generating unit 504 may compensate the control signal using the obtained compensation signal, and generate a compensated signal. Wherein the control signal corresponding to the selected compensation signal is generated for the selected target signal by the target pid controller. The target pid controller may be a pid controller for generating a control signal for controlling the controlled device.
In this embodiment, the second generating unit 505 may generate a signal for instructing the controlled apparatus to perform a motion based on the compensated signal obtained by the first generating unit 504.
In some alternative implementations of this embodiment, the target signal is indicative of a speed of the controlled device, the controlled device is a vehicle, and the compensation signal is indicative of an acceleration of the controlled device and a time at which the controlled device is moving in accordance with the indicated acceleration.
In some optional implementations of the present embodiment, the input unit 502 includes: an acquisition module (not shown in the figure) is configured to acquire a forward road image of the controlled device, wherein the forward road image is an image of a road located in a moving direction of the controlled device; the input module (not shown in the figure) is configured to input the acquired target signal and the acquired forward road image into a pre-trained motion model to obtain a signal for compensation, wherein the motion model is used for representing the corresponding relationship among the target signal, the forward road image and the signal for compensation.
In some optional implementations of this embodiment, the motion model is obtained by training through the following steps: acquiring a target signal set, a target forward road image set and a signal set for target compensation; and adopting a reinforcement learning algorithm to execute the following training steps to learn the generation behavior of the compensated signal: selecting a target signal from a target signal set; selecting a forward road image from a target forward road image set; selecting a compensation signal from a target compensation signal set; compensating a control signal corresponding to the selected compensation signal by using the selected compensation signal to generate a compensated signal, wherein the control signal corresponding to the selected compensation signal is generated by a target proportional-integral-derivative controller aiming at the selected target signal; determining whether the target vehicle satisfies a predetermined motion smoothing condition in a state where the target vehicle moves according to the acceleration indicated by the selected compensated signal and the indicated time of the movement; establishing a correspondence between the selected target signal, the selected forward road image and the selected signal for compensation in response to determining that the motion smoothing condition is satisfied; determining whether a preset training end condition is met; in response to determining that the training end condition is satisfied, a motion model characterizing the established at least one correspondence is generated, whereby the established respective correspondence may be characterized.
In some optional implementations of this embodiment, the apparatus 500 further includes: a training continuation unit configured to continue to perform the training step in response to determining that the training end condition is not satisfied.
In some optional implementations of the embodiment, the forward road image in the target forward road image set is an image of a road having a slope. The input module 502 may include: an input sub-module (not shown) is configured to input the target signal and the forward road image to a pre-trained motion model, resulting in a signal for compensation, in response to determining that the selected forward road image is an image of a forward road having a slope.
In some optional implementations of the embodiment, the motion smoothing condition includes at least one of: the maximum movement speed of the target vehicle is less than or equal to a preset speed threshold; the variance of the speed of the target vehicle is less than or equal to a preset variance threshold; the acceleration change rate of the target vehicle is less than or equal to a preset acceleration change rate threshold value.
In some optional implementations of the embodiment, the positive feedback of the target pid controller includes a forward road image of a road having a slope, and the negative feedback of the target pid controller includes a forward road image of a flat road.
In some optional implementations of this embodiment, the forward road image is an image of a forward road of the controlled device on a spiral ramp.
The apparatus provided by the above embodiment of the present disclosure obtains a target signal by a first obtaining unit 501, where the target signal is used to indicate a motion of a controlled device, and then inputs the obtained target signal to a pre-trained motion model by an input unit 502 to obtain a signal for compensation, where the motion model is a model trained by an reinforcement learning algorithm, and the motion model is used to represent a corresponding relationship between the target signal and the signal for compensation, then a second obtaining unit 503 obtains a signal for control generated by a target proportional-integral-derivative controller with respect to the obtained target signal, then a first generating unit 504 compensates the signal for control by using the obtained signal for compensation to generate a compensated signal, and finally a second generating unit 505 generates a signal for indicating the controlled device to move based on the compensated signal to realize a forward feedback of a feedforward proportional-integral-derivative controller by reinforcement learning, the method has the advantages that the movement of the controlled equipment has stronger robustness and wider application range, improves the precision of the generated signal for indicating the controlled equipment to move, and is beneficial to more accurately controlling the controlled equipment.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use with the electronic device implementing embodiments of the present disclosure. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by a Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Python, Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first acquisition unit, an input unit, a second acquisition unit, a first generation unit, and a second generation unit. Here, the names of these units do not constitute a limitation to the unit itself in some cases, and for example, the first acquisition unit may also be described as a "unit that acquires a target signal".
As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target signal, wherein the target signal is used for indicating the movement of a controlled device; inputting the obtained target signal into a pre-trained motion model to obtain a signal for compensation, wherein the motion model is a model obtained by training by adopting a reinforcement learning algorithm and is used for representing the corresponding relation between the target signal and the signal for compensation; acquiring a control signal generated by a target proportional integral derivative controller aiming at the acquired target signal; compensating the control signal by using the obtained compensation signal to generate a compensated signal; based on the compensated signal, a signal is generated for instructing the controlled device to perform a motion.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (20)

1. A method for generating information, comprising:
acquiring a target signal, wherein the target signal is used for indicating the movement of a controlled device;
inputting the obtained target signal into a pre-trained motion model to obtain a signal for compensation, wherein the motion model is a model obtained by training by adopting a reinforcement learning algorithm and is used for representing the corresponding relation between the target signal and the signal for compensation;
acquiring a control signal generated by a target proportional integral derivative controller aiming at the acquired target signal;
compensating the control signal by using the obtained compensation signal to generate a compensated signal;
generating a signal for instructing the controlled device to perform motion based on the compensated signal;
the motion model is obtained by training the following steps:
acquiring a target signal set and a signal set for compensation;
adopting a reinforcement learning algorithm to execute the following training steps: selecting a target signal from the acquired target signal set; selecting a compensation signal from the compensation signal set; compensating a control signal corresponding to the selected compensation signal by using the selected compensation signal to generate a compensated signal, wherein the control signal corresponding to the selected compensation signal is generated by a target proportional-integral-derivative controller aiming at the selected target signal; determining whether the target vehicle meets a preset condition in a state that the target vehicle moves according to a target control quantity indicated by the selected compensated signal; establishing a corresponding relationship between the selected target signal and the selected signal for compensation in response to determining that the preset condition is satisfied; determining whether a preset training end condition is met; in response to determining that the training end condition is satisfied, generating a motion model characterizing the established at least one correspondence.
2. The method of claim 1, wherein a target signal is indicative of a velocity of the controlled device, the controlled device is a vehicle, and a compensating signal is indicative of an acceleration of the controlled device and a time at which the controlled device is moving in accordance with the indicated acceleration.
3. The method of claim 2, wherein inputting the acquired target signal to a pre-trained motion model to obtain a signal for compensation comprises:
acquiring a forward road image of the controlled equipment, wherein the forward road image is an image of a road in the moving direction of the controlled equipment;
and inputting the acquired target signal and the acquired forward road image into a pre-trained motion model to obtain a signal for compensation, wherein the motion model is used for representing the corresponding relation among the target signal, the forward road image and the signal for compensation.
4. The method of claim 3, wherein the motion model is trained by:
acquiring a target signal set, a target forward road image set and a signal set for target compensation;
and adopting a reinforcement learning algorithm to execute the following training steps to learn the generation behavior of the compensated signal: selecting a target signal from a target signal set; selecting a forward road image from a target forward road image set; selecting a compensation signal from a target compensation signal set; compensating a control signal corresponding to the selected compensation signal by using the selected compensation signal to generate a compensated signal, wherein the control signal corresponding to the selected compensation signal is generated by a target proportional-integral-derivative controller aiming at the selected target signal; determining whether a target vehicle satisfies a predetermined motion smoothing condition in a state where the target vehicle moves in accordance with the acceleration indicated by the selected compensated signal and the indicated time of the movement; establishing a correspondence between the selected target signal, the selected forward road image and the selected signal for compensation in response to determining that the motion smoothing condition is satisfied; determining whether a preset training end condition is met; in response to determining that the training end condition is satisfied, generating a motion model characterizing the established at least one correspondence.
5. The method of claim 4, wherein the method further comprises:
continuing to perform the training step in response to determining that the training end condition is not satisfied.
6. The method of claim 4, wherein a forward road image in the set of target forward road images is an image of a road having a slope; and
inputting the acquired target signal and the acquired forward road image into a pre-trained motion model to obtain a signal for compensation, comprising:
and in response to determining that the selected forward road image is an image of a forward road with a slope, inputting the target signal and the forward road image to a pre-trained motion model to obtain a signal for compensation.
7. The method of claim 4, wherein the motion smoothing condition comprises at least one of:
the maximum movement speed of the target vehicle is less than or equal to a preset speed threshold;
the variance of the speed of the target vehicle is less than or equal to a preset variance threshold;
and the acceleration change rate of the target vehicle is less than or equal to a preset acceleration change rate threshold value.
8. The method according to one of claims 1 to 7, wherein the positive feedback of the target PID controller comprises a forward road image of a road with a slope and the negative feedback of the target PID controller comprises a forward road image of a flat road.
9. The method according to any one of claims 3-7, wherein the forward road image is an image of a forward road of the controlled device on a spiral ramp.
10. An apparatus for generating information, comprising:
a first acquisition unit configured to acquire a target signal, wherein the target signal is indicative of a motion of a controlled device;
an input unit, configured to input the acquired target signal to a pre-trained motion model to obtain a signal for compensation, where the motion model is a model trained by using a reinforcement learning algorithm, the motion model is used to represent a corresponding relationship between the target signal and the signal for compensation, and the motion model is obtained by training through the following steps: acquiring a target signal set and a signal set for compensation; adopting a reinforcement learning algorithm to execute the following training steps: selecting a target signal from the acquired target signal set; selecting a compensation signal from the compensation signal set; compensating a control signal corresponding to the selected compensation signal by using the selected compensation signal to generate a compensated signal, wherein the control signal corresponding to the selected compensation signal is generated by a target proportional-integral-derivative controller aiming at the selected target signal; determining whether the target vehicle meets a preset condition in a state that the target vehicle moves according to a target control quantity indicated by the selected compensated signal; establishing a corresponding relationship between the selected target signal and the selected signal for compensation in response to determining that the preset condition is satisfied; determining whether a preset training end condition is met; in response to determining that the training end condition is satisfied, generating a motion model characterizing the established at least one correspondence;
a second acquisition unit configured to acquire a signal for control generated by the target pid controller for the acquired target signal;
a first generating unit configured to compensate the control signal using the obtained compensation signal, and generate a compensated signal;
a second generating unit configured to generate a signal for instructing the controlled device to perform a motion based on the compensated signal.
11. The apparatus of claim 10, wherein a target signal is indicative of a velocity of the controlled device, the controlled device is a vehicle, and a compensation signal is indicative of an acceleration of the controlled device and a time at which the controlled device is moving in accordance with the indicated acceleration.
12. The apparatus of claim 11, wherein the input unit comprises:
an acquisition module configured to acquire a forward road image of the controlled device, wherein the forward road image is an image of a road located in a moving direction of the controlled device;
and the input module is configured to input the acquired target signal and the acquired forward road image into a pre-trained motion model to obtain a signal for compensation, wherein the motion model is used for representing the corresponding relation among the target signal, the forward road image and the signal for compensation.
13. The apparatus of claim 12, wherein the motion model is trained by:
acquiring a target signal set, a target forward road image set and a signal set for target compensation;
and adopting a reinforcement learning algorithm to execute the following training steps to learn the generation behavior of the compensated signal: selecting a target signal from a target signal set; selecting a forward road image from a target forward road image set; selecting a compensation signal from a target compensation signal set; compensating a control signal corresponding to the selected compensation signal by using the selected compensation signal to generate a compensated signal, wherein the control signal corresponding to the selected compensation signal is generated by a target proportional-integral-derivative controller aiming at the selected target signal; determining whether a target vehicle satisfies a predetermined motion smoothing condition in a state where the target vehicle moves in accordance with the acceleration indicated by the selected compensated signal and the indicated time of the movement; establishing a correspondence between the selected target signal, the selected forward road image and the selected signal for compensation in response to determining that the motion smoothing condition is satisfied; determining whether a preset training end condition is met; in response to determining that the training end condition is satisfied, generating a motion model characterizing the established at least one correspondence.
14. The apparatus of claim 13, wherein the apparatus further comprises:
a continuation training unit configured to continue to perform the training step in response to determining that the training end condition is not satisfied.
15. The apparatus according to claim 13, wherein a forward road image in the target set of forward road images is an image of a road having a slope; and
the input module includes:
an input sub-module configured to input the target signal and the forward road image to a pre-trained motion model, resulting in a signal for compensation, in response to determining that the selected forward road image is an image of a forward road having a slope.
16. The apparatus of claim 13, wherein the motion smoothing condition comprises at least one of:
the maximum movement speed of the target vehicle is less than or equal to a preset speed threshold;
the variance of the speed of the target vehicle is less than or equal to a preset variance threshold;
and the acceleration change rate of the target vehicle is less than or equal to a preset acceleration change rate threshold value.
17. The apparatus according to one of claims 10 to 16, wherein the positive feedback of the target pid controller includes a forward road image of a road having a slope, and the negative feedback of the target pid controller includes a forward road image of a flat road.
18. The apparatus according to any one of claims 12-16, wherein the forward road image is an image of a forward road of the controlled device on a spiral ramp.
19. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
20. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-9.
CN201811641376.5A 2018-12-29 2018-12-29 Method and apparatus for generating information Active CN109581874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811641376.5A CN109581874B (en) 2018-12-29 2018-12-29 Method and apparatus for generating information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811641376.5A CN109581874B (en) 2018-12-29 2018-12-29 Method and apparatus for generating information

Publications (2)

Publication Number Publication Date
CN109581874A CN109581874A (en) 2019-04-05
CN109581874B true CN109581874B (en) 2022-04-05

Family

ID=65932825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811641376.5A Active CN109581874B (en) 2018-12-29 2018-12-29 Method and apparatus for generating information

Country Status (1)

Country Link
CN (1) CN109581874B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117813561A (en) * 2021-09-26 2024-04-02 西门子股份公司 Motion control method and device
CN116048160B (en) * 2023-02-23 2024-06-14 苏州浪潮智能科技有限公司 Control method and control device of heat dissipation system of power supply and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339406A (en) * 2007-07-04 2009-01-07 中国科学院自动化研究所 Self-adaptive controllers and method
CN105740793A (en) * 2016-01-26 2016-07-06 哈尔滨工业大学深圳研究生院 Road bump condition and road type identification based automatic speed adjustment method and system
CN106877746A (en) * 2017-03-21 2017-06-20 北京京东尚科信息技术有限公司 Method for control speed and speed control unit
CN106888419A (en) * 2015-12-16 2017-06-23 华为终端(东莞)有限公司 The method and apparatus for adjusting earpiece volume
CN107479368A (en) * 2017-06-30 2017-12-15 北京百度网讯科技有限公司 A kind of method and system of the training unmanned aerial vehicle (UAV) control model based on artificial intelligence
CN108099897A (en) * 2017-11-17 2018-06-01 浙江吉利汽车研究院有限公司 Cruise control method, apparatus and system
CN108639048A (en) * 2018-05-15 2018-10-12 智车优行科技(北京)有限公司 Automobile lane change householder method, system and automobile

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339406A (en) * 2007-07-04 2009-01-07 中国科学院自动化研究所 Self-adaptive controllers and method
CN106888419A (en) * 2015-12-16 2017-06-23 华为终端(东莞)有限公司 The method and apparatus for adjusting earpiece volume
CN105740793A (en) * 2016-01-26 2016-07-06 哈尔滨工业大学深圳研究生院 Road bump condition and road type identification based automatic speed adjustment method and system
CN106877746A (en) * 2017-03-21 2017-06-20 北京京东尚科信息技术有限公司 Method for control speed and speed control unit
CN107479368A (en) * 2017-06-30 2017-12-15 北京百度网讯科技有限公司 A kind of method and system of the training unmanned aerial vehicle (UAV) control model based on artificial intelligence
CN108099897A (en) * 2017-11-17 2018-06-01 浙江吉利汽车研究院有限公司 Cruise control method, apparatus and system
CN108639048A (en) * 2018-05-15 2018-10-12 智车优行科技(北京)有限公司 Automobile lane change householder method, system and automobile

Also Published As

Publication number Publication date
CN109581874A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
US20230153617A1 (en) Distributed training using actor-critic reinforcement learning with off-policy correction factors
CN109606383B (en) Method and apparatus for generating a model
CN110968087B (en) Calibration method and device for vehicle control parameters, vehicle-mounted controller and unmanned vehicle
CN114291098B (en) Parking method and device for automatically driving vehicle
US11868866B2 (en) Controlling agents using amortized Q learning
CN109581874B (en) Method and apparatus for generating information
CN109606365A (en) Method for controlling a vehicle and device
CN113635892B (en) Vehicle control method, device, electronic equipment and computer readable medium
CN113183975A (en) Control method, device, equipment and storage medium for automatic driving vehicle
CN113008258B (en) Path planning method, device, equipment and storage medium
CN114834467A (en) Control operation method and device for automatic driving vehicle and unmanned vehicle
KR20130017403A (en) Apparatus and method for control of actuator
JPWO2020152977A1 (en) Vehicle control device, vehicle control method, and vehicle control system
CN109606366B (en) Method and device for controlling a vehicle
EP3757712A1 (en) Method for controlling mobile robot, apparatus, and control system
CN111399489B (en) Method and device for generating information
CN114419758B (en) Vehicle following distance calculation method and device, vehicle and storage medium
CN113682298B (en) Vehicle speed limiting method and device
CN111976703B (en) Unmanned control method and device
CN112461239B (en) Method, device, equipment and storage medium for planning mobile body path
CN114889848A (en) Control method and device for satellite attitude, computer equipment and medium
CN113778075A (en) Control method and device for automatic guided vehicle
CN115577318B (en) Semi-physical-based data fusion evaluation method, system, equipment and storage medium
CN115534950B (en) Vehicle control method, device, equipment and computer readable medium
CN115114976B (en) Training method, device, equipment and storage medium of pretightening distance prediction model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant