WO2020029580A1 - 训练用于生成自动驾驶策略的控制策略模型的方法和装置 - Google Patents

训练用于生成自动驾驶策略的控制策略模型的方法和装置 Download PDF

Info

Publication number
WO2020029580A1
WO2020029580A1 PCT/CN2019/078072 CN2019078072W WO2020029580A1 WO 2020029580 A1 WO2020029580 A1 WO 2020029580A1 CN 2019078072 W CN2019078072 W CN 2019078072W WO 2020029580 A1 WO2020029580 A1 WO 2020029580A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
data
encoder
low
feature space
Prior art date
Application number
PCT/CN2019/078072
Other languages
English (en)
French (fr)
Inventor
闫洁
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020029580A1 publication Critical patent/WO2020029580A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators

Definitions

  • the present application relates to the field of autonomous driving, and in particular, to a method and device for training a control strategy model for generating an autonomous driving strategy.
  • Autonomous driving is a technology in which a computer system replaces a human to drive a motor vehicle, and includes functional modules such as environmental perception, location positioning, path planning, decision control, and power systems.
  • the way to realize the environment perception function includes the following two ways: the environment perception function is realized by high-precision low-dimensional sensors such as lidar and millimeter wave radar, and the environment sense is realized by high-dimensional low-precision sensors such as monocular / multi-lens HD camera.
  • high-precision, low-dimensional sensors such as lidar are expensive and the accuracy is easily affected by weather conditions, and their accuracy drops sharply.
  • Low-precision, high-dimensional sensors such as high-definition cameras are cheaper and more resistant to interference, and high-dimensional data (i.e. , Data obtained through high-dimensional sensors) contains more information than low-dimensional data (that is, data obtained through low-dimensional sensors), and can reflect the complex traffic environment. Therefore, using high-dimensional data to determine autonomous driving strategies has a larger Application prospects.
  • high-dimensional data contains a large amount of information, usually, high-dimensional data also contains some redundant information. Therefore, it is difficult to directly obtain a usable autonomous driving strategy by processing high-dimensional data with artificial neural networks.
  • This application provides a method and device for training a control strategy model for generating an autonomous driving strategy.
  • low-dimensional training data is used to determine a hidden feature space and a policy function defined on the hidden feature space.
  • supervise the training of the encoder that maps the high-dimensional training data to the hidden feature space and then apply the encoder and the above strategy function to the real traffic environment, that is, input the high-dimensional data obtained from the real traffic environment,
  • high-dimensional data can be used to directly obtain available autonomous driving strategies.
  • the present application also provides a method and device for generating an automatic driving strategy, an automatic driving strategy generating system, and a control method for the automatic driving strategy generating system.
  • a method for training a control strategy model for generating an autonomous driving strategy including: acquiring a hidden feature space of low-dimensional training data, wherein the low-dimensional training data is collected from a first traffic scene
  • the second encoder is trained through the hidden feature space of the high-dimensional training data and the low-dimensional training data.
  • the high-dimensional training data is data collected from the first traffic scene
  • the information contained in the low-dimensional training data is A subset of the information contained in the high-dimensional training data, the second encoder being part of a control strategy model used to generate an autonomous driving strategy.
  • the hidden feature space obtained through the low-dimensional training data must also be obtained from the high-dimensional training data. Based on the above principles, this application first Obtain the hidden feature space of the low-dimensional training data. Because the low-dimensional training data contains a small amount of information and the low-dimensional training data contains less redundant information, the hidden feature space based on the low-dimensional training data is relatively easy to obtain. Available strategy functions. Then, the hidden feature space of the low-dimensional data is used to supervise the training process of the second encoder, and finally a second encoder capable of mapping the high-dimensional training data to the hidden feature space is obtained. After the training of the second encoder is completed, the high-dimensional data in the real environment (that is, high-dimensional real data) can be directly processed using the second encoder and the policy function obtained in advance to obtain a usable autonomous driving strategy.
  • training the second encoder by using the hidden feature space of the high-dimensional training data and the low-dimensional training data includes: inputting the high-dimensional training data as an input amount to a second encoder to obtain the hidden data of the high-dimensional training data. Feature space.
  • the hidden feature space of the low-dimensional training data is used to supervise the output result of the second encoder, so that the hidden feature space of the high-dimensional training data is the same as the hidden feature space of the low-dimensional training data.
  • the supervised learning method is a machine learning method.
  • the machine uses the hidden feature space of the low-dimensional training data to supervise the output of the second encoder. Finally, it can obtain the second map of the high-dimensional training data to the hidden feature space of the low-dimensional training data. Encoder.
  • acquiring the hidden feature space of the low-dimensional training data includes: inputting the low-dimensional training data into a first encoder to obtain a hidden feature space of the low-dimensional training data, and the first encoder is trained based on multiple low-dimensional data samples. It is obtained that each of the plurality of low-dimensional data samples is data collected from any traffic scene and has the same type as the low-dimensional training data, and the first encoder is a component of a control strategy model. section.
  • the type of the low-dimensional training data is the same as the type of the low-dimensional data samples. In this way, the first encoder obtained through the low-dimensional data samples can be applied to the low-dimensional training data, so that the hidden feature space of the low-dimensional training data can be obtained.
  • the method further includes: training according to the multiple low-dimensional data samples and state parameter of multiple vehicles.
  • the strategy model is controlled to obtain a first encoder and a strategy function, and the plurality of low-dimensional data samples correspond to the state parameters of the plurality of vehicles one-to-one.
  • the method further includes: with determine Among them, f ′ 1 represents the first encoder before the update, Represents parameters other than independent variables in f ′ 1 , s (1) represents hidden feature space, Represents with The relevant gradient s (1) , Represents the gradient L RL related to s (1) , and L RL represents the loss function related to the reinforcement learning model, Means updated versus Positive correlation, and, versus Negative correlation Update f ′ 1 to obtain f 1 , and f 1 represents the updated first encoder.
  • the above solution provides a training method of the first encoder when the gradient descent algorithm is used, which can continuously optimize the first encoder, so that the hidden feature space obtained from the low-dimensional training data can more accurately reflect the first traffic environment.
  • training the second encoder through the hidden feature space of the high-dimensional training data and the low-dimensional training data includes: with determine Among them, f ′ 2 represents the second encoder before the update, Represents parameters other than arguments in f ′ 2 , Represents with Correlation gradient l, l represents with variance of, Means updated versus Positive correlation, and, versus Negative correlation, where, x (2) represents high-dimensional training data, Express Norm of Update f ′ 2 to obtain f 2 , and f 2 represents the updated second encoder.
  • the above solution provides a training method of the second encoder when the gradient descent algorithm is used, which can continuously optimize the second encoder so that the high-dimensional training data is more accurately mapped to the hidden feature space of the low-dimensional training data.
  • the method further includes: aligning timestamps of x (1) and x (2) .
  • Aligning the timestamps of the low-dimensional training data and the high-dimensional training data can more accurately map the high-dimensional training data to the hidden feature space of the low-dimensional training data.
  • the method further includes: acquiring high-dimensional real data, the high-dimensional real data is data collected by the vehicle from the second traffic scene, and the type of the high-dimensional real data is the same as the type of the high-dimensional training data;
  • the state parameters of the vehicle and high-dimensional real data are input into the control strategy model to generate an automatic driving strategy suitable for the second traffic scenario, and the automatic driving strategy is used to control the vehicle to drive in the second traffic scenario.
  • high-dimensional real data and high-dimensional training data are both image data. Since the type of high-dimensional real data is the same as the type of high-dimensional training data, the second encoder obtained from the high-dimensional training data is also applicable to high-dimensional training data.
  • control strategy model further includes a strategy function; wherein the state parameters of the vehicle and high-dimensional real data are input into the control strategy model to generate an automatic driving strategy suitable for the second traffic scenario, including: inputting high-dimensional real data
  • the second encoder obtains the hidden feature space of the high-dimensional real data, and obtains the autonomous driving strategy according to the hidden feature space of the high-dimensional real data, the state parameters of the vehicle, and the policy function.
  • the present application provides a method for generating an autonomous driving strategy, including: entering high-dimensional real data into a second encoder to obtain a hidden feature space of the high-dimensional real data, where the high-dimensional real data is a vehicle from the current traffic Data collected in the scene; generating an automatic driving strategy based on the hidden feature space of high-dimensional real data, the state parameters of the vehicle, and a policy function, the automatic driving strategy is used to control the vehicle to drive in the current traffic scene;
  • the second encoder is obtained by training: inputting low-dimensional training data into the first encoder to obtain a hidden feature space of the low-dimensional training data, and the low-dimensional training data is data collected from the first traffic scene; The second encoder is trained through the hidden feature space of the high-dimensional training data and the low-dimensional training data.
  • the high-dimensional training data is data collected from the first traffic scene, and the information contained in the low-dimensional training data is high-dimensional training data. Contains a subset of the information.
  • the second encoder obtained through the above method can directly obtain the available hidden feature space from the high-dimensional real data, so that the high-dimensional real data can be used to obtain an automatic driving strategy suitable for the current traffic scene. It has a low price and better anti-interference ability. Strong advantages.
  • training the second encoder by using the hidden feature space of the high-dimensional training data and the low-dimensional training data includes: inputting the high-dimensional training data as an input amount to the second encoder to obtain the hidden feature space of the high-dimensional training data.
  • the hidden feature space of the low-dimensional training data is used to supervise the output result of the second encoder, so that the hidden feature space of the high-dimensional training data is the same as the hidden feature space of the low-dimensional training data.
  • the first encoder and the strategy function are obtained through training by training a control strategy model according to multiple low-dimensional data samples and state parameters of multiple vehicles to obtain the first encoder and the strategy.
  • the control strategy model includes a first encoder and a strategy function, and each of the plurality of low-dimensional data samples is data collected from any traffic scene and of the same type as the low-dimensional training data The plurality of low-dimensional data samples correspond to the state parameters of the plurality of vehicles one-to-one.
  • the present application provides an automatic driving strategy generation system.
  • the automatic driving strategy generation system includes a control strategy model, a first switch, and a second switch.
  • the control strategy model includes a first encoder, a second encoder, and a strategy.
  • the first switch is used to control the state of the path between the first encoder and the policy function module
  • the second switch is used to control the state of the path between the second encoder and the policy function module
  • the first encoder is used to receive the vehicle
  • the low-dimensional real data collected from the traffic scene and the hidden feature space of the low-dimensional real data are output.
  • the second encoder is configured to receive the high-dimensional real data collected by the vehicle from the traffic scene and output the high-dimensional real data.
  • the policy function module is used to generate an automatic driving strategy according to the received state parameters of the vehicle and the hidden feature space.
  • the automatic driving strategy is used to control the vehicle in a traffic scenario.
  • the above-mentioned system can select different strategies to generate paths according to the type of data collected by the vehicle. For example, when the collected data is low-dimensional real data, the working state of the first switch is controlled to be closed, and an automatic acquisition is performed based on the low-dimensional real data.
  • the driving strategy when the collected data is high-dimensional real data, controls the working state of the second switch to be closed, and obtains an automatic driving strategy based on the high-dimensional real data, thereby having strong flexibility and robustness.
  • the working states of the first switch and the second switch are opposite to achieve a hidden feature space that the policy function module receives the output of the first encoder or the second encoder.
  • the working states of the first switch and the second switch are opposite, so that the policy function module can only receive the hidden feature space of one type of data at the same time, which can prevent the system from being caused by the hidden feature space of the policy function module receiving multiple types of data at the same time. Operation error.
  • the path state between the first encoder and the strategy function module is connected, and the second encoder and the strategy function module are connected.
  • the state of the path between them is cut off, so that the first encoder inputs the hidden feature space of low-dimensional real data to the policy function module.
  • the path between the second encoder and the policy function module is connected, and between the first encoder and the policy function module The state of the channel is cut off, so that the second encoder inputs the hidden feature space of high-dimensional real data to the policy function module.
  • the automatic driving strategy generation system further includes a data valve for controlling whether low-dimensional real data is input to the first encoder, and controlling whether high-dimensional real data is input to the second encoder.
  • the above scheme can control the input of low-dimensional real data and high-dimensional real data through the data valve to realize the hidden feature space of the strategy function module receiving the output of the first encoder or the second encoder.
  • the scheme of closing the implementation of the strategy function module to receive the hidden feature space output by the first encoder or the second encoder can prevent the first encoder or the second encoder from doing unnecessary work through the data valve control scheme.
  • the present application provides a control method for an automatic driving strategy generation system.
  • the automatic driving strategy generation system includes a control strategy model, a first switch, and a second switch.
  • the control strategy model includes a first encoder, A second encoder and a strategy function module; wherein the first switch is used to control the state of the path between the first encoder and the strategy function module, and the second switch is used to control the path between the second encoder and the strategy function module State, the first encoder is used to receive the low-dimensional real data collected by the vehicle from the traffic scene and output the hidden feature space of the low-dimensional real data, and the second encoder is used to receive the high-dimensional real data collected by the vehicle from the traffic scene Data and output the hidden feature space of high-dimensional real data, and the policy function module is configured to generate an automatic driving strategy according to the received state parameters of the vehicle and the hidden feature space;
  • the control method includes:
  • a hidden feature space of low-dimensional real data or a hidden feature space of high-dimensional real data is input to the policy function module.
  • the above-mentioned system can select different strategies to generate paths according to the type of data collected by the vehicle. For example, when the collected data is low-dimensional real data, the working state of the first switch is controlled to be closed, and an automatic acquisition is performed based on the low-dimensional real data. Driving strategy: When the collected data is high-dimensional real data, the working state of the second switch is controlled to be closed, and an automatic driving strategy is obtained based on the high-dimensional real data. Therefore, the above control method has strong flexibility and robustness. .
  • inputting the hidden feature space of low-dimensional real data or the hidden feature space of high-dimensional real data to the policy function module includes controlling the work of the first switch.
  • the state is closed and the working state of the second switch is open; the first encoder inputs the hidden feature space of low-dimensional real data to the policy function module.
  • the reliability of the low-dimensional real data is higher than the reliability of the high-dimensional real data.
  • the working state of the first switch can be controlled to be closed, and the working state of the second switch can be controlled to be open, so that the highly reliable Data is obtained from highly reliable autonomous driving strategies.
  • inputting the hidden feature space of low-dimensional real data or the hidden feature space of high-dimensional real data to the policy function module includes controlling the work of the second switch.
  • the state is closed and the working state of the first switch is open; the second encoder inputs the hidden feature space of the high-dimensional real data to the policy function module.
  • the reliability of the high-dimensional real data is higher than the reliability of the low-dimensional real data.
  • the working state of the first switch can be controlled to be open, and the working state of the second switch can be controlled to be closed, so that the highly reliable Data is obtained from highly reliable autonomous driving strategies.
  • the low-dimensional real data is radar data collected by a vehicle from a traffic scene through radar
  • the high-dimensional real data is image data collected by a vehicle from a traffic scene through a camera.
  • a device for training a control strategy model for generating an autonomous driving strategy can implement functions corresponding to each step in the method according to the first aspect, and the functions can be implemented by hardware.
  • Corresponding software can also be implemented by hardware.
  • the hardware or software includes one or more units or modules corresponding to the functions described above.
  • the apparatus includes a processor configured to support the apparatus to perform a corresponding function in the method according to the first aspect.
  • the device may also include a memory for coupling to the processor, which stores program instructions and data necessary for the device.
  • the device further includes a communication interface, which is used to support communication between the device and other network elements.
  • a computer-readable storage medium stores computer program code.
  • the processing unit or processor is caused to execute the first aspect. The method described.
  • a computer program product includes computer program code that, when the computer program code is executed by a processing unit or processor, causes the processing unit or processor to execute the method of the first aspect.
  • an apparatus for generating an autonomous driving strategy may implement functions corresponding to each step in the method according to the second aspect, and the functions may be implemented by hardware or may execute corresponding functions by hardware.
  • Software Implementation The hardware or software includes one or more units or modules corresponding to the functions described above.
  • the device includes a processor configured to support the device to perform a corresponding function in the method according to the second aspect.
  • the device may also include a memory for coupling to the processor, which stores program instructions and data necessary for the device.
  • the device further includes a communication interface, which is used to support communication between the device and other network elements.
  • a computer-readable storage medium stores computer program code, and when the computer program code is executed by a processing unit or processor, the processing unit or processor is caused to execute the second aspect. The method described.
  • a computer program product includes computer program code, and when the computer program code is executed by a processing unit or a processor, the processing unit or the processor is caused to execute the above-mentioned second aspect. method.
  • an apparatus for controlling an automatic driving strategy generation system may implement functions corresponding to each step in the method according to the fourth aspect, and the functions may be implemented by hardware or hardware. Perform the corresponding software implementation.
  • the hardware or software includes one or more units or modules corresponding to the functions described above.
  • the device includes a processor configured to support the device to perform a corresponding function in the method according to the fourth aspect.
  • the device may also include a memory for coupling to the processor, which stores program instructions and data necessary for the device.
  • the device further includes a communication interface, which is used to support communication between the device and other network elements.
  • a computer-readable storage medium stores computer program code, and when the computer program code is executed by a processing unit or processor, the processing unit or processor is caused to execute a fourth Aspect of the method.
  • a computer program product includes computer program code, and when the computer program code is run by a processing unit or processor, the processing unit or processor is caused to execute the foregoing fourth aspect. Methods.
  • FIG. 1 is a schematic diagram of a system for training a control strategy model applicable to the present application
  • FIG. 2 is a schematic flowchart of a method for training a control strategy model for generating an autonomous driving strategy provided by the present application
  • FIG. 3 is a schematic flowchart of a method for training a first encoder and a policy function by using a reinforcement learning model provided in the present application;
  • FIG. 4 is a schematic diagram of an automobile physical device provided by the present application.
  • FIG. 5 is a schematic flowchart of a method for generating an autonomous driving strategy provided by the present application
  • FIG. 6 is a schematic structural diagram of an autonomous driving vehicle provided by the present application.
  • FIG. 7 is a schematic flowchart of controlling an autonomous vehicle to achieve automatic driving provided by the present application.
  • FIG. 8 is a schematic diagram of an automatic driving strategy generation system provided by the present application.
  • FIG. 9 is a schematic diagram of a method for controlling an automatic driving strategy generation system provided by the present application.
  • FIG. 10 is a schematic structural diagram of a device for training a control strategy model for generating an automatic driving strategy provided by the present application.
  • FIG. 11 is a schematic structural diagram of another apparatus for training a control strategy model for generating an automatic driving strategy provided by the present application;
  • FIG. 12 is a schematic structural diagram of a device for generating an automatic driving strategy provided by the present application.
  • FIG. 13 is a schematic structural diagram of another apparatus for generating an automatic driving strategy provided by the present application.
  • FIG. 14 is a schematic structural diagram of a device for controlling an automatic driving strategy generation system provided by the present application.
  • FIG. 15 is a schematic structural diagram of another apparatus for controlling an automatic driving strategy generation system provided by the present application.
  • FIG. 1 illustrates a system for training a control strategy model suitable for the present application.
  • the system is used to train a control strategy model for generating autonomous driving strategies in a simulated environment.
  • the system includes:
  • the simulator 110 includes an environment module 111, a car module 112, and a simulator engine 113.
  • the environment module 111 is used to set a traffic environment (such as a city, a village, a highway, etc.), and the car module 112 is used to simulate an electronic system of a vehicle.
  • the simulator engine 113 can also be called a task logic module, which is used to design driving tasks, plan routes, design rewards and punishment rules, etc., and gradually advance the entire simulation process in chronological order.
  • the autonomous driving agent 120 includes a reinforcement learning module 121.
  • the autonomous driving agent 120 may be a software program for receiving a vehicle state parameter x (0) , low-dimensional training data x (1) , and high-dimensional data from the simulator 110.
  • the training data x (2) and the instant reward r make a decision (ie, a control action) based on the above data, and send control action information to the simulator 110.
  • the reinforcement learning module 121 is configured to train a first encoder described below through a reinforcement learning model.
  • x (0) is, for example, vehicle speed, acceleration, body offset angle, position, etc.
  • x (1) is, for example, lidar data
  • x (2) is, for example, image data taken by a front camera
  • the autonomous driving agent 120 The control actions determined based on the above data are, for example, acceleration, braking, and steering wheel angle. If the simulator 110 completes the driving task after executing the control actions, the instant reward r sent to the autonomous driving agent 120 may be positive feedback. If the driving task is not completed after 110 performing the control action, the instant reward r sent to the autonomous driving agent 120 may be negative feedback.
  • the above system can finally output a set of models that can make decisions based on high-dimensional data directly in a real environment.
  • the driver visually obtains the distance between the vehicle (ie, the vehicle being driven by the driver) and the obstacle, thereby making a control strategy to avoid the obstacle.
  • the above control strategy is made by the driver based on the distance between the vehicle and the obstacle.
  • the complete information obtained by the driver through vision also includes information such as the shape and type of the obstacle, so the vehicle and the obstacle are indicated.
  • the distance data can be called low-dimensional data, and the data containing the complete information can be called high-dimensional data. Since the information contained in the low-dimensional data is a subset of the information contained in the high-dimensional data, if the automatic driving control strategy can be determined based on the low-dimensional data, the high-dimensional data can also be used to determine the automatic driving strategy.
  • the low-dimensional data control strategy model is used to process the low-dimensional data to obtain the automatic driving strategy and the high-dimensional data control strategy model Autonomous driving strategies obtained from processing high-dimensional data must be the same.
  • the low-dimensional data contains less information, it is easier to obtain a control strategy model that meets the requirements of safe driving by training the control strategy model with low-dimensional data. Therefore, you can first train a control strategy model that meets the requirements of safe driving based on the low-dimensional data. The control strategy model is then used to supervise the training of the control strategy model for high-dimensional data.
  • the method for training a control strategy model of low-dimensional data includes:
  • a control strategy model for low-dimensional data is trained according to multiple low-dimensional data samples and state parameters of multiple vehicles, and a first encoder and a policy function are obtained.
  • the multiple low-dimensional data samples and state parameters of multiple vehicles are one. One correspondence.
  • the plurality of low-dimensional data samples and the state parameters of the plurality of vehicles are, for example, data generated by the simulator 110.
  • the aforementioned transportation means may be a vehicle, or other equipment such as an aircraft, a submersible, a ship, an industrial robot, and the like.
  • the first encoder is used to extract the hidden feature space from the low-dimensional data samples, and the policy function is used to output the automatic driving strategy based on the own vehicle parameters (for example, the own vehicle speed) and the hidden feature space of the low-dimensional data samples.
  • Hidden feature space is a set of features extracted from raw data (for example, low-dimensional data samples) by machine learning algorithms.
  • Features are abstract representations of raw data. Since features extracted from raw data are often used as machine learning algorithms Intermediate parameters are not output results, so features are also called latent features.
  • the training system may train according to the following manner when training the first encoder:
  • f ′ 1 represents the first encoder before the update
  • s (1) represents hidden feature space
  • L RL represents the loss function related to the reinforcement learning model
  • Means updated versus Positive correlation and, versus Negative correlation Update f ′ 1 to obtain f 1
  • f 1 represents the updated first encoder.
  • positive correlation means that when the independent variable increases, the dependent variable also increases, and when the independent variable decreases, the dependent variable also decreases.
  • the function y 2x
  • when x increases, y It also increases, and when x decreases, y also decreases, it is said that y is positively correlated with x.
  • negative correlation means that when the independent variable increases, the dependent variable decreases, and when the independent variable decreases, the dependent variable increases.
  • the above solution provides a training method of the first encoder when using the gradient descent algorithm, which can continuously optimize the first encoder, so that the hidden feature space obtained from the low-dimensional training data can more accurately reflect the first traffic environment.
  • represents the learning rate, and the value range is greater than or equal to 0 and less than or equal to 1.
  • the training system obtains the low-dimensional training data, inputs the low-dimensional training data to the first encoder, obtains the hidden feature space of the low-dimensional training data, and uses the hidden feature space of the low-dimensional training data to monitor the control of the high-dimensional training data. Training of strategy models.
  • FIG. 2 A method for training a control strategy model of high-dimensional data is shown in FIG. 2.
  • the method can be executed by a training system, that is, training a second encoder offline, or by a vehicle, that is, training a second encoder online.
  • Device The method 200 includes:
  • S210 Obtain a hidden feature space of low-dimensional training data, where the low-dimensional training data is data collected from a first traffic scene.
  • the low-dimensional training data refers to the low-dimensional data used in the model training phase.
  • the hidden feature space of the low-dimensional training data may be a hidden feature space of low-dimensional training data (for example, ranging radar data). It can also be the hidden feature space of a variety of low-dimensional training data (for example, ranging radar data and velocity radar data).
  • the second encoder Train the second encoder through the hidden feature space of the high-dimensional training data and the low-dimensional training data.
  • the high-dimensional training data is data collected from the first traffic scene, and the information contained in the low-dimensional training data is high. A subset of the information contained in the training data, the second encoder being part of a control strategy model used to generate an autonomous driving strategy.
  • the second encoder is used to obtain the hidden feature space from the high-dimensional training data.
  • the hidden feature space of the high-dimensional training data includes part or all of the hidden feature space of the low-dimensional training data.
  • the second encoder obtained through training obtains the high-dimensional training data from the high-dimensional training data.
  • the hidden feature space obtained in is the same as the hidden feature space obtained by the first encoder from the low-dimensional training data, that is, the second encoder obtained through training can map the high-dimensional training data to the hidden feature space of the low-dimensional training data. In this way, the second encoder and the policy function in the control strategy model of the low-dimensional data can be used to form a control strategy model of the high-dimensional data to generate an autonomous driving strategy.
  • the high-dimensional training data may be data collected in synchronization with the low-dimensional training data.
  • the high-dimensional sensor and the low-dimensional sensor work simultaneously to collect data from the first traffic scene.
  • the second encoder may be a function, an artificial neural network, or other algorithms or models, which are used to process the input high-dimensional training data and obtain the hidden feature space of the high-dimensional training data.
  • this application first obtains the hidden feature space of the low-dimensional training data. Because the low-dimensional training data contains a small amount of information and the low-dimensional training data contains less redundant information, Hidden feature space makes it easier to obtain usable policy functions. Subsequently, the hidden feature space is used to supervise the training process of the second encoder, that is, to train a second encoder capable of mapping the high-dimensional training data to the hidden feature space of the low-dimensional training data. After the training of the second encoder is completed, the high-dimensional data in the real environment (that is, high-dimensional real data) can be directly processed using the second encoder and the policy function obtained in advance to obtain a usable autonomous driving strategy.
  • training the second encoder by using the hidden feature space of the high-dimensional training data and the low-dimensional training data includes:
  • the high-dimensional training data is used as an input to the second encoder to obtain the hidden feature space of the high-dimensional training data.
  • the hidden feature space of the low-dimensional training data is used to supervise the output of the second encoder, so that the hidden features of the high-dimensional training data The space is the same as the hidden feature space of the low-dimensional training data.
  • the supervised learning method is a machine learning method.
  • the machine uses the hidden feature space of the low-dimensional training data to supervise the output of the second encoder. Finally, it can obtain the second map of the high-dimensional training data to the hidden feature space of the low-dimensional training data. Encoder.
  • the hidden feature space of the high-dimensional training data obtained by the second encoder processing the high-dimensional training data is the same as the hidden feature space of the low-dimensional training data, it can also be considered that the second encoder will The training data is mapped to the hidden feature space of the low-dimensional training data.
  • training the second encoder through the hidden feature space of the high-dimensional training data and the low-dimensional training data includes: with determine Among them, f ′ 2 represents the second encoder before the update, Represents parameters other than arguments in f ′ 2 , Represents with Correlation gradient l, l represents with variance of, Means updated versus Positive correlation, and, versus Negative correlation, where, x (2) represents high-dimensional training data, Express Norm of Update f ′ 2 to obtain f 2 , and f 2 represents the updated second encoder.
  • the above solution provides a training method of the second encoder when the gradient descent algorithm is used, which can continuously optimize the second encoder so that the high-dimensional training data is more accurately mapped to the hidden feature space of the low-dimensional training data.
  • represents the learning rate
  • the value range is greater than or equal to 0 and less than or equal to 1.
  • the method 200 further includes:
  • Aligning the timestamps of low-dimensional training data and high-dimensional training data can more accurately map high-dimensional training data to the hidden feature space of low-dimensional training data
  • FIG. 3 shows a process for training a model (f 1 , g) using a reinforcement learning model provided by the present application.
  • the training process includes:
  • A0 The autonomous driving agent 120 is initialized, and the current time t is set to 0.
  • A1 Receive and read x (0) , x (1), and r of the simulated vehicle at the current moment (as shown by the dashed arrows in FIG. 3).
  • A3 Send a to the simulator 110.
  • A4 Continue training the model (f 1 , g).
  • the simulator 110 is initialized, and the traffic environment such as map and route is set.
  • B4 Obtain the simulation result of implementing a, determine r according to the simulation result, and return to B1.
  • the replay buffer is a fixed-length memory container data structure in which stored records can be replaced.
  • the recorded x t (0) , x t (1) , a t , r t , x t + 1 (0) , x t + 1 (1) ) are saved to the playback Buffering.
  • step A4 a batch of data is randomly extracted from the playback buffer to train the model (f 1 , g).
  • the above reinforcement learning model may be off-policy or on-policy.
  • a normalized advantage function (NAF) algorithm based on a departure strategy is adopted as the reinforcement learning model.
  • the NAF algorithm is a Q-learning method that supports continuous control actions.
  • the Q estimation function is shown in formula (1), which can be further expressed as an advanced function A and a state estimation function. V.
  • an important technique is to represent the function A as a quadratic form, as shown in formula (2), where the matrix P is a positive definite matrix.
  • the training process of the NAF algorithm is a standard Q-learning method, that is, the goal of maximizing future returns (cumulative rewards) is achieved by minimizing the variance of the Bellman function, as shown in formula (4).
  • the target network technique that is, use two Q-evaluation function networks of the same structure, namely Q and Q ′, the former (in the strategy) is used to explore Problem space, the latter (off strategy) used for valuation.
  • the entire learning process is realized by formula (4).
  • the batch data of size N is used to update the model in step A4 above, and the update formulas are shown in formulas (5) and (6).
  • Equations (5) and (6) are the updated formulas of ⁇ Q and ⁇ Q ′ , that is, ⁇ Q and ⁇ Q ′ to the left of the equal sign are updated parameters, and ⁇ Q and ⁇ Q ′ to the right of the equal sign are Parameters before update.
  • the vehicle equipped with lidar and camera is used to collect low-dimensional radar scan data (ie low-dimensional training data) and high-dimensional image data (ie , High-dimensional training data), a car physical device 400 equipped with a lidar and a camera is shown in FIG. 4.
  • the following requirements are required for data acquisition: fixing the installation positions of the lidar 402 and the camera 403, and aligning the time stamps of the low-dimensional training data and the high-dimensional training data.
  • this application also provides a method for generating an autonomous driving strategy. As shown in FIG. 5, the method 500 includes:
  • the high-dimensional real data is input to a second encoder to obtain a hidden feature space of the high-dimensional real data.
  • the high-dimensional real data is data collected by a vehicle from a current traffic scene.
  • S520 Generate an automatic driving strategy according to the hidden feature space of the high-dimensional real data, the state parameters of the vehicle, and a policy function.
  • the automatic driving strategy is used to control the vehicle to drive in the current traffic scene.
  • a device for performing method 500 is, for example, a car, and a control strategy model including a second encoder is deployed on the car, and high-dimensional real data collected by a high-dimensional sensor can be used to generate an automatic driving strategy suitable for the current traffic scene.
  • the above-mentioned high-dimensional sensor is, for example, a high-definition camera. Since the high-definition camera is cheaper than a low-dimensional sensor such as a lidar and has strong anti-interference ability, the method 500 for generating an automatic driving strategy does not need to use an expensive low-dimensional sensor, and can Autonomous driving is realized at a lower cost, and it can adapt to more traffic environments.
  • the model (f 2 , g) is applied to a vehicle 600 shown in FIG. 6.
  • the vehicle 600 includes a car physics device 400 including a power control system 401 (for example, an accelerator, a steering wheel, and a brake device), a camera 403 for collecting high-dimensional real data, and a vehicle state parameter x (0 ) 'S sensor 404.
  • a power control system 401 for example, an accelerator, a steering wheel, and a brake device
  • a camera 403 for collecting high-dimensional real data
  • a vehicle state parameter x (0 ) 'S sensor 404 for example, a vehicle state parameter x (0 ) 'S sensor
  • the vehicle 600 also includes an automatic driving system 601.
  • the automatic driving system 601 includes a route planning module 602 and a control decision module 603.
  • the route planning module 602 is used to plan routes based on driving tasks, maps, and positioning information.
  • the model (f 2 , g) is called, and the vehicle control action a is calculated based on the obtained x (0) and x (2) , and then the a is matched with the power control system 601 through the control adapter 604, for example, the control action a
  • the control adapter 604 sends a command to the braking device in the power control system 601 to execute the braking action, thereby completing the automatic driving.
  • the automatic driving system 601 may be a functional module implemented by software, or a functional module implemented by hardware.
  • FIG. 7 shows a schematic diagram of an automatic driving process provided by the present application.
  • the driving process includes:
  • C1 Plan routes based on driving tasks, maps, and positioning information.
  • C4 Send a command to the power control system 601 (as shown by the dotted arrow in FIG. 7), and return to C1.
  • step D3 Perform control action a, and return to step D1.
  • the system 800 includes:
  • Control Policy Model 810 a first switch and a second switch K2 K1, wherein the control policy model comprises a first encoder 810 f 1, f 2, and a second encoder module policy function g.
  • the first switch K1 is used to control the state of the path between f 1 and g
  • the second switch K2 is used to control the state of the path between f 2 and g
  • f 1 is used to receive the low-dimensional reality collected by the vehicle from the traffic scene Data and output the hidden feature space of the low-dimensional real data
  • f 2 is used to receive the high-dimensional real data collected by the vehicle from the traffic scene and output the hidden feature space of the high-dimensional real data
  • g is used according to the received
  • the state parameters and hidden feature space of the vehicle generate an automatic driving strategy, which is used to control the vehicle driving in a traffic scene.
  • the above system can select different strategies to generate paths according to the type of data collected by the vehicle. For example, when the collected data is low-dimensional real data, control the working state of K1 to be closed to obtain the hidden feature space of the low-dimensional real data. Then, the hidden feature space of the low-dimensional real data is input to g, and an automatic driving strategy is obtained based on the low-dimensional real data. When the collected data is high-dimensional real data, control the working state of K2 to be closed to obtain high-dimensional real data. Hidden feature space, then the hidden feature space of the high-dimensional real data is input to g, and an automatic driving strategy is obtained based on the high-dimensional real data. In this way, even if a low-dimensional or high-dimensional sensor of a vehicle fails, as long as one of the sensors works normally, the above system can generate an automatic driving strategy suitable for the current traffic environment. Therefore, the above system has strong flexibility and robustness. Great.
  • the working states of K1 and K2 are opposite to achieve a hidden feature space where g receives the output of f 1 or f 2 .
  • the working states of K1 and K2 are opposite.
  • the working state of K1 is closed and the working state of K2 is open, the state of the path between f 1 and g is connected, and the path between f 2 and g The state is cut off to realize the hidden feature space where f 1 inputs low-dimensional real data to g;
  • the working state of K1 is open and the working state of K2 is closed, the path between f 2 and g is connected, and f 1
  • the state of the path to g is cut off to achieve the hidden feature space where f 2 inputs high-dimensional real data to g.
  • the hidden feature space of one type of data that g can only receive at the same time can prevent the system 800 from running incorrectly because the hidden feature space of g receives multiple types of data at the same time.
  • system 800 further includes:
  • a data valve is used to control whether low-dimensional real data is input to the first encoder, and to control whether high-dimensional real data is input to the second encoder.
  • the above scheme can control the input of low-dimensional real data and high-dimensional real data through the data valve to realize the hidden feature space of the strategy function module receiving the output of the first encoder or the second encoder.
  • the scheme of closing the implementation of the strategy function module to receive the hidden feature space output by the first encoder or the second encoder can prevent the first encoder or the second encoder from doing unnecessary work through the data valve control scheme.
  • control method 900 includes:
  • a hidden feature space of low-dimensional real data or a hidden feature space of high-dimensional real data is input to the policy function module.
  • the execution device of the method 900 may be a vehicle-mounted processor or a vehicle containing the vehicle-mounted processor.
  • the vehicle-mounted processor may select different strategies to generate paths according to the type of collected data. For example, when the collected data is low-dimensional real For data, control the working state of the first switch to be closed, and obtain an automatic driving strategy based on low-dimensional real data. When the collected data is high-dimensional real data, control the working state of the second switch to be closed, based on high-dimensional real data. Obtaining an autonomous driving strategy, therefore, the method 900 has strong flexibility and robustness.
  • the on-board processor can determine whether the collected data is low-dimensional real data or high-dimensional real data according to the type of information contained in the data collected by the sensor.
  • the following principles can be used to determine whether the collected data is low-dimensional real data or high-dimensional real data.
  • the data When the number of types of information contained in the collected data is less than or equal to the number threshold, the data is determined to be low-dimensional real data; when the number of types of information contained in the collected data is greater than the number threshold, the data is determined to be high Dimensional real data.
  • the above number threshold is 2.
  • the collected data contains only “distance” information, it is determined that the data is low-dimensional real data.
  • the collected data includes “distance”, “speed”, and “obstacle type”
  • the data is determined to be high-dimensional real data.
  • S910 includes:
  • the first encoder inputs the hidden feature space of the low-dimensional real data to the strategy function module.
  • the reliability of the low-dimensional real data is higher than the reliability of the high-dimensional real data.
  • the working state of the first switch can be controlled to be closed, and the working state of the second switch can be controlled to be open, so that the highly reliable Data is obtained from highly reliable autonomous driving strategies.
  • inputting the hidden feature space of low-dimensional real data or the hidden feature space of high-dimensional real data to the policy function module includes controlling the work of the second switch.
  • the state is closed and the working state of the first switch is open; the second encoder inputs the hidden feature space of the high-dimensional real data to the policy function module.
  • the reliability of the high-dimensional real data is higher than the reliability of the low-dimensional real data.
  • the working state of the first switch can be controlled to be open, and the working state of the second switch can be controlled to be closed, so that the highly reliable Data is obtained from highly reliable autonomous driving strategies.
  • the above-mentioned low-dimensional real data is radar data collected by a car from a traffic scene through a laser radar
  • the above-mentioned high-dimensional real data is image data collected by a car from a traffic scene through a high-definition camera.
  • the interference of the rain and snow on the lidar (for example, the rain and snow refraction and / or reflection of the radar wave prevents the radar receiver from receiving obstacle echoes) is greater than the interference with the HD camera, Therefore, the working state of the first switch can be controlled to be open, and the working state of the second switch can be controlled to be closed, so that a highly reliable automatic driving strategy can be obtained by using highly reliable high-dimensional real data.
  • the current traffic scene is a high-intensity lighting scene
  • the interference of high-intensity light on the lidar is less than that of the high-definition camera (for example, high-intensity reflected light makes it impossible for the high-definition camera to obtain a clear image)
  • the first The working state of the switch is closed, and the working state of the second switch is controlled to be open, so that highly reliable low-dimensional real data can be used to obtain a highly reliable automatic driving strategy.
  • a device for training a control strategy model for generating an autonomous driving strategy includes a hardware structure and / or a software module corresponding to each function.
  • this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
  • This application may divide the functional units of the device for determining an autonomous driving strategy according to the above method examples.
  • each function may be divided into various functional units, or two or more functions may be integrated into one processing unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit. It should be noted that the division of the units in this application is schematic, and it is only a logical function division. In actual implementation, there may be another division manner.
  • FIG. 10 illustrates a possible structure diagram of a device for training a control strategy model for generating an automatic driving strategy provided by the present application.
  • the device 1000 includes a processing unit 1001 and a communication unit 1002.
  • the processing unit 1001 is configured to control the apparatus 1000 to execute the steps of training the second encoder shown in FIG. 2.
  • the processing unit 1001 may also be used to perform other processes for the techniques described herein.
  • the device 1000 may further include a storage unit 1003 for storing program code and data of the device 1000.
  • the communication unit 1002 is configured to perform: acquiring a hidden feature space of low-dimensional training data, where the low-dimensional training data is data collected from a first traffic scene;
  • the processing unit 1001 is configured to execute: training a second encoder through high-dimensional training data and a hidden feature space of the low-dimensional training data, where the high-dimensional training data is data collected from the first traffic scene, and The information contained in the low-dimensional training data is a subset of the information contained in the high-dimensional training data, and the second encoder is a component of a control strategy model for generating an automatic driving strategy.
  • the processing unit 1001 may be a processor or a controller, for example, it may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application-specific integrated circuit (application-specific integrated circuit). , ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the communication unit 1102 is, for example, a communication interface, and the storage unit 1003 may be a memory.
  • the processing unit 1001 is a processor
  • the communication unit 1102 is, for example, a communication interface
  • the storage unit 1003 is a memory
  • the device for determining an automatic driving strategy involved in this application may be the device shown in FIG. 11.
  • the device 1100 includes a processor 1101, a communication interface 1102, and a memory 1103 (optional).
  • the processor 1101, the communication interface 1102, and the memory 1103 can communicate with each other through an internal connection path, and transfer control and / or data signals.
  • the training device for generating a control strategy model for an autonomous driving strategy first obtains a hidden feature space of low-dimensional training data, and then uses the hidden feature space of low-dimensional training data to supervise the training of the second encoder, and obtains The high-dimensional training data is mapped to the encoder of the hidden feature space of the low-dimensional training data, thereby obtaining a control strategy model that uses high-dimensional real data to directly generate a usable autonomous driving strategy.
  • FIG. 12 shows a possible structural schematic diagram of an apparatus for generating an automatic driving strategy provided by the present application.
  • the device 1200 includes a processing unit 1201 and a communication unit 1202.
  • the processing unit 1201 is configured to control the device 1200 to execute the steps of generating an automatic driving strategy shown in FIG. 5.
  • the processing unit 1201 may also be used to perform other processes for the techniques described herein.
  • the device 1200 may further include a storage unit 1203 for storing program code and data of the device 1200.
  • the communication unit 1202 is configured to obtain high-dimensional real data
  • the processing unit 1201 is configured to execute: inputting high-dimensional real data into a second encoder to obtain a hidden feature space of the high-dimensional real data, where the high-dimensional real data is data collected by a vehicle from a current traffic scene; according to The hidden feature space of the high-dimensional real data, the state parameters of the vehicle, and a policy function generate an automatic driving strategy, and the automatic driving strategy is used to control the vehicle to drive in the traffic scenario.
  • the processing unit 1201 may be a processor or a controller.
  • the processing unit 1201 may be a CPU, a general-purpose processor, a DSP, an ASIC, an FPGA, or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the communication unit 1202 may be a communication interface
  • the storage unit 1203 may be a memory.
  • the processing unit 1201 is a processor
  • the communication unit 1202 is, for example, a communication interface
  • the storage unit 1203 is a memory
  • the device for generating an automatic driving strategy involved in this application may be the device shown in FIG. 13.
  • the device 1300 includes: a processor 1301, a communication interface 1302, and a memory 1303 (optional).
  • the processor 1301, the communication interface 1302, and the memory 1303 can communicate with each other through an internal connection path, and transfer control and / or data signals.
  • the device for generating an automatic driving strategy deploys a control strategy model including a second encoder, and can use high-dimensional real data collected by a high-dimensional sensor to generate an automatic driving strategy suitable for a current traffic scene.
  • the above-mentioned high-dimensional sensor is, for example, a high-definition camera. Since the high-definition camera is cheaper than a low-dimensional sensor such as lidar and has strong anti-interference ability, the device for generating an automatic driving strategy does not need to use an expensive low-dimensional sensor, and can Lower costs enable autonomous driving and can adapt to more traffic environments.
  • FIG. 14 shows a possible schematic structural diagram of an apparatus for controlling an automatic driving strategy generation system provided in the present application.
  • the device 1400 includes a processing unit 1401 and a storage unit 1403.
  • the processing unit 1401 is configured to control the device 1400 to execute the steps of controlling the automatic driving system shown in FIG. 9.
  • the processing unit 1401 may also be used to perform other processes for the techniques described herein.
  • the storage unit 1403 is configured to store program codes and data of the device 1400.
  • the apparatus 1400 may further include a communication unit 1402 for communicating with other devices.
  • the processing unit 1401 is configured to execute: by controlling the working states of the first switch and the second switch, input a hidden feature space of low-dimensional real data or a hidden feature space of high-dimensional real data to the policy function module.
  • the processing unit 1401 may be a processor or a controller.
  • the processing unit 1401 may be a CPU, a general-purpose processor, a DSP, an ASIC, an FPGA, or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the communication unit 1402 may be a communication interface
  • the storage unit 1403 may be a memory.
  • the processing unit 1401 is a processor
  • the communication unit 1402 is, for example, a communication interface
  • the storage unit 1403 is a memory
  • the device for controlling an automatic driving strategy generation system involved in this application may be the device shown in FIG. 15.
  • the device 1500 includes a processor 1501, a communication interface 1502 (optional), and a memory 1503.
  • the processor 1501, the communication interface 1502, and the memory 1503 can communicate with each other through an internal connection path, and transfer control and / or data signals.
  • the device for controlling an automatic driving system selects different strategies to generate paths according to the type of collected data. For example, when the collected data is low-dimensional real data, the working state of the first switch is controlled to be closed. The low-dimensional real data is used to obtain the automatic driving strategy. When the collected data is high-dimensional real data, the working state of the second switch is controlled to be closed. Based on the high-dimensional real data, the automatic driving strategy is obtained. Therefore, the method 900 has strong flexibility. Sex and robustness.
  • the device embodiment corresponds to the method embodiment completely.
  • the communication unit executes the obtaining step in the method embodiment. All steps other than the obtaining step and the sending step may be performed by a processing unit or a processor.
  • a processing unit or a processor.
  • the function of the specific unit reference may be made to the corresponding method embodiment, which will not be described in detail.
  • the size of the sequence number of each process does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of this application.
  • the steps of the method or algorithm described in combination with the disclosure of this application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in random access memory (RAM), flash memory, read-only memory (ROM), and erasable programmable read-only memory (erasable (programmable ROM, EPROM), electrically erasable programmable read-only memory (EPROM), registers, hard disks, mobile hard disks, read-only optical disks (CD-ROMs), or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may reside in an ASIC.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions may be transmitted from a website site, computer, server, or data center through wired (for example, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (for example, infrared, wireless, microwave, etc.) Another website site, computer, server, or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, and the like that includes one or more available medium integration.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital versatile disc (DVD), or a semiconductor medium (for example, a solid state disk (SSD)) Wait.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Traffic Control Systems (AREA)

Abstract

本申请提供了一种训练用于生成自动驾驶策略的控制策略模型的方法和装置,首先利用低维训练数据确定隐特征空间和定义在该隐特征空间上的策略函数,随后以该隐特征空间为目标,监督将高维训练数据映射到该隐特征空间的编码器的训练,随后将该编码器和上述策略函数应用于真实交通环境,即,输入从真实交通环境中获取的高维数据,从而可以利用高维数据直接获得可用的自动驾驶策略。

Description

训练用于生成自动驾驶策略的控制策略模型的方法和装置
本申请要求于2018年08月08日提交中国专利局、申请号为201810898344.7、申请名称为“训练用于生成自动驾驶策略的控制策略模型的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及自动驾驶领域,尤其涉及一种训练用于生成自动驾驶策略的控制策略模型的方法和装置。
背景技术
自动驾驶是一种由计算机系统代替人类来驾驶机动车辆的技术,其包括环境感知、位置定位、路径规划、决策控制和动力系统等功能模块。其中,实现环境感知功能的方式包括以下两种:通过激光雷达、毫米波雷达等高精度低维度传感器实现环境感知功能,以及,通过单目/多目高清摄影头等高维度低精度传感器实现环境感知功能。
通常情况下,激光雷达等高精度低维度传感器价格昂贵且精度容易受到天气条件的影响而急剧下降,高清摄像头等低精度高维度传感器价格低廉且抗干扰能力更强,并且,高维数据(即,通过高维度传感器获得的数据)比低维数据(即,通过低维度传感器获得的数据)包含的信息更多,能够反映复杂的交通环境,因此,利用高维度数据确定自动驾驶策略具有较大的应用前景。
然而,由于高维数据包含的信息量较大,通常情况下,高维数据还包含一些冗余信息,因此,通过人工神经网络处理高维数据难以直接获得可用的自动驾驶策略。
发明内容
本申请提供了一种训练用于生成自动驾驶策略的控制策略模型的方法和装置,首先利用低维训练数据确定隐特征空间和定义在该隐特征空间上的策略函数,随后以该隐特征空间为目标,监督将高维训练数据映射到该隐特征空间的编码器的训练,随后将该编码器和上述策略函数应用于真实交通环境,即,输入从真实交通环境中获取的高维数据,从而可以利用高维数据直接获得可用的自动驾驶策略。基于上述控制策略模型,本申请还提供了一种生成自动驾驶策略的方法和装置,一种自动驾驶策略生成系统以及该自动驾驶策略生成系统的控制方法。
第一方面,提供了一种训练用于生成自动驾驶策略的控制策略模型的方法,包括:获取低维训练数据的隐特征空间,其中,该低维训练数据为从第一交通场景中采集到的数据;通过高维训练数据和低维训练数据的隐特征空间训练第二编码器,该高维训练数据为从第一交通场景中采集到的数据,并且,低维训练数据包含的信息是高维训练数据包含的信息的子集,所述第二编码器为用于生成自动驾驶策略的控制策略模型的组成部分。
由于低维训练数据包含的信息是高维训练数据包含的信息的子集,因此,通过低维训练数据获得的隐特征空间也一定能从高维训练数据中得到,基于上述原理,本申请首先获取低维训练数据的隐特征空间,由于低维训练数据包含的信息量较小,并且,低维训练数据包含的冗余信息较少,因此,基于低维训练数据的隐特征空间比较容易获得可用的策略函数。随后,利用低维数据的隐特征空间监督第二编码器的训练过程,最终获得一个能够将高维训练数据映射至该隐特征空间的第二编码器。第二编码器训练完成后,即可使用第二编码器和预先得到的策略函数直接处理真实环境中的高维数据(即,高维真实数据),得到可用的自动驾驶策略。
可选地,所述通过高维训练数据和低维训练数据的隐特征空间训练第二编码器,包括:将所述高维训练数据作为输入量输入第二编码器得到高维训练数据的隐特征空间,所述低维训练数据的隐特征空间用于监督第二编码器的输出结果,使得高维训练数据的隐特征空间与低维训练数据的隐特征空间相同。
监督学习方法是一种机器学习方法,机器利用低维训练数据的隐特征空间监督第二编码器的输出结果,最终可以获得将高维训练数据映射至低维训练数据的隐特征空间的第二编码器。
可选地,所述获取低维训练数据的隐特征空间,包括:将低维训练数据输入第一编码器得到低维训练数据的隐特征空间,第一编码器基于多个低维数据样本训练得到,该多个低维数据样本中的每一个低维数据样本为从任一交通场景采集且与所述低维训练数据的类型相同的数据,所述第一编码器为控制策略模型的组成部分。
低维训练数据的类型与低维数据样本的类型相同,这样,通过低维数据样本得到的第一编码器可以适用于低维训练数据,从而可以获得低维训练数据的隐特征空间。
可选地,在将低维训练数据输入第一编码器得到低维训练数据的隐特征空间之前,所述方法还包括:根据所述多个低维数据样本和多个交通工具的状态参数训练控制策略模型,得到第一编码器和策略函数,所述多个低维数据样本与所述多个交通工具的状态参数一一对应。
可选地,将低维训练数据输入第一编码器得到低维训练数据的隐特征空间之前,所述方法还包括:根据
Figure PCTCN2019078072-appb-000001
Figure PCTCN2019078072-appb-000002
确定
Figure PCTCN2019078072-appb-000003
其中,f′ 1表示更新前的第一编码器,
Figure PCTCN2019078072-appb-000004
表示f′ 1中除自变量之外的参数,s (1)表示隐特征空间,
Figure PCTCN2019078072-appb-000005
表示与
Figure PCTCN2019078072-appb-000006
相关的梯度s (1)
Figure PCTCN2019078072-appb-000007
表示与s (1)相关的梯度L RL,L RL表示与强化学习模型相关的损失函数,
Figure PCTCN2019078072-appb-000008
表示更新后的
Figure PCTCN2019078072-appb-000009
Figure PCTCN2019078072-appb-000010
正相关,且,
Figure PCTCN2019078072-appb-000011
Figure PCTCN2019078072-appb-000012
负相关;根据
Figure PCTCN2019078072-appb-000013
更新f′ 1得到f 1,f 1表示更新后的第一编码器。
上述方案提供了采用梯度下降算法时第一编码器的训练方法,能够不断优化第一编码器,从而使得从低维训练数据中获得的隐特征空间能够更加准确地反映第一交通环境。
可选地,通过高维训练数据和低维训练数据的隐特征空间训练第二编码器,包括:根据
Figure PCTCN2019078072-appb-000014
Figure PCTCN2019078072-appb-000015
确定
Figure PCTCN2019078072-appb-000016
其中,f′ 2表示更新前的第二编码器,
Figure PCTCN2019078072-appb-000017
表示f′ 2中除自变量之外的参数,
Figure PCTCN2019078072-appb-000018
表示与
Figure PCTCN2019078072-appb-000019
相关的梯度l,l表示
Figure PCTCN2019078072-appb-000020
Figure PCTCN2019078072-appb-000021
的方差,
Figure PCTCN2019078072-appb-000022
表示更新后的
Figure PCTCN2019078072-appb-000023
Figure PCTCN2019078072-appb-000024
正相关,且,
Figure PCTCN2019078072-appb-000025
Figure PCTCN2019078072-appb-000026
负相关,其中,
Figure PCTCN2019078072-appb-000027
Figure PCTCN2019078072-appb-000028
x (2)表示高维训练数据,
Figure PCTCN2019078072-appb-000029
表示
Figure PCTCN2019078072-appb-000030
的范数;根据
Figure PCTCN2019078072-appb-000031
更新f′ 2得到f 2,f 2表示更新后的第二编码器。
上述方案提供了采用梯度下降算法时第二编码器的训练方法,能够不断优化第二编码器,使得高维训练数据更加准确地映射到低维训练数据的隐特征空间。
可选地,根据
Figure PCTCN2019078072-appb-000032
Figure PCTCN2019078072-appb-000033
确定
Figure PCTCN2019078072-appb-000034
之前,所述方法还包括:对齐x (1)和x (2)的时间戳。
对齐低维训练数据和高维训练数据的时间戳能够将高维训练数据更加准确地映射到低维训练数据的隐特征空间。
可选地,所述方法还包括:获取高维真实数据,该高维真实数据为交通工具从第二交通场景中采集到的数据,高维真实数据的类型与高维训练数据的类型相同;将交通工具的状态参数和高维真实数据输入控制策略模型,生成适用于第二交通场景的自动驾驶策略,所述自动驾驶策略用于控制交通工具在第二交通场景下行驶。
例如,高维真实数据和高维训练数据均为图像数据,由于高维真实数据的类型与高维训练数据的类型相同,因此,通过高维训练数据得到的第二编码器同样适用于高维真实数据,将高维真实数据输入包含第二编码器的控制策略模型,即可得到适用于第二交通场景的自动驾驶策略。
可选地,控制策略模型还包括策略函数;其中,将交通工具的状态参数和高维真实数据输入控制策略模型,生成适用于第二交通场景的自动驾驶策略,包括:将高维真实数据输入第二编码器,得到高维真实数据的隐特征空间;根据高维真实数据的隐特征空间、交通工具的状态参数以及策略函数,得到自动驾驶策略。
第二方面,本申请提供了一种生成自动驾驶策略的方法,包括:将高维真实数据输入第二编码器得到高维真实数据的隐特征空间,高维真实数据为交通工具从当前的交通场景中采集到的数据;根据高维真实数据的隐特征空间、交通工具的状态参数以及策略函数生成自动驾驶策略,所述自动驾驶策略用于控制交通工具在当前的交通场景下行驶;
其中,所述第二编码器通过以下方法训练得到:将低维训练数据输入第一编码器得到低维训练数据的隐特征空间,低维训练数据为从第一交通场景中采集到的数据;通过高维训练数据和低维训练数据的隐特征空间训练第二编码器,高维训练数据为从第一交通场景中采集到的数据,并且,低维训练数据包含的信息是高维训练数据包含的信息的子集。
通过上述方法得到的第二编码器可以直接从高维真实数据中得到可用的隐特征空间,从而可以利用高维真实数据获得适用于当前交通场景的自动驾驶策略,具有价格低廉且抗干扰能力更强的优点。
可选地,所述通过高维训练数据和低维训练数据的隐特征空间训练第二编码器,包括:将高维训练数据作为输入量输入第二编码器得到高维训练数据的隐特征空间,低维训练数据的隐特征空间用于监督第二编码器的输出结果,使得高维训练数据的隐特征空间与低维训练数据的隐特征空间相同。
可选地,所述第一编码器和所述策略函数通过以下方法训练得到:根据多个低维数据样本和多个交通工具的状态参数训练控制策略模型,得到第一编码器和所述策略函数,所述控制策略模型包括第一编码器和策略函数,所述多个低维数据样本中的每一个低维数据样本为从任一交通场景采集且与低维训练数据的类型相同的数据,所述多个低维数据样本 与所述多个交通工具的状态参数一一对应。
第三方面,本申请提供了一种自动驾驶策略生成系统,该自动驾驶策略生成系统包括控制策略模型、第一开关、第二开关,控制策略模型包括第一编码器、第二编码器、策略函数模块;
其中,第一开关用于控制第一编码器与策略函数模块之间的通路状态,第二开关用于控制第二编码器与策略函数模块之间的通路状态,第一编码器用于接收交通工具从交通场景中采集的低维真实数据并输出该低维真实数据的隐特征空间,第二编码器用于接收所述交通工具从交通场景中采集的高维真实数据并输出该高维真实数据的隐特征空间,策略函数模块用于根据接收到的交通工具的状态参数和隐特征空间生成自动驾驶策略,所述自动驾驶策略用于控制交通工具在交通场景下行驶。
上述系统可以根据交通工具采集到的数据的类型选择不同的策略生成路径,例如,当采集到的数据为低维真实数据时,控制第一开关的工作状态为闭合,基于低维真实数据获得自动驾驶策略,当采集到的数据为高维真实数据时,控制第二开关的工作状态为闭合,基于高维真实数据获得自动驾驶策略,从而具有较强的灵活性和鲁棒性。
可选地,所述第一开关和所述第二开关的工作状态相反,以实现策略函数模块接收到第一编码器或第二编码器输出的隐特征空间。
第一开关和第二开关的工作状态相反,使得策略函数模块在同一时刻只能接收到的一种数据的隐特征空间,可以避免因策略函数模块同时接收到多种数据的隐特征空间导致系统运行出错。
可选地,当第一开关的工作状态为闭合,第二开关的工作状态为断开时,第一编码器与策略函数模块之间的通路状态为联通,第二编码器与策略函数模块之间的通路状态为切断,以实现第一编码器向策略函数模块输入低维真实数据的隐特征空间。
可选地,当第一开关的工作状态为断开,第二开关的工作状态为闭合时,第二编码器与策略函数模块之间的通路为联通,第一编码器与策略函数模块之间的通路状态为切断,以实现第二编码器向策略函数模块输入高维真实数据的隐特征空间。
可选地,所述自动驾驶策略生成系统还包括:数据阀门,用于控制低维真实数据是否输入第一编码器,和,控制高维真实数据是否输入第二编码器。
上述方案通过数据阀门控制低维真实数据和高维真实数据的输入可以实现策略函数模块接收到第一编码器或第二编码器输出的隐特征空间,相对于控制第一开关和第二开关的闭合实现策略函数模块接收到第一编码器或第二编码器输出的隐特征空间的方案,上述通过数据阀门控制方案可以避免第一编码器或第二编码器做无用功。
第四方面,本申请提供了一种自动驾驶策略生成系统的控制方法,所述自动驾驶策略生成系统包括控制策略模型、第一开关和第二开关,所述控制策略模型包括第一编码器、第二编码器和策略函数模块;其中,第一开关用于控制第一编码器与策略函数模块之间的通路状态,所第二开关用于控制第二编码器与策略函数模块之间的通路状态,第一编码器用于接收交通工具从交通场景中采集的低维真实数据并输出低维真实数据的隐特征空间,第二编码器用于接收所述交通工具从交通场景中采集的高维真实数据并输出高维真实数据的隐特征空间,策略函数模块用于根据接收到的所述交通工具的状态参数和隐特征空间生成自动驾驶策略;
所述控制方法包括:
通过控制第一开关和第二开关的工作状态,向策略函数模块输入低维真实数据的隐特征空间或高维真实数据的隐特征空间。
上述系统可以根据交通工具采集到的数据的类型选择不同的策略生成路径,例如,当采集到的数据为低维真实数据时,控制第一开关的工作状态为闭合,基于低维真实数据获得自动驾驶策略,当采集到的数据为高维真实数据时,控制第二开关的工作状态为闭合,基于高维真实数据获得自动驾驶策略,因此,上述控制方法具有较强的灵活性和鲁棒性。
可选地,所述通过控制第一开关和第二开关的工作状态,向策略函数模块输入低维真实数据的隐特征空间或高维真实数据的隐特征空间,包括:控制第一开关的工作状态为闭合和第二开关的工作状态为断开;第一编码器向策略函数模块输入低维真实数据的隐特征空间。
可选地,所述低维真实数据的可靠性高于所述高维真实数据的可靠性。
当低维真实数据的可靠性高于高维真实数据的可靠性时,可以控制第一开关的工作状态为闭合,以及,控制第二开关的工作状态为断开,从而可以利用高可靠性的数据获得高可靠性的自动驾驶策略。
可选地,所述通过控制第一开关和第二开关的工作状态,向策略函数模块输入低维真实数据的隐特征空间或高维真实数据的隐特征空间,包括:控制第二开关的工作状态为闭合和第一开关的工作状态为断开;第二编码器向策略函数模块输入高维真实数据的隐特征空间。
可选地,所述高维真实数据的可靠性高于所述低维真实数据的可靠性。
当高维真实数据的可靠性高于低维真实数据的可靠性时,可以控制第一开关的工作状态为断开,以及,控制第二开关的工作状态为闭合,从而可以利用高可靠性的数据获得高可靠性的自动驾驶策略。
可选地,所述低维真实数据为交通工具通过雷达从交通场景中采集的雷达数据,所述高维真实数据为交通工具通过摄像头从交通场景中采集的图像数据。
第五方面,提供了一种训练用于生成自动驾驶策略的控制策略模型的装置,该装置可以实现上述第一方面所涉及的方法中各个步骤所对应的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元或模块。
在一种可能的设计中,该装置包括处理器,该处理器被配置为支持该装置执行上述第一方面所涉及的方法中相应的功能。该装置还可以包括存储器,该存储器用于与处理器耦合,其保存该装置必要的程序指令和数据。可选地,该装置还包括通信接口,该通信接口用于支持该装置与其它网元之间的通信。
第六方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储了计算机程序代码,该计算机程序代码被处理单元或处理器执行时,使得处理单元或处理器执行第一方面所述的方法。
第七方面,提供了一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码被处理单元或处理器运行时,使得处理单元或处理器执行上述第一方面的方法。
第八方面,提供了一种生成自动驾驶策略的装置,该装置可以实现上述第二方面所涉及的方法中各个步骤所对应的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元或模块。
在一种可能的设计中,该装置包括处理器,该处理器被配置为支持该装置执行上述第二方面所涉及的方法中相应的功能。该装置还可以包括存储器,该存储器用于与处理器耦合,其保存该装置必要的程序指令和数据。可选地,该装置还包括通信接口,该通信接口用于支持该装置与其它网元之间的通信。
第九方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储了计算机程序代码,该计算机程序代码被处理单元或处理器执行时,使得处理单元或处理器执行第二方面所述的方法。
第十方面,提供了一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码被处理单元或处理器运行时,使得处理单元或处理器执行上述第二方面所述的方法。
第十一方面,提供了一种控制自动驾驶策略生成系统的装置,该装置可以实现上述第四方面所涉及的方法中各个步骤所对应的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元或模块。
在一种可能的设计中,该装置包括处理器,该处理器被配置为支持该装置执行上述第四方面所涉及的方法中相应的功能。该装置还可以包括存储器,该存储器用于与处理器耦合,其保存该装置必要的程序指令和数据。可选地,该装置还包括通信接口,该通信接口用于支持该装置与其它网元之间的通信。
第十二方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储了计算机程序代码,该计算机程序代码被处理单元或处理器执行时,使得处理单元或处理器执行第四方面所述的方法。
第十三方面,提供了一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码被处理单元或处理器运行时,使得处理单元或处理器执行上述第四方面所述的方法。
附图说明
图1是一种适用于本申请的训练控制策略模型的系统的示意图;
图2是本申请提供的一种训练用于生成自动驾驶策略的控制策略模型的方法的示意性流程图;
图3是本申请提供的利用强化学习模型训练第一编码器和策略函数的方法的示意性流程图;
图4是本申请提供的一种汽车物理装置的示意图;
图5是本申请提供的一种生成自动驾驶策略的方法的示意性流程图;
图6是本申请提供的一种自动驾驶车辆的结构示意图;
图7是本申请提供的一种控制自动驾驶车辆实现自动驾驶的示意性流程图;
图8是本申请提供的一种自动驾驶策略生成系统的示意图;
图9是本申请提供的一种控制自动驾驶策略生成系统的方法的示意图;
图10是本申请提供的一种训练用于生成自动驾驶策略的控制策略模型的装置的结构示意图;
图11是本申请提供的另一种训练用于生成自动驾驶策略的控制策略模型的装置的结构示意图;
图12是本申请提供的一种生成自动驾驶策略的装置的结构示意图;
图13是本申请提供的另一种生成自动驾驶策略的装置的结构示意图;
图14是本申请提供的一种控制自动驾驶策略生成系统的装置的结构示意图;
图15是本申请提供的另一种控制自动驾驶策略生成系统的装置的结构示意图。
具体实施方式
图1示出了一种适用于本申请的训练控制策略模型的系统。该系统用于在模拟环境下训练用于生成自动驾驶策略的控制策略模型,该系统包括:
模拟器110,包括环境模块111、汽车模块112和模拟器引擎113,其中,环境模块111用于设置交通环境(例如城市、乡村、高速公路等),汽车模块112用于模拟自车的电子系统、动力系统和外形特征等,模拟器引擎113也可称为任务逻辑模块,用于设计驾驶任务、规划路线、设计奖惩规则等,将整个模拟过程按时间顺序逐步推进。
自动驾驶智能体120,包括强化学习模块121,自动驾驶智能体120可以是一个软件程序,用于从模拟器110接收自车状态参数x (0)、低维训练数据x (1)、高维训练数据x (2)和即时奖励r,根据上述数据做出决策(即,控制动作),并向模拟器110发送控制动作信息。强化学习模块121用于通过强化学习(reinforcement learning)模型训练下文所述的第一编码器。
x (0)例如是车辆的速度、加速度、车身偏移角度、位置等数据,x (1)例如是激光雷达数据,x (2)例如是前置摄像头拍摄的图像数据,自动驾驶智能体120基于上述数据确定的控制动作例如是加速、刹车和方向盘角度,若模拟器110执行该控制动作后完成了驾驶任务,则向自动驾驶智能体120发送的即时奖励r可以是正面反馈,若模拟器110执行该控制动作后未完成驾驶任务,则向自动驾驶智能体120发送的即时奖励r可以是负面反馈。
采用本申请提供的确定自动驾驶策略的方法,上述系统最终能够输出一套可以在真实环境中直接根据高维数据做出决策的模型。
下面,将详细描述本申请提供的确定自动驾驶策略的方法。
在驾驶汽车的过程中,驾驶员通过视觉获取自车(即,驾驶员所驾驶的车辆)与障碍物之间的距离,从而做出控制策略避开障碍物。上述控制策略是驾驶员基于自车与障碍物之间的距离做出的,实际上,驾驶员通过视觉获取的完整信息还包括障碍物的形状、类型等信息,因此,指示自车与障碍物之间的距离的数据可以称为低维数据,包含上述完整信息的数据可以称为高维数据。由于低维数据包含的信息是高维数据包含的信息的子集,因此,若能够根据低维数据确定自动驾驶控制策略,那么,也能够通过该高维数据确定自动驾驶策略。
基于上述原理,若高维数据和低维数据是在同一个交通环境中采集的数据,则通过低维数据的控制策略模型处理低维数据得到的自动驾驶策略与通过高维数据的控制策略模 型处理高维数据得到的自动驾驶策略必然相同。
由于低维数据包含的信息量较少,通过低维数据训练控制策略模型比较容易获得符合安全驾驶要求的控制策略模型,因此,可以首先根据低维数据训练出符合安全驾驶要求的控制策略模型,随后利用该控制策略模型监督高维数据的控制策略模型的训练。
本申请提供的训练低维数据的控制策略模型的方法包括:
根据多个低维数据样本和多个交通工具的状态参数训练低维数据的控制策略模型,得到第一编码器和策略函数,上述多个低维数据样本与多个交通工具的状态参数一一对应。
上述多个低维数据样本和多个交通工具的状态参数例如是模拟器110生成的数据。上述交通工具可以是车辆,还可以是飞行器、潜水器、船只、工业机器人等其它设备。
第一编码器用于从低维数据样本中提取出隐特征空间,策略函数用于基于自车参数(例如,自车速度)和低维数据样本的隐特征空间输出自动驾驶策略。隐特征空间是通过机器学习算法从原始数据(例如,低维数据样本)中提取的特征的集合,特征是原始数据的抽象化表示,由于从原始数据中提取出的特征通常作为机器学习算法的中间参数而非输出结果,因此,特征也被称为隐特征(latent feature)。
作为一个可选的实施方式,训练系统在训练第一编码器时可以根据下述方式进行训练:
根据
Figure PCTCN2019078072-appb-000035
Figure PCTCN2019078072-appb-000036
确定
Figure PCTCN2019078072-appb-000037
其中,f′ 1表示更新前的第一编码器,
Figure PCTCN2019078072-appb-000038
表示f′ 1中除自变量之外的参数,s (1)表示隐特征空间,
Figure PCTCN2019078072-appb-000039
表示与
Figure PCTCN2019078072-appb-000040
相关的梯度s (1)(gradient of s (1)with respect of
Figure PCTCN2019078072-appb-000041
),
Figure PCTCN2019078072-appb-000042
表示与s (1)相关的梯度L RL,L RL表示与强化学习模型相关的损失函数,
Figure PCTCN2019078072-appb-000043
表示更新后的
Figure PCTCN2019078072-appb-000044
Figure PCTCN2019078072-appb-000045
正相关,且,
Figure PCTCN2019078072-appb-000046
Figure PCTCN2019078072-appb-000047
负相关;根据
Figure PCTCN2019078072-appb-000048
更新f′ 1得到f 1,f 1表示更新后的第一编码器。
在本申请中,正相关指的是当自变量增大时,因变量也增大,当自变量减小时,因变量也减小,例如,对于函数y=2x,当x增大时,y也增大,当x减小时,y也减小,则称y与x正相关。又例如,对于函数y=x 2,当x增大时,y也增大,当x减小时,y也减小,则称y与x正相关。
类似地,负相关指的是当自变量增大时,因变量减小,当自变量减小时,因变量增大。
上述方案提供了采用梯度下降算法时第一编码器的训练方法,能够不断优化第一编码器,从而使得从低维训练数据中获得的隐特征空间能够更加准确地反映第一交通环境。例如,
Figure PCTCN2019078072-appb-000049
η表示学习率,取值范围是大于或等于0且小于或等于1。
随后,训练系统获取低维训练数据,将该低维训练数据输入上述第一编码器,获得低维训练数据的隐特征空间,并利用低维训练数据的隐特征空间监督高维训练数据的控制策略模型的训练。
训练高维数据的控制策略模型的方法如图2所示,该方法可以由训练系统执行,即,通过离线方式训练第二编码器,也可以由车辆执行,即,通过在线方式训练第二编码器。方法200包括:
S210,获取低维训练数据的隐特征空间,其中,该低维训练数据为从第一交通场景中采集到的数据。
在本申请中,低维训练数据指的是在模型训练阶段使用的低维数据,低维训练数据的 隐特征空间可以是一种低维训练数据(例如,测距雷达数据)的隐特征空间,也可以是多种低维训练数据(例如,测距雷达数据和测速雷达数据)的隐特征空间。
S220,通过高维训练数据和低维训练数据的隐特征空间训练第二编码器,该高维训练数据为从第一交通场景中采集到的数据,并且,低维训练数据包含的信息是高维训练数据包含的信息的子集,所述第二编码器为用于生成自动驾驶策略的控制策略模型的组成部分。
第二编码器用于从高维训练数据中获取隐特征空间,高维训练数据的隐特征空间包括低维训练数据的部分或全部隐特征空间,通过训练获得的第二编码器从高维训练数据中获取的隐特征空间与第一编码器从低维训练数据中获得的隐特征空间相同,即,通过训练得到的第二编码器能够将高维训练数据映射到低维训练数据的隐特征空间上,这样,就可以利用第二编码器和低维数据的控制策略模型中的策略函数组成高维数据的控制策略模型,以生成自动驾驶策略。
高维训练数据可以是与低维训练数据同步采集到的数据,例如,高维传感器和低维传感器同时工作,从第一交通场景中采集数据。
在本申请中,“第一”、“第二”等词语仅用于区分不同的个体,并不代表对名词的限定。第二编码器可以是函数,也可以是人工神经网络,还可以是其它算法或模型,用于处理输入的高维训练数据,并获取高维训练数据的隐特征空间。
综上,本申请首先获取低维训练数据的隐特征空间,由于低维训练数据包含的信息量较小,并且,低维训练数据包含的冗余信息较少,因此,基于低维训练数据的隐特征空间比较容易获得可用的策略函数。随后,利用该隐特征空间监督第二编码器的训练过程,即,训练一个能够将高维训练数据映射到低维训练数据的隐特征空间的第二编码器。第二编码器训练完成后,即可使用第二编码器和预先得到的策略函数直接处理真实环境中的高维数据(即,高维真实数据),得到可用的自动驾驶策略。
可选地,上述通过高维训练数据和低维训练数据的隐特征空间训练第二编码器,包括:
将高维训练数据作为输入量输入第二编码器得到高维训练数据的隐特征空间,低维训练数据的隐特征空间用于监督第二编码器的输出结果,使得高维训练数据的隐特征空间与低维训练数据的隐特征空间相同。
监督学习方法是一种机器学习方法,机器利用低维训练数据的隐特征空间监督第二编码器的输出结果,最终可以获得将高维训练数据映射至低维训练数据的隐特征空间的第二编码器。
在可接受的误差范围内,若第二编码器处理高维训练数据得到的高维训练数据的隐特征空间与低维训练数据的隐特征空间部分相同,也可以认为第二编码器将高维训练数据映射至低维训练数据的隐特征空间。
可选地,方法200中,通过高维训练数据和低维训练数据的隐特征空间训练第二编码器,包括:根据
Figure PCTCN2019078072-appb-000050
Figure PCTCN2019078072-appb-000051
确定
Figure PCTCN2019078072-appb-000052
其中,f′ 2表示更新前的第二编码器,
Figure PCTCN2019078072-appb-000053
表示f′ 2中除自变量之外的参数,
Figure PCTCN2019078072-appb-000054
表示与
Figure PCTCN2019078072-appb-000055
相关的梯度l,l表示
Figure PCTCN2019078072-appb-000056
Figure PCTCN2019078072-appb-000057
的方差,
Figure PCTCN2019078072-appb-000058
表示更新后的
Figure PCTCN2019078072-appb-000059
Figure PCTCN2019078072-appb-000060
正相关,且,
Figure PCTCN2019078072-appb-000061
Figure PCTCN2019078072-appb-000062
负相关,其中,
Figure PCTCN2019078072-appb-000063
x (2)表示高维训练数据,
Figure PCTCN2019078072-appb-000064
表示
Figure PCTCN2019078072-appb-000065
的范数;根据
Figure PCTCN2019078072-appb-000066
更新f′ 2得到f 2,f 2表示更新后的第二编码器。
上述方案提供了采用梯度下降算法时第二编码器的训练方法,能够不断优化第二编码器,使得高维训练数据更加准确地映射到低维训练数据的隐特征空间。例如,
Figure PCTCN2019078072-appb-000067
η表示学习率,取值范围是大于或等于0且小于或等于1。
可选地,根据
Figure PCTCN2019078072-appb-000068
Figure PCTCN2019078072-appb-000069
确定
Figure PCTCN2019078072-appb-000070
之前,方法200还包括:
对齐x (1)和x (2)的时间戳。
对齐低维训练数据和高维训练数据的时间戳能够将高维训练数据更加准确地映射到低维训练数据的隐特征空间
下面,将详细描述f 1、f 2和g的训练过程。
图3示出了本申请提供的利用强化学习模型训练模型(f 1,g)的流程。
该训练流程包括:
A0:自动驾驶智能体120初始化,设置当前时刻t为0。
A1:接收并读取当前时刻模拟车辆的x (0)、x (1)和r(如图3中虚线箭头所示)。
A2:通过强化学习模型得到的策略模型g计算a,a=g(x (0),f 1(x (1)))。
A3:将a发送给模拟器110。
A4:继续训练模型(f 1,g)。
B0:模拟器110初始化,设置地图、路线等交通环境。
B1:发送或更新当前时刻模拟车辆的x (0)、x (1)和r。
B2:接收并读取a(如图3中虚线箭头所示)。
B3:在模拟车辆上实施a。
B4:获取实施a的模拟结果,并根据模拟结果确定r,返回B1。
在上述训练流程中应用重放缓冲(replay buffer)技术,重放缓冲是一种定长的内存容器数据结构,其中存储的记录可替换。例如,在步骤A1中,将记录的(x t (0),x t (1),a t,r t,x t+1 (0),x t+1 (1))保存到该重放缓冲中。
之后,在步骤A4中,从上述重放缓冲中随机抽取一批(batch)数据来训练模型(f 1,g)。上述强化学习模型可以是离策略(off-policy)的,也可以是在策略(on-policy)的。在本实施例中,采用离策略的归一化先进函数(normalized advantage function,NAF)算法作为强化学习模型。
NAF算法是一种支持连续控制动作的Q学习(Q-learning)方法,其中,Q估值函数如公式(1)所示,它可以进一步被表示为先进(advantage)函数A和状态估值函数V。这里,一个重要的技术是将函数A表示成一个二次型,如公式(2)所示,其中,矩阵P为正定矩阵。矩阵P可以进一步被表示为下三角矩阵L与L的转置矩阵的乘积,如公式(3)所示。从而,控制动作a可以直接通过Q估值函数网络的子网络a=g(s)计算出来,其中,s由s (0)和s (1)拼接得到。
Q(s,a|θ Q)=A(s,a|θ A)+V(s|θ V)   (1)
Figure PCTCN2019078072-appb-000071
P(s|θ P)=L(s|θ L) TL(s|θ L)   (3)
NAF算法的训练过程为标准的Q学习方法,即,通过最小化Bellman函数的方差来实现最大化未来回报(累积奖励)的目标,如公式(4)所示。为了保持Q估值函数的稳定性,可以采用目标网络(target network)技巧,即,使用两个同样结构的Q估值函数网络,分别即为Q和Q′,前者(在策略)用来探索问题空间,后者(离策略)用来估值。整个学习过程通过公式(4)实现,用一个大小为N的batch数据来更新上述步骤A4中的模型,更新公式为公式(5)和公式(6)所示。
Figure PCTCN2019078072-appb-000072
Figure PCTCN2019078072-appb-000073
θ Q′=τθ Q-(1-τ)θ Q′    (6)
我们最终需要的是Q的两个子网络,即,第一编码器f 1和策略函数g。上述公式(4)中,γ表示打折(discount)因子,取值范围是0至1。公式(5)和公式(6)分别为θ Q和θ Q′的更新公式,即,等号左边的θ Q和θ Q′为更新后的参数,等号右边的θ Q和θ Q′为更新前的参数。
模型(f 1,g)训练完成后,使用装备有激光雷达和摄像头的车辆在真实的交通环境中同时采集低维的雷达扫描数据(即,低维训练数据)和高维的图像数据(即,高维训练数据),装备有激光雷达和摄像头的汽车物理装置400如图4所示。
其中,对数据采集有以下要求:固定激光雷达402和摄像头403的安装位置,对齐低维训练数据和高维训练数据的时间戳。
对齐低维训练数据和高维训练数据后,利用图2所示的方法训练f 2,采用方法200中的梯度下降算法优化f 2,即,通过最小化损失函数l优化f 2,最终得到能够将高维训练数据映射至低维训练数据的隐特征空间的第二编码器。
基于方法200得到的第二编码器,本申请还提供了一种生成自动驾驶策略的方法,如图5所示,该方法500包括:
S510,将高维真实数据输入第二编码器得到高维真实数据的隐特征空间,高维真实数据为交通工具从当前的交通场景中采集到的数据。
S520,根据高维真实数据的隐特征空间、交通工具的状态参数以及策略函数生成自动驾驶策略,所述自动驾驶策略用于控制交通工具在当前的交通场景下行驶。
执行方法500的设备例如是汽车,将包含第二编码器的控制策略模型部署到汽车上,可以利用高维传感器采集到的高维真实数据生成适用于当前交通场景的自动驾驶策略。上述高维传感器例如是高清摄像头,由于高清摄像头相对于激光雷达等低维传感器的价格低廉,且抗干扰能力较强,因此,通过方法500生成自动驾驶策略无需使用价格昂贵的低维传感器,能够以较低的成本实现自动驾驶,并且能够适应更多的交通环境。
下面,将详细本申请提供的生成自动驾驶策略的方法。
通过图3所述的训练过程,我们最终得到包含第二编码器的控制策略模型(f 2,g),它是f 2与g的复合函数。将模型(f 2,g)应用到图6所示的车辆600中。
车辆600包括汽车物理装置400,汽车物理装置400包括动力控制系统401(例如,加速器、方向盘和刹车装置),用于采集高维真实数据的摄像头403,以及用于采集自车状态参数x (0)的传感器404。
车辆600还包括自动驾驶系统601,自动驾驶系统601包括由路线规划模块602和控制决策模块603,其中,路线规划模块602用于基于驾驶任务、地图和定位信息来规划路线,控制决策模块603用于调用模型(f 2,g),并根据获取到的x (0)和x (2)计算车辆控制动作a,随后通过控制适配器604将a与动力控制系统601进行匹配,例如,控制动作a为刹车动作,则控制适配器604向动力控制系统601中的刹车装置发送命令,执行刹车动作,从而完成了自动驾驶。
自动驾驶系统601可以是通过软件实现的功能模块,也可以是通过硬件实现的功能模块。
图7示出了本申请提供的一种自动驾驶流程的示意图。
该驾驶流程包括:
C1:基于驾驶任务、地图和定位信息来规划路线。
C2:接收当前时刻的x (0)和x (2)(如图7中虚线箭头所示)。
C3:计算控制动作a,a=g(x (0),f 2(x (2))),并将控制动作匹配到动力控制系统601中相应的装置。
C4:向动力控制系统601发送命令(如图7中虚线箭头所示),返回C1。
D0:初始化,起动。
D1:发送当前时刻的自车状态参数x (0)和高维真实数据x (2)
D2:接收控制命令。
D3:执行控制动作a,返回步骤D1。
本申请还提供了一种自动驾驶策略生成系统,如图8所示,该系统800包括:
控制策略模型810、第一开关K1和第二开关K2,其中,控制策略模型810包括第一编码器f 1、第二编码器f 2和策略函数模块g。
第一开关K1用于控制f 1与g之间的通路状态,第二开关K2用于控制f 2与g之间的通路状态,f 1用于接收交通工具从交通场景中采集的低维真实数据并输出该低维真实数据的隐特征空间,f 2用于接收所述交通工具从交通场景中采集的高维真实数据并输出该高维真实数据的隐特征空间,g用于根据接收到的交通工具的状态参数和隐特征空间生成自动驾驶策略,所述自动驾驶策略用于控制交通工具在交通场景下行驶。
上述系统可以根据交通工具采集到的数据的类型选择不同的策略生成路径,例如,当采集到的数据为低维真实数据时,控制K1的工作状态为闭合,获取低维真实数据的隐特征空间,随后将低维真实数据的隐特征空间输入g,基于低维真实数据获得自动驾驶策略;当采集到的数据为高维真实数据时,控制K2的工作状态为闭合,获取高维真实数据的隐特征空间,随后将高维真实数据的隐特征空间输入g,基于高维真实数据获得自动驾驶策略。这样,即使交通工具的低维传感器或者高维传感器出现故障,只要有一个传感器正常工作,上述系统均可生成适用于当前交通环境的自动驾驶策略,因此,上述系统具有较强的灵活性和鲁棒性。
可选地,K1和K2的工作状态相反,以实现g接收到f 1或f 2输出的隐特征空间。
在上述方案中,K1和K2的工作状态相反,当K1的工作状态为闭合,K2的工作状态为断开时,f 1与g之间的通路状态为联通,f 2与g之间的通路状态为切断,以实现f 1向g输入低维真实数据的隐特征空间;当K1的工作状态为断开,K2的工作状态为闭合时,f 2与g之间的通路为联通,f 1与g之间的通路状态为切断,以实现f 2向g输入高维真实数据的隐特征空间。
因此,g在同一时刻只能接收到的一种数据的隐特征空间,可以避免因g同时接收到多种数据的隐特征空间导致系统800运行出错。
可选地,系统800还包括:
数据阀门,用于控制低维真实数据是否输入第一编码器,和,控制高维真实数据是否输入第二编码器。
上述方案通过数据阀门控制低维真实数据和高维真实数据的输入可以实现策略函数模块接收到第一编码器或第二编码器输出的隐特征空间,相对于控制第一开关和第二开关的闭合实现策略函数模块接收到第一编码器或第二编码器输出的隐特征空间的方案,上述通过数据阀门控制方案可以避免第一编码器或第二编码器做无用功。
本申请还提供了一种基于自动驾驶策略生成系统800的控制方法,如图9所示,该控制方法900包括:
S910,通过控制第一开关和第二开关的工作状态,向策略函数模块输入低维真实数据的隐特征空间或高维真实数据的隐特征空间。
方法900的执行装置可以是车载处理器或者是包含该车载处理器的汽车,该车载处理器可以根据采集到的数据的类型选择不同的策略生成路径,例如,当采集到的数据为低维真实数据时,控制第一开关的工作状态为闭合,基于低维真实数据获得自动驾驶策略,当采集到的数据为高维真实数据时,控制第二开关的工作状态为闭合,基于高维真实数据获得自动驾驶策略,因此,方法900具有较强的灵活性和鲁棒性。
车载处理器可以根据传感器采集到的数据所包含的信息的类型确定采集到的数据属于低维真实数据还是高维真实数据,可以通过如下原则判断采集到的数据为低维真实数据还是高维真实数据:
当采集到的数据包含的信息的类型的数量小于或等于数量阈值时,确定该数据为低维真实数据;当采集到的数据包含的信息的类型的数量大于数量阈值时,确定该数据为高维真实数据。
例如,上述数量阈值为2,当采集到的数据仅包含“距离”信息时,确定该数据为低维真实数据,当采集到的数据包含“距离”、“速度”和“障碍物类型”这三种信息时,确定该数据为高维真实数据。
上述方法仅是举例说明,本申请对车载处理器如何确定采集到的数据的类型不作限定。
可选地,S910包括:
控制第一开关的工作状态为闭合和第二开关的工作状态为断开;第一编码器向策略函数模块输入低维真实数据的隐特征空间。
可选地,所述低维真实数据的可靠性高于所述高维真实数据的可靠性。
当低维真实数据的可靠性高于高维真实数据的可靠性时,可以控制第一开关的工作状 态为闭合,以及,控制第二开关的工作状态为断开,从而可以利用高可靠性的数据获得高可靠性的自动驾驶策略。
可选地,所述通过控制第一开关和第二开关的工作状态,向策略函数模块输入低维真实数据的隐特征空间或高维真实数据的隐特征空间,包括:控制第二开关的工作状态为闭合和第一开关的工作状态为断开;第二编码器向策略函数模块输入高维真实数据的隐特征空间。
可选地,所述高维真实数据的可靠性高于所述低维真实数据的可靠性。
当高维真实数据的可靠性高于低维真实数据的可靠性时,可以控制第一开关的工作状态为断开,以及,控制第二开关的工作状态为闭合,从而可以利用高可靠性的数据获得高可靠性的自动驾驶策略。
例如,上述低维真实数据为汽车通过激光雷达从交通场景中采集的雷达数据,上述高维真实数据为汽车通过高清摄像头从交通场景中采集的图像数据。
若当前交通场景为雨雪场景,由于雨雪对激光雷达的干扰(例如,雨雪对雷达波的折射和/或反射使得雷达接收器不能接收到障碍物回波)大于对高清摄像头的干扰,因此,可以控制第一开关的工作状态为断开,以及,控制第二开关的工作状态为闭合,从而可以利用高可靠性的高维真实数据获得高可靠性的自动驾驶策略。
若当前交通场景为高强度光照场景,由于高强度光照对激光雷达的干扰小于对高清摄像头的干扰(例如,高强度的反射光使得高清摄像头的无法获得清晰的图像),因此,可以控制第一开关的工作状态为闭合,以及,控制第二开关的工作状态为断开,从而可以利用高可靠性的低维真实数据获得高可靠性的自动驾驶策略。
上文详细描述了本申请提供的训练用于生成自动驾驶策略的控制策略模型的方法的示例。可以理解的是,训练用于生成自动驾驶策略的控制策略模型的装置为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请可以根据上述方法示例对确定自动驾驶策略的装置进行功能单元的划分,例如,可以将各个功能划分为各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用集成的单元的情况下,图10示出了本申请提供的训练用于生成自动驾驶策略的控制策略模型的装置的一种可能的结构示意图。装置1000包括:处理单元1001和通信单元1002。处理单元1001用于控制装置1000执行图2所示的训练第二编码器的步骤。处理单元1001还可以用于执行本文所描述的技术的其它过程。装置1000还可以包括存储单元1003,用于存储装置1000的程序代码和数据。
例如,通信单元1002用于执行:获取低维训练数据的隐特征空间,其中,所述低维训练数据为从第一交通场景中采集到的数据;
处理单元1001用于执行:通过高维训练数据和所述低维训练数据的隐特征空间训练第二编码器,所述高维训练数据为从所述第一交通场景中采集到的数据,并且,所述低维训练数据包含的信息是所述高维训练数据包含的信息的子集,所述第二编码器为用于生成自动驾驶策略的控制策略模型的组成部分。
处理单元1001可以是处理器或控制器,例如可以是中央处理器(central processing unit,CPU),通用处理器,数字信号处理器(digital signal processor,DSP),专用集成电路(application-specific integrated circuit,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信单元1102例如是通信接口,存储单元1003可以是存储器。
当处理单元1001为处理器,通信单元1102例如是通信接口,存储单元1003为存储器时,本申请所涉及的确定自动驾驶策略的装置可以为图11所示的装置。
参阅图11所示,该装置1100包括:处理器1101、通信接口1102和存储器1103(可选的)。其中,处理器1101、通信接口1102和存储器1103可以通过内部连接通路相互通信,传递控制和/或数据信号。
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本申请提供的训练用于生成自动驾驶策略的控制策略模型的装置,首先获取低维训练数据的隐特征空间,随后利用低维训练数据的隐特征空间监督第二编码器的训练,得到能够将高维训练数据映射至低维训练数据的隐特征空间的编码器,从而获得利用高维真实数据直接生成可用的自动驾驶策略的控制策略模型。
在采用集成的单元的情况下,图12示出了本申请提供的生成自动驾驶策略的装置的一种可能的结构示意图。装置1200包括:处理单元1201和通信单元1202。处理单元1201用于控制装置1200执行图5所示的生成自动驾驶策略的步骤。处理单元1201还可以用于执行本文所描述的技术的其它过程。装置1200还可以包括存储单元1203,用于存储装置1200的程序代码和数据。
例如,通信单元1202用于获取高维真实数据;
处理单元1201用于执行:将高维真实数据输入第二编码器得到所述高维真实数据的隐特征空间,所述高维真实数据为交通工具从当前的交通场景中采集到的数据;根据所述高维真实数据的隐特征空间、所述交通工具的状态参数以及策略函数生成自动驾驶策略,所述自动驾驶策略用于控制所述交通工具在所述交通场景下行驶。
处理单元1201可以是处理器或控制器,例如可以是CPU,通用处理器,DSP,ASIC,FPGA或者其它可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信单元1202可以是通信接口,存储单元1203可以是存储器。
当处理单元1201为处理器,通信单元1202例如是通信接口,存储单元1203为存储器时,本申请所涉及的生成自动驾驶策略的装置可以为图13所示的装置。
参阅图13所示,该装置1300包括:处理器1301、通信接口1302和存储器1303(可选的)。其中,处理器1301、通信接口1302和存储器1303可以通过内部连接通路相互通信,传递控制和/或数据信号。
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本申请提供的生成自动驾驶策略的装置,部署包含第二编码器的控制策略模型,可以利用高维传感器采集到的高维真实数据生成适用于当前交通场景的自动驾驶策略。上述高维传感器例如是高清摄像头,由于高清摄像头相对于激光雷达等低维传感器的价格低廉,且抗干扰能力较强,因此,生成自动驾驶策略的装置无需使用价格昂贵的低维传感器,能够以较低的成本实现自动驾驶,并且能够适应更多的交通环境。
在采用集成的单元的情况下,图14示出了本申请提供的控制自动驾驶策略生成系统的装置一种可能的结构示意图。装置1400包括:处理单元1401和存储单元1403。处理单元1401用于控制装置1400执行图9所示的控制自动驾驶系统的步骤。处理单元1401还可以用于执行本文所描述的技术的其它过程。存储单元1403用于存储装置1400的程序代码和数据。装置1400还可以包括通信单元1402,用于与其它设备通信。
例如,处理单元1401用于执行:通过控制第一开关和第二开关的工作状态,向策略函数模块输入低维真实数据的隐特征空间或高维真实数据的隐特征空间。
处理单元1401可以是处理器或控制器,例如可以是CPU,通用处理器,DSP,ASIC,FPGA或者其它可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信单元1402可以是通信接口,存储单元1403可以是存储器。
当处理单元1401为处理器,通信单元1402例如是通信接口,存储单元1403为存储器时,本申请所涉及的控制自动驾驶策略生成系统的装置可以为图15所示的装置。
参阅图15所示,该装置1500包括:处理器1501、通信接口1502(可选的)和存储器1503。其中,处理器1501、通信接口1502和存储器1503可以通过内部连接通路相互通信,传递控制和/或数据信号。
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本申请提供的控制自动驾驶系统的装置,根据采集到的数据的类型选择不同的策略生成路径,例如,当采集到的数据为低维真实数据时,控制第一开关的工作状态为闭合,基于低维真实数据获得自动驾驶策略,当采集到的数据为高维真实数据时,控制第二开关的工作状态为闭合,基于高维真实数据获得自动驾驶策略,因此,方法900具有较强的灵活性和鲁棒性。
装置实施例和方法实施例完全对应,例如通信单元执行方法实施例中的获取步骤,除获取步骤和发送步骤以外的其它步骤均可以由处理单元或处理器执行。具体单元的功能可以参考相应的方法实施例,不再详述。
在本申请各个实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施过程构成任何限定。
另外,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
结合本申请公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read only memory,ROM)、可擦除可编程只读存储器(erasable programmable ROM,EPROM)、电可擦可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital versatile disc,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (29)

  1. 一种训练用于生成自动驾驶策略的控制策略模型的方法,其特征在于,包括:
    获取低维训练数据的隐特征空间,其中,所述低维训练数据为从第一交通场景中采集到的数据;
    通过高维训练数据和所述低维训练数据的隐特征空间训练第二编码器,所述高维训练数据为从所述第一交通场景中采集到的数据,并且,所述低维训练数据包含的信息是所述高维训练数据包含的信息的子集,所述第二编码器为用于生成自动驾驶策略的控制策略模型的组成部分。
  2. 根据权利要求1所述的方法,其特征在于,所述通过高维训练数据和所述低维训练数据的隐特征空间训练第二编码器,包括:
    将所述高维训练数据作为输入量输入所述第二编码器得到所述高维训练数据的隐特征空间,所述低维训练数据的隐特征空间用于监督所述第二编码器的输出结果,使得所述高维训练数据的隐特征空间与所述低维训练数据的隐特征空间相同。
  3. 根据权利要求1或2所述的方法,其特征在于,所述获取低维训练数据的隐特征空间,包括:
    将所述低维训练数据输入第一编码器得到所述低维训练数据的隐特征空间,所述第一编码器基于多个低维数据样本训练得到,所述多个低维数据样本中的每一个低维数据样本为从任一交通场景采集且与所述低维训练数据的类型相同的数据,所述第一编码器为所述控制策略模型的组成部分。
  4. 根据权利要求3所述的方法,其特征在于,在所述将所述低维训练数据输入第一编码器得到所述低维训练数据的隐特征空间之前,所述方法还包括:
    根据所述多个低维数据样本和多个交通工具的状态参数训练所述控制策略模型,得到所述第一编码器和所述策略函数,所述多个低维数据样本与所述多个交通工具的状态参数一一对应。
  5. 根据权利要求3所述的方法,其特征在于,所述将所述低维训练数据输入第一编码器得到所述低维训练数据的隐特征空间之前,所述方法还包括:
    根据
    Figure PCTCN2019078072-appb-100001
    Figure PCTCN2019078072-appb-100002
    确定
    Figure PCTCN2019078072-appb-100003
    其中,所述f 1′表示更新前的所述第一编码器,所述
    Figure PCTCN2019078072-appb-100004
    表示所述f 1′中除自变量之外的参数,所述s (1)表示所述隐特征空间,所述
    Figure PCTCN2019078072-appb-100005
    表示与所述
    Figure PCTCN2019078072-appb-100006
    相关的梯度s (1),所述
    Figure PCTCN2019078072-appb-100007
    表示与所述s (1)相关的梯度L RL,所述L RL表示与所述强化学习模型相关的损失函数,所述
    Figure PCTCN2019078072-appb-100008
    表示更新后的所述
    Figure PCTCN2019078072-appb-100009
    所述
    Figure PCTCN2019078072-appb-100010
    与所述
    Figure PCTCN2019078072-appb-100011
    正相关,且,所述
    Figure PCTCN2019078072-appb-100012
    与所述
    Figure PCTCN2019078072-appb-100013
    负相关;
    根据所述
    Figure PCTCN2019078072-appb-100014
    更新所述f 1′得到所述f 1,所述f 1表示更新后的所述第一编码器。
  6. 根据权利要求5所述的方法,其特征在于,所述通过高维训练数据和所述低维训练数据的隐特征空间训练第二编码器,包括:
    根据
    Figure PCTCN2019078072-appb-100015
    Figure PCTCN2019078072-appb-100016
    确定
    Figure PCTCN2019078072-appb-100017
    其中,所述f 2′表示更新前的所述第二编码器,所述
    Figure PCTCN2019078072-appb-100018
    表 示所述f 2′中除自变量之外的参数,所述
    Figure PCTCN2019078072-appb-100019
    表示与所述
    Figure PCTCN2019078072-appb-100020
    相关的梯度l,所述l表示
    Figure PCTCN2019078072-appb-100021
    Figure PCTCN2019078072-appb-100022
    的方差,所述
    Figure PCTCN2019078072-appb-100023
    表示更新后的所述
    Figure PCTCN2019078072-appb-100024
    所述
    Figure PCTCN2019078072-appb-100025
    与所述
    Figure PCTCN2019078072-appb-100026
    正相关,且,所述
    Figure PCTCN2019078072-appb-100027
    与所述
    Figure PCTCN2019078072-appb-100028
    负相关,其中,
    Figure PCTCN2019078072-appb-100029
    Figure PCTCN2019078072-appb-100030
    所述x (2)表示所述高维训练数据,所述
    Figure PCTCN2019078072-appb-100031
    表示所述
    Figure PCTCN2019078072-appb-100032
    的范数;
    根据所述
    Figure PCTCN2019078072-appb-100033
    更新所述f 2′得到所述f 2,所述f 2表示更新后的所述第二编码器。
  7. 根据权利要求6所述的方法,其特征在于,根据所述
    Figure PCTCN2019078072-appb-100034
    Figure PCTCN2019078072-appb-100035
    确定
    Figure PCTCN2019078072-appb-100036
    之前,所述方法还包括:
    对齐所述x (1)和所述x (2)的时间戳。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:
    获取高维真实数据,所述高维真实数据为交通工具从第二交通场景中采集到的数据,所述高维真实数据的类型与所述高维训练数据的类型相同;
    将所述交通工具的状态参数和所述高维真实数据输入所述控制策略模型,生成适用于所述第二交通场景的自动驾驶策略,所述自动驾驶策略用于控制所述交通工具在所述第二交通场景下行驶。
  9. 根据权利要求8所述的方法,其特征在于,所述控制策略模型还包括策略函数;
    其中,所述将交通工具的状态参数和所述高维真实数据输入所述控制策略模型,生成适用于所述第二交通场景的自动驾驶策略,包括:
    将所述高维真实数据输入所述第二编码器,得到所述高维真实数据的隐特征空间;
    根据所述高维真实数据的隐特征空间、所述交通工具的状态参数以及所述策略函数,得到所述自动驾驶策略。
  10. 一种生成自动驾驶策略的方法,其特征在于,所述方法包括:
    将高维真实数据输入第二编码器得到所述高维真实数据的隐特征空间,所述高维真实数据为交通工具从当前交通场景中采集到的数据;
    根据所述高维真实数据的隐特征空间、所述交通工具的状态参数以及策略函数生成自动驾驶策略,所述自动驾驶策略用于控制所述交通工具在所述当前交通场景下行驶;
    其中,所述第二编码器通过以下方法训练得到:
    将低维训练数据输入第一编码器得到所述低维训练数据的隐特征空间,所述低维训练数据为从第一交通场景中采集到的数据;
    通过高维训练数据和所述低维训练数据的隐特征空间训练第二编码器,所述高维训练数据为从所述第一交通场景中采集到的数据,并且,所述低维训练数据包含的信息是所述高维训练数据包含的信息的子集。
  11. 根据权利要求10所述的方法,其特征在于,所述通过高维训练数据和所述低维训练数据的隐特征空间训练第二编码器,包括:
    将所述高维训练数据作为输入量输入所述第二编码器得到所述高维训练数据的隐特征空间,所述低维训练数据的隐特征空间用于监督第二编码器的输出结果,使得所述高维训练数据的隐特征空间与所述低维训练数据的隐特征空间相同。
  12. 根据权利要求10或11所述的方法,其特征在于,所述第一编码器和所述策略函数通过以下方法训练得到:
    根据多个低维数据样本和多个交通工具的状态参数训练控制策略模型,得到所述第一编码器和所述策略函数,所述控制策略模型包括所述第一编码器和所述策略函数,所述多个低维数据样本中的每一个低维数据样本为从任一交通场景采集且与所述低维训练数据的类型相同的数据,所述多个低维数据样本与所述多个交通工具的状态参数一一对应。
  13. 一种自动驾驶策略生成系统,其特征在于,所述自动驾驶策略生成系统包括控制策略模型、第一开关、第二开关,所述控制策略模型包括第一编码器、第二编码器、策略函数模块;
    其中,所述第一开关用于控制所述第一编码器与所述策略函数模块之间的通路状态,所述第二开关用于控制所述第二编码器与所述策略函数模块之间的通路状态,所述第一编码器用于接收交通工具从交通场景中采集的低维真实数据并输出所述低维真实数据的隐特征空间,所述第二编码器用于接收所述交通工具从所述交通场景中采集的高维真实数据并输出所述高维真实数据的隐特征空间,所述策略函数模块用于根据接收到的所述交通工具的状态参数和隐特征空间生成自动驾驶策略,所述自动驾驶策略用于控制所述交通工具在所述交通场景下行驶。
  14. 根据权利要求13所述的系统,其特征在于,所述第一开关和所述第二开关的工作状态相反,以实现所述策略函数模块接收到所述第一编码器或所述第二编码器输出的隐特征空间。
  15. 根据权利要求14所述的系统,其特征在于,当所述第一开关的工作状态为闭合,所述第二开关的工作状态为断开时,所述第一编码器与所述策略函数模块之间的通路状态为联通,所述第二编码器与所述策略函数模块之间的通路状态为切断,以实现所述第一编码器向所述策略函数模块输入所述低维真实数据的隐特征空间。
  16. 根据权利要求14所述的系统,其特征在于,当所述第一开关的工作状态为断开,所述第二开关的工作状态为闭合时,所述第二编码器与所述策略函数模块之间的通路为联通,所述第一编码器与所述策略函数模块之间的通路状态为切断,以实现所述第二编码器向所述策略函数模块输入所述高维真实数据的隐特征空间。
  17. 根据权利要求13至16中任一项所述的系统,其特征在于,还包括:数据阀门,用于控制所述低维真实数据是否输入所述第一编码器,和控制所述高维真实数据是否输入所述第二编码器。
  18. 一种自动驾驶策略生成系统的控制方法,其特征在于,所述自动驾驶策略生成系统包括控制策略模型、第一开关、第二开关,所述控制策略模型包括第一编码器、第二编码器、策略函数模块;其中,所述第一开关用于控制所述第一编码器与所述策略函数模块之间的通路状态,所述第二开关用于控制所述第二编码器与所述策略函数模块之间的通路状态,所述第一编码器用于接收交通工具从交通场景中采集的低维真实数据并输出所述低维真实数据的隐特征空间,所述第二编码器用于接收所述交通工具从所述交通场景中采集的高维真实数据并输出所述高维真实数据的隐特征空间,所述策略函数模块用于根据接收到的所述交通工具的状态参数和隐特征空间生成自动驾驶策略;
    所述方法包括:
    通过控制所述第一开关和所述第二开关的工作状态,向所述策略函数模块输入所述低维真实数据的隐特征空间或所述高维真实数据的隐特征空间。
  19. 根据权利要求18所述的方法,其特征在于,所述通过控制所述第一开关和所述第二开关的工作状态,向所述策略函数模块输入所述低维真实数据的隐特征空间或所述高维真实数据的隐特征空间,包括:
    控制所述第一开关的工作状态为闭合和所述第二开关的工作状态为断开;
    所述第一编码器向所述策略函数模块输入所述低维真实数据的隐特征空间。
  20. 根据权利要求18所述的方法,其特征在于,所述通过控制所述第一开关和所述第二开关的工作状态,向所述策略函数模块输入所述低维真实数据的隐特征空间或所述高维真实数据的隐特征空间,包括:
    控制所述第二开关的工作状态为闭合和所述第一开关的工作状态为断开;
    所述第二编码器向所述策略函数模块输入所述高维真实数据的隐特征空间。
  21. 根据权利要求19所述的方法,其特征在于,所述低维真实数据的可靠性高于所述高维真实数据的可靠性。
  22. 根据权利要求20所述的方法,其特征在于,所述高维真实数据的可靠性高于所述低维真实数据的可靠性。
  23. 根据权利要求18至22中任一项所述的方法,其特征在于,所述低维真实数据为所述交通工具通过雷达从所述交通场景中采集的雷达数据,所述高维真实数据为所述交通工具通过摄像头从所述交通场景中采集的图像数据。
  24. 一种训练用于生成自动驾驶策略的控制策略模型的装置,其特征在于,包括处理单元和存储单元,所述存储单元存储有指令,当所述指令被所述处理单元运行时,使得所述处理单元执行如权利要求1至9中任一项所述的方法。
  25. 一种生成自动驾驶策略的装置,其特征在于,包括处理单元和存储单元,所述存储单元存储有指令,当所述指令被所述处理单元运行时,使得所述处理单元执行如权利要求10至12中任一项所述的方法。
  26. 一种控制自动驾驶策略生成系统的装置,其特征在于,包括处理单元和存储单元,所述存储单元存储有指令,当所述指令被所述处理单元运行时,使得所述处理单元执行如权利要求18至23中任一项所述的方法。
  27. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,当处理器调用所述计算机程序时,使得所述处理器执行权利要求1至9中任一项所述的方法。
  28. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,当处理器调用所述计算机程序时,使得所述处理器执行权利要求10至12中任一项所述的方法。
  29. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,当处理器调用所述计算机程序时,使得所述处理器执行权利要求18至23中任一项所述的方法。
PCT/CN2019/078072 2018-08-08 2019-03-14 训练用于生成自动驾驶策略的控制策略模型的方法和装置 WO2020029580A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810898344.7A CN110824912B (zh) 2018-08-08 2018-08-08 训练用于生成自动驾驶策略的控制策略模型的方法和装置
CN201810898344.7 2018-08-08

Publications (1)

Publication Number Publication Date
WO2020029580A1 true WO2020029580A1 (zh) 2020-02-13

Family

ID=69413256

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/078072 WO2020029580A1 (zh) 2018-08-08 2019-03-14 训练用于生成自动驾驶策略的控制策略模型的方法和装置

Country Status (2)

Country Link
CN (1) CN110824912B (zh)
WO (1) WO2020029580A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666762A (zh) * 2020-03-31 2022-06-24 华为技术有限公司 驾驶数据采集方法及装置
CN111625948B (zh) * 2020-05-20 2023-09-29 阿波罗智能技术(北京)有限公司 超长场景的回放型仿真方法、装置、设备和介质
CN112666833B (zh) * 2020-12-25 2022-03-15 吉林大学 一种用于电动自动驾驶车辆的车速跟随自适应鲁棒控制方法
CN114358128A (zh) * 2021-12-06 2022-04-15 深圳先进技术研究院 一种训练端到端的自动驾驶策略的方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779280A (zh) * 2012-06-19 2012-11-14 武汉大学 一种基于激光传感器的交通信息提取方法
CN105608444A (zh) * 2016-01-27 2016-05-25 大连楼兰科技股份有限公司 一种用于自动驾驶的野生动物图像识别方法
CN106203346A (zh) * 2016-07-13 2016-12-07 吉林大学 一种面向智能车辆驾驶模式切换的道路环境图像分类方法
CN108196535A (zh) * 2017-12-12 2018-06-22 清华大学苏州汽车研究院(吴江) 基于增强学习和多传感器融合的自动驾驶系统

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5179905A (en) * 1991-11-19 1993-01-19 Raytheon Company Adaptive autopilot
CN103996056B (zh) * 2014-04-08 2017-05-24 浙江工业大学 一种基于深度学习的纹身图像分类方法
CN104391504B (zh) * 2014-11-25 2017-05-31 浙江吉利汽车研究院有限公司 基于车联网的自动驾驶控制策略的生成方法与生成装置
CN106525063A (zh) * 2017-01-11 2017-03-22 奇瑞汽车股份有限公司 一种自动驾驶汽车自主加油方法和智能车
CN107169567B (zh) * 2017-03-30 2020-04-07 深圳先进技术研究院 一种用于车辆自动驾驶的决策网络模型的生成方法及装置
CN107563426B (zh) * 2017-08-25 2020-05-22 清华大学 一种机车运行时序特征的学习方法
CN107697070B (zh) * 2017-09-05 2020-04-07 百度在线网络技术(北京)有限公司 驾驶行为预测方法和装置、无人车
CN107862346B (zh) * 2017-12-01 2020-06-30 驭势科技(北京)有限公司 一种进行驾驶策略模型训练的方法与设备
CN107977629A (zh) * 2017-12-04 2018-05-01 电子科技大学 一种基于特征分离对抗网络的人脸图像衰老合成方法
CN108062569B (zh) * 2017-12-21 2020-10-27 东华大学 一种基于红外和雷达的无人车驾驶决策方法
CN108830308B (zh) * 2018-05-31 2021-12-14 西安电子科技大学 一种基于信号的传统特征与深度特征融合的调制识别方法
CN110633725B (zh) * 2018-06-25 2023-08-04 富士通株式会社 训练分类模型的方法和装置以及分类方法和装置
CN109934295B (zh) * 2019-03-18 2022-04-22 重庆邮电大学 一种基于超限隐特征学习模型的图像分类与重建方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779280A (zh) * 2012-06-19 2012-11-14 武汉大学 一种基于激光传感器的交通信息提取方法
CN105608444A (zh) * 2016-01-27 2016-05-25 大连楼兰科技股份有限公司 一种用于自动驾驶的野生动物图像识别方法
CN106203346A (zh) * 2016-07-13 2016-12-07 吉林大学 一种面向智能车辆驾驶模式切换的道路环境图像分类方法
CN108196535A (zh) * 2017-12-12 2018-06-22 清华大学苏州汽车研究院(吴江) 基于增强学习和多传感器融合的自动驾驶系统

Also Published As

Publication number Publication date
CN110824912B (zh) 2021-05-18
CN110824912A (zh) 2020-02-21

Similar Documents

Publication Publication Date Title
WO2020029580A1 (zh) 训练用于生成自动驾驶策略的控制策略模型的方法和装置
CN111123933B (zh) 车辆轨迹规划的方法、装置、智能驾驶域控制器和智能车
WO2021238303A1 (zh) 运动规划的方法与装置
EP3835908B1 (en) Automatic driving method, training method and related apparatuses
Chen et al. Autonomous vehicle testing and validation platform: Integrated simulation system with hardware in the loop
CN110244701B (zh) 用于基于自动生成的课程序列的自主车辆的强化学习的方法和装置
CN109109863B (zh) 智能设备及其控制方法、装置
US20200139989A1 (en) Vehicle Control Method, Apparatus, and Device
CN111923928A (zh) 用于自动车辆的决策制定方法和系统
CN111923927B (zh) 用于交互感知交通场景预测的方法和装置
CN111273655A (zh) 用于自动驾驶车辆的运动规划方法和系统
US11962664B1 (en) Context-based data valuation and transmission
JP2022506404A (ja) 車両速度を決定する方法及び装置
CN112382165B (zh) 驾驶策略生成方法、装置、介质、设备及仿真系统
US11586865B2 (en) Apparatus, system and method for fusing sensor data to do sensor translation
KR20220054755A (ko) 운전 행위 습관 결정, 차량 주행 제어 방법, 장치 및 기기
CN114358128A (zh) 一种训练端到端的自动驾驶策略的方法
US10836405B2 (en) Continual planning and metareasoning for controlling an autonomous vehicle
CN111208814B (zh) 用于自动车辆的、利用动态模型的、基于记忆的最优运动规划
EP4119412A1 (en) Vehicle-based data processing method and apparatus, computer, and storage medium
US20230347979A1 (en) Methods and processors for controlling steering of self-driving car
CN113591518A (zh) 一种图像的处理方法、网络的训练方法以及相关设备
CN113119999B (zh) 自动驾驶特征的确定方法、装置、设备、介质及程序产品
CN116403174A (zh) 一种端到端自动驾驶方法、系统、仿真系统及存储介质
CN113066124A (zh) 一种神经网络的训练方法以及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19846054

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19846054

Country of ref document: EP

Kind code of ref document: A1