WO2020233495A1 - 自动换道方法、装置及存储介质 - Google Patents

自动换道方法、装置及存储介质 Download PDF

Info

Publication number
WO2020233495A1
WO2020233495A1 PCT/CN2020/090234 CN2020090234W WO2020233495A1 WO 2020233495 A1 WO2020233495 A1 WO 2020233495A1 CN 2020090234 W CN2020090234 W CN 2020090234W WO 2020233495 A1 WO2020233495 A1 WO 2020233495A1
Authority
WO
WIPO (PCT)
Prior art keywords
moment
information
current
autonomous vehicle
historical
Prior art date
Application number
PCT/CN2020/090234
Other languages
English (en)
French (fr)
Inventor
陈晨
钱俊
刘武龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20809674.3A priority Critical patent/EP3965004A4/en
Publication of WO2020233495A1 publication Critical patent/WO2020233495A1/zh
Priority to US17/532,640 priority patent/US20220080972A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/10Path keeping
    • B60W30/12Lane keeping
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/14Adaptive cruise control
    • B60W30/16Control of distance between vehicles, e.g. keeping a distance to preceding vehicle
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/18Propelling the vehicle
    • B60W30/18009Propelling the vehicle related to particular drive situations
    • B60W30/18163Lane change; Overtaking manoeuvres
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/02Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to ambient conditions
    • B60W40/04Traffic conditions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0011Planning or execution of driving tasks involving control alternatives for a single driving scenario, e.g. planning several paths to avoid obstacles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0062Adapting control system settings
    • B60W2050/0075Automatic parameter input, automatic initialising or calibrating means
    • B60W2050/0083Setting, resetting, calibration
    • B60W2050/0088Adaptive recalibration
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/406Traffic density
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/80Spatial relation or speed relative to objects
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2556/00Input parameters relating to data
    • B60W2556/10Historical data

Definitions

  • This application relates to the field of automatic driving technology, and in particular to an automatic lane changing method, device and storage medium.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.
  • Autonomous driving is a mainstream application in the field of artificial intelligence.
  • Autonomous driving technology relies on the collaboration of computer vision, radar, monitoring devices, and global positioning systems to allow motor vehicles to achieve autonomous driving without the need for human active operations.
  • Self-driving vehicles use various computing systems to help transport passengers from one location to another. Some autonomous vehicles may require some initial input or continuous input from an operator (such as a navigator, driver, or passenger). The self-driving vehicle allows the operator to switch from the manual mode to the self-driving mode or a mode in between. Since autonomous driving technology does not require humans to drive motor vehicles, it can theoretically effectively avoid human driving errors, reduce traffic accidents, and improve highway transportation efficiency. Therefore, autonomous driving technology has received more and more attention. In the field of autonomous driving technology, the design of intelligent lane-changing decisions for autonomous vehicles still faces huge challenges.
  • a deep network is used to simulate the action value function Q corresponding to the current state and action; among them, the input of the action value function Q is a local relative semantic grid centered on the autonomous vehicle, by considering the closest distance to the autonomous vehicle The speed and distance information of neighboring cars, as well as some road semantic information (for example, whether the lane of the autonomous vehicle is an acceleration lane or a left turn lane, etc.), so as to select the action with the highest Q value of the action value function as the current decision action.
  • the generated strategy action is not a globally optimal strategy action.
  • the embodiments of the present application provide an automatic lane changing method, device, and storage medium, which solves the problem that the strategy action generated in the related technology is not the global optimal strategy action.
  • an embodiment of the present application provides an automatic lane changing method, including:
  • the local neighbor features and global statistical features of the autonomous vehicle at the current moment are calculated; the local neighbor features are used to represent the The motion state information of specific neighbor obstacles of the autonomous vehicle relative to the autonomous vehicle; the global statistical feature is used to indicate the sparseness and density of obstacles in each lane within the perception range;
  • the target action instruction is used to instruct the autonomous vehicle to perform a target action.
  • the target action includes at least two types: changing lanes or staying straight;
  • the local neighbor characteristics and global statistics of the autonomous vehicle at the current moment are calculated based on the current driving information of the autonomous vehicle and the movement information of the obstacles in each lane within the autonomous vehicle’s perception range Features; Further, the target action instructions are obtained according to the local neighbor features, global statistical features and the current control strategy, and the target action is executed according to the target action instructions. It can be seen that, on the basis of local neighbor features, further introducing global statistical features and inputting the current control strategy to obtain target action instructions, not only considers the information of local neighbor obstacles (such as other cars), but also considers global statistical features (such as overall Therefore, the target action obtained by integrating the local and all road obstacle information is the global optimal strategic action.
  • the method further includes:
  • the feedback information is obtained by executing the target action, and the feedback information is used to update the current control strategy; where the feedback information includes the driving information of the autonomous vehicle after the target action is executed, the driving information of the autonomous vehicle at the next moment, and The movement information of obstacles in each lane within the autonomous vehicle's perception range at the next moment; when the target action is a lane change, the feedback information also includes: the ratio of the time to execute the target action to the historical average time, and the automatic The change in the degree of sparseness and density between the lane of the driving vehicle after the lane change and the lane before the lane change; where the historical average time includes the average time for the autonomous vehicle to perform similar actions in the preset historical time period;
  • the feedback information is obtained by executing the target action, and the current control strategy is updated according to the feedback information to obtain the control strategy at the next time, so that the next time can be accurately determined according to the control strategy at the next time.
  • Target action It is worth noting that at a later time, the control strategy at time t can also be continuously updated based on the feedback information at time t to obtain the control strategy at time t+1, so that the control strategy for generating the target action has been adaptively continued. In the update and optimization, it is ensured that each moment has its corresponding optimal control strategy, which provides a guarantee for the accurate generation of the target action at each moment.
  • updating the current control strategy according to the feedback information to obtain the control strategy at the next moment includes:
  • the quaternion information at the current moment corresponds to the vehicle condition at the current moment, including: the characteristics of the current moment, the target action, the reward value corresponding to the target action, and the characteristics of the next moment ,
  • the feature at the current moment includes the local neighbor feature and global statistical feature of the autonomous vehicle at the current moment, and the feature at the next moment includes the local neighbor feature and global statistical feature of the autonomous vehicle at the next moment;
  • the current control strategy is updated according to the quadruple information at the current moment to obtain the control strategy at the next moment.
  • updating the current control strategy according to the quadruple information at the current moment to obtain the control strategy at the next moment includes:
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • updating the current control strategy according to the quadruple information at the current moment to obtain the control strategy at the next moment includes:
  • the extended quad information at the current moment corresponds to the current extended vehicle condition, where the current extended vehicle condition is obtained by processing the symmetric rule and the monotonic rule on the current vehicle condition.
  • the rule refers to symmetrically changing the positions of obstacles in all lanes on the left and right sides of the lane where the autonomous vehicle is located, using the lane where the autonomous vehicle is located;
  • the monotonic rule refers to the location of the obstacles on the target lane of the lane change. The distance between the front and rear neighbor obstacles of the autonomous driving vehicle is increased, and/or the distance between the front and rear neighbor obstacles of the autonomous driving vehicle on the non-target lane changes less than the preset distance range;
  • the current control strategy is updated to obtain the control strategy at the next moment.
  • the updating the current control strategy to obtain the control strategy at the next time according to the quadruple information at the current time and the extended quadruple information at the current time includes:
  • the target value corresponding to the i-th quadruple information is generated; wherein, the i is a pass A positive integer not greater than n, where n is the total number of quadruple information included in the quadruple information at the current moment and the extended quadruple information at the current moment;
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • updating the current control strategy according to the quadruple information at the current moment to obtain the control strategy at the next moment includes:
  • the quaternion information of the historical moment corresponds to the vehicle condition at the historical moment, including: the characteristics of the historical moment, the target action at the historical moment, the reward value corresponding to the target action at the historical moment, and the characteristics of the next moment at the historical moment.
  • the characteristics of the historical moment include the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the historical moment, and the characteristics of the next moment in the historical moment include the local neighbor characteristics and global characteristics of the autonomous vehicle at the next moment in the historical moment.
  • Statistical characteristics; the extended quad information of the historical moment corresponds to the extended vehicle condition at the historical moment, and the extended vehicle condition at the historical moment is obtained by processing the vehicle condition at the historical moment on symmetrical rules and monotonous rules.
  • the current control strategy is updated to obtain the current control strategy based on the quad information at the current moment, the quad information at the historical moment, and the extended quad information at the historical moment.
  • Control strategies including:
  • the target corresponding to the jth quaternion information is generated Value;
  • the j is a positive integer not greater than m
  • m is the quaternion information of the current moment, the quaternion information of the historical moment, and the quaternion included in the extended quaternion information of the historical moment Total group information;
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • updating the current control strategy according to the quadruple information at the current moment to obtain the control strategy at the next moment includes:
  • the extended quadruple information at the current moment corresponds to the current extended vehicle condition at the current moment, and the current extended vehicle condition is obtained by processing the symmetrical rule and the monotonous rule on the current vehicle condition;
  • the current control strategy is updated to obtain the next moment The control strategy; where the quad information at the historical moment corresponds to the vehicle conditions at the historical moment, the extended quad information at the historical moment corresponds to the extended vehicle conditions at the historical moment, and the extended vehicle conditions at the historical moment are symmetrical and monotonous to the vehicle conditions at the historical moment Rule processing.
  • the The current control strategy is updated to obtain the next time control strategy, including:
  • the quadruple information of the current moment the extended quadruple information of the current moment, the quadruple information of the historical moment, and the k-th quadruple information in the extended quadruple information of the historical moment, the The target value corresponding to the k-th quaternion information; where k is a positive integer not greater than p, and p is the quaternion information of the current moment, the extended quaternion information of the current moment, and the historical moment
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the return value corresponding to the target action is calculated according to the feedback information, as well as the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the next moment, including :
  • the local neighbor characteristics and global statistical features of the autonomous vehicle at the next moment are calculated.
  • the return value corresponding to the target action is calculated according to the feedback information, as well as the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the next moment, including :
  • the ratio of the time to execute the target action to the historical average time, and the degree of sparseness and density of the lane where the autonomous vehicle is located after the lane change and the lane before the lane change Calculate the return value
  • the local neighbor characteristics and global statistical features of the autonomous vehicle at the next moment are calculated.
  • the specific neighbor obstacles of the autonomous vehicle include at least one of the following: front and rear obstacles adjacent to the autonomous vehicle in the lane where the autonomous vehicle is located, and the lane where the autonomous vehicle is located Front and rear obstacles adjacent to the autonomous vehicle in the adjacent left lane of the vehicle, and front and rear obstacles adjacent to the autonomous vehicle in the adjacent right lane of the lane where the autonomous vehicle is located;
  • the front and rear obstacles adjacent to the autonomous driving vehicle in the adjacent left lane of the lane where the autonomous driving vehicle is located have a default value relative to the motion state information of the autonomous driving vehicle; and / or,
  • the front and rear obstacles adjacent to the autonomous vehicle in the adjacent right lane of the lane where the autonomous vehicle is located have a default value relative to the motion state information of the autonomous vehicle.
  • the global traffic flow statistics feature of the autonomous vehicle at the current moment includes at least one of the following: the average traveling speed and the average interval of all obstacles in each lane within the sensing range.
  • an embodiment of the present application provides an automatic lane changing device, including:
  • the calculation module is used to calculate the local neighbor characteristics and global statistical characteristics of the automatic driving vehicle at the current time according to the driving information of the automatic driving vehicle at the current time and the motion information of the obstacles in each lane within the sensing range of the automatic driving vehicle;
  • the local neighbor Features are used to represent the motion state information of specific neighbor obstacles of the autonomous vehicle relative to the autonomous vehicle;
  • the global statistical features are used to represent the sparseness and denseness of obstacles in each lane within the perception range;
  • the acquisition module is used to acquire a target action instruction according to the local neighbor feature, the global statistical feature and the current control strategy.
  • the target action instruction is used to instruct the autonomous vehicle to perform a target action.
  • the target action includes at least two types: lane change or Keep going straight
  • the execution module is used to execute the target action according to the target action instruction.
  • the device further includes:
  • the feedback module is used to obtain feedback information by executing the target action, and the feedback information is used to update the current control strategy; wherein the feedback information includes the driving information of the autonomous vehicle after the target action is executed, and the autonomous vehicle is next The driving information at the time and the movement information of the obstacles in each lane within the sensing range of the autonomous vehicle at the next time; when the target action is a lane change, the feedback information also includes: the time to execute the target action and the historical average time The ratio, and the change in the degree of sparseness and density between the lane of the autonomous vehicle after the lane change and the lane before the lane change; among them, the historical average time includes the autonomous vehicle’s execution of similar actions in a preset historical time period Average time;
  • the update module is used to update the current control strategy according to the feedback information to obtain the next control strategy.
  • the update module includes:
  • the calculation unit is configured to calculate the return value corresponding to the target action according to the feedback information, and the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the next moment;
  • the determining unit is used to determine the quadruple information at the current moment; wherein the quadruple information at the current moment corresponds to the current vehicle condition, including: the characteristics of the current moment, the target action, the reward value corresponding to the target action, and the Features at the next moment, the features at the current moment include the local neighbor features and global statistical features of the autonomous vehicle at the current moment, and the features at the next moment include the local neighbor features and global statistical features of the autonomous vehicle at the next moment ;
  • the update unit is used to update the current control strategy according to the quadruple information at the current moment to obtain the control strategy at the next moment.
  • the update unit is specifically configured to:
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the update unit is specifically configured to:
  • the extended quad information at the current moment corresponds to the current extended vehicle condition, where the current extended vehicle condition is obtained by processing the symmetric rule and the monotonic rule on the current vehicle condition.
  • the rule refers to symmetrically changing the positions of obstacles in all lanes on the left and right sides of the lane where the autonomous vehicle is located, using the lane where the autonomous vehicle is located;
  • the monotonic rule refers to the location of the obstacles on the target lane of the lane change. The distance between the front and rear neighbor obstacles of the autonomous driving vehicle is increased, and/or the distance between the front and rear neighbor obstacles of the autonomous driving vehicle on the non-target lane changes less than the preset distance range;
  • the current control strategy is updated to obtain the control strategy at the next moment.
  • the update unit is specifically used for:
  • the target value corresponding to the i-th quadruple information is generated; wherein, the i is a pass A positive integer not greater than n, where n is the total number of quadruple information included in the quadruple information at the current moment and the extended quadruple information at the current moment;
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the update unit is specifically configured to:
  • the quaternion information of the historical moment corresponds to the vehicle condition at the historical moment, including: the characteristics of the historical moment, the target action at the historical moment, the reward value corresponding to the target action at the historical moment, and the characteristics of the next moment at the historical moment.
  • the characteristics of the historical moment include the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the historical moment, and the characteristics of the next moment in the historical moment include the local neighbor characteristics and global characteristics of the autonomous vehicle at the next moment in the historical moment.
  • Statistical characteristics; the extended quad information of the historical moment corresponds to the extended vehicle condition at the historical moment, and the extended vehicle condition at the historical moment is obtained by processing the vehicle condition at the historical moment on symmetrical rules and monotonous rules.
  • the update unit is specifically used for:
  • the target corresponding to the jth quaternion information is generated Value;
  • the j is a positive integer not greater than m
  • m is the quaternion information of the current moment, the quaternion information of the historical moment, and the quaternion included in the extended quaternion information of the historical moment Total group information;
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the update unit is specifically configured to:
  • the extended quadruple information at the current moment corresponds to the current extended vehicle condition at the current moment, and the current extended vehicle condition is obtained by processing the symmetrical rule and the monotonous rule on the current vehicle condition;
  • the current control strategy is updated to obtain the next moment The control strategy; where the quad information at the historical moment corresponds to the vehicle conditions at the historical moment, the extended quad information at the historical moment corresponds to the extended vehicle conditions at the historical moment, and the extended vehicle conditions at the historical moment are symmetrical and monotonous to the vehicle conditions at the historical moment Rule processing.
  • the basis update unit is specifically used for:
  • the quadruple information of the current moment the extended quadruple information of the current moment, the quadruple information of the historical moment, and the k-th quadruple information in the extended quadruple information of the historical moment, the The target value corresponding to the k-th quaternion information; where k is a positive integer not greater than p, and p is the quaternion information of the current moment, the extended quaternion information of the current moment, and the historical moment
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the calculation unit is specifically configured to:
  • the local neighbor characteristics and global statistical features of the autonomous vehicle at the next moment are calculated.
  • the calculation unit is specifically configured to:
  • the ratio of the time to execute the target action to the historical average time, and the degree of sparseness and density of the lane where the autonomous vehicle is located after the lane change and the lane before the lane change Calculate the return value
  • the local neighbor characteristics and global statistical features of the autonomous vehicle at the next moment are calculated.
  • the specific neighbor obstacles of the autonomous vehicle include at least one of the following: front and rear obstacles adjacent to the autonomous vehicle in the lane where the autonomous vehicle is located, and the lane where the autonomous vehicle is located Front and rear obstacles adjacent to the autonomous vehicle in the adjacent left lane of the vehicle, and front and rear obstacles adjacent to the autonomous vehicle in the adjacent right lane of the lane where the autonomous vehicle is located;
  • the front and rear obstacles adjacent to the autonomous driving vehicle in the adjacent left lane of the lane where the autonomous driving vehicle is located have a default value relative to the motion state information of the autonomous driving vehicle; and / or,
  • the front and rear obstacles adjacent to the autonomous vehicle in the adjacent right lane of the lane where the autonomous vehicle is located have a default value relative to the motion state information of the autonomous vehicle.
  • the global traffic flow statistics feature of the autonomous vehicle at the current moment includes at least one of the following: the average traveling speed and the average interval of all obstacles in each lane within the sensing range.
  • an embodiment of the present application provides an automatic lane changing device, including: a processor and a memory;
  • the memory is used to store program instructions
  • the processor is configured to call and execute the program instructions stored in the memory.
  • the automatic lane changing device is configured to execute the method described in any implementation manner of the first aspect. .
  • an embodiment of the present application provides a computer-readable storage medium that stores instructions in the computer-readable storage medium, and when the instructions are run on a computer, the computer executes any implementation of the first aspect described above. Methods.
  • an embodiment of the present application provides a program that is used to execute the method described in any implementation manner of the first aspect when the program is executed by a processor.
  • an embodiment of the present application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method described in any implementation manner of the first aspect.
  • an embodiment of the present application provides a method for training a control strategy, including:
  • Step A Obtain the quaternion information of a preset number of historical moments, where the quaternion information of the historical moment corresponds to the vehicle condition at the historical moment, including: the characteristics of the historical moment, the target action of the autonomous vehicle at the historical moment, The return value corresponding to the target action at the historical moment and the characteristics of the next moment in the historical moment.
  • the characteristics of the historical moment include the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the historical moment.
  • the next moment in the historical moment The features of includes the local neighbor features and global statistical features of the autonomous vehicle at the next moment in the historical moment;
  • Step B According to the quadruple information of at least one first historical moment, the extended quadruple information of the at least one first historical moment, and the quadruple information of at least one second historical moment, update the current control strategy to obtain Control strategy at the next moment;
  • step A and step B reach the preset number of times, or the step A and step B are executed repeatedly until the updated control strategy meets the preset condition and stop; the step A and step B are executed repeatedly
  • the control strategy finally obtained is used for the automatic lane changing device to obtain the target action instruction when the automatic lane changing method is executed;
  • the quaternion information of the at least one first historical moment is the quaternion information of the quaternion information of the preset number of historical moments.
  • the target action of the historical moment is the quaternion information of the historical moment corresponding to the lane change;
  • the at least one The quaternion information of the second historical moment is the quaternion information of other historical moments except the quaternion information of the at least one first historical moment among the quaternion information of the preset number of historical moments;
  • the extended quad information at the first historical moment corresponds to the extended vehicle condition at the first historical moment, and the extended vehicle condition at the first historical moment is obtained by processing the vehicle condition at the first historical moment on symmetric rules and monotonic rules.
  • the quaternion information of a preset number of historical moments is acquired; further, based on the quaternion information of at least one first historical moment and the extension of at least one first historical moment
  • the quaternion information and the quaternion information of at least one second historical moment are updated to obtain the control strategy of the next moment. It can be seen that, on the basis of the quaternion information of the preset number of historical moments, the current control strategy is updated based on the extended quaternion information of the first historical moment in the quaternion information of the preset number of historical moments. , So that a more accurate control strategy can be obtained, so that the corresponding target action can be accurately determined.
  • Update the current control strategy to get the next control strategy including:
  • the extended four-tuple information of the at least one first historical moment, the extended four-tuple information of the at least one first historical moment, and the l-th four-tuple information in the four-tuple information of the at least one second historical moment Generate the target value corresponding to the l th quaternion information; where l is a positive integer not greater than q, q is the quaternion information of the at least one first historical moment, the at least one first historical The extended quadruple information of the time, and the total number of quadruple information included in the quad information of the at least one second historical time;
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the method before obtaining the quadruple information of a preset number of historical moments, the method further includes:
  • the target action instruction is used to instruct the autonomous vehicle to perform a target action.
  • the target action includes at least two types : Change lanes or keep going straight;
  • the feedback information is obtained by executing the target action; where the feedback information includes the driving information of the autonomous vehicle after the target action is executed, the driving information of the autonomous vehicle at the next moment, and the obstacles of each lane within the autonomous vehicle's perception range The movement information of the object at the next moment; when the target action is a lane change, the feedback information also includes: the ratio of the time to perform the target action to the historical average time, and the lane where the autonomous vehicle is located after the lane change and before the lane change The change in the degree of sparseness and denseness of the lane; where the historical average time includes the average time for the autonomous vehicle to perform similar actions in the preset historical time period;
  • calculating the reward value corresponding to the target action according to the feedback information includes:
  • the reward value is calculated according to the driving information of the autonomous vehicle after executing the target action.
  • calculating the reward value corresponding to the target action according to the feedback information includes:
  • the ratio of the time to execute the target action to the historical average time, and the degree of sparseness and density of the lane where the autonomous vehicle is located after the lane change and the lane before the lane change Calculate the return value.
  • an embodiment of the present application provides a training device for a control strategy, including:
  • the first acquisition module is used to perform step A: acquire the quaternion information of a preset number of historical moments, where the quaternion information of the historical moment corresponds to the vehicle condition at the historical moment, including: the characteristics of the historical moment, the historical moment The target action of the autonomous vehicle, the return value corresponding to the target action at the historical moment, and the characteristics of the next moment in the historical moment.
  • the characteristics of the historical moment include the local neighbor characteristics and global statistics of the autonomous vehicle at the historical moment
  • the features of the next moment of the historical moment include local neighbor features and global statistical features of the autonomous vehicle at the next moment of the historical moment;
  • the update module is configured to perform step B: According to the quadruple information of at least one first historical moment, the extended quadruple information of the at least one first historical moment, and the quadruple information of at least one second historical moment, The current control strategy is updated to obtain the control strategy at the next moment;
  • step A and step B reach the preset number of times, or the step A and step B are executed repeatedly until the updated control strategy meets the preset condition and stop; the step A and step B are executed repeatedly
  • the control strategy finally obtained is used for the automatic lane changing device to obtain the target action instruction when the automatic lane changing method is executed;
  • the quaternion information of the at least one first historical moment is the quaternion information of the quaternion information of the preset number of historical moments.
  • the target action of the historical moment is the quaternion information of the historical moment corresponding to the lane change;
  • the at least one The quaternion information of the second historical moment is the quaternion information of other historical moments except the quaternion information of the at least one first historical moment among the quaternion information of the preset number of historical moments;
  • the extended quad information at the first historical moment corresponds to the extended vehicle condition at the first historical moment, and the extended vehicle condition at the first historical moment is obtained by processing the vehicle condition at the first historical moment on symmetric rules and monotonic rules.
  • the update module includes:
  • a generating unit configured to use the first four-tuple information of the at least one first historical moment, the extended four-tuple information of the at least one first historical moment, and the first one in the four-tuple information of the at least one second historical moment Four-tuple information to generate the target value corresponding to the l-th four-tuple information; where l is a positive integer not greater than q, and q is the four-tuple information of the at least one first historical moment, the The extended quadruple information of at least one first historical moment, and the total number of quadruple information included in the quad information of the at least one second historical moment;
  • the update unit is used to iteratively update the parameter ⁇ in the preset function containing the target value corresponding to the l-th quaternion information by using the gradient descent method;
  • the replacement unit is used to replace the parameter ⁇ after iterative update with the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the device further includes:
  • the first calculation module is used to calculate the local neighbor characteristics of the autonomous vehicle at the historical moment based on the driving information of the autonomous vehicle and the movement information of the obstacles in each lane within the autonomous vehicle's perception range for each historical moment And global statistical characteristics;
  • the second acquisition module is configured to acquire the target action instruction at the historical moment according to the local neighbor characteristics, global statistical features and the control strategy of the historical moment, where the target action instruction is used to instruct the autonomous vehicle to perform a target action,
  • the target action includes at least two types: changing lanes or keeping straight;
  • the feedback module is used to obtain feedback information by executing the target action; where the feedback information includes driving information of the autonomous vehicle after executing the target action, the driving information of the autonomous vehicle at the next moment, and the autonomous vehicle's perception range The movement information of obstacles in each lane in the next moment; when the target action is a lane change, the feedback information also includes: the ratio of the time to execute the target action to the historical average time, and the position of the autonomous vehicle after the lane change The change in the degree of sparseness and density between the lane and the lane before the lane change; among them, the historical average time includes the average time for the autonomous vehicle to perform similar actions in the preset historical time period;
  • the second calculation module is configured to calculate the return value corresponding to the target action according to the feedback information, and the local neighbor characteristics and global traffic statistics characteristics of the autonomous vehicle at the next moment in the historical moment;
  • the storage module is used to store the quadruple information of the historical moment.
  • the second calculation module is specifically configured to:
  • the reward value is calculated according to the driving information of the autonomous vehicle after executing the target action.
  • the second calculation module is specifically configured to:
  • the ratio of the time to execute the target action to the historical average time, and the degree of sparseness and density of the lane where the autonomous vehicle is located after the lane change and the lane before the lane change Calculate the return value.
  • an embodiment of the present application provides a training device for a control strategy, including: a processor and a memory;
  • the memory is used to store program instructions
  • the processor is used to call and execute the program instructions stored in the memory.
  • the training device for the control strategy is used to execute any implementation of the seventh aspect. method.
  • an embodiment of the present application provides a computer-readable storage medium that stores instructions in the computer-readable storage medium.
  • the instructions When the instructions are run on a computer, the computer executes any implementation of the seventh aspect. Methods.
  • an embodiment of the present application provides a program, which is used to execute the method described in any implementation manner of the seventh aspect when the program is executed by a processor.
  • embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in any implementation manner of the seventh aspect.
  • an embodiment of the present application provides a chip that includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface and executes any implementation of the first aspect or the seventh aspect. Way described method.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the foregoing first The method described in any implementation of one aspect or the seventh aspect.
  • an embodiment of the present application provides an electronic device that includes the automatic lane changing device described in any implementation manner of the second aspect or the third aspect.
  • an embodiment of the present application provides an electronic device that includes the control strategy training device described in any implementation manner of the second aspect or the third aspect.
  • Figure 1 is a schematic diagram of a system architecture provided by an embodiment of the application.
  • Fig. 2 is a functional block diagram of a vehicle 100 provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the structure of the computer system in FIG. 2;
  • FIG. 4 is a schematic diagram of a chip hardware structure provided by an embodiment of the application.
  • Figure 5 is a schematic diagram of an operating environment provided by an embodiment of the application.
  • Fig. 6 is a schematic diagram 1 of the symmetry principle provided by an embodiment of the application.
  • FIG. 7 is a second schematic diagram of the symmetry principle provided by an embodiment of this application.
  • Figure 8 is a schematic diagram of the monotonic principle provided by an embodiment of the application.
  • FIG. 9 is a schematic flowchart of a training method of a control strategy provided by an embodiment of the application.
  • FIG. 10 is a schematic flowchart of a method for training a control strategy provided by another embodiment of the application.
  • FIG. 11 is a schematic flowchart of an automatic lane changing method provided by an embodiment of this application.
  • FIG. 12 is a schematic flowchart of an automatic lane changing method provided by another embodiment of this application.
  • FIG. 13 is a schematic diagram of training data provided by an embodiment of the application.
  • FIG. 14 is a schematic structural diagram of an automatic lane changing device provided by an embodiment of this application.
  • 15 is a schematic structural diagram of an automatic lane changing device provided by another embodiment of the application.
  • Figure 16 is a conceptual partial view of a computer program product provided by an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of a training device for a control strategy provided by an embodiment of this application.
  • FIG. 18 is a schematic structural diagram of a training device for a control strategy provided by another embodiment of the application.
  • the automatic lane changing method, device, and storage medium provided in the embodiments of the present application can be applied to a lane changing scene of an automatic driving vehicle.
  • the automatic lane changing method, device, and storage medium provided in the embodiments of the present application can be applied to the A scene and the B scene. The following briefly introduces the A scene and the B scene respectively.
  • the autonomous vehicle needs to change to the corresponding target lane before reaching the ramp or intersection in order to complete the driving task. For example, if an autonomous vehicle is driving in the leftmost lane of the road, and 500 meters ahead is an intersection, the autonomous vehicle needs to turn right at the intersection in order to reach the destination, and needs to change to the rightmost lane before reaching the intersection .
  • Figure 1 is a schematic diagram of a system architecture provided by an embodiment of the application.
  • the system architecture 1000 provided by the embodiment of the present application may include: a training device 1001 and an execution device 1002.
  • the training device 1001 is used to train the control strategy according to the training method of the control strategy provided in the embodiment of the application;
  • the execution device 1002 is used to determine the target using the control strategy trained by the training device 1001 according to the automatic lane changing method provided in the embodiment of the application. Action;
  • the execution device 1002 can also be used to train the control strategy in real time, or to train the control strategy every preset duration.
  • the execution subject of the training method for executing the control strategy may be the training device 1001 described above, or the training device of the control strategy in the training device 1001 described above.
  • the training device for the control strategy provided in the embodiment of the present application may be implemented by software and/or hardware.
  • the execution subject of the automatic lane changing method may be the above-mentioned execution device 1002, or may be the automatic lane-changing device in the above-mentioned execution device 1002.
  • the automatic lane changing device provided in the embodiment of the present application may be implemented by software and/or hardware.
  • the training device 1001 provided in the embodiment of the present application may include, but is not limited to: a model training platform device.
  • the execution device 1002 provided in the embodiment of the present application may include, but is not limited to: an automatic driving vehicle, or a control device in an automatic driving vehicle.
  • Fig. 2 is a functional block diagram of a vehicle 100 provided by an embodiment of the present application.
  • the vehicle 100 is configured in a fully or partially autonomous driving mode.
  • the vehicle 100 can also determine the current state of the vehicle and its surrounding environment through human operations when it is in the automatic driving mode, and determine the possible behavior of at least one other vehicle in the surrounding environment , And determine the confidence level corresponding to the possibility of the other vehicle performing possible behaviors, and control the vehicle 100 based on the determined information.
  • the vehicle 100 can be placed to operate without human interaction.
  • the vehicle 100 may include various subsystems, such as a travel system 102, a sensor system 104, a control system 106, one or more peripheral devices 108 and a power supply 110, a computer system 112, and a user interface 116.
  • the vehicle 100 may include more or fewer subsystems, and each subsystem may include multiple elements.
  • each of the subsystems and elements of the vehicle 100 may be wired or wirelessly interconnected.
  • the travel system 102 may include components that provide power movement for the vehicle 100.
  • the travel system 102 may include an engine 118, an energy source 119, a transmission 120, and wheels/tires 121.
  • the engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine composed of a gasoline engine and an electric motor, or a hybrid engine composed of an internal combustion engine and an air compression engine.
  • the engine 118 converts the energy source 119 into mechanical energy.
  • Examples of energy sources 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity.
  • the energy source 119 may also provide energy for other systems of the vehicle 100.
  • the transmission device 120 can transmit mechanical power from the engine 118 to the wheels 121.
  • the transmission device 120 may include a gearbox, a differential, and a drive shaft.
  • the transmission device 120 may also include other devices, such as a clutch.
  • the drive shaft may include one or more shafts that can be coupled to one or more wheels 121.
  • the sensor system 104 may include several sensors that sense information about the environment around the vehicle 100.
  • the sensor system 104 may include a positioning system 122 (the positioning system may be a GPS system, a Beidou system or other positioning systems), an inertial measurement unit (IMU) 124, a radar 126, a laser rangefinder 128, and Camera 130.
  • the sensor system 104 may also include sensors of the internal system of the monitored vehicle 100 (for example, an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors can be used to detect objects and their corresponding characteristics (position, shape, direction, speed, etc.). Such detection and identification are key functions for the safe operation of the autonomous vehicle 100.
  • the positioning system 122 can be used to estimate the geographic location of the vehicle 100.
  • the IMU 124 is used to sense changes in the position and orientation of the vehicle 100 based on inertial acceleration.
  • the IMU 124 may be a combination of an accelerometer and a gyroscope.
  • the radar 126 may use radio signals to sense objects in the surrounding environment of the vehicle 100. In some embodiments, in addition to sensing the object, the radar 126 may also be used to sense the speed and/or direction of the object.
  • the laser rangefinder 128 can use laser light to sense objects in the environment where the vehicle 100 is located.
  • the laser rangefinder 128 may include one or more laser sources, laser scanners, and one or more detectors, as well as other system components.
  • the camera 130 may be used to capture multiple images of the surrounding environment of the vehicle 100.
  • the camera 130 may be a still camera or a video camera.
  • the control system 106 controls the operation of the vehicle 100 and its components.
  • the control system 106 may include various components, including a steering system 132, a throttle 134, a braking unit 136, a sensor fusion algorithm 138, a computer vision system 140, a route control system 142, and an obstacle avoidance system 144.
  • the steering system 132 is operable to adjust the forward direction of the vehicle 100.
  • it may be a steering wheel system in one embodiment.
  • the throttle 134 is used to control the operating speed of the engine 118 and thereby control the speed of the vehicle 100.
  • the braking unit 136 is used to control the vehicle 100 to decelerate.
  • the braking unit 136 may use friction to slow down the wheels 121.
  • the braking unit 136 may convert the kinetic energy of the wheels 121 into electric current.
  • the braking unit 136 may also take other forms to slow down the rotation speed of the wheels 121 to control the speed of the vehicle 100.
  • the computer vision system 140 may be operable to process and analyze the images captured by the camera 130 in order to identify objects and/or features in the surrounding environment of the vehicle 100.
  • the objects and/or features may include traffic signals, road boundaries and obstacles.
  • the computer vision system 140 may use object recognition algorithms, structure from motion (SFM) algorithms, video tracking, and other computer vision technologies.
  • SFM structure from motion
  • the computer vision system 140 may be used to map the environment, track objects, estimate the speed of objects, and so on.
  • the route control system 142 is used to determine the travel route of the vehicle 100.
  • the route control system 142 may combine data from the sensor 138, the global positioning system (GPS) 122, and one or more predetermined maps to determine the driving route for the vehicle 100.
  • GPS global positioning system
  • the obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise surpass potential obstacles in the environment of the vehicle 100.
  • control system 106 may add or alternatively include components other than those shown and described. Alternatively, a part of the components shown above may be reduced.
  • the vehicle 100 interacts with external sensors, other vehicles, other computer systems, or users through peripheral devices 108.
  • the peripheral device 108 may include a wireless communication system 146, an onboard computer 148, a microphone 150 and/or a speaker 152.
  • the peripheral device 108 provides a means for the user of the vehicle 100 to interact with the user interface 116.
  • the onboard computer 148 may provide information to the user of the vehicle 100.
  • the user interface 116 can also operate the onboard computer 148 to receive user input.
  • the on-board computer 148 can be operated through a touch screen.
  • the peripheral device 108 may provide a means for the vehicle 100 to communicate with other devices located in the vehicle.
  • the microphone 150 may receive audio (eg, voice commands or other audio input) from a user of the vehicle 100.
  • the speaker 152 may output audio to the user of the vehicle 100.
  • the wireless communication system 146 may wirelessly communicate with one or more devices directly or via a communication network.
  • the wireless communication system 146 may use 3G cellular communication, such as code division multiple access (CDMA), EVD0, global system for mobile communications (GSM)/general packet radio service (general packet radio) service, GPRS), or 4G cellular communication, such as LTE. Or 5G cellular communication.
  • the wireless communication system 146 may use wireless-fidelity (wireless-fidelity, WiFi) to communicate with a wireless local area network (WLAN).
  • WLAN wireless local area network
  • the wireless communication system 146 may directly communicate with the device using an infrared link, Bluetooth, or ZigBee protocol. Other wireless protocols, such as various vehicle communication systems.
  • the wireless communication system 146 may include one or more dedicated short-range communications (DSRC) devices, which may include vehicles and/or roadside stations. Public and/or private data communications.
  • DSRC dedicated short-range communications
  • the power supply 110 may provide power to various components of the vehicle 100.
  • the power source 110 may be a rechargeable lithium ion or lead-acid battery.
  • One or more battery packs of such batteries may be configured as a power source to provide power to various components of the vehicle 100.
  • the power source 110 and the energy source 119 may be implemented together, such as in some all-electric vehicles.
  • the computer system 112 may include at least one processor 113 that executes instructions 115 stored in a non-transitory computer readable medium such as a data storage device 114.
  • the computer system 112 may also be multiple computing devices that control individual components or subsystems of the vehicle 100 in a distributed manner.
  • the processor 113 may be any conventional processor, such as a commercially available central processing unit (CPU). Alternatively, the processor may be a dedicated device such as an application specific integrated circuit (ASIC) or other hardware-based processor.
  • FIG. 1B functionally illustrates the processor, memory, and other elements of the computer system 112 in the same block, those of ordinary skill in the art should understand that the processor, computer, or memory may actually include Multiple processors, computers, or memories that are not stored in the same physical enclosure.
  • the memory may be a hard disk drive or other storage medium located in a housing other than the computer. Therefore, a reference to a processor or computer will be understood to include a reference to a collection of processors or computers or memories that may or may not operate in parallel. Rather than using a single processor to perform the steps described here, some components such as steering components and deceleration components may each have its own processor that only performs calculations related to component-specific functions .
  • the processor may be located away from the vehicle and wirelessly communicate with the vehicle.
  • some of the processes described herein are executed on a processor disposed in the vehicle and others are executed by a remote processor, including taking the necessary steps to perform a single manipulation.
  • the data storage device 114 may include instructions 115 (eg, program logic), which may be executed by the processor 113 to perform various functions of the vehicle 100, including those functions described above.
  • the data storage device 114 may also contain additional instructions, including sending data to, receiving data from, interacting with, and/or performing data on one or more of the propulsion system 102, the sensor system 104, the control system 106, and the peripheral device 108. Control instructions.
  • the data storage device 114 may also store data, such as road maps, route information, the location, direction, and speed of the vehicle, and other such vehicle data, as well as other information. Such information may be used by the vehicle 100 and the computer system 112 during the operation of the vehicle 100 in autonomous, semi-autonomous, and/or manual modes.
  • the user interface 116 is used to provide information to or receive information from a user of the vehicle 100.
  • the user interface 116 may include one or more input/output devices in the set of peripheral devices 108, such as a wireless communication system 146, an in-vehicle computer 148, a microphone 150, and a speaker 152.
  • the computer system 112 may control the functions of the vehicle 100 based on inputs received from various subsystems (eg, travel system 102, sensor system 104, and control system 106) and from the user interface 116. For example, the computer system 112 may utilize input from the control system 106 in order to control the steering unit 132 to avoid obstacles detected by the sensor system 104 and the obstacle avoidance system 144. In some embodiments, the computer system 112 is operable to provide control of many aspects of the vehicle 100 and its subsystems.
  • various subsystems eg, travel system 102, sensor system 104, and control system 106
  • the computer system 112 may utilize input from the control system 106 in order to control the steering unit 132 to avoid obstacles detected by the sensor system 104 and the obstacle avoidance system 144.
  • the computer system 112 is operable to provide control of many aspects of the vehicle 100 and its subsystems.
  • one or more of these components described above may be installed or associated with the vehicle 100 separately.
  • the data storage device 114 may exist partially or completely separately from the vehicle 100.
  • the aforementioned components may be communicatively coupled together in a wired and/or wireless manner.
  • FIG. 2 should not be construed as a limitation to the embodiments of the present application.
  • An autonomous vehicle traveling on a road can recognize objects in its surrounding environment to determine its own adjustment to the current speed.
  • the object may be other vehicles, traffic control equipment, or other types of objects.
  • each identified obstacle can be considered independently, and based on the respective characteristics of each obstacle, such as its current speed, acceleration, distance from the vehicle, etc., to determine the adjustment of the self-driving car (self-driving car) speed.
  • the self-driving car vehicle 100 or the computing equipment associated with the self-driving car vehicle 100 may be based on the characteristics of the identified obstacle and The state of the surrounding environment (for example, traffic, rain, ice on the road, etc.) predicts the behavior of the identified obstacle.
  • the state of the surrounding environment for example, traffic, rain, ice on the road, etc.
  • each of the identified obstacles depends on each other's behavior, so all the identified obstacles can also be considered together to predict the behavior of a single identified obstacle.
  • the vehicle 100 can adjust its speed based on the predicted behavior of the identified obstacle.
  • the self-driving car can determine what state the vehicle will need to adjust to (for example, accelerate, decelerate, or stop) based on the predicted behavior of the obstacle.
  • other factors may also be considered to determine the speed of the vehicle 100, such as the lateral position of the vehicle 100 on the road on which it is traveling, the curvature of the road, the proximity of static and dynamic objects, and so on.
  • the computing device may also provide instructions to modify the steering angle of the vehicle 100 so that the autonomous vehicle follows a given trajectory and/or maintains obstacles near the autonomous vehicle ( For example, the safe lateral and longitudinal distances of vehicles in adjacent lanes on the road.
  • the above-mentioned vehicle 100 can be a car, truck, motorcycle, bus, boat, airplane, helicopter, lawn mower, recreational vehicle, playground vehicle, construction equipment, tram, golf cart, train, and trolley, etc.
  • the embodiments of the invention are not particularly limited.
  • FIG. 3 is a schematic diagram of the structure of the computer system 112 in FIG. 2.
  • the computer system 112 includes a processor 113, and the processor 113 is coupled to a system bus 105.
  • the processor 113 may be one or more processors, where each processor may include one or more processor cores.
  • a display adapter (video adapter) 107 can drive the display 109, and the display 109 is coupled to the system bus 105.
  • the system bus 105 is coupled to an input/output (I/O) bus through a bus bridge 111.
  • the I/O interface 115 is coupled to the I/O bus.
  • the I/O interface 115 communicates with a variety of I/O devices, such as an input device 117 (such as a keyboard, a mouse, a touch screen, etc.), a media tray 121 (such as a CD-ROM, a multimedia interface, etc.).
  • the transceiver 123 (which can send and/or receive radio communication signals), the camera 155 (which can capture still and dynamic digital video images) and an external USB interface 125.
  • the interface connected to the I/O interface 115 may be a universal serial bus (USB) interface.
  • USB universal serial bus
  • the processor 113 may be any conventional processor, including a reduced instruction set computing (“RISC”) processor, a complex instruction set computing (“CISC”) processor, or a combination of the foregoing.
  • the processor may be a dedicated device such as an application specific integrated circuit (“ASIC").
  • the processor 113 may be a neural network processor or a combination of a neural network processor and the foregoing traditional processors.
  • the computer system may be located far away from the autonomous vehicle and may communicate wirelessly with the autonomous vehicle.
  • some of the processes described herein are executed on a processor provided in an autonomous vehicle, and others are executed by a remote processor, including taking actions required to perform a single manipulation.
  • the computer system 112 can communicate with the software deployment server 149 through the network interface 129.
  • the network interface 129 is a hardware network interface, such as a network card.
  • the network 127 may be an external network, such as the Internet, or an internal network, such as an Ethernet or a virtual private network (VPN).
  • the network 127 may also be a wireless network, such as a WiFi network, a cellular network, and so on.
  • the hard disk drive interface 131 is coupled to the system bus 105.
  • the hard disk drive interface 131 and the hard disk drive 133 are connected.
  • the system memory 135 is coupled to the system bus 105.
  • the software running in the system memory 135 may include an operating system (OS) 137 and application programs 143 of the computer system 112.
  • OS operating system
  • application programs 143 of the computer system 112.
  • the operating system includes Shell 139 and kernel 141.
  • Shell 139 is an interface between the user and the kernel of the operating system.
  • the shell is the outermost layer of the operating system. The shell manages the interaction between the user and the operating system: waiting for the user's input, explaining the user's input to the operating system, and processing the output of various operating systems.
  • the kernel 141 is composed of those parts of the operating system for managing memory, files, peripherals, and system resources. Directly interacting with hardware, the kernel 141 of the operating system usually runs processes and provides inter-process communication, and provides CPU time slice management, interrupts, memory management, IO management, and so on.
  • Application programs 141 include programs that control auto-driving cars, such as programs that manage the interaction between autonomous vehicles and road obstacles, programs that control the route or speed of autonomous vehicles, and programs that control interaction between autonomous vehicles and other autonomous vehicles on the road. .
  • the application program 141 also exists on the system of a software deployment server (deploying server) 149. In one embodiment, when the application program 141 needs to be executed, the computer system may download the application program 143 from the deploying server149.
  • deploying server software deployment server
  • the sensor 153 is associated with the computer system.
  • the sensor 153 is used to detect the environment around the computer system 112.
  • the sensor 153 can detect animals, cars, obstacles, and crosswalks.
  • the sensor can also detect the environment around objects such as animals, cars, obstacles, and crosswalks, such as the environment around the animals, for example, when the animals appear around them. Other animals, weather conditions, the brightness of the surrounding environment, etc.
  • the sensor may be a camera, an infrared sensor, a chemical detector, a microphone, etc.
  • FIG. 4 is a schematic diagram of a chip hardware structure provided by an embodiment of the application.
  • the chip may include a neural network processor 30.
  • the chip can be set in the execution device 1002 shown in FIG. 1 to complete the automatic lane changing method provided in the application embodiment.
  • the chip can also be set in the training device 1001 shown in FIG. 1 to complete the training method of the control strategy provided in the application embodiment.
  • the neural network processor 30 may be any processor suitable for large-scale XOR operation processing such as NPU, TPU, or GPU.
  • NPU can be mounted on a host CPU (host CPU) as a coprocessor, and the host CPU assigns tasks to it.
  • the core part of the NPU is the arithmetic circuit 303.
  • the arithmetic circuit 303 is controlled by the controller 304 to extract matrix data in the memory (301 and 302) and perform multiplication and addition operations.
  • the arithmetic circuit 303 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.
  • the arithmetic circuit 303 fetches the weight data of matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303.
  • the arithmetic circuit 303 fetches the input data of matrix A from the input memory 301, and performs matrix operations according to the input data of matrix A and the weight data of matrix B, and the partial result or final result of the obtained matrix is stored in an accumulator 308 .
  • the unified memory 306 is used to store input data and output data.
  • the weight data is directly transferred to the weight memory 302 through the direct memory access controller (DMAC) 305 of the storage unit.
  • the input data is also transferred to the unified memory 306 through the DMAC.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 310 is used for the interaction between the DMAC and the instruction fetch buffer 309; the bus interface unit 301 is also used for the instruction fetch buffer 309 to obtain instructions from the external memory; the bus interface unit 301 also The storage unit access controller 305 obtains the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 306, or to transfer the weight data to the weight memory 302, or to transfer the input data to the input memory 301.
  • the vector calculation unit 307 has multiple arithmetic processing units, if necessary, further processing the output of the arithmetic circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on.
  • the vector calculation unit 307 is mainly used for the calculation of non-convolutional layers or fully connected layers (FC) in the neural network, and can specifically process: Pooling (pooling), normalization (normalization) and other calculations.
  • the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 307 generates a normalized value, a combined value, or both.
  • the vector calculation unit 307 stores the processed vector to the unified memory 306. In some implementations, the vector processed by the vector calculation unit 307 can be used as the activation input of the arithmetic circuit 303.
  • the instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304;
  • the unified memory 306, the input memory 301, the weight memory 302, and the fetch memory 309 are all On-Chip memories.
  • the external memory is independent of the NPU hardware architecture.
  • FIG. 5 is a schematic diagram of an operating environment provided by an embodiment of the application.
  • the cloud service center may receive information (such as data collected by vehicle sensors or other information) from the autonomous vehicles 510 and 512 in its operating environment 500 via a network 502 (such as a wireless communication network).
  • a network 502 such as a wireless communication network
  • the cloud service center 520 may receive the driving information (such as driving speed and/or driving position, etc.) and the autonomous driving vehicle of the autonomous driving vehicle 510 at any time from the autonomous driving vehicle 510 via the network 502 (such as a wireless communication network) 510 senses the driving information of other vehicles within the range.
  • the driving information such as driving speed and/or driving position, etc.
  • the autonomous driving vehicle of the autonomous driving vehicle 510 at any time from the autonomous driving vehicle 510 via the network 502 (such as a wireless communication network) 510 senses the driving information of other vehicles within the range.
  • the cloud service center 520 can run its stored programs related to controlling the auto-driving of the car, so as to realize the control of the auto-driving vehicles 510 and 512.
  • Programs related to controlling auto-driving can be programs that manage the interaction between autonomous vehicles and obstacles on the road, programs that control the route or speed of autonomous vehicles, and programs that control interaction between autonomous vehicles and other autonomous vehicles on the road.
  • the network 502 provides parts of the map to the autonomous vehicles 510 and 512.
  • multiple cloud service centers can receive, confirm, combine, and/or send information reports.
  • information reports and/or sensor data can also be sent between autonomous vehicles.
  • the cloud service center 520 may send to the autonomous vehicle (or autonomous vehicle) suggested solutions based on possible driving conditions in the environment (eg, inform the obstacle ahead and tell how to avoid it). For example, the cloud service center 520 may assist the vehicle in determining how to proceed when facing a specific obstacle in the environment.
  • the cloud service center 520 may send a response to the autonomous vehicle indicating how the vehicle should travel in a given scene. For example, the cloud service center can confirm the presence of a temporary stop sign in front of the road based on the collected sensor data, and also determine that the lane is closed due to the application of the "lane closed" sign and the sensor data of construction vehicles on the lane. .
  • the cloud service center 520 may send a suggested operation mode for the automatic driving vehicle to pass the obstacle (for example, instructing the vehicle to change lanes on another road).
  • the operation steps used for the autonomous driving vehicle can be added to the driving information map.
  • this information can be sent to other vehicles in the area that may encounter the same obstacle, so as to assist other vehicles not only to recognize the closed lane but also how to pass.
  • autonomous driving vehicle 510 and/or 512 may autonomously control driving during operation, or may not need the control of the cloud service center 520.
  • the local neighbor features at any time involved in the embodiments of the present application are used to represent the motion state information (such as relative distance and relative speed) of a specific neighbor obstacle of the autonomous driving vehicle relative to the autonomous driving vehicle.
  • the specific neighbor obstacles may include, but are not limited to, at least one of the following: front and rear obstacles adjacent to the autonomous vehicle in the lane where the autonomous vehicle is located, and autonomous driving on the adjacent left lane of the lane where the autonomous vehicle is located Front and rear obstacles adjacent to the vehicle, and front and rear obstacles adjacent to the autonomous vehicle in the adjacent right lane of the lane where the autonomous vehicle is located.
  • the front and rear obstacles adjacent to the autonomous vehicle in the adjacent left lane of the lane where the autonomous vehicle is located may be the default value relative to the motion state information of the autonomous vehicle; and /Or, when the autonomous vehicle is located on the right lane, the front and rear obstacles adjacent to the autonomous vehicle in the adjacent right lane of the lane where the autonomous vehicle is located can be the default value relative to the movement state information of the autonomous vehicle.
  • the obstacles involved in the embodiments of the present application may be dynamically moving obstacles or static obstacles.
  • obstacles may include, but are not limited to, at least one of the following: self-driving vehicles, non-self-driving motor vehicles, people, and objects.
  • the specific neighbor obstacle is a static obstacle
  • the relative distance of the specific neighbor obstacle to the autonomous vehicle may be the distance between the neighbor obstacle and the autonomous vehicle
  • the specific neighbor obstacle may be the moving speed of the autonomous vehicle.
  • the local neighbor characteristics of the autonomous vehicle at any time may include but are not limited to: the relative speed of the front obstacle adjacent to the autonomous vehicle in the lane where the autonomous vehicle is located relative to the autonomous vehicle And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle in the lane of the autonomous vehicle relative to the autonomous vehicle And relative distance The relative speed of the front obstacle adjacent to the autonomous vehicle in the adjacent left lane of the autonomous vehicle relative to the autonomous vehicle And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle in the adjacent left lane of the autonomous vehicle relative to the autonomous vehicle And relative distance The relative speed of the front obstacle adjacent to the autonomous vehicle in the adjacent right lane of the autonomous vehicle's lane relative to the autonomous vehicle And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle in the adjacent right lane of the autonomous vehicle relative to the autonomous vehicle And relative distance
  • the local neighbor feature of the autonomous vehicle at any time may also include: the position information flag between the navigation target lane and the lane where the autonomous vehicle is located, and the distance between the autonomous vehicle and the next intersection in the driving direction dist2goal; where flag ⁇ ⁇ 0,-1,1 ⁇ , where flag equals 0 means that the autonomous vehicle is in the navigation target lane, flag is equal to -1 means that the navigation target lane is on the left side of the autonomous vehicle's lane, and flag is equal to 1 means that the navigation target lane is on the right side of the lane where the autonomous vehicle is located.
  • the global statistical features at any time involved in the embodiments of the present application are used to indicate the perception range (that is, the range that can be detected by the sensor of the autonomous driving vehicle, such as the range within the preset interval of the autonomous driving vehicle) of obstacles in each lane The degree of sparseness and density.
  • the global statistical characteristics at any time may include, but are not limited to, at least one of the following: the average travel speed and the average interval of all obstacles in each lane within the sensing range. For example, if the average interval of all obstacles in a certain lane is less than the preset interval, it means that the obstacles in the lane are dense; if the average interval of all obstacles in a certain lane is greater than or equal to the preset interval, it indicates the obstacles of the lane Things are sparse.
  • the global statistical characteristics at any time may include, but are not limited to: the average interval gap L between two adjacent obstacles in all lanes on the left side of the lane where the autonomous vehicle is located, and the gap L between adjacent obstacles in the lane where the autonomous vehicle is located.
  • the average gap between two obstacles gap M The average gap between two adjacent obstacles in all lanes on the right side of the lane where the autonomous vehicle is located, gap R , and the obstacles in all lanes on the left side of the lane where the autonomous vehicle is located
  • the average driving speed V L The average driving speed V M of obstacles in the lane where the autonomous vehicle is located, and the average driving speed V R of obstacles in all lanes on the right side of the lane where the autonomous vehicle is located.
  • the local neighbor features and global statistical features of the autonomous vehicle at any time involved in the embodiments of the present application may be discretized features, which can meet the requirements of low-speed dense scenes with small discrete granularity and high-speed sparse scenes with large discrete granularity.
  • discretized features which can meet the requirements of low-speed dense scenes with small discrete granularity and high-speed sparse scenes with large discrete granularity.
  • the local relative distance characteristic accuracy is 0.01, and the local relative speed characteristic accuracy is 0.05. For example, if a certain local relative distance feature is 0.1123, it will be 0.11 after discretization; if a certain local relative speed feature is 0.276, it will be discretized to 0.25.
  • the target action instruction involved in the embodiment of the present application is used to instruct the autonomous driving vehicle to perform the target action.
  • the target action may include at least but not limited to the following two categories: changing lanes or staying straight, where the lane changing may include: changing lanes to the left adjacent lane or turning to the right adjacent lane.
  • the four-tuple information (s, a, r, s') at any time involved in the embodiments of the present application corresponds to the vehicle condition at that time, and may include: the feature s at that time, the target action a of the autonomous vehicle at that time, and the The reward value r corresponding to the target action at time and the feature s'at the next time at that time; where the feature s at this time can include: the local neighbor feature s l and the global statistical feature s g of the autonomous vehicle at that time, The feature s'at the next time at this time may include: a local neighbor feature s l' and a global statistical feature sg ' of the autonomous vehicle at the next time.
  • FIG. 6 is a schematic diagram of the symmetry principle provided by an embodiment of the application
  • FIG. 7 is a schematic diagram of the symmetry principle provided by an embodiment of the application.
  • the symmetry rule involved in the embodiment of the application refers to automatic
  • the lane where the driving vehicle is located is the axis, and the positions of obstacles on all lanes on the left and right sides of the lane where the autonomous vehicle is located are symmetrically transformed.
  • Fig. 8 is a schematic diagram of the monotonic principle provided by an embodiment of the application.
  • the monotonic principle involved in the embodiment of the application refers to one of the obstacles of the front and rear neighbors of the autonomous vehicle on the target lane where the autonomous vehicle changes lanes. And/or, the distance between the front and rear neighbor obstacles of the autonomous vehicle on the non-target lane changes less than the preset distance range. For example, the next next car in the target lane of an autonomous vehicle changing lane is A, the next next car in the target lane is D, the next next car in the non-target lane is B, and the next next car in the non-target lane is C.
  • the monotonic principle can include but is not limited to the following operations:
  • Operation 1 Vehicle A moves backward by a preset distance 1 or its speed decreases by a preset value 1.
  • Operation 3 Vehicle B moves forward or backward the preset distance 3, or the speed increases or decreases the preset value 3;
  • Operation 4 The vehicle C moves forward or backward a preset distance 4, or the speed increases or decreases by a preset value 4.
  • the extended quadruple information at any time involved in the embodiment of the present application corresponds to the extended vehicle condition at that time, and the extended vehicle condition at that time is obtained by processing the vehicle condition at that time by symmetric rules and monotonic rules.
  • the extended quadruple information at any time may include: symmetric quadruple information (s e , a e , r, s e ') and monotonic quad information (s m , a m , r ,s m ');
  • the symmetric quadruple information (s e , a e , r, s e ') at this moment is the quad information (s, a, r, s' ) Is obtained by constructing, s e is the symmetrical feature of s, a e is the symmetrical action of a, and s e'is the symmetrical feature of s';
  • the monotonic quadruple information (s m , a m , r, s m ') is a four-tuple information of the time (s, a, R & lt, s monotonic rules' constructed obtained), m s wherein s is a monotonis
  • s g (gap L ,gap M ,gap R ,V L ,V M ,V R )
  • s e is constructed according to s el and s eg .
  • a equal to 0 means keeping going straight
  • a equal to 1 means changing lanes to the left adjacent lane
  • a equal to 2 means changing lanes to the right adjacent lane.
  • s g in the quadruple information (s, a, r, s') at any time is equal to the monotonic quadruple information (s m , a m , r, s m ') at that time s mg , construct s m based on s ml and s mg .
  • the involved autonomous vehicles, obstacles, lanes and other information can be simulated road information in the training device 1001, or actual roads. Historical data that occurred on the information.
  • the execution device 1002 executes the automatic lane changing method provided in the embodiment of the present application, the involved autonomous vehicles, obstacles, lanes and other information are actual real-time road information.
  • control strategy or control strategy model training side and the control strategy application side:
  • the training method of the control strategy provided in the embodiment of this application involves computer processing, and can be specifically applied to data processing methods such as data training, machine learning, and deep learning.
  • data processing methods such as data training, machine learning, and deep learning.
  • the quaternion information of the present application is used to symbolize and formalize intelligent information modeling, extraction, preprocessing, training, etc., and finally obtain a well-trained control strategy; and the automatic lane changing method provided in the embodiments of this application can use the above training
  • input data such as local neighbor characteristics and global statistical characteristics in the embodiment of this application
  • output data such as the target action indication in the embodiment of this application.
  • control strategy in the automatic lane changing method, can also be updated in real time, or the control strategy can be updated every preset duration.
  • training method and application method of the control strategy provided in the embodiments of this application are inventions based on the same concept, and can also be understood as two parts in a system or two stages of an overall process: such as Control strategy training phase and control strategy application phase.
  • FIG. 9 is a schematic flowchart of a method for training a control strategy provided by an embodiment of the application.
  • the method in this embodiment may be specifically executed by the training device 1001 shown in FIG. 1.
  • the method provided by the embodiment of the present application may include:
  • Step S901 Obtain the four-tuple information of a preset number of historical moments.
  • the quaternion information of a preset number of historical moments is obtained from the database, where the quaternion information of any historical moment corresponds to the vehicle condition of the historical moment, which may include, but is not limited to: the characteristics of the historical moment, the historical moment The target action of the autonomous vehicle (that is, the target action determined according to the corresponding control strategy at the historical moment), the reward value corresponding to the target action at the historical moment, and the characteristics of the next moment in the historical moment.
  • the characteristics of the historical moment may include, but are not limited to: local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the historical moment.
  • the characteristics of the next moment in the historical moment may include, but are not limited to: local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the next moment.
  • the local neighbor feature of the autonomous vehicle at any time involved in the embodiments of the present application is used to indicate the motion state information (such as relative distance and relative speed) of a specific neighbor obstacle of the autonomous vehicle relative to the autonomous vehicle at that time.
  • the specific neighbor obstacles may include, but are not limited to: front and rear obstacles adjacent to the autonomous vehicle in the lane where the autonomous vehicle is located at the moment, and the autonomous vehicle in the adjacent left lane of the lane where the autonomous vehicle is located at that moment.
  • the global statistical features of the autonomous vehicle at any time involved in the embodiments of this application are used to indicate the degree of sparseness and density of obstacles in each lane within the perception range of the autonomous vehicle at that time, for example, all obstacles in each lane are in the Average driving speed and average interval at the time.
  • the global statistical characteristics at any time may include, but are not limited to: the average interval between two obstacles that are adjacent to each other in all lanes on the left side of the lane where the autonomous vehicle is located at that time, and the location of the autonomous vehicle at that time
  • the average interval between two adjacent obstacles in the front and rear of the lane, the average interval between the two adjacent obstacles in all lanes on the right side of the lane where the autonomous vehicle is located at the moment, and the lane where the autonomous vehicle is located at the moment The average travel speed of obstacles in all lanes on the left side, the average travel speed of obstacles in the lane where the autonomous vehicle is at that moment, and the average travel speed of obstacles in all lanes on the right side of the lane where the autonomous vehicle is at that moment.
  • Step S902 According to the quaternion information of at least one first historical moment, the extended quaternion information of at least one first historical moment, and the quaternion information of at least one second historical moment, update the current control strategy to obtain the following One moment control strategy.
  • the quaternion information of at least one first historical moment is the quaternion of the historical moment corresponding to the change of lanes in the quaternion information of the preset number of historical moments. information.
  • the quaternion information of the at least one second historical moment is the quaternion information of other historical moments except the quaternion information of the at least one first historical moment among the quaternion information of the preset number of historical moments.
  • the group information, that is, the target action of the autonomous vehicle at the historical time in the four-tuple information of the preset number of historical moments is the four-tuple information of the historical moment corresponding to the maintenance execution.
  • the quaternion information of a preset number of historical moments can include: quaternion information of historical moment 1 (wherein, the target action of the autonomous vehicle at historical moment 1 is changing lanes), and quaternion of historical moment 2 Information (where the target action of the autonomous vehicle at historical time 2 is to keep going straight), the four-tuple information at historical time 3 (wherein, the target action of the autonomous vehicle at historical time 3 is lane change) and the four of historical time 4 Tuple information (where the target action of the autonomous vehicle at historical time 4 is to keep going straight), then at least one quaternion information of the first historical time may include: quaternion information of historical time 1 and four of historical time 3
  • the tuple information, the quad information of the at least one second historical moment may include: the quad information of the historical moment 2 and the quad information of the historical moment 4.
  • the extended quad information of any first historical moment involved in the embodiment of the present application corresponds to the extended vehicle condition at the first historical moment, and is obtained by processing the vehicle condition at the first historical moment on the symmetric rule and the monotonic rule.
  • the symmetry rule involved in the embodiments of the present application refers to symmetrically transforming the positions of obstacles in all lanes on the left and right sides of the lane where the autonomous vehicle is located, taking the lane where the autonomous vehicle is located as the axis.
  • the monotonic principle involved in the embodiments of the present application refers to an increase in the distance between the obstacles in the front and rear neighbors of the autonomous vehicle on the target lane where the autonomous vehicle changes lanes, and/or the increase in the distance between the autonomous vehicle on the non-target lane The distance between the front and rear neighbor obstacles changes less than the preset distance range.
  • the extended quadruple information of any first historical moment may include: symmetric quadruple information and monotonic quadruple information of the first historical moment.
  • the symmetrical quadruple information of the first historical moment may be obtained by processing the symmetrical principle of the quadruple information of the first historical moment, and the monotonic quadruple information of the first historical moment may be the first historical moment.
  • the quadruple information at the historical moment is processed on the monotonic principle.
  • the parameters in the current control strategy Update to get the control strategy at the next moment (used to determine the target action at the next moment).
  • the quadruple information of at least one first historical moment the extended quadruple information of at least one first historical moment, and the l-th quadruple in the quadruple information of at least one second historical moment Information, generate the target value corresponding to the lth quaternion information; further, use the gradient descent method to iteratively update the parameter ⁇ in the preset function containing the target value corresponding to the lth quaternion information; further, The parameter ⁇ after iterative update is replaced with the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the group information (s l , a l , r l , s l ') can use the following formula to generate the target value y l corresponding to the l-th quaternion information; where l is a positive integer not greater than q, q It is the total number of quadruple information included in the quadruple information of at least one first historical moment, the extended quadruple information of at least one first historical moment, and the quadruple information of at least one second historical moment.
  • the end state refers to the automatic driving of the autonomous vehicle to complete the preset maximum distance or human intervention in the driving of the autonomous vehicle;
  • represents the preset forgetting factor, ⁇ (0,1);
  • Q(s l ', a l , ⁇ ) represents the action value function; It means traversing a l so that Q(s l ', a l , ⁇ ) takes the maximum value;
  • s l ' represents the feature of the next moment in the l-th quadruple information.
  • the target value corresponding to the l-th quaternion information can also be generated by other variations of the above formula or equivalent formula, which is not limited in the embodiment of the present application.
  • the parameter ⁇ in is updated iteratively; among them, Q(s l , a l , ⁇ ) is the action value function corresponding to the l th quaternion information, and s l represents the previous moment in the l th quaternion information Characteristic, a l represents the target action at the previous moment in the l-th quadruple information.
  • the iteratively updated parameter ⁇ is replaced with the parameter ⁇ in the current control strategy, so as to obtain the control strategy at the next moment, so as to be used to determine the target action at the next moment.
  • the current control strategy can also be controlled in other ways.
  • the parameters in is updated to obtain the control strategy at the next moment, which is not limited in the embodiment of the present application.
  • the above-mentioned training device 1001 may execute step S901-step S902 in a loop for a preset number of times, or may execute step S901-step S902 in a loop for multiple times until the updated control strategy meets the preset condition and stop.
  • the control strategy finally obtained by the above training device 1001 can be used when the execution device 1002 executes the automatic lane changing method.
  • the current control strategy of the embodiment of the present application may be a preset initial control strategy; when the training device 1001 does not perform step S902 for the first time, the embodiment of the present application
  • the current control strategy of may be the control strategy obtained after the training device 1001 executed step S902 last time.
  • the quaternion information of a preset number of historical moments is obtained; further, based on the quaternion information of at least one first historical moment, the extended quaternion information of at least one first historical moment, and At least one quaternion information of the second historical moment is updated to obtain the control strategy of the next moment.
  • the current control strategy is updated based on the extended quaternion information of the first historical moment in the quaternion information of the preset number of historical moments.
  • FIG. 10 is a schematic flowchart of a method for training a control strategy provided by another embodiment of the application.
  • the embodiment of the present application introduces the method of generating "quadruple information of historical moments". As shown in FIG. 10, before the above step S901, the method further includes:
  • S1001 for each historical moment, calculate the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the historical moment according to the driving information of the autonomous vehicle and the motion information of the obstacles in each lane within the autonomous vehicle's perception range.
  • the motion information of the obstacle is driving information; when the obstacle is a person, an animal, or a stationary object, the motion information of the obstacle may include the speed and position of the motion. And other related information.
  • the driving information of the autonomous vehicle such as driving speed and/or driving position
  • the perception range of the autonomous vehicle that is, the range that the sensor of the autonomous vehicle can detect, such as distance automatic
  • the movement information of obstacles in each lane within the preset interval of driving the vehicle for example, the speed and/or position of the vehicle, the speed and/or position of people, animals, or stationary objects
  • the local neighbor features of the autonomous vehicle at any historical moment involved in the embodiments of the present application are used to indicate the specific neighbor car of the autonomous vehicle at that historical moment (for example, the front and rear obstacles adjacent to the autonomous vehicle in the lane of the autonomous vehicle)
  • Autonomous vehicle movement status information (such as relative distance and relative speed).
  • the local neighbor feature s l of the autonomous vehicle at any time may include, but is not limited to: the relative speed of the front obstacle adjacent to the autonomous vehicle in the lane where the autonomous vehicle is located relative to the autonomous vehicle And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle in the lane of the autonomous vehicle relative to the autonomous vehicle And relative distance The relative speed of the front obstacle adjacent to the autonomous vehicle in the adjacent left lane of the autonomous vehicle's lane relative to the autonomous vehicle And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle in the adjacent left lane of the autonomous vehicle relative to the autonomous vehicle And relative distance The relative speed of the front obstacle adjacent to the autonomous vehicle in the adjacent right lane of the autonomous vehicle's lane relative to the autonomous vehicle And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle in the adjacent right lane of the autonomous vehicle's lane relative to the autonomous vehicle And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle in the adjacent right lane of the autonomous vehicle's lane relative to the autonomous vehicle And relative distance The relative speed of the rear
  • the global statistical characteristics of the autonomous vehicle involved in the embodiments of this application at any historical moment are used to indicate the sparseness and density of obstacles in each lane within the perception range of the autonomous vehicle, for example, the obstacles in each lane at the historical moment Average driving speed and average interval.
  • the global traffic flow statistical characteristics s g of the autonomous vehicle at any time may include but are not limited to: the average interval gap L between the front and rear obstacles in all lanes on the left side of the lane where the autonomous vehicle is located, and the autonomous vehicle The average interval gap M between two adjacent obstacles in the lane where the autonomous vehicle is located, the average gap between two adjacent obstacles in all lanes on the right side of the lane where the autonomous vehicle is located, gap R The average traveling speed of obstacles in all lanes on the left side V L , the average traveling speed of obstacles in the lane where the autonomous vehicle is located V M, and the average traveling speed of obstacles in all lanes on the right side of the autonomous vehicle’s lane V R.
  • S1002 Acquire a target action instruction at the historical moment according to the local neighbor characteristics, global statistical characteristics, and the control strategy of the historical moment.
  • the target action instruction used to instruct the autonomous vehicle to perform the target action at the historical moment can be obtained.
  • control strategy at any time (for example, the control strategy at the historical time) can be expressed as:
  • s represents the local neighbor features and global statistical features at the moment; a' ⁇ (0,1,2), a'equals 0 means keeping going straight, a'equals 1 means changing lanes to the left adjacent lane, and a'equals 2 means changing lanes to the right adjacent lane.
  • the action a'that maximizes the action value function Q(s, a', ⁇ ) is selected as the target action a at the historical moment.
  • control strategy at this historical moment may also adopt other variants or equivalent formulas of the above formula, which is not limited in the embodiment of the present application.
  • the target action includes at least two types: changing lanes or staying straight, where the lane changing may include: changing lanes to the left adjacent lane or turning to the right adjacent lane.
  • the feedback information may include, but is not limited to: driving information (such as driving speed or driving position, etc.) of the autonomous vehicle after executing the target action, the driving information of the autonomous vehicle at the next moment, and each lane within the autonomous vehicle's perception range
  • the obstacle movement information at the next moment when the target action is a lane change, the feedback information can also include: the ratio of the time to perform the target action to the historical average time, and the lane of the autonomous vehicle after the lane change and the position before the lane change
  • the change in the degree of sparseness and denseness of lanes; among them, the historical average time includes the average time for the autonomous vehicle to perform similar actions (such as lane changing actions) within a preset historical time period (for example, a 500 time window).
  • the change in the degree of sparseness and density between the lane of the autonomous vehicle after the lane change and the lane before the lane change may be based on the autonomous vehicle and other obstacles in the autonomous vehicle's perception range before and after the autonomous vehicle changes lanes (For example, the average distance gap cur between two adjacent obstacles in the lane before the autonomous vehicle changes lanes, the average travel speed V cur of the obstacles in the lane before the autonomous vehicle changes lanes, automatic driving The average distance gap goal between the two adjacent obstacles in the lane after the vehicle changes lanes, the average travel speed of the obstacles in the lane after the autonomous vehicle changes lanes V goal ), and the preset global classification model F 0 of.
  • S1004 Calculate the reward value corresponding to the target action according to the feedback information, and the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the next moment in the historical moment.
  • the next moment of the autonomous vehicle at the historical moment can be calculated.
  • Local neighbor features and global statistical features The specific calculation method may refer to the method of obtaining the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the historical moment in step S401, which will not be repeated in this embodiment of the application.
  • a possible implementation is that when the target action is to keep going straight, the reward value is calculated according to the driving information of the autonomous vehicle after the target action is executed.
  • the reward value is calculated according to the preset function R(s") and the driving information s” (such as the driving speed or the driving position, etc.) after the autonomous vehicle performs the target action.
  • the preset function R(s") V ego ', V ego ' represents the driving speed of the autonomous vehicle after executing the target action; of course, the preset function R(s”) can also be equal to including the autonomous vehicle executing the target action
  • Other functions of the subsequent driving information are not limited in the embodiment of the present application.
  • the target action is a lane change
  • the ratio of the time to execute the target action to the historical average time is calculated.
  • the change in the degree of sparseness and density of the lane before the lane change is calculated.
  • the coefficient K l is determined according to the local return execution time ratio of the target operation history and T is the average time T e.
  • the global reward coefficient K g is determined according to the changes in the sparseness and denseness of the lane where the autonomous vehicle is located after the lane change and the lane before the lane change. When the lane in front of the road is dense, K g >1; when the lane of the autonomous vehicle after changing lanes is sparser than the lane of the autonomous vehicle before changing lanes, K g ⁇ 1. Further, the reward value is calculated according to the driving information after the autonomous vehicle performs the target action, the local reward coefficient K l and the global reward coefficient K g .
  • the reward value is calculated according to the formula c*K l *K g *R(s"); where c represents a preset discount factor (for example, 0.3), and R(s”) represents the target action after the autonomous vehicle
  • c represents a preset discount factor (for example, 0.3)
  • R(s) represents the target action after the autonomous vehicle
  • the preset function of the driving information can also be calculated by other equivalent or modified formulas of the above formula, which is not limited in the embodiment of the present application.
  • the local neighbor feature of the autonomous vehicle at the historical moment may also include: the position information flag between the navigation target lane and the lane where the autonomous vehicle is located, and the distance dist2goal between the autonomous vehicle and the next intersection in the driving direction; Among them, flag ⁇ ⁇ 0,-1,1 ⁇ , where flag equal to 0 means that the autonomous vehicle is on the navigation target lane, flag equal to -1 means that the navigation target lane is on the left side of the autonomous vehicle lane, and flag equal to 1 means navigation target lane on the right side of the vehicle where automatic driving lane, the ratio e is performed based on the time T T of the target operation history and determining the local mean return time coefficient K l. Further, the first global return coefficient is determined according to the change in the degree of sparseness and density between the lane where the autonomous vehicle is located after the lane change and the lane where the lane is located before the lane change
  • the second global reward coefficient is determined according to the position information flag and the target action between the navigation target lane and the lane where the autonomous vehicle is located
  • the second global return coefficient is determined according to the following formula
  • gap cur represents the average distance between the two adjacent obstacles in the lane before the autonomous vehicle changes lanes
  • the gap goal represents the average distance between the two adjacent obstacles in the lane after the autonomous vehicle changes lanes
  • Interval, a represents the target action.
  • the second global return coefficient can also be calculated by other equivalent or modified formulas of the above formula This is not limited in the embodiments of the application
  • the local reward coefficient K l and the first global reward coefficient And the second global return coefficient Calculate the return value.
  • flag' represents the position information between the navigation target lane and the lane where the self-driving vehicle is after performing the target action
  • flag' equal to 0 represents that the autonomous vehicle is In the navigation target lane
  • flag' equal to -1 represents the left side of the navigation target lane after the autonomous vehicle performs the target action
  • flag' equals 1 represents the right side of the navigation target lane after the autonomous vehicle performs the target action
  • dist2goal' represents the distance between the next intersection in the driving direction after the autonomous vehicle performs the target action
  • the preset function R(s") can also be equal to other functions including the driving information after the autonomous vehicle performs the target action This is not limited in the embodiments of this application.
  • the return value can also be calculated by other equivalent or modified formulas of the above formula, which is not limited in the embodiment of the present application.
  • the reward value corresponding to the target action can also be calculated in other ways according to the feedback information, which is not limited in the embodiment of the present application.
  • the quaternion information of the historical moment can be stored in the database, in order to facilitate subsequent training of the control strategy.
  • the four-tuple information of the historical moment corresponds to the vehicle condition at the historical moment, and may include: the characteristics of the historical moment, the target action of the autonomous vehicle at the historical moment, the reward value corresponding to the target action at the historical moment, and the The characteristics of the next moment in the historical moment.
  • the characteristics of the historical moment include the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the historical moment.
  • the characteristics of the next moment in the historical moment include the characteristics of the autonomous vehicle at the historical moment. Local neighbor characteristics and global statistical characteristics at a moment.
  • the target action instruction at the historical moment is obtained according to the local neighbor characteristics, global statistical characteristics of the autonomous vehicle at the historical moment, and the control strategy of the historical moment; further, by executing The target action obtains feedback information, and the return value corresponding to the target action and the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the next moment in the historical moment are calculated according to the feedback information, and the quadruple information of the historical moment is stored. It can be seen that by further introducing information such as global statistical features and the return value corresponding to the target action on the basis of the local neighbor characteristics, the training data used to train the control strategy is more complete, which is conducive to training more accurate control strategies.
  • the global classification model features are obtained; wherein, the global classification model features may include, but are not limited to: obstacles in each lane within the perception range of the autonomous vehicle and the autonomous vehicle within a preset time period (for example, 2,000,000 time windows)
  • the movement information before and after each lane change of the autonomous vehicle for example, the average gap between two obstacles in the lane before the autonomous vehicle changes lanes, gap cur
  • the average driving speed of obstacles in the lane V cur The average gap between the two adjacent obstacles in the lane after each lane change of the autonomous vehicle, the gap goal of the obstacle in the lane after each lane change of the autonomous vehicle
  • the average travel speed V goal a logistic regression algorithm is used according to the characteristics of the global classification model to generate the preset global classification model F 0 .
  • road scenes with different sparseness and density levels and/or different speeds can be preset in the simulator, for example, a map containing three lanes with a preset length (for example, 4km) can be constructed as a training map
  • the layout of social cars covers car-free scenes, sparse medium-speed scenes, sparse high-speed scenes, sparse low-speed scenes, uniform medium-speed scenes, uniform high-speed scenes, uniform low-speed scenes, dense Medium-speed scenes, dense high-speed scenes, dense low-speed scenes, dense ultra-low-speed scenes, etc.
  • the average vehicle density of sparse scenes, uniform scenes, and dense scenes is 15 vehicles/4000 meters, 40 vehicles/4000 meters, and 100 vehicles/ 4000 meters
  • the average speed range of social cars is 5 km/h, 10 km/h, 20 km/h, 30 km/h, 40 km/h, 50 km/h, 60 km/h, etc.).
  • the so-called random strategy means that the autonomous vehicle randomly selects targets in the decision space A (such as 0, 1, 2)
  • the action is executed. It is assumed that each time the autonomous vehicle drives to the end of the training map, it will randomly switch between the new training map and the new social car configuration scene, and it will terminate after a preset time period (for example, 2,000,000 time windows).
  • the global classification model features are obtained; among them, the global classification model features can include but are not limited to: Autonomous vehicles within a preset time period (for example, 2000000 time windows) and other vehicles within the autonomous vehicle perception range.
  • Driving information before and after each lane change of the driving vehicle for example, the average interval gap cur between two adjacent vehicles in the lane before the automatic driving vehicle changes lanes, and the autonomous vehicle is in the lane before each lane change
  • the average speed of the vehicle V cur the average gap goal between the two adjacent vehicles in the lane after each lane change of the autonomous vehicle, the average speed of the vehicle in the lane after each lane change of the autonomous vehicle V goal ).
  • the label corresponding to the driving information before and after the lane change is 1 (representing that the autonomous vehicle is in a lane after the lane change is greater than that of the autonomous vehicle before the lane change The lane is sparse), otherwise it is 0 (representing that the lane of the autonomous vehicle after changing lanes is denser than the lane of the autonomous vehicle before changing lanes).
  • the driving information before and after each lane change and the corresponding labels are added as sample data to the training set D, and the sample data in the training set D and the logistic regression algorithm are used to learn to generate a preset global classification model F 0 (model The output of is the probability of the lane being sparser after changing lanes).
  • the preset global classification model F 0 can also be generated in other ways, which is not limited in the embodiment of the present application.
  • FIG. 11 is a schematic flowchart of an automatic lane changing method provided by an embodiment of this application.
  • the method in this embodiment may be specifically executed by the execution device 1002 shown in FIG. 1.
  • the method provided by the embodiment of the present application may include:
  • the movement information is driving information; when the obstacle is a person, an animal or a stationary object, the movement information may include related information such as movement speed.
  • the current driving information of the autonomous vehicle such as driving speed and/or driving position
  • the perception range of the autonomous vehicle that is, the range that the sensor of the autonomous vehicle can detect, such as the distance from the autonomous vehicle Set the motion information of obstacles in each lane (such as the speed and/or driving position of the vehicle, the speed and/or motion position of people, animals or stationary objects, etc.) in each lane within the interval), and calculate the automatic driving vehicle Local neighbor characteristics and global statistical characteristics at the current moment.
  • the local neighbor features of the autonomous vehicle at the current moment involved in the embodiments of this application are used to indicate the specific neighbor obstacles of the autonomous vehicle at the current moment (for example, the autonomous vehicle is adjacent to the autonomous vehicle in the lane at the current moment).
  • Front and rear obstacles, front and rear obstacles that the autonomous vehicle is adjacent to the autonomous vehicle in the adjacent left lane of the lane at the current moment, and the autonomous vehicle is opposite the autonomous vehicle in the adjacent right lane of the lane at the current moment The motion state information (such as relative distance and relative speed) of adjacent front and rear obstacles relative to the autonomous vehicle.
  • the local neighbor feature s l of the autonomous driving vehicle at any time may include, but is not limited to: the relative speed of the front obstacle of the autonomous driving vehicle adjacent to the autonomous driving vehicle in the lane at that time relative to the autonomous driving vehicle And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle in the lane of the autonomous vehicle relative to the autonomous vehicle And relative distance The relative speed of the front obstacle adjacent to the autonomous vehicle relative to the autonomous vehicle in the adjacent left lane of the autonomous vehicle at the moment And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle relative to the autonomous vehicle in the adjacent left lane of the autonomous vehicle at the moment And relative distance The relative speed of the front obstacle adjacent to the autonomous vehicle relative to the autonomous vehicle in the adjacent right lane of the autonomous vehicle at the moment And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle relative to the autonomous vehicle in the adjacent right lane of the autonomous vehicle at the moment And relative distance The relative speed of the rear obstacle adjacent to the autonomous vehicle relative to the autonomous vehicle in the adjacent right lane of the autonomous vehicle at the moment And relative distance The relative
  • the global statistical characteristics of the automatic driving vehicle at the current time are used to indicate the sparseness and density of the obstacles in each lane within the perception range of the automatic driving vehicle at the current time. For example, all obstacles in each lane are at the current time The average driving speed and the average interval.
  • the global statistical feature s g of the autonomous driving vehicle at any time may include, but is not limited to: the average interval gap L between two adjacent obstacles in all lanes on the left side of the lane where the autonomous driving vehicle is located at that time. , The average gap between the two obstacles in the lane where the autonomous vehicle is located at the moment, gap M , the average gap between the two obstacles in the lane on the right side of the lane at that time. The interval gap R , the average speed V L of obstacles in all lanes on the left side of the lane where the autonomous vehicle is at the moment, the average speed V M of the obstacles in the lane where the autonomous vehicle is at that moment, and the autonomous vehicle's on the right side of the lane moments where all obstacle in the roadway average speed V R.
  • the current control strategy in this embodiment of the present application may be a preset control strategy.
  • the above-mentioned training device 1001 is finally obtained by executing the training method of the above-mentioned control strategy
  • the current control strategy in this embodiment of the application may be the control strategy obtained after the execution device 1002 is updated at the previous time.
  • the target action instruction at the current moment used to instruct the autonomous vehicle Perform the target action.
  • control strategy (such as the current control strategy) at any time can be expressed as:
  • s represents the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at that moment; a' ⁇ (0,1,2), a'equal to 0 means keeping going straight, a'equal to 1 means changing lanes to the left adjacent lane , A'equal to 2 means changing lanes to the right adjacent lane.
  • the action a'that maximizes Q(s, a', ⁇ ) is selected as the target action a at the current moment.
  • the target action includes at least two types: changing lanes or keeping going straight, where the lane changing includes: changing lanes to the left adjacent lane or turning to the right adjacent lane.
  • control strategy at the current moment can also adopt other variants or equivalent formulas of the above formula, which is not limited in the embodiment of the present application.
  • S1103 Execute the target action according to the target action instruction.
  • the autonomous vehicle executes the lane-changing action; if the target action instructs to keep going straight, the automated driving vehicle executes the keeping straight action.
  • the manner in which the target action is executed according to the target action instruction can be referred to the content in the related technology, which is not limited in the embodiment of the present application.
  • the local neighbor features and global statistical features of the autonomous vehicle at the current moment are calculated; further, Obtain target action instructions according to local neighbor characteristics, global statistical characteristics and current control strategies, and execute the target action according to the target action instructions. It can be seen that, on the basis of local neighbor features, further introducing global statistical features to input current control strategy to obtain target action instructions, not only considers the information of local neighbor obstacles (such as other cars) but also considers global statistical features (such as overall traffic flow). ). Therefore, the target action obtained by integrating the local and all road obstacle information is the global optimal strategic action.
  • feedback information can also be obtained by executing the target action, and the current control strategy can be updated according to the feedback information to obtain the control strategy at the next time, so that the next time can be based on the The control strategy at the next moment accurately determines the target action at the next moment.
  • the control strategy at time t can be updated according to the feedback information at time t to obtain the control strategy at time t+1, so that the control strategy for generating the target action is always adaptively continuous In the update and optimization, it is ensured that each moment has its corresponding optimal control strategy, which provides a guarantee for the accurate generation of the target action at each moment.
  • feedback information (used to update the current control strategy) is obtained by executing the target action, so as to determine the return value corresponding to the target action and the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the next moment, thereby updating the current control Strategy.
  • the feedback information may include, but is not limited to: driving information (such as driving speed or driving position, etc.) of the autonomous vehicle after executing the target action, the driving information of the autonomous vehicle at the next moment, and each lane within the autonomous vehicle's perception range
  • the obstacle movement information at the next moment when the target action is a lane change, the feedback information can also include: the ratio of the time to perform the target action to the historical average time, and the lane of the autonomous vehicle after the lane change and the position before the lane change
  • the change in the degree of sparseness and denseness of lanes; among them, the historical average time includes the average time for the autonomous vehicle to perform similar actions (such as lane changing actions) within a preset historical time period (for example, a 500 time window).
  • the change in the degree of sparseness and density between the lane of the autonomous vehicle after the lane change and the lane before the lane change may be based on the autonomous vehicle and the obstacles in each lane within the autonomous vehicle's perception range.
  • Movement information before and after the lane for example, the average gap between two adjacent obstacles in the lane before the autonomous vehicle changes lanes, gap cur , the average travel speed V cur of the obstacles in the lane before the autonomous vehicle changes lanes,
  • the average distance gap goal between two adjacent obstacles in the lane after the autonomous vehicle changes lanes
  • the preset global classification model F 0 sure the preset global classification model
  • FIG. 12 is a schematic flowchart of an automatic lane changing method provided by another embodiment of this application.
  • the embodiment of the present application introduces the achievable manner of "update the current control strategy according to the feedback information to obtain the control strategy at the next moment".
  • the method of the embodiment of the present application may include:
  • step S1001 based on the driving information of the autonomous vehicle at the next moment in the feedback information and the movement information of the obstacles in each lane within the perception range of the autonomous vehicle at the next moment, calculate the next moment of the autonomous vehicle at the current moment.
  • Local neighbor features and global statistical features The specific calculation method can refer to the method of obtaining the local neighbor characteristics and the global statistical characteristics of the autonomous vehicle at the historical moment in step S1001, which will not be repeated in this embodiment of the application.
  • a possible implementation manner is to calculate the reward value based on the driving information of the autonomous vehicle after the target action is executed when the target action is to stay straight.
  • the reward value is calculated according to the preset function R(s") and the driving information s” (such as driving speed or distance, etc.) of the autonomous vehicle after executing the target action.
  • the preset function R(s") V ego ', V ego ' represents the driving speed of the autonomous vehicle after executing the target action; of course, the preset function R(s”) can also be equal to including the autonomous vehicle executing the target action
  • Other functions of the subsequent driving information are not limited in the embodiment of the present application.
  • the target action is a lane change
  • the ratio of the time to execute the target action to the historical average time is calculated.
  • the change in the degree of sparseness and density of the lane before the lane change is calculated.
  • the coefficient K l is determined according to the local return execution time ratio of the target operation history and T is the average time T e.
  • the global reward coefficient K g is determined according to the changes in the sparseness and denseness of the lane where the autonomous vehicle is located after the lane change and the lane before the lane change. When the lane in front of the road is dense, K g >1; when the lane of the autonomous vehicle after changing lanes is sparser than the lane of the autonomous vehicle before changing lanes, K g ⁇ 1. Further, the reward value is calculated according to the driving information after the autonomous vehicle performs the target action, the local reward coefficient K l and the global reward coefficient K g .
  • the reward value is calculated according to the formula c*K l *K g *R(s"); where c represents a preset discount factor (for example, 0.3), and R(s”) represents the target action after the autonomous vehicle
  • c represents a preset discount factor (for example, 0.3)
  • R(s) represents the target action after the autonomous vehicle
  • the preset function of the driving information can also be calculated by other equivalent or modified formulas of the above formula, which is not limited in the embodiment of the present application.
  • the local neighbor feature of the autonomous vehicle at the current moment can also include: the position information flag between the navigation target lane and the lane where the autonomous vehicle is located, and the distance between the autonomous vehicle and the next intersection along the driving direction dist2goal; where ,Flag ⁇ 0,-1,1 ⁇ , where flag equal to 0 means that the autonomous vehicle is on the navigation target lane, flag equal to -1 means that the navigation target lane is on the left side of the lane where the autonomous vehicle is located, and flag equal to 1 means navigation the target lane on the right side of the vehicle where automatic driving lane, the ratio e is performed based on the time T T of the target operation history and determining the local mean return time coefficient K l. Further, the first global return coefficient is determined according to the change in the degree of sparseness between the lane where the autonomous vehicle is located after the lane change and the lane where the lane is located before the lane change
  • the second global reward coefficient is determined according to the position information flag and the target action between the navigation target lane and the lane where the autonomous vehicle is located
  • the second global return coefficient is determined according to the following formula
  • gap cur represents the average distance between the two adjacent obstacles in the lane before the autonomous vehicle changes lanes
  • the gap goal represents the average distance between the two adjacent obstacles in the lane after the autonomous vehicle changes lanes
  • Interval, a represents the target action.
  • the second global return coefficient can also be calculated by other equivalent or modified formulas of the above formula This is not limited in the embodiments of the application
  • the local reward coefficient K l and the first global reward coefficient And the second global return coefficient Calculate the return value.
  • flag' represents the position information between the navigation target lane and the lane where the self-driving vehicle is after performing the target action
  • flag' equal to 0 represents that the autonomous vehicle is In the navigation target lane
  • flag' equal to -1 represents the left side of the navigation target lane after the autonomous vehicle performs the target action
  • flag' equals 1 represents the right side of the navigation target lane after the autonomous vehicle performs the target action
  • dist2goal' represents the distance between the next intersection in the driving direction after the autonomous vehicle performs the target action
  • the preset function R(s") can also be equal to other functions including the driving information after the autonomous vehicle performs the target action This is not limited in the embodiments of this application.
  • the return value can also be calculated by other equivalent or modified formulas of the above formula, which is not limited in the embodiment of the present application.
  • the reward value corresponding to the target action can also be calculated in other ways according to the feedback information, which is not limited in the embodiment of the present application.
  • Step S1202 Determine the quadruple information at the current moment.
  • step S1201 according to the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the current moment, the target action at the current moment, the reward value corresponding to the target action calculated in step S1201, and the local neighbor of the autonomous vehicle at the next moment Features and global statistical features to determine the current quadruple information.
  • the four-tuple information at the current time corresponds to the vehicle condition at the current time, and may include: the characteristics of the current time, the target action of the autonomous vehicle at the current moment, the reward value corresponding to the target action, and the characteristics of the next moment at the current moment.
  • the features at the current moment include the local neighbor features and global statistical features of the autonomous vehicle at the current moment
  • the next moment features at the current moment include the local neighbor features and global statistical features of the autonomous vehicle at the next moment.
  • S1203 Update the current control strategy according to the quadruple information at the current moment to obtain the control strategy at the next moment.
  • a possible implementation method is to generate the target value corresponding to the quadruple information according to the quadruple information at the current moment when the target action is going straight; further, use the gradient descent method to determine the first preset function containing the target value
  • the parameter ⁇ in is updated iteratively; further, the parameter ⁇ after the iterative update is replaced with the parameter ⁇ in the current control strategy to obtain the control strategy at the current time and the next time.
  • the following formula may be used according to the quadruple information at the current moment to generate the target value y corresponding to the quadruple information.
  • represents the preset forgetting factor, ⁇ (0,1); Q(s',a, ⁇ ) represents the action value function; Represents traversing a to make Q(s', a, ⁇ ) take the maximum value; s'represents the characteristics of the current moment and the next moment.
  • the target value corresponding to the quaternion information can also be generated by other variations of the above formula or equivalent formula, which is not limited in the embodiment of the present application.
  • the gradient descent method is used to iteratively update the parameter ⁇ in the first preset function (yQ(s,a, ⁇ )) 2 containing the target value y; where Q(s,a, ⁇ ) is the current moment
  • the action value function corresponding to the quaternion information of, s represents the local neighbor feature and the global feature of the current time in the quaternion information of the current time, and a represents the target action at the current time in the quaternion information of the current time.
  • the iteratively updated parameter ⁇ is replaced with the parameter ⁇ in the current control strategy, so as to obtain the control strategy at the current time and the next time, so as to be used to determine the target action at the next time.
  • Another possible implementation method is to obtain the current extended quadruple information when the target action is changing lanes; further, the current control strategy is based on the current quadruple information and the current extended quadruple information Update to get the control strategy of the current moment and the next moment.
  • the extended quadruple information at the current moment involved in the embodiments of the present application corresponds to the current extended vehicle condition, and is obtained by processing the symmetric rule and the monotonous rule on the current vehicle condition.
  • the symmetry rule involved in the embodiments of the present application refers to symmetrically transforming the positions of obstacles in all lanes on the left and right sides of the lane where the autonomous vehicle is located, taking the lane where the autonomous vehicle is located as the axis.
  • the monotonic principle involved in the embodiments of the present application refers to an increase in the distance between the obstacles in the front and rear neighbors of the autonomous vehicle on the target lane where the autonomous vehicle changes lanes, and/or the increase in the distance between the autonomous vehicle on the non-target lane The distance between the front and rear neighbor obstacles changes less than the preset distance range.
  • the extended quadruple information at the current moment may include: symmetric quadruple information and monotonic quadruple information at the current moment.
  • the symmetric quadruple information at the current time can be obtained by processing the quadruple information at the current time on the symmetric principle
  • the monotonic quadruple information at the current time can be obtained by processing the quadruple information at the current time on the monotonic principle. of.
  • Information (s i , a i , r i , s i ') can use the following formula to generate the target value y i corresponding to the i-th quaternion information; where i is a positive integer not greater than n, and n is The total number of quadruple information included in the current quadruple information and the current extended quadruple information.
  • represents the preset forgetting factor, ⁇ (0,1);
  • Q(s i ',a i , ⁇ ) represents the action value function; It represents traversing a i to make Q(s i ', a i , ⁇ ) take the maximum value;
  • s i ' represents the feature of the i-th quadruple information at a later time.
  • the i-th quaternion information correspondence can also be generated through other variations of the above formula or equivalent formula
  • the target value of is not limited in the embodiment of this application.
  • the gradient descent method is used to determine the second preset function containing the target value y i corresponding to the i-th quadruple information
  • the parameter ⁇ in is updated iteratively; where Q(s i , a i , ⁇ ) is the action value function corresponding to the i-th quadruple information, and s i represents the previous moment in the i-th quadruple information
  • a i represents the target action at the previous moment in the i-th quadruple information.
  • the iteratively updated parameter ⁇ is replaced with the parameter ⁇ in the current control strategy, so as to obtain the control strategy at the current time and the next time, so as to be used to determine the target action at the next time.
  • Another possible implementation method is to update the current control strategy to obtain the current time according to the quadruple information of the current time, the quadruple information of the historical time and the extended quadruple information of the historical time when the target action is to keep going straight. Control strategy for the next moment.
  • the four-tuple information of the historical moment in the embodiment of the present application corresponds to the vehicle condition at the historical moment, which may include, but is not limited to: the characteristics of the historical moment, the target action of the autonomous vehicle at the historical moment (that is, determined at the historical moment according to the corresponding control strategy) The target action), the reward value corresponding to the target action at the historical moment, and the characteristics of the next moment in the historical moment.
  • the characteristics of the historical moment may include, but are not limited to: local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the historical moment.
  • the characteristics of the next moment in the historical moment may include, but are not limited to: local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the next moment.
  • the extended quadruple information of the historical time involved in the embodiment of the present application corresponds to the extended vehicle conditions at the historical time, and is obtained by processing the vehicle conditions at the historical time on symmetric rules and monotonic rules.
  • the symmetry rule involved in the embodiments of the present application refers to symmetrically transforming the positions of obstacles in all lanes on the left and right sides of the lane where the autonomous vehicle is located, taking the lane where the autonomous vehicle is located as the axis.
  • the monotonic principle involved in the embodiments of the present application refers to an increase in the distance between the obstacles in the front and rear neighbors of the autonomous vehicle on the target lane where the autonomous vehicle changes lanes, and/or the increase in the distance between the autonomous vehicle on the non-target lane The distance between the front and rear neighbor obstacles changes less than the preset distance range.
  • the extended quadruple information of the historical moment may include: symmetric quadruple information and monotonic quadruple information of the historical moment.
  • the symmetrical quaternion information of the historical moment can be obtained by processing the quaternion information of the historical moment on the symmetric principle
  • the monotonic quaternion information of the historical moment can be obtained by processing the quaternion information of the historical moment on the monotonic principle. of.
  • the quad information at the historical moment, and the extended quad information at the historical moment can use the following formula to generate the target value y j corresponding to the j-th quaternion information; where j is a positive integer not greater than m, and m is the current moment.
  • represents the preset forgetting factor, ⁇ (0,1);
  • Q(s j ',a j , ⁇ ) represents the action value function; It represents traversing a j to make Q(s j ', a j , ⁇ ) take the maximum value;
  • s j ' represents the feature of the j-th quadruple information at a later time.
  • the j-th quadruple information in the quadruple information of the current moment, the quadruple information of the historical moment, and the extended quadruple information of the historical moment can also be generated by other variations of the above formula or equivalent formula
  • the target value corresponding to the j-th quaternion information is not limited in the embodiment of the present application.
  • the third preset function containing the target value y j corresponding to the j-th quadruple information is determined by the gradient descent method
  • the parameter ⁇ in is updated iteratively; where Q(s j , a j , ⁇ ) is the action value function corresponding to the j-th quaternion information, and s j represents the previous moment in the j-th quaternion information Feature, a j represents the target action at the previous moment in the j-th quadruple information.
  • the iteratively updated parameter ⁇ is replaced with the parameter ⁇ in the current control strategy, so as to obtain the control strategy at the next moment, so as to be used to determine the target action at the next moment.
  • Another possible implementation method is to obtain the extended quadruple information at the current moment when the target action is changing lanes; further, according to the current quadruple information, the current extended quadruple information at the current moment, and the historical moment
  • the quaternion information and the extended quaternion information of the historical time are updated to obtain the control strategy of the current time and the next time by updating the current control strategy.
  • the extended quadruple information at the current moment involved in the embodiments of the present application corresponds to the current extended vehicle condition, and is obtained by processing the symmetric rule and the monotonous rule on the current vehicle condition.
  • the symmetry rule involved in the embodiments of the present application refers to symmetrically transforming the positions of obstacles in all lanes on the left and right sides of the lane where the autonomous vehicle is located, taking the lane where the autonomous vehicle is located as the axis.
  • the monotonic principle involved in the embodiments of the present application refers to an increase in the distance between the obstacles in the front and rear neighbors of the autonomous vehicle on the target lane where the autonomous vehicle changes lanes, and/or the increase in the distance between the autonomous vehicle on the non-target lane The distance between the front and rear neighbor obstacles changes less than the preset distance range.
  • the extended quadruple information at the current moment may include: symmetric quadruple information and monotonic quadruple information at the current moment.
  • the symmetric quadruple information at the current time can be obtained by processing the quadruple information at the current time on the symmetric principle
  • the monotonic quadruple information at the current time can be obtained by processing the quadruple information at the current time on the monotonic principle. of.
  • the four-tuple information of the historical moment in the embodiment of the present application corresponds to the vehicle condition at the historical moment, which may include, but is not limited to: the characteristics of the historical moment, the target action of the autonomous vehicle at the historical moment (that is, determined at the historical moment according to the corresponding control strategy) The target action), the reward value corresponding to the target action at the historical moment, and the characteristics of the next moment in the historical moment.
  • the characteristics of the historical moment may include, but are not limited to: local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the historical moment.
  • the characteristics of the next moment in the historical moment may include, but are not limited to: local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the next moment.
  • the extended quadruple information of the historical time involved in the embodiment of the present application corresponds to the extended vehicle conditions at the historical time, and is obtained by processing the vehicle conditions at the historical time on symmetric rules and monotonic rules.
  • the extended quadruple information of the historical moment may include: symmetric quadruple information and monotonic quadruple information of the historical moment.
  • the symmetrical quaternion information of the historical moment can be obtained by processing the quaternion information of the historical moment on the symmetric principle
  • the monotonic quaternion information of the historical moment can be obtained by processing the quaternion information of the historical moment on the monotonic principle. of.
  • the k-th quadruple information (s k , a k , r k , s k ') in the extended quad information at historical moments can use the following formula to generate the target value y corresponding to the k-th quadruple information k ; where k is a positive integer not greater than p, and p is the quadruple information of the current moment, the extended quadruple information of the current moment, the quadruple information of the historical moment and the extended quadruple information of the historical moment The total number of quadruple information included in.
  • represents the preset forgetting factor, ⁇ (0,1);
  • Q(s k ', ak , ⁇ ) represents the action value function; It means traversing a k so that Q(s k ', a k , ⁇ ) takes the maximum value;
  • s k ' represents the feature of the k-th quadruple information at a later time.
  • the fourth preset function containing the target value y k corresponding to the k-th quadruple information is determined by using the gradient descent method
  • the parameter ⁇ in is updated iteratively; where Q(s k , a k , ⁇ ) is the action value function corresponding to the k-th quaternion information, and s k represents the previous moment in the k-th quaternion information Characteristic, a k represents the target action at the previous moment in the k-th quadruple information.
  • the iteratively updated parameter ⁇ is replaced with the parameter ⁇ in the current control strategy, so as to obtain the control strategy at the current time and the next time, so as to be used to determine the target action at the next time.
  • the current control strategy can also be updated in other ways to obtain the control strategy at the next moment, which is not limited in the embodiment of the present application.
  • FIG. 13 is a schematic diagram of training data provided by an embodiment of the application.
  • the control strategy training method provided by the embodiment of the present application obtains the control strategies at different stages during the training process in four different traffic flows (for example, sparse scene, normal scene, congested scene, and very congested scene). ) The performance trend of the test.
  • the abscissa in FIG. 13 represents the number of iterations of the entire training process (unit: 10,000 times), and the ordinate represents the time (unit: seconds) that the autonomous vehicle takes to complete the journey on a fixed-length lane.
  • the red curve represents the convergence trend of using only local neighbor features as input training (Scheme 1)
  • the blue curve represents the trend of training (Scheme 2) after further adding global statistical features based on local neighbor features
  • the curve represents the convergence trend of training (Scheme 3) after introducing the local return coefficient and the global return coefficient when calculating the return value.
  • Table 1 is a schematic table of training data provided by the embodiments of the application. As shown in Table 1, the sparse scene, normal scene, congested scene and very congested scene are compared with related schemes. It can be seen that this scheme has advantages over related schemes in terms of average speed and average number of lane changes. At the same time, we also counted some seemingly unreasonable, long-term reasonable lane changing behaviors, called “soft lane changing”, and found that our scheme has a certain proportion of "soft lane changing", indicating that our model is more Long-term intelligence.
  • Table 1 is a schematic table of training data provided by an embodiment of this application.
  • FIG. 14 is a schematic structural diagram of an automatic lane changing device provided by an embodiment of the application.
  • the automatic lane changing device 140 provided in this embodiment may include: a calculation module 1401, an acquisition module 1402, and an execution module 1403.
  • the calculation module 1401 is used to calculate the local neighbor characteristics and global statistics of the autonomous vehicle at the current moment based on the driving information of the autonomous vehicle at the current moment and the motion information of the obstacles in each lane within the autonomous vehicle's perception range Feature;
  • the local neighbor feature is used to represent the motion state information of a specific neighbor obstacle of the autonomous vehicle relative to the autonomous vehicle;
  • the global statistical feature is used to represent obstacles in each lane within the sensing range The degree of sparseness and density of objects;
  • the obtaining module 1402 is configured to obtain a target action indication according to the local neighbor characteristics, the global statistical characteristics and the current control strategy, the target action indication is used to instruct the autonomous vehicle to perform a target action, and the target action includes at least Two types: change lanes or keep going straight;
  • the execution module 1403 is configured to execute the target action according to the target action instruction.
  • the device further includes:
  • the feedback module is used to obtain feedback information by executing the target action, and the feedback information is used to update the current control strategy; wherein, the feedback information includes driving information of the autonomous vehicle after executing the target action, The driving information of the autonomous vehicle at the next moment and the motion information of the obstacles in each lane within the sensing range of the autonomous vehicle at the next moment; when the target action is a lane change, the feedback information further includes: execution The ratio of the time of the target action to the historical average time, and the change in the degree of sparseness and density between the lane of the autonomous vehicle after the lane change and the lane before the lane change; wherein, the historical average time includes the The average time for an autonomous vehicle to perform similar actions in a preset historical time period;
  • the update module is used to update the current control strategy according to the feedback information to obtain the next control strategy.
  • the update module includes:
  • a calculation unit configured to calculate the reward value corresponding to the target action according to the feedback information, and the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the next moment;
  • the determining unit is used to determine the quadruple information at the current moment; wherein, the quadruple information at the current moment corresponds to the vehicle condition at the current moment, including: the characteristics of the current moment, the target action, and the corresponding target action
  • the return value and the characteristics of the next moment include the local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the current moment.
  • the characteristics of the next moment include the characteristics of the autonomous vehicle at the next moment. Local neighbor characteristics and global statistical characteristics at the moment;
  • the update unit is configured to update the current control strategy according to the quadruple information at the current moment to obtain the control strategy at the next moment.
  • the update unit is specifically configured to:
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the update unit is specifically configured to:
  • the extended quadruple information at the current time corresponds to the current extended vehicle condition at the current time, wherein the current extended vehicle condition at the current time is obtained by performing symmetrical and monotonic rule processing on the current vehicle condition ,
  • the symmetry rule refers to symmetrically transforming the positions of obstacles in all lanes on the left and right sides of the lane where the autonomous vehicle is located, taking the lane where the autonomous vehicle is located; The distance between the front and rear neighbor obstacles of the autonomous driving vehicle on the target lane of the lane change increases, and/or the distance between the front and rear neighbor obstacles of the autonomous vehicle on the non-target lane changes less than Preset distance range;
  • the current control strategy is updated to obtain the control strategy at the next time according to the quadruple information at the current time and the extended quadruple information at the current time.
  • the update unit is specifically configured to:
  • the target value corresponding to the i-th quadruple information is generated; wherein, the i is a positive integer not greater than n, and n is the total number of quadruple information included in the quadruple information at the current moment and the extended quadruple information at the current moment;
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the update unit is specifically configured to:
  • the four-tuple information of the historical moment corresponds to the vehicle condition at the historical moment, including: the characteristics of the historical moment, the target action at the historical moment, the reward value corresponding to the target action at the historical moment, and the historical moment
  • the characteristics of the next moment, the characteristics of the historical moment include local neighbor characteristics and global statistical characteristics of the autonomous vehicle at the historical moment, and the characteristics of the next moment of the historical moment include the characteristics of the autonomous vehicle at the historical moment Local neighbor features and global statistical features at the next moment
  • the extended quadruple information of the historical moment corresponds to the extended vehicle condition at the historical moment, and the historical moment extended vehicle condition is obtained by processing the vehicle condition at the historical moment symmetrically and monotonously .
  • the update unit is specifically configured to:
  • the j-th quadruple Generate the j-th quadruple according to the quadruple information of the current moment, the quadruple information of the historical moment, and the j-th quadruple information in the extended quadruple information of the historical moment
  • the target value corresponding to the information wherein, the j is a positive integer not greater than m, and m is the quaternion information of the current moment, the quaternion information of the historical moment, and the extended four of the historical moment
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the update unit is specifically configured to:
  • the extended quadruple information at the current time corresponds to the current extended vehicle condition, and the current extended vehicle condition is a symmetric rule and monotonic rule processing on the current vehicle condition owned;
  • the current control strategy is updated to obtain The control strategy at the next moment; wherein the quad information at the historical moment corresponds to the vehicle condition at the historical moment, the extended quad information at the historical moment corresponds to the extended vehicle condition at the historical moment, and the extended vehicle condition at the historical moment corresponds to all
  • the vehicle conditions at the historical moment are processed by symmetric rules and monotonous rules.
  • the basis updating unit is specifically configured to:
  • the target value corresponding to the k-th quadruple information is generated; wherein, k is a positive integer not greater than p, and p is the quadruple information at the current moment and the extension of the current moment.
  • the iteratively updated parameter ⁇ is substituted for the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the calculation unit is specifically configured to:
  • the driving information of the autonomous vehicle at the next moment and the movement information of obstacles in each lane within the autonomous vehicle's perception range at the next moment calculate the local neighbor characteristics and global statistical features of the autonomous vehicle at the next moment .
  • the calculation unit is specifically configured to:
  • the ratio of the time for executing the target action to the historical average time, and the lane where the autonomous vehicle is located after the lane change and the lane before the lane change calculate the return value
  • the driving information of the autonomous vehicle at the next moment and the movement information of obstacles in each lane within the autonomous vehicle's perception range at the next moment calculate the local neighbor characteristics and global statistical features of the autonomous vehicle at the next moment .
  • the specific neighbor obstacle of the autonomous vehicle includes at least one of the following: front and rear obstacles adjacent to the autonomous vehicle in the lane where the autonomous vehicle is located, and the autonomous vehicle Front and rear obstacles adjacent to the autonomous vehicle on the adjacent left lane of the lane where the driving vehicle is located, and front and rear obstacles adjacent to the autonomous vehicle on the adjacent right lane of the lane where the autonomous vehicle is located;
  • the front and rear obstacles adjacent to the autonomous vehicle in the adjacent left lane of the lane where the autonomous vehicle is located are relative to the motion state information of the autonomous vehicle Is the default value;
  • the front and rear obstacles adjacent to the autonomous vehicle on the adjacent right lane of the lane where the autonomous vehicle is located, relative to the autonomous vehicle's motion state information are default value.
  • the global traffic flow statistics feature of the autonomous vehicle at the current moment includes at least one of the following: the average driving speed and the average interval of all obstacles in each lane within the sensing range.
  • the automatic lane changing device 140 provided in the embodiment of the present application may be used to implement the technical solutions in the above-mentioned automatic lane changing method embodiments of the present application, and its implementation principles and technical effects are similar, and will not be repeated here.
  • the automatic lane changing device 150 provided in this embodiment may include a processor 1501 and a memory 1502.
  • the memory 1502 is used to store program instructions
  • the processor 1501 is configured to call and execute the program instructions stored in the memory 1502.
  • the automatic lane changing device is configured to execute the above-mentioned
  • the implementation principles and technical effects of the technical solutions in the embodiments of the automatic lane changing method are similar, and will not be repeated here.
  • Fig. 15 only shows the simplified design of the automatic lane changing device.
  • the automatic lane changing device may also include any number of processors, memories, and/or communication units, which are not limited in the embodiments of the present application.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when the instructions run on a computer, the computer executes the above-mentioned automatic lane changing method embodiment of the present application
  • the technical solution has similar implementation principles and technical effects, and will not be repeated here.
  • the embodiment of the present application also provides a program, which is used to execute the technical solution in the above embodiment of the automatic lane changing method of the present application when the program is executed by the processor.
  • the implementation principle and technical effect are similar, and will not be repeated here.
  • the embodiment of the present application also provides a computer program product containing instructions, which when running on a computer, causes the computer to execute the technical solutions in the above embodiments of the automatic lane changing method of the present application.
  • the implementation principles and technical effects are similar here. No longer.
  • FIG. 16 is a conceptual partial view of a computer program product provided by an embodiment of the application, and FIG. 16 schematically illustrates a conceptual partial view of an example computer program product arranged according to at least some of the embodiments shown here, the example computer
  • the program product includes a computer program for executing a computer process on a computing device.
  • the example computer program product 600 is provided using a signal bearing medium 601.
  • the signal-bearing medium 601 may include one or more program instructions 602, which, when run by one or more processors, can implement the technical solutions in the above-mentioned automatic lane changing method embodiments of this application, and the implementation principles and technical effects are similar , I won’t repeat it here.
  • the signal-bearing medium 601 may include a computer-readable medium 603, such as, but not limited to, a hard disk drive, compact disc (CD), digital video disc (DVD), digital tape, memory, read-only storage memory (Read- Only Memory (ROM) or Random Access Memory (RAM), etc.
  • the signal bearing medium 601 may include a computer recordable medium 604, such as but not limited to memory, read/write (R/W) CD, R/W DVD, and so on.
  • the signal-bearing medium 601 may include a communication medium 605, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.).
  • the signal bearing medium 601 may be communicated by a wireless communication medium 605 (for example, a wireless communication medium that complies with the IEEE 802.11 standard or other transmission protocols).
  • the one or more program instructions 602 may be, for example, computer-executable instructions or logic-implemented instructions.
  • the computing device may be configured to provide various programs in response to program instructions 602 communicated to the computing device through one or more of the computer-readable medium 603, the computer-recordable medium 604, and/or the communication medium 605. Operation, function, or action. It should be understood that the arrangement described here is for illustrative purposes only.
  • FIG. 17 is a schematic structural diagram of a training device for a control strategy provided by an embodiment of the application.
  • the training device 170 for the control strategy provided in this embodiment may include: a first acquisition module 1701 and an update module 1702.
  • the first obtaining module 1701 is configured to perform step A: obtain the quad information of a preset number of historical moments, wherein the quad information of the historical moment corresponds to the vehicle condition at the historical moment, including: Characteristics, the target action of the autonomous vehicle at the historical moment, the reward value corresponding to the target action at the historical moment, and the characteristics of the next moment of the historical moment, and the characteristics of the historical moment include the characteristics of the autonomous vehicle A local neighbor feature and a global statistical feature at a historical moment, where the feature at the next moment in the historical moment includes the local neighbor feature and a global statistical feature of the autonomous vehicle at the next moment in the historical moment;
  • the update module 1702 is configured to perform step B: according to the quad information of at least one first historical moment, the extended quad information of the at least one first historical moment, and the quad information of at least one second historical moment , Update the current control strategy to get the next control strategy;
  • the cycle execution times of the steps A and B reach a preset number of times, or the steps A and B are cycled for multiple times until the updated control strategy meets the preset condition and stop; the steps A and B
  • the control strategy finally obtained after repeated executions is used for the automatic lane changing device to obtain the target action instruction when the automatic lane changing method is executed;
  • the quaternion information of the at least one first historical moment is the quaternion information of the quaternion information of the preset number of historical moments
  • the target action of the historical moment is the quaternion information of the historical moment corresponding to the lane change
  • the quaternion information of the at least one second historical moment is the quaternion of other historical moments in the quaternion information of the preset number of historical moments except for the quaternion information of the at least one first historical moment Group information
  • the extended quad information of any of the first historical moments corresponds to the extended vehicle condition at the first historical moment
  • the extended vehicle condition at the first historical moment is obtained by subjecting the vehicle condition at the first historical moment to symmetric and monotonic rules.
  • the update module 1702 includes:
  • a generating unit configured to use the quad information of the at least one first historical moment, the extended quad information of the at least one first historical moment, and the quad information of the at least one second historical moment Generate the target value corresponding to the l th quaternion information; wherein, l is a positive integer not greater than q, and q is the at least one first historical moment Quadruple information, extended quad information of the at least one first historical moment, and the total number of quad information included in the quad information of the at least one second historical moment;
  • An update unit configured to use a gradient descent method to iteratively update the parameter ⁇ in the preset function containing the target value corresponding to the l-th quaternion information
  • the replacement unit is used to replace the parameter ⁇ after iterative update with the parameter ⁇ in the current control strategy to obtain the control strategy at the next moment.
  • the device further includes:
  • the first calculation module is used to calculate the driving information of the autonomous vehicle and the movement information of the obstacles in each lane within the sensing range of the autonomous vehicle for each historical moment. Local neighbor characteristics and global statistical characteristics;
  • the second acquisition module is configured to acquire a target action instruction at the historical moment according to the local neighbor characteristics, global statistical features, and the control strategy of the historical moment, where the target action instruction is used to instruct the automatic driving
  • the vehicle performs a target action, and the target action includes at least two types: changing lanes or keeping straight;
  • the feedback module is used to obtain feedback information by executing the target action; wherein the feedback information includes the driving information of the autonomous vehicle after executing the target action, the driving information of the autonomous vehicle at the next moment and the The motion information of obstacles in each lane within the sensing range of the autonomous driving vehicle at the next moment; when the target action is a lane change, the feedback information also includes: the ratio of the time to execute the target action to the historical average time, And the change in the degree of sparseness and density between the lane where the autonomous vehicle is located after the lane change and the lane where the lane is located before the lane change; wherein, the historical average time includes the autonomous vehicle performing similar actions in a preset historical time period Average time
  • the second calculation module is configured to calculate the reward value corresponding to the target action according to the feedback information, and the local neighbor characteristics and global traffic statistics characteristics of the autonomous vehicle at the next moment in the historical moment;
  • the storage module is used to store the quadruple information of the historical moment.
  • the second calculation module is specifically configured to:
  • the reward value is calculated according to the driving information of the autonomous vehicle after the target action is executed.
  • the second calculation module is specifically configured to:
  • the ratio of the time for executing the target action to the historical average time, and the lane where the autonomous vehicle is located after the lane change and the lane before the lane change The change in the degree of sparseness and density is calculated, and the return value is calculated.
  • control strategy training device 170 provided by the embodiment of the present application can be used to implement the technical solutions in the above-mentioned control strategy training method embodiments of the present application.
  • the implementation principles and technical effects are similar, and will not be repeated here.
  • FIG. 18 is a schematic structural diagram of a training device for a control strategy provided by another embodiment of the application.
  • the training device 180 for the control strategy provided in this embodiment may include: a processor 1801 and a memory 1802;
  • the memory 1802 is used to store program instructions
  • the processor 1801 is configured to call and execute the program instructions stored in the memory 1802.
  • the control strategy training device is used to execute the present application.
  • FIG. 18 only shows a simplified design of the training device of the control strategy.
  • the training device for the control strategy may also include any number of processors, memories, and/or communication units, which are not limited in the embodiments of the present application.
  • the embodiment of the present application also provides a computer-readable storage medium that stores instructions in the computer-readable storage medium.
  • the instructions When the instructions are run on a computer, the computer executes the above-mentioned control strategy training method embodiment of the present application.
  • the implementation principle and technical effect of the technical solution of, are similar, and will not be repeated here.
  • the embodiment of the present application also provides a program, when the program is executed by a processor, it is used to execute the technical solution in the above-mentioned training method embodiment of the control strategy of the present application.
  • the implementation principle and technical effect are similar, and will not be repeated here. .
  • the embodiment of the present application also provides a computer program product containing instructions, which when running on a computer, causes the computer to execute the technical solution in the embodiment of the training method for the above-mentioned control strategy of the present application.
  • the implementation principles and technical effects are similar. I won't repeat it here.
  • FIG. 16 Exemplarily, the conceptual partial view of the computer program product provided by the embodiment of the present application can be referred to as shown in FIG. 16, which will not be repeated here.
  • An embodiment of the present application also provides a chip, which includes a processor and a data interface, the processor reads instructions stored on the memory through the data interface, and executes the above-mentioned control strategy training method embodiment or the above-mentioned automatic lane changing method implementation
  • a chip which includes a processor and a data interface
  • the processor reads instructions stored on the memory through the data interface, and executes the above-mentioned control strategy training method embodiment or the above-mentioned automatic lane changing method implementation
  • the implementation principles and technical effects of the technical solutions in the example are similar, and will not be repeated here.
  • the chip may also include a memory in which instructions are stored, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the aforementioned control
  • the implementation principles and technical effects of the strategy training method embodiment or the technical solution in the foregoing automatic lane changing method embodiment are similar, and will not be repeated here.
  • An embodiment of the present application also provides an electronic device, which includes the automatic lane changing device provided in the foregoing automatic lane changing device embodiment.
  • An embodiment of the present application also provides an electronic device, which includes the control strategy training device provided in the foregoing control strategy training device embodiment.
  • the processor involved in the embodiments of the present application may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, and may implement or Perform the methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory involved in the embodiments of this application may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or a volatile memory (volatile memory), for example Random-access memory (random-access memory, RAM).
  • the memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the size of the sequence number of each process does not mean the order of execution.
  • the order of execution of each process should be determined by its function and internal logic.
  • the implementation process of the embodiments of this application should constitute any limitation.
  • all or part of it may be implemented by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

一种自动换道方法、装置及存储介质,该方法包括:根据自动驾驶车辆当前时刻的行驶信息以及自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算自动驾驶车辆在当前时刻的局部邻居特征以及全局统计特征(S1101);根据局部邻居特征、全局统计特征和当前控制策略获取目标动作指示(S1102),根据目标动作指示执行目标动作(S1103)。可见,通过在局部邻居特征的基础上,进一步引入全局统计特征输入当前控制策略获取目标动作指示,不仅考虑了局部的邻居障碍物的信息,还考虑了全局统计特征的宏观情况,因此,综合了局部和全部路面障碍物信息得到的目标动作是全局最优的策略动作。

Description

自动换道方法、装置及存储介质
本申请要求于2019年5月21日提交中国专利局、申请号为2019104262487、申请名称为“自动换道方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及自动驾驶技术领域,尤其涉及一种自动换道方法、装置及存储介质。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。
自动驾驶是人工智能领域的一种主流应用,自动驾驶技术依靠计算机视觉、雷达、监控装置和全球定位系统等协同合作,让机动车辆可以在不需要人类主动操作下,实现自动驾驶。自动驾驶的车辆使用各种计算系统来帮助将乘客从一个位置运输到另一位置。一些自动驾驶车辆可能要求来自操作者(诸如,领航员、驾驶员、或者乘客)的一些初始输入或者连续输入。自动驾驶车辆准许操作者从手动模操作式切换到自东驾驶模式或者介于两者之间的模式。由于自动驾驶技术无需人类来驾驶机动车辆,所以理论上能够有效避免人类的驾驶失误,减少交通事故的发生,且能够提高公路的运输效率。因此,自动驾驶技术越来越受到重视。在自动驾驶技术领域,自动驾驶车辆智能换道决策的设计还面临巨大的挑战。
相关技术中,通过深度网络来模拟当前状态和动作对应的动作价值函数Q;其中,动作价值函数Q的输入为以自动驾驶车辆为中心的一个局部相对语义网格,通过考虑距离自动驾驶车辆最近的邻居车的速度与距离信息,以及一些道路语义信息(例如自动驾驶车辆所在车道是加速车道或者是左转车道等),从而选择使动作价值函数Q值最高的动作作为当前决策动作。
但相关技术中,仅考虑了自车的局部邻居车信息,没有考虑整体车流的宏观情况,因此,生成的策略动作并非全局最优的策略动作。
发明内容
本申请实施例提供一种自动换道方法、装置及存储介质,解决了相关技术中生成的策 略动作并非全局最优策略动作的问题。
第一方面,本申请实施例提供一种自动换道方法,包括:
根据自动驾驶车辆当前时刻的行驶信息以及该自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算该自动驾驶车辆当前时刻的局部邻居特征以及全局统计特征;该局部邻居特征用于表示该自动驾驶车辆的特定的邻居障碍物相对于该自动驾驶车辆的运动状态信息;该全局统计特征用于表示该感知范围内各个车道的障碍物的稀疏与稠密程度;
根据该局部邻居特征、该全局统计特征和当前控制策略获取目标动作指示,该目标动作指示用于指示该自动驾驶车辆执行目标动作,该目标动作至少包括两类:换道或保持直行;
根据该目标动作指示执行该目标动作。
第一方面提供的自动换道方法中,通过根据自动驾驶车辆当前时刻的行驶信息以及自动驾驶车辆感知范围内各个车道障碍物的运动信息,计算自动驾驶车辆在当前时刻的局部邻居特征以及全局统计特征;进一步地,根据局部邻居特征、全局统计特征和当前控制策略获取目标动作指示,并根据目标动作指示执行目标动作。可见,通过在局部邻居特征的基础上,进一步引入全局统计特征输入当前控制策略获取目标动作指示,不仅考虑了局部的邻居障碍物(如他车)的信息,还考虑了全局统计特征(如整体车流)的宏观情况,因此,综合了局部和全部路面障碍物信息得到的目标动作是全局最优的策略动作。
在一种可能的实现方式中,该方法还包括:
通过执行该目标动作得到反馈信息,该反馈信息用于更新该当前控制策略;其中,该反馈信息包括该自动驾驶车辆执行该目标动作后的行驶信息,该自动驾驶车辆在下一时刻的行驶信息和该自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息;当该目标动作为换道时,该反馈信息还包括:执行该目标动作的时间与历史平均时间的比值,以及该自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况;其中,该历史平均时间包括该自动驾驶车辆在预设历史时间段内执行同类动作的平均时间;
根据该反馈信息更新该当前控制策略得到下一时刻的控制策略。
本实现方式中,通过执行目标动作得到反馈信息,并根据反馈信息更新当前控制策略得到下一时刻的控制策略,从而使得下一时刻可以根据该下一时刻的控制策略准确地确定下一时刻的目标动作。值得说明的是,在之后的时刻也可以持续的根据t时刻的反馈信息对t时刻的控制策略进行更新,得到t+1时刻的控制策略,使得生成目标动作的控制策略一直在自适应的持续更新优化中,从而保证每一个时刻都有其对应的最佳控制策略,为每一个时刻的目标动作的准确生成提供了保障。
在一种可能的实现方式中,根据该反馈信息更新该当前控制策略得到下一时刻的控制策略包括:
根据该反馈信息计算该目标动作对应的回报值,以及该自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征;
确定当前时刻的四元组信息;其中,该当前时刻的四元组信息对应当前时刻车况, 包括:该当前时刻的特征、该目标动作、该目标动作对应的回报值以及该下一时刻的特征,该当前时刻的特征包括该自动驾驶车辆在当前时刻的局部邻居特征和全局统计特征,该下一时刻的特征包括该自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征;
根据该当前时刻的四元组信息对该当前控制策略进行更新得到该下一时刻的控制策略。
在一种可能的实现方式中,当该目标动作为保持直行时,该根据该当前时刻的四元组信息对该当前控制策略进行更新得到该下一时刻的控制策略,包括:
根据该当前时刻的四元组信息,生成该四元组信息对应的目标值;
利用梯度下降法对包含该目标值的第一预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换该当前控制策略中的参数θ,得到该下一时刻的控制策略。
在一种可能的实现方式中,当该目标动作为换道时,该根据该当前时刻的四元组信息对该当前控制策略进行更新得到该下一时刻的控制策略,包括:
获取该当前时刻的延伸四元组信息,该当前时刻的延伸四元组信息对应当前时刻延伸车况,其中该当前时刻延伸车况是对该当前时刻车况进行对称规则和单调规则处理得到的,该对称规则是指以该自动驾驶车辆所在车道为轴,将该自动驾驶车辆所在车道的左右两侧所有车道上障碍物的位置进行对称变换;该单调规则是指将该换道的目标车道上的该自动驾驶车辆的前后邻居障碍物之间的距离增大,和/或,非目标车道上的该自动驾驶车辆的前后邻居障碍物之间的距离改变小于预设距离范围;
根据该当前时刻的四元组信息和该当前时刻的延伸四元组信息,对该当前控制策略进行更新得到该下一时刻的控制策略。
在一种可能的实现方式中,该根据该当前时刻的四元组信息和该当前时刻的延伸四元组信息,对该当前控制策略进行更新得到该下一时刻的控制策略,包括:
根据该当前时刻的四元组信息和该当前时刻的延伸四元组信息中的第i个四元组信息,生成该第i个四元组信息对应的目标值;其中,该i为取遍不大于n的正整数,n为该当前时刻的四元组信息和该当前时刻的延伸四元组信息中包括的四元组信息总数;
利用梯度下降法对包含该第i个四元组信息对应的目标值的第二预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换该当前控制策略中的参数θ,得到该下一时刻的控制策略。
在一种可能的实现方式中,当该目标动作为保持直行时,该根据该当前时刻的四元组信息对该当前控制策略进行更新得到该下一时刻的控制策略,包括:
根据该当前时刻的四元组信息、历史时刻的四元组信息和该历史时刻的延伸四元组信息对该当前控制策略进行更新得到该下一时刻的控制策略;
其中,该历史时刻的四元组信息对应历史时刻车况,包括:该历史时刻的特征、该历史时刻的目标动作、该历史时刻的目标动作对应的回报值以及该历史时刻的下一时刻的特征,该历史时刻的特征包括该自动驾驶车辆在历史时刻的局部邻居特征和全局统计特征,该历史时刻的下一时刻的特征包括该自动驾驶车辆在历史时刻的下一时刻的局部邻居 特征和全局统计特征;该历史时刻的延伸四元组信息对应历史时刻延伸车况,该历史时刻延伸车况是对该历史时刻车况进行对称规则和单调规则处理得到的。
在一种可能的实现方式中,该根据该当前时刻的四元组信息、历史时刻的四元组信息和该历史时刻的延伸四元组信息对该当前控制策略进行更新得到该下一时刻的控制策略,包括:
根据该当前时刻的四元组信息、该历史时刻的四元组信息和该历史时刻的延伸四元组信息中的第j个四元组信息,生成该第j个四元组信息对应的目标值;其中,该j为取遍不大于m的正整数,m为该当前时刻的四元组信息、该历史时刻的四元组信息和该历史时刻的延伸四元组信息中包括的四元组信息总数;
利用梯度下降法对包含该第j个四元组信息对应的目标值的第三预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换该当前控制策略中的参数θ,得到该下一时刻的控制策略。
在一种可能的实现方式中,当该目标动作为换道时,该根据该当前时刻的四元组信息对该当前控制策略进行更新得到该下一时刻的控制策略,包括:
获取该当前时刻的延伸四元组信息;其中,该当前时刻的延伸四元组信息对应当前时刻延伸车况,该当前时刻延伸车况是对该当前时刻车况进行对称规则和单调规则处理得到的;
根据该当前时刻的四元组信息、该当前时刻的延伸四元组信息、历史时刻的四元组信息和该历史时刻的延伸四元组信息,对该当前控制策略进行更新得到该下一时刻的控制策略;其中,该历史时刻的四元组信息对应历史时刻车况,该历史时刻的延伸四元组信息对应历史时刻延伸车况,该历史时刻延伸车况是对该历史时刻车况进行对称规则和单调规则处理得到的。
在一种可能的实现方式中,该根据该当前时刻的四元组信息、该当前时刻的延伸四元组信息、历史时刻的四元组信息和该历史时刻的延伸四元组信息,对该当前控制策略进行更新得到该下一时刻的控制策略,包括:
根据该当前时刻的四元组信息、该当前时刻的延伸四元组信息、该历史时刻的四元组信息和该历史时刻的延伸四元组信息中的第k个四元组信息,生成该第k个四元组信息对应的目标值;其中,该k为取遍不大于p的正整数,p为该当前时刻的四元组信息、该当前时刻的延伸四元组信息、该历史时刻的四元组信息和该历史时刻的延伸四元组信息中包括的四元组信息总数;
利用梯度下降法对包含该第k个四元组信息对应的目标值的第四预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换该当前控制策略中的参数θ,得到该下一时刻的控制策略。
在一种可能的实现方式中,当该目标动作为保持直行时,该根据该反馈信息计算该目标动作对应的回报值,以及该自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征,包括:
根据该自动驾驶车辆执行该目标动作后的行驶信息计算该回报值;
根据该自动驾驶车辆在下一时刻的行驶信息和该自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息,计算该自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征。
在一种可能的实现方式中,当该目标动作为换道时,该根据该反馈信息计算该目标动作对应的回报值,以及该自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征,包括:
根据该自动驾驶车辆执行该目标动作后的行驶信息、该执行该目标动作的时间与历史平均时间的比值,以及该自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况,计算该回报值;
根据该自动驾驶车辆在下一时刻的行驶信息和该自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息,计算该自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征。
在一种可能的实现方式中,该自动驾驶车辆的特定的邻居障碍物包括以下至少一项:该自动驾驶车辆所在车道上与该自动驾驶车辆相邻的前后障碍物、该自动驾驶车辆所在车道的相邻左车道上与该自动驾驶车辆相邻的前后障碍物、该自动驾驶车辆所在车道的相邻右车道上与该自动驾驶车辆相邻的前后障碍物;
其中,当该自动驾驶车辆位于左边道时,该自动驾驶车辆所在车道的相邻左车道上与该自动驾驶车辆相邻的前后障碍物,相对于该自动驾驶车辆的运动状态信息为默认值;和/或,
当该自动驾驶车辆位于右边道时,该自动驾驶车辆所在车道的相邻右车道上与该自动驾驶车辆相邻的前后障碍物,相对于该自动驾驶车辆的运动状态信息为默认值。
在一种可能的实现方式中,该自动驾驶车辆当前时刻的全局车流统计特征包括以下至少一项:该感知范围内各个车道所有障碍物的平均行驶速度以及平均间隔。
第二方面,本申请实施例提供一种自动换道装置,包括:
计算模块,用于根据自动驾驶车辆当前时刻的行驶信息以及该自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算该自动驾驶车辆当前时刻的局部邻居特征以及全局统计特征;该局部邻居特征用于表示该自动驾驶车辆的特定的邻居障碍物相对于该自动驾驶车辆的运动状态信息;该全局统计特征用于表示该感知范围内各个车道的障碍物的稀疏与稠密程度;
获取模块,用于根据该局部邻居特征、该全局统计特征和当前控制策略获取目标动作指示,该目标动作指示用于指示该自动驾驶车辆执行目标动作,该目标动作至少包括两类:换道或保持直行;
执行模块,用于根据该目标动作指示执行该目标动作。
在一种可能的实现方式中,该装置还包括:
反馈模块,用于通过执行该目标动作得到反馈信息,该反馈信息用于更新该当前控制策略;其中,该反馈信息包括该自动驾驶车辆执行该目标动作后的行驶信息,该自动驾驶车辆在下一时刻的行驶信息和该自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息;当该目标动作为换道时,该反馈信息还包括:执行该目标动作的时间与历史平均时间的比值,以及该自动驾驶车辆换道后所在车道与换道前所在车道 在稀疏与稠密程度上的变化情况;其中,该历史平均时间包括该自动驾驶车辆在预设历史时间段内执行同类动作的平均时间;
更新模块,用于根据该反馈信息更新该当前控制策略得到下一时刻的控制策略。
在一种可能的实现方式中,该更新模块包括:
计算单元,用于根据该反馈信息计算该目标动作对应的回报值,以及该自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征;
确定单元,用于确定当前时刻的四元组信息;其中,该当前时刻的四元组信息对应当前时刻车况,包括:该当前时刻的特征、该目标动作、该目标动作对应的回报值以及该下一时刻的特征,该当前时刻的特征包括该自动驾驶车辆在当前时刻的局部邻居特征和全局统计特征,该下一时刻的特征包括该自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征;
更新单元,用于根据该当前时刻的四元组信息对该当前控制策略进行更新得到该下一时刻的控制策略。
在一种可能的实现方式中,当该目标动作为保持直行时,该更新单元具体用于:
根据该当前时刻的四元组信息,生成该四元组信息对应的目标值;
利用梯度下降法对包含该目标值的第一预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换该当前控制策略中的参数θ,得到该下一时刻的控制策略。
在一种可能的实现方式中,当该目标动作为换道时,该更新单元具体用于:
获取该当前时刻的延伸四元组信息,该当前时刻的延伸四元组信息对应当前时刻延伸车况,其中该当前时刻延伸车况是对该当前时刻车况进行对称规则和单调规则处理得到的,该对称规则是指以该自动驾驶车辆所在车道为轴,将该自动驾驶车辆所在车道的左右两侧所有车道上障碍物的位置进行对称变换;该单调规则是指将该换道的目标车道上的该自动驾驶车辆的前后邻居障碍物之间的距离增大,和/或,非目标车道上的该自动驾驶车辆的前后邻居障碍物之间的距离改变小于预设距离范围;
根据该当前时刻的四元组信息和该当前时刻的延伸四元组信息,对该当前控制策略进行更新得到该下一时刻的控制策略。
在一种可能的实现方式中,该更新单元具体用于:
根据该当前时刻的四元组信息和该当前时刻的延伸四元组信息中的第i个四元组信息,生成该第i个四元组信息对应的目标值;其中,该i为取遍不大于n的正整数,n为该当前时刻的四元组信息和该当前时刻的延伸四元组信息中包括的四元组信息总数;
利用梯度下降法对包含该第i个四元组信息对应的目标值的第二预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换该当前控制策略中的参数θ,得到该下一时刻的控制策略。
在一种可能的实现方式中,当该目标动作为保持直行时,该更新单元具体用于:
根据该当前时刻的四元组信息、历史时刻的四元组信息和该历史时刻的延伸四元组信息对该当前控制策略进行更新得到该下一时刻的控制策略;
其中,该历史时刻的四元组信息对应历史时刻车况,包括:该历史时刻的特征、该 历史时刻的目标动作、该历史时刻的目标动作对应的回报值以及该历史时刻的下一时刻的特征,该历史时刻的特征包括该自动驾驶车辆在历史时刻的局部邻居特征和全局统计特征,该历史时刻的下一时刻的特征包括该自动驾驶车辆在历史时刻的下一时刻的局部邻居特征和全局统计特征;该历史时刻的延伸四元组信息对应历史时刻延伸车况,该历史时刻延伸车况是对该历史时刻车况进行对称规则和单调规则处理得到的。
在一种可能的实现方式中,该更新单元具体用于:
根据该当前时刻的四元组信息、该历史时刻的四元组信息和该历史时刻的延伸四元组信息中的第j个四元组信息,生成该第j个四元组信息对应的目标值;其中,该j为取遍不大于m的正整数,m为该当前时刻的四元组信息、该历史时刻的四元组信息和该历史时刻的延伸四元组信息中包括的四元组信息总数;
利用梯度下降法对包含该第j个四元组信息对应的目标值的第三预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换该当前控制策略中的参数θ,得到该下一时刻的控制策略。
在一种可能的实现方式中,当该目标动作为换道时,该更新单元具体用于:
获取该当前时刻的延伸四元组信息;其中,该当前时刻的延伸四元组信息对应当前时刻延伸车况,该当前时刻延伸车况是对该当前时刻车况进行对称规则和单调规则处理得到的;
根据该当前时刻的四元组信息、该当前时刻的延伸四元组信息、历史时刻的四元组信息和该历史时刻的延伸四元组信息,对该当前控制策略进行更新得到该下一时刻的控制策略;其中,该历史时刻的四元组信息对应历史时刻车况,该历史时刻的延伸四元组信息对应历史时刻延伸车况,该历史时刻延伸车况是对该历史时刻车况进行对称规则和单调规则处理得到的。
在一种可能的实现方式中,该根据更新单元具体用于:
根据该当前时刻的四元组信息、该当前时刻的延伸四元组信息、该历史时刻的四元组信息和该历史时刻的延伸四元组信息中的第k个四元组信息,生成该第k个四元组信息对应的目标值;其中,该k为取遍不大于p的正整数,p为该当前时刻的四元组信息、该当前时刻的延伸四元组信息、该历史时刻的四元组信息和该历史时刻的延伸四元组信息中包括的四元组信息总数;
利用梯度下降法对包含该第k个四元组信息对应的目标值的第四预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换该当前控制策略中的参数θ,得到该下一时刻的控制策略。
在一种可能的实现方式中,当该目标动作为保持直行时,该计算单元具体用于:
根据该自动驾驶车辆执行该目标动作后的行驶信息计算该回报值;
根据该自动驾驶车辆在下一时刻的行驶信息和该自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息,计算该自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征。
在一种可能的实现方式中,当该目标动作为换道时,该计算单元具体用于:
根据该自动驾驶车辆执行该目标动作后的行驶信息、该执行该目标动作的时间与历史平均时间的比值,以及该自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况,计算该回报值;
根据该自动驾驶车辆在下一时刻的行驶信息和该自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息,计算该自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征。
在一种可能的实现方式中,该自动驾驶车辆的特定的邻居障碍物包括以下至少一项:该自动驾驶车辆所在车道上与该自动驾驶车辆相邻的前后障碍物、该自动驾驶车辆所在车道的相邻左车道上与该自动驾驶车辆相邻的前后障碍物、该自动驾驶车辆所在车道的相邻右车道上与该自动驾驶车辆相邻的前后障碍物;
其中,当该自动驾驶车辆位于左边道时,该自动驾驶车辆所在车道的相邻左车道上与该自动驾驶车辆相邻的前后障碍物,相对于该自动驾驶车辆的运动状态信息为默认值;和/或,
当该自动驾驶车辆位于右边道时,该自动驾驶车辆所在车道的相邻右车道上与该自动驾驶车辆相邻的前后障碍物,相对于该自动驾驶车辆的运动状态信息为默认值。
在一种可能的实现方式中,该自动驾驶车辆当前时刻的全局车流统计特征包括以下至少一项:该感知范围内各个车道所有障碍物的平均行驶速度以及平均间隔。
第三方面,本申请实施例提供一种自动换道装置,包括:处理器和存储器;
其中,该存储器,用于存储程序指令;
该处理器,用于调用并执行该存储器中存储的程序指令,当该处理器执行该存储器存储的程序指令时,该自动换道装置用于执行上述第一方面的任意实现方式所述的方法。
第四方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当该指令在计算机上运行时,使得计算机执行上述第一方面的任意实现方式所述的方法。
第五方面,本申请实施例提供一种程序,该程序在被处理器执行时用于执行上述第一方面的任意实现方式所述的方法。
第六方面,本申请实施例提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面的任意实现方式所述的方法。
第七方面,本申请实施例提供一种控制策略的训练方法,包括:
步骤A:获取预设数量个历史时刻的四元组信息,其中,该历史时刻的四元组信息对应历史时刻车况,包括:该历史时刻的特征、该历史时刻的自动驾驶车辆的目标动作、该历史时刻的目标动作对应的回报值以及该历史时刻的下一时刻的特征,该历史时刻的特征包括该自动驾驶车辆在历史时刻的局部邻居特征和全局统计特征,该历史时刻的下一时刻的特征包括该自动驾驶车辆在该历史时刻的下一时刻的局部邻居特征和全局统计特征;
步骤B:根据至少一个第一历史时刻的四元组信息、该至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息,对当前控制策略进行更新得到下一时刻的控制策略;
其中,该步骤A和步骤B的循环执行次数达到预设次数,或者该步骤A和步骤B循环执 行多次直至更新后的控制策略满足预设条件时停止;该步骤A和步骤B循环执行多次最终得到的控制策略用于自动换道装置在执行自动换道方法时获取目标动作指示;
其中,该至少一个第一历史时刻的四元组信息为该预设数量个历史时刻的四元组信息中历史时刻的目标动作为换道所对应的历史时刻的四元组信息;该至少一个第二历史时刻的四元组信息为该预设数量个历史时刻的四元组信息中除该至少一个第一历史时刻的四元组信息之外的其它历史时刻的四元组信息;任意该第一历史时刻的延伸四元组信息对应第一历史时刻延伸车况,该第一历史时刻延伸车况是对第一历史时刻车况进行对称规则和单调规则处理得到的。
第七方面提供的控制策略的训练方法中,通过获取预设数量个历史时刻的四元组信息;进一步地,根据至少一个第一历史时刻的四元组信息、至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息,对当前控制策略进行更新得到下一时刻的控制策略。可见,通过在预设数量个历史时刻的四元组信息的基础上,进一步根据预设数量个历史时刻的四元组信息中的第一历史时刻的延伸四元组信息对当前控制策略进行更新,从而可以获得更加准确的控制策略,以便于可以准确地确定出对应的目标动作。
在一种可能的实现方式中,该根据至少一个第一历史时刻的四元组信息、该至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息,对当前控制策略进行更新得到下一时刻的控制策略,包括:
根据该至少一个第一历史时刻的四元组信息、该至少一个第一历史时刻的延伸四元组信息,以及该至少一个第二历史时刻的四元组信息中的第l个四元组信息,生成该第l个四元组信息对应的目标值;其中,该l为取遍不大于q的正整数,q为该至少一个第一历史时刻的四元组信息、该至少一个第一历史时刻的延伸四元组信息,以及该至少一个第二历史时刻的四元组信息中包括的四元组信息总数;
利用梯度下降法对包含该第l个四元组信息对应的目标值的预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换该当前控制策略中的参数θ,得到该下一时刻的控制策略。
在一种可能的实现方式中,该获取预设数量个历史时刻的四元组信息之前,还包括:
对于每个历史时刻,根据自动驾驶车辆的行驶信息以及该自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算该自动驾驶车辆在该历史时刻的局部邻居特征以及全局统计特征;
根据该历史时刻的局部邻居特征、全局统计特征和该历史时刻的控制策略获取该历史时刻的目标动作指示,该目标动作指示用于指示该自动驾驶车辆执行目标动作,该目标动作至少包括两类:换道或保持直行;
通过执行该目标动作得到反馈信息;其中,该反馈信息包括该自动驾驶车辆执行该目标动作后的行驶信息,该自动驾驶车辆在下一时刻的行驶信息和该自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息;当该目标动作为换道时,该反馈信息还包括:执行该目标动作的时间与历史平均时间的比值,以及该自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况;其中,该历史平均时间 包括该自动驾驶车辆在预设历史时间段内执行同类动作的平均时间;
根据该反馈信息计算该目标动作对应的回报值,以及该自动驾驶车辆在该历史时刻的下一时刻的局部邻居特征以及全局车流统计特征;
存储该历史时刻的四元组信息。
在一种可能的实现方式中,当该目标动作为保持直行时,该根据该反馈信息计算该目标动作对应的回报值,包括:
根据该自动驾驶车辆执行该目标动作后的行驶信息计算该回报值。
在一种可能的实现方式中,当该目标动作为换道时,该根据该反馈信息计算该目标动作对应的回报值,包括:
根据该自动驾驶车辆执行该目标动作后的行驶信息、该执行该目标动作的时间与历史平均时间的比值,以及该自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况,计算该回报值。
第八方面,本申请实施例提供一种控制策略的训练装置,包括:
第一获取模块,用于执行步骤A:获取预设数量个历史时刻的四元组信息,其中,该历史时刻的四元组信息对应历史时刻车况,包括:该历史时刻的特征、该历史时刻的自动驾驶车辆的目标动作、该历史时刻的目标动作对应的回报值以及该历史时刻的下一时刻的特征,该历史时刻的特征包括该该自动驾驶车辆在历史时刻的局部邻居特征和全局统计特征,该历史时刻的下一时刻的特征包括该自动驾驶车辆在该历史时刻的下一时刻的局部邻居特征和全局统计特征;
更新模块,用于执行步骤B:根据至少一个第一历史时刻的四元组信息、该至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息,对当前控制策略进行更新得到下一时刻的控制策略;
其中,该步骤A和步骤B的循环执行次数达到预设次数,或者该步骤A和步骤B循环执行多次直至更新后的控制策略满足预设条件时停止;该步骤A和步骤B循环执行多次最终得到的控制策略用于自动换道装置在执行自动换道方法时获取目标动作指示;
其中,该至少一个第一历史时刻的四元组信息为该预设数量个历史时刻的四元组信息中历史时刻的目标动作为换道所对应的历史时刻的四元组信息;该至少一个第二历史时刻的四元组信息为该预设数量个历史时刻的四元组信息中除该至少一个第一历史时刻的四元组信息之外的其它历史时刻的四元组信息;任意该第一历史时刻的延伸四元组信息对应第一历史时刻延伸车况,该第一历史时刻延伸车况是对第一历史时刻车况进行对称规则和单调规则处理得到的。
在一种可能的实现方式中,该更新模块,包括:
生成单元,用于根据该至少一个第一历史时刻的四元组信息、该至少一个第一历史时刻的延伸四元组信息,以及该至少一个第二历史时刻的四元组信息中的第l个四元组信息,生成该第l个四元组信息对应的目标值;其中,该l为取遍不大于q的正整数,q为该至少一个第一历史时刻的四元组信息、该至少一个第一历史时刻的延伸四元组信息,以及该至少一个第二历史时刻的四元组信息中包括的四元组信息总数;
更新单元,用于利用梯度下降法对包含该第l个四元组信息对应的目标值的预设函数中的参数θ进行迭代更新;
替换单元,用于将迭代更新后的参数θ替换该当前控制策略中的参数θ,得到该下一时刻的控制策略。
在一种可能的实现方式中,该装置还包括:
第一计算模块,用于对于每个历史时刻,根据自动驾驶车辆的行驶信息以及该自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算该自动驾驶车辆在该历史时刻的局部邻居特征以及全局统计特征;
第二获取模块,用于根据该历史时刻的局部邻居特征、全局统计特征和该历史时刻的控制策略获取该历史时刻的目标动作指示,该目标动作指示用于指示该自动驾驶车辆执行目标动作,该目标动作至少包括两类:换道或保持直行;
反馈模块,用于通过执行该目标动作得到反馈信息;其中,该反馈信息包括该自动驾驶车辆执行该目标动作后的行驶信息,该自动驾驶车辆在下一时刻的行驶信息和该自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息;当该目标动作为换道时,该反馈信息还包括:执行该目标动作的时间与历史平均时间的比值,以及该自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况;其中,该历史平均时间包括该自动驾驶车辆在预设历史时间段内执行同类动作的平均时间;
第二计算模块,用于根据该反馈信息计算该目标动作对应的回报值,以及该自动驾驶车辆在该历史时刻的下一时刻的局部邻居特征以及全局车流统计特征;
存储模块,用于存储该历史时刻的四元组信息。
在一种可能的实现方式中,当该目标动作为保持直行时,该第二计算模块具体用于:
根据该自动驾驶车辆执行该目标动作后的行驶信息计算该回报值。
在一种可能的实现方式中,当该目标动作为换道时,该第二计算模块具体用于:
根据该自动驾驶车辆执行该目标动作后的行驶信息、该执行该目标动作的时间与历史平均时间的比值,以及该自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况,计算该回报值。
第九方面,本申请实施例提供一种控制策略的训练装置,包括:处理器和存储器;
其中,该存储器,用于存储程序指令;
该处理器,用于调用并执行该存储器中存储的程序指令,当该处理器执行该存储器存储的程序指令时,该控制策略的训练装置用于执行上述第七方面的任意实现方式所述的方法。
第十方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当该指令在计算机上运行时,使得计算机执行上述第七方面的任意实现方式所述的方法。
第十一方面,本申请实施例提供一种程序,该程序在被处理器执行时用于执行上述第七方面的任意实现方式所述的方法。
第十二方面,本申请实施例提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第七方面的任意实现方式所述的方法。
第十三方面,本申请实施例提供一种芯片,该芯片包括处理器与数据接口,该处理器通过该数据接口读取存储器上存储的指令,执行上述第一方面或第七方面的任意实现 方式所述的方法。
可选地,作为一种实现方式,该芯片还可以包括存储器,该存储器中存储有指令,该处理器用于执行该存储器上存储的指令,当该指令被执行时,该处理器用于执行上述第一方面或第七方面的任意实现方式所述的方法。
第十四方面,本申请实施例提供一种电子设备,该电子设备包括上述第二方面或第三方面的任意实现方式所述的自动换道装置。
第十五方面,本申请实施例提供一种电子设备,该电子设备包括上述第二方面或第三方面的任意实现方式所述的控制策略的训练装置。
附图说明
图1为本申请实施例提供的系统架构示意图;
图2是本申请实施例提供的车辆100的功能框图;
图3为图2中的计算机系统的结构示意图;
图4为本申请实施例提供的一种芯片硬件结构的示意图;
图5为本申请实施例提供的操作环境示意图;
图6为本申请实施例提供的对称原则示意图一;
图7为本申请实施例提供的对称原则示意图二;
图8为本申请实施例提供的单调原则示意图;
图9为本申请一实施例提供的控制策略的训练方法的流程示意图;
图10为本申请另一实施例提供的控制策略的训练方法的流程示意图;
图11为本申请一实施例提供的自动换道方法的流程示意图;
图12为本申请另一实施例提供的自动换道方法的流程示意图;
图13为本申请实施例提供的训练数据示意图;
图14为本申请一实施例提供的自动换道装置的结构示意图;
图15为本申请另一实施例提供的自动换道装置的结构示意图;
图16为本申请实施例提供的计算机程序产品的概念性局部视图;
图17为本申请一实施例提供的控制策略的训练装置的结构示意图;
图18为本申请另一实施例提供的控制策略的训练装置的结构示意图。
具体实施方式
首先,对本申请实施例所涉及的应用场景和部分术语进行解释说明。
本申请实施例提供的自动换道方法、装置及存储介质能够应用在自动驾驶车辆的换道场景。示例性地,本申请实施例提供的自动换道方法、装置及存储介质能够应用在A场景和B场景中,下面分别对A场景和B场景进行简单的介绍。
A场景:
自动驾驶车辆在多车道的行驶过程中,为了提高行驶速度,需要在适当的时机发送“直行”或者“换道”的命令。例如自动驾驶车辆所在车道的前方有车辆低速行驶,自动驾驶车辆需要绕过前方的所述车辆行驶。
B场景:
自动驾驶车辆在多车道的行驶过程中,如果自动驾驶车辆所在车道的前方有匝道或路口等道路结构,自动驾驶车辆需要在达到匝道或路口之前换到相应的目标车道,以便完成行驶任务。例如自动驾驶车辆在道路的最左边的车道行驶,前方500米是个十字路口,则自动驾驶车辆为了到达目的地需要在十字路口处右拐,需要在到达所述十字路口之前换到最右边的车道。
当然,本申请实施例提供的自动换道方法、装置及存储介质还可应用在其它场景,本申请实施例中对此并不作限制。
图1为本申请实施例提供的系统架构示意图。如图1所示,本申请实施例提供的系统架构1000可以包括:训练设备1001和执行设备1002。其中,训练设备1001用于根据本申请实施例提供的控制策略的训练方法训练控制策略;执行设备1002用于根据本申请实施例提供的自动换道方法使用训练设备1001所训练的控制策略确定目标动作;当然,执行设备1002也可以用于实时训练控制策略,或者每隔预设时长训练控制策略。
本申请实施例中,执行控制策略的训练方法的执行主体可以是上述训练设备1001,也可以是上述训练设备1001中的控制策略的训练装置。示例性地,本申请实施例提供的控制策略的训练装置可以通过软件和/或硬件实现。
本申请实施例中,执行自动换道方法的执行主体可以是上述执行设备1002,也可以是上述执行设备1002中的自动换道装置。示例性地,本申请实施例提供的自动换道装置可以通过软件和/或硬件实现。
示例性地,本申请实施例中提供的训练设备1001可以包括但不限于:模型训练平台设备。
示例性地,本申请实施例中提供的执行设备1002可以包括但不限于:自动驾驶车辆,或者自动驾驶车辆中的控制设备。
图2是本申请实施例提供的车辆100的功能框图。在一个实施例中,将车辆100配置为完全或部分地自动驾驶模式。例如,当车辆100配置为部分地自动驾驶模式时,车辆100在处于自动驾驶模式时还可通过人为操作来确定车辆及其周边环境的当前状态,确定周边环境中的至少一个其他车辆的可能行为,并确定该其他车辆执行可能行为的可能性相对应的置信水平,基于所确定的信息来控制车辆100。在车辆100处于自动驾驶模式中时,可以将车辆100置为在没有和人交互的情况下操作。
车辆100可包括各种子系统,例如行进系统102、传感器系统104、控制系统106、一个或多个外围设备108以及电源110、计算机系统112和用户接口116。可选地,车辆100可包括更多或更少的子系统,并且每个子系统可包括多个元件。另外,车辆100的每个子系统和元件可以通过有线或者无线互连。
行进系统102可包括为车辆100提供动力运动的组件。在一个实施例中,行进系统102可包括引擎118、能量源119、传动装置120和车轮/轮胎121。引擎118可以是内燃引擎、电动机、空气压缩引擎或其他类型的引擎组合,例如汽油发动机和电动机组成的混动引擎,内燃引擎和空气压缩引擎组成的混动引擎。引擎118将能量源119转换成机械能量。
能量源119的示例包括汽油、柴油、其他基于石油的燃料、丙烷、其他基于压缩气体的燃料、乙醇、太阳能电池板、电池和其他电力来源。能量源119也可以为车辆100的其 他系统提供能量。
传动装置120可以将来自引擎118的机械动力传送到车轮121。传动装置120可包括变速箱、差速器和驱动轴。在一个实施例中,传动装置120还可以包括其他器件,比如离合器。其中,驱动轴可包括可耦合到一个或多个车轮121的一个或多个轴。
传感器系统104可包括感测关于车辆100周边的环境的信息的若干个传感器。例如,传感器系统104可包括定位系统122(定位系统可以是GPS系统,也可以是北斗系统或者其他定位系统)、惯性测量单元(inertial measurement unit,IMU)124、雷达126、激光测距仪128以及相机130。传感器系统104还可包括被监视车辆100的内部系统的传感器(例如,车内空气质量监测器、燃油量表、机油温度表等)。来自这些传感器中的一个或多个的传感器数据可用于检测对象及其相应特性(位置、形状、方向、速度等)。这种检测和识别是自主车辆100的安全操作的关键功能。
定位系统122可用于估计车辆100的地理位置。IMU 124用于基于惯性加速度来感测车辆100的位置和朝向变化。在一个实施例中,IMU 124可以是加速度计和陀螺仪的组合。
雷达126可利用无线电信号来感测车辆100的周边环境内的物体。在一些实施例中,除了感测物体以外,雷达126还可用于感测物体的速度和/或前进方向。
激光测距仪128可利用激光来感测车辆100所位于的环境中的物体。在一些实施例中,激光测距仪128可包括一个或多个激光源、激光扫描器以及一个或多个检测器,以及其他系统组件。
相机130可用于捕捉车辆100的周边环境的多个图像。相机130可以是静态相机或视频相机。
控制系统106为控制车辆100及其组件的操作。控制系统106可包括各种元件,其中包括转向系统132、油门134、制动单元136、传感器融合算法138、计算机视觉系统140、路线控制系统142以及障碍物避免系统144。
转向系统132可操作来调整车辆100的前进方向。例如在一个实施例中可以为方向盘系统。
油门134用于控制引擎118的操作速度并进而控制车辆100的速度。
制动单元136用于控制车辆100减速。制动单元136可使用摩擦力来减慢车轮121。在其他实施例中,制动单元136可将车轮121的动能转换为电流。制动单元136也可采取其他形式来减慢车轮121转速从而控制车辆100的速度。
计算机视觉系统140可以操作来处理和分析由相机130捕捉的图像以便识别车辆100周边环境中的物体和/或特征。所述物体和/或特征可包括交通信号、道路边界和障碍物。计算机视觉系统140可使用物体识别算法、运动中恢复结构(structure from motion,SFM)算法、视频跟踪和其他计算机视觉技术。在一些实施例中,计算机视觉系统140可以用于为环境绘制地图、跟踪物体、估计物体的速度等等。
路线控制系统142用于确定车辆100的行驶路线。在一些实施例中,路线控制系统142可结合来自传感器138、全球定位系统(global positioning system,GPS)122和一个或多个预定地图的数据以为车辆100确定行驶路线。
障碍物规避系统144用于识别、评估和避开或者以其他方式越过车辆100的环境中的潜在障碍物。
当然,在一个实例中,控制系统106可以增加或替换地包括除了所示出和描述的那些以外的组件。或者也可以减少一部分上述示出的组件。
车辆100通过外围设备108与外部传感器、其他车辆、其他计算机系统或用户之间进行交互。外围设备108可包括无线通信系统146、车载电脑148、麦克风150和/或扬声器152。
在一些实施例中,外围设备108提供车辆100的用户与用户接口116交互的手段。例如,车载电脑148可向车辆100的用户提供信息。用户接口116还可操作车载电脑148来接收用户的输入。车载电脑148可以通过触摸屏进行操作。在其他情况中,外围设备108可提供用于车辆100与位于车内的其它设备通信的手段。例如,麦克风150可从车辆100的用户接收音频(例如,语音命令或其他音频输入)。类似地,扬声器152可向车辆100的用户输出音频。
无线通信系统146可以直接地或者经由通信网络来与一个或多个设备无线通信。例如,无线通信系统146可使用3G蜂窝通信,例如码分多址(code division multiple access,CDMA)、EVD0、全球移动通信系统(global system for mobile communications,GSM)/通用分组无线服务(general packet radio service,GPRS),或者4G蜂窝通信,例如LTE。或者5G蜂窝通信。无线通信系统146可利用无线保真(wireless-fidelity,WiFi)与无线局域网(wireless local area network,WLAN)通信。在一些实施例中,无线通信系统146可利用红外链路、蓝牙或紫蜂协议(ZigBee)与设备直接通信。其他无线协议,例如各种车辆通信系统,例如,无线通信系统146可包括一个或多个专用短程通信(dedicated short range communications,DSRC)设备,这些设备可包括车辆和/或路边台站之间的公共和/或私有数据通信。
电源110可向车辆100的各种组件提供电力。在一个实施例中,电源110可以为可再充电锂离子或铅酸电池。这种电池的一个或多个电池组可被配置为电源为车辆100的各种组件提供电力。在一些实施例中,电源110和能量源119可一起实现,例如一些全电动车中那样。
车辆100的部分或所有功能受计算机系统112控制。计算机系统112可包括至少一个处理器113,处理器113执行存储在例如数据存储装置114这样的非暂态计算机可读介质中的指令115。计算机系统112还可以是采用分布式方式控制车辆100的个体组件或子系统的多个计算设备。
处理器113可以是任何常规的处理器,诸如商业可获得的中央处理器(central processing unit,CPU)。替选地,该处理器可以是诸如用于供专门应用的集成电路(application specific integrated circuit,ASIC)或其它基于硬件的处理器的专用设备。尽管图1B功能性地图示了处理器、存储器、和在相同块中的计算机系统112的其它元件,但是本领域的普通技术人员应该理解该处理器、计算机、或存储器实际上可以包括可以或者可以不存储在相同的物理外壳内的多个处理器、计算机、或存储器。例如,存储器可以是硬盘驱动器或位于不同于计算机的外壳内的其它存储介质。因此,对处理器或计算机的引用将被理解为包括对可以或者可以不并行操作的处理器或计算机或存储器的集合的引用。不同于使用单一的处理器来执行此处所描述的步骤,诸如转向组件和减速组件的一些组件每个都可以具有其自己的处理器,所述处理器只执行与特定于组件的功能相关的计算。
在此处所描述的各个方面中,处理器可以位于远离该车辆并且与该车辆进行无线通信。在其它方面中,此处所描述的过程中的一些在布置于车辆内的处理器上执行而其它则由远程处理器执行,包括采取执行单一操纵的必要步骤。
在一些实施例中,数据存储装置114可包含指令115(例如,程序逻辑),指令115可被处理器113执行来执行车辆100的各种功能,包括以上描述的那些功能。数据存储装置114也可包含额外的指令,包括向推进系统102、传感器系统104、控制系统106和外围设备108中的一个或多个发送数据、从其接收数据、与其交互和/或对其进行控制的指令。
除了指令115以外,数据存储装置114还可存储数据,例如道路地图、路线信息,车辆的位置、方向、速度以及其它这样的车辆数据,以及其他信息。这种信息可在车辆100在自主、半自主和/或手动模式中操作期间被车辆100和计算机系统112使用。
用户接口116,用于向车辆100的用户提供信息或从其接收信息。可选地,用户接口116可包括在外围设备108的集合内的一个或多个输入/输出设备,例如无线通信系统146、车车在电脑148、麦克风150和扬声器152。
计算机系统112可基于从各种子系统(例如,行进系统102、传感器系统104和控制系统106)以及从用户接口116接收的输入来控制车辆100的功能。例如,计算机系统112可利用来自控制系统106的输入以便控制转向单元132来避免由传感器系统104和障碍物避免系统144检测到的障碍物。在一些实施例中,计算机系统112可操作来对车辆100及其子系统的许多方面提供控制。
可选地,上述这些组件中的一个或多个可与车辆100分开安装或关联。例如,数据存储装置114可以部分或完全地与车辆100分开存在。上述组件可以按有线和/或无线方式来通信地耦合在一起。
可选地,上述组件只是一个示例,实际应用中,上述各个模块中的组件有可能根据实际需要增添或者删除,图2不应理解为对本申请实施例的限制。
在道路行进的自动驾驶汽车,如上面的车辆100,可以识别其周围环境内的物体以确定自身对当前速度的调整。所述物体可以是其它车辆、交通控制设备、或者其它类型的物体。在一些示例中,可以独立地考虑每个识别的障碍物,并且基于各个障碍物各自的特性,诸如它的当前速度、加速度、与车辆的间距等,来确定自动驾驶汽车(自车)所要调整的速度。
可选地,自动驾驶汽车车辆100或者与自动驾驶汽车车辆100相关联的计算设备(如图2的计算机系统112、计算机视觉系统140、数据存储装置114)可以基于所识别的障碍物的特性和周围环境的状态(例如,交通、雨、道路上的冰、等等)来预测所述识别的障碍物的行为。可选地,每一个所识别的障碍物都依赖于彼此的行为,因此还可以将所识别的所有障碍物全部一起考虑来预测单个识别的障碍物的行为。车辆100能够基于预测的所述识别的障碍物的行为来调整它的速度。换句话说,自动驾驶汽车能够基于所预测的障碍物的行为来确定车辆将需要调整到(例如,加速、减速、或者停止)什么状态。在这个过程中,也可以考虑其它因素来确定车辆100的速度,诸如,车辆100在行驶的道路中的横向位置、道路的曲率、静态和动态物体的接近度等等。
除了提供调整自动驾驶汽车的速度的指令之外,计算设备还可以提供修改车辆100的转向角的指令,以使得自动驾驶汽车遵循给定的轨迹和/或维持与自动驾驶汽车附近的障碍 物(例如,道路上的相邻车道中的车辆)的安全横向和纵向距离。
上述车辆100可以为轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、割草机、娱乐车、游乐场车辆、施工设备、电车、高尔夫球车、火车、和手推车等,本发明实施例不做特别的限定。
图3为图2中的计算机系统112的结构示意图。如图3所示,计算机系统112包括处理器113,处理器113和系统总线105耦合。处理器113可以是一个或者多个处理器,其中每个处理器都可以包括一个或多个处理器核。显示适配器(video adapter)107,显示适配器107可以驱动显示器109,显示器109和系统总线105耦合。系统总线105通过总线桥111和输入输出(I/O)总线耦合。I/O接口115和I/O总线耦合。I/O接口115和多种I/O设备进行通信,比如输入设备117(如:键盘,鼠标,触摸屏等),多媒体盘(media tray)121,(例如,CD-ROM,多媒体接口等)。收发器123(可以发送和/或接受无线电通信信号),摄像头155(可以捕捉静态和动态数字视频图像)和外部USB接口125。其中,可选地,和I/O接口115相连接的接口可以是通用串行总线(universal serial bus,USB)接口。
其中,处理器113可以是任何传统处理器,包括精简指令集计算(“RISC”)处理器、复杂指令集计算(“CISC”)处理器或上述的组合。可选地,处理器可以是诸如专用集成电路(“ASIC”)的专用装置。可选地,处理器113可以是神经网络处理器或者是神经网络处理器和上述传统处理器的组合。
可选地,在本文所述的各种实施例中,计算机系统可位于远离自动驾驶车辆的地方,并且可与自动驾驶车辆无线通信。在其它方面,本文所述的一些过程在设置在自动驾驶车辆内的处理器上执行,其它由远程处理器执行,包括采取执行单个操纵所需的动作。
计算机系统112可以通过网络接口129和软件部署服务器149通信。网络接口129是硬件网络接口,比如,网卡。网络127可以是外部网络,比如因特网,也可以是内部网络,比如以太网或者虚拟私人网络(VPN)。可选地,网络127还可以是无线网络,比如WiFi网络,蜂窝网络等。
硬盘驱动接口131和系统总线105耦合。硬盘驱动接口131和硬盘驱动器133相连接。系统内存135和系统总线105耦合。运行在系统内存135的软件可以包括计算机系统112的操作系统(operating system,OS)137和应用程序143。
操作系统包括Shell 139和内核(kernel)141。Shell 139是介于使用者和操作系统之内核(kernel)间的一个接口。shell是操作系统最外面的一层。shell管理使用者与操作系统之间的交互:等待使用者的输入,向操作系统解释使用者的输入,并且处理各种各样的操作系统的输出结果。
内核141由操作系统中用于管理存储器、文件、外设和系统资源的那些部分组成。直接与硬件交互,操作系统的内核141通常运行进程,并提供进程间的通信,提供CPU时间片管理、中断、内存管理、IO管理等等。
应用程序141包括控制汽车自动驾驶相关的程序,比如,管理自动驾驶的汽车和路上障碍物交互的程序,控制自动驾驶汽车路线或者速度的程序,控制自动驾驶汽车和路上其他自动驾驶汽车交互的程序。应用程序141也存在于软件部署服务器(deploying server)149的系统上。在一个实施例中,在需要执行应用程序141时,计算机系统可以从deploying server149下载应用程序143。
传感器153和计算机系统关联。传感器153用于探测计算机系统112周围的环境。举例来说,传感器153可以探测动物,汽车,障碍物和人行横道等,进一步传感器还可以探测上述动物,汽车,障碍物和人行横道等物体周围的环境,比如:动物周围的环境,例如,动物周围出现的其他动物,天气条件,周围环境的光亮度等。可选地,如果计算机系统112位于自动驾驶的汽车上,传感器可以是摄像头,红外线感应器,化学检测器,麦克风等。
图4为本申请实施例提供的一种芯片硬件结构的示意图。如图4所示,该芯片可以包括神经网络处理器30。该芯片可以被设置在如图1所示的执行设备1002中,用以完成申请实施例提供的自动换道方法。该芯片也可以被设置在如图1所示的训练设备1001中,用以完成申请实施例提供的控制策略的训练方法。
神经网络处理器30可以是NPU,TPU,或者GPU等一切适合用于大规模异或运算处理的处理器。以NPU为例:NPU可以作为协处理器挂载到主CPU(host CPU)上,由主CPU为其分配任务。NPU的核心部分为运算电路303,通过控制器304控制运算电路303提取存储器(301和302)中的矩阵数据并进行乘加运算。
在一些实现中,运算电路303内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路303是二维脉动阵列。运算电路303还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路303是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路303从权重存储器302中取矩阵B的权重数据,并缓存在运算电路303中的每一个PE上。运算电路303从输入存储器301中取矩阵A的输入数据,根据矩阵A的输入数据与矩阵B的权重数据进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)308中。
统一存储器306用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)305,被搬运到权重存储器302中。输入数据也通过DMAC被搬运到统一存储器306中。
总线接口单元(bus interface unit,BIU)310,用于DMAC和取指存储器(instruction fetch buffer)309的交互;总线接口单元301还用于取指存储器309从外部存储器获取指令;总线接口单元301还用于存储单元访问控制器305从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器306中,或将权重数据搬运到权重存储器302中,或将输入数据搬运到输入存储器301中。
向量计算单元307多个运算处理单元,在需要的情况下,对运算电路303的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。向量计算单元307主要用于神经网络中非卷积层,或全连接层(fully connected layers,FC)的计算,具体可以处理:Pooling(池化),Normalization(归一化)等的计算。例如,向量计算单元307可以将非线性函数应用到运算电路303的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元307生成归一化的值、合并值,或二者均有。
在一些实现中,向量计算单元307将经处理的向量存储到统一存储器306。在一些实现中,经向量计算单元307处理过的向量能够用作运算电路303的激活输入。
控制器304连接的取指存储器(instruction fetch buffer)309,用于存储控制器304使 用的指令;
统一存储器306,输入存储器301,权重存储器302以及取指存储器309均为On-Chip存储器。外部存储器独立于该NPU硬件架构。
图5为本申请实施例提供的操作环境示意图。如图5所示,云服务中心可以经网络502(如无线通信网络),从其操作环境500内的自动驾驶车辆510和512接收信息(诸如车辆传感器收集到数据或者其它信息)。
示例性地,云服务中心520可以经网络502(如无线通信网络)从自动驾驶车辆510接收自动驾驶车辆510在任意时刻的行驶信息(例如行驶速度和/或行驶位置等信息)以及自动驾驶车辆510感知范围内其他车辆的行驶信息等。
云服务中心520根据接收到的信息,可以运行其存储的控制汽车自动驾驶相关的程序,从而实现对自动驾驶车辆510和512的控制。控制汽车自动驾驶相关的程序可以为,管理自动驾驶的汽车和路上障碍物交互的程序,控制自动驾驶汽车路线或者速度的程序,控制自动驾驶汽车和路上其他自动驾驶汽车交互的程序。
网络502将地图的部分提供给自动驾驶车辆510和512。
例如,多个云服务中心可以接收、证实、组合和/或发送信息报告。在一些示例中还可以在自动驾驶车辆之间发送信息报告和/传感器数据。
在一些示例中,云服务中心520可以向自动驾驶车辆(或自动驾驶汽车)发送对于基于环境内可能的驾驶情况所建议的解决方案(如,告知前方障碍物,并告知如何绕开它)。例如,云服务中心520可以辅助车辆确定当面对环境内的特定障碍时如何行进。云服务中心520可以向自动驾驶车辆发送指示该车辆应当在给定场景中如何行进的响应。例如,云服务中心基于收集到的传感器数据,可以确认道路前方具有临时停车标志的存在,并还该车道上基于“车道封闭”标志和施工车辆的传感器数据,确定该车道由于施上而被封闭。相应地,云服务中心520可以发送用于自动驾驶车辆通过障碍的建议操作模式(例如:指示车辆变道另一条道路上)。云服务中心520可以观察其操作环境内的视频流并且已确认自动驾驶车辆能安全并成功地穿过障碍时,对该自动驾驶车辆所使用操作步骤可以被添加到驾驶信息地图中。相应地,这一信息可以发送到该区域内可能遇到相同障碍的其它车辆,以便辅助其它车辆不仅识别出封闭的车道还知道如何通过。
需要说明的是,自动驾驶车辆510和/或512在运行过程中可以自主控制行驶,也可以不需要云服务中心520的控制。
本申请实施例中涉及的任意时刻的局部邻居特征用于表示自动驾驶车辆的特定的邻居障碍物相对于自动驾驶车辆的运动状态信息(例如相对距离和相对速度)。
示例性地,特定的邻居障碍物可以包括但不限于以下至少一项:自动驾驶车辆所在车道上与自动驾驶车辆相邻的前后障碍物、自动驾驶车辆所在车道的相邻左车道上与自动驾驶车辆相邻的前后障碍物、自动驾驶车辆所在车道的相邻右车道上与自动驾驶车辆相邻的前后障碍物。
示例性地,当自动驾驶车辆位于左边道时,自动驾驶车辆所在车道的相邻左车道上与自动驾驶车辆相邻的前后障碍物,相对于自动驾驶车辆的运动状态信息可以为默认值;和/或,当自动驾驶车辆位于右边道时,自动驾驶车辆所在车道的相邻右车道上与 自动驾驶车辆相邻的前后障碍物,相对于自动驾驶车辆的运动状态信息可以为默认值。
本申请实施例中涉及的障碍物可以是动态移动的障碍物也可以是静态的障碍物。例如,障碍物可以包括但不限于以下至少一项:自动驾驶车辆、非自动驾驶的机动车辆、人、物体。示例性地,当特定的邻居障碍物为静态的障碍物时,则特定的邻居障碍物相对于自动驾驶车辆的相对距离可以为该邻居障碍物与自动驾驶车辆之间的距离,特定的邻居障碍物相对于自动驾驶车辆的相对速度可以为自动驾驶车辆的移动速度。
示例性地,任意时刻的自动驾驶车辆的局部邻居特征可以包括但不限于:自动驾驶车辆所在车道上与自动驾驶车辆相邻的前障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000001
和相对距离
Figure PCTCN2020090234-appb-000002
自动驾驶车辆所在车道上与自动驾驶车辆相邻的后障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000003
和相对距离
Figure PCTCN2020090234-appb-000004
自动驾驶车辆所在车道的相邻左车道上与自动驾驶车辆相邻的前障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000005
和相对距离
Figure PCTCN2020090234-appb-000006
自动驾驶车辆所在车道的相邻左车道上与自动驾驶车辆相邻的后障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000007
和相对距离
Figure PCTCN2020090234-appb-000008
自动驾驶车辆所在车道的相邻右车道上与自动驾驶车辆相邻的前障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000009
和相对距离
Figure PCTCN2020090234-appb-000010
自动驾驶车辆所在车道的相邻右车道上与自动驾驶车辆相邻的后障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000011
和相对距离
Figure PCTCN2020090234-appb-000012
可选地,任意时刻的自动驾驶车辆的局部邻居特征还可以包括:导航目标车道与自动驾驶车辆所在车道之间的位置信息flag,以及自动驾驶车辆距离沿行驶方向的下一个路口之间的距离dist2goal;其中,flag∈{0,-1,1},其中,flag等于0代表自动驾驶车辆在导航目标车道上,flag等于-1代表导航目标车道在自动驾驶车辆所在车道的左侧,flag等于1代表导航目标车道在自动驾驶车辆所在车道的右侧。
本申请实施例中涉及的任意时刻的全局统计特征用于表示感知范围(即自动驾驶车辆的传感器可检测的范围,例如距离自动驾驶车辆的预设间隔内的范围)内各个车道的障碍物的稀疏与稠密程度。
示例性地,任意时刻的全局统计特征可以包括但不限于以下至少一项:感知范围内各个车道所有障碍物的平均行驶速度以及平均间隔。例如,若某个车道所有障碍物的平均间隔小于预设间隔,则表示该车道的障碍物比较稠密;若某个车道所有障碍物的平均间隔大于或等于预设间隔,则表示该车道的障碍物比较稀疏。例如,任意时刻的全局统计特征可以包括但不限于:自动驾驶车辆所在车道的左侧所有车道上前后相邻的两障碍物之间的平均间隔gap L、自动驾驶车辆所在车道上前后相邻的两障碍物之间的平均间隔gap M、自动驾驶车辆所在车道的右侧所有车道上前后相邻的两障碍物之间的平均间隔gap R、自动驾驶车辆所在车道的左侧所有车道上障碍物的平均行驶速度V L、自动驾驶车辆所在车道上障碍物的平均行驶速度V M和自动驾驶车辆所在车道的右侧所有车道上障碍物的平均行驶速度V R
可选地,本申请实施例中涉及的任意时刻的自动驾驶车辆的局部邻居特征和全局统计特征可以为离散化处理后的特征,可以满足低速稠密场景离散粒度小,高速稀疏场景离散粒度大,例如:
1)当自动驾驶车辆的车速V ego≤V threshold(如20公里/小时),则局部相对距离特征精度 为0.01,局部相对速度特征精度为0.05。例如,如果某个局部相对距离特征为0.1123,则离散化后为0.11;如果某个局部相对速度特征为0.276,则离散化为0.25。
2)当自动驾驶车辆的车速V ego>V threshold,则局部相对距离特征精度为0.05,局部相对速度特征精度为0.1。
3)平均间隔特征精度统一为0.01,平均速度特征精度统一为0.01。
本申请实施例中涉及的目标动作指示用于指示自动驾驶车辆执行目标动作。示例性地,目标动作可以至少包括但不限于以下两类:换道或保持直行,其中,换道可以包括:向左相邻车道换道或向右相邻车道换道。
本申请实施例中涉及的任意时刻的四元组信息(s,a,r,s')对应该时刻车况,可以包括:该时刻的特征s、该时刻的自动驾驶车辆的目标动作a、该时刻的目标动作对应的回报值r以及该时刻的下一时刻的特征s';其中,该时刻的特征s可以包括:自动驾驶车辆在该时刻的局部邻居特征s l和全局统计特征s g,该时刻的下一时刻的特征s'可以包括:自动驾驶车辆在该下一时刻的局部邻居特征s l'和全局统计特征s g'。
图6为本申请实施例提供的对称原则示意图一,图7为本申请实施例提供的对称原则示意图二,如图6和图7所示,本申请实施例中涉及的对称规则是指以自动驾驶车辆所在车道为轴,将自动驾驶车辆所在车道的左右两侧所有车道上障碍物的位置进行对称变换。
图8为本申请实施例提供的单调原则示意图,如图8所示,本申请实施例中涉及的单调原则是指将自动驾驶车辆换道的目标车道上的自动驾驶车辆的前后邻居障碍物之间的距离增大,和/或,非目标车道上的自动驾驶车辆的前后邻居障碍物之间的距离改变小于预设距离范围。例如,自动驾驶车辆换道的目标车道上的后邻车为A、目标车道上的前邻车为D、非目标车道上的前邻车为B、非目标车道上的后邻车为C,则单调原则可以包括但不限于以下操作:
操作一:车辆A向后挪动预设距离1或者速度减小预设数值1;
操作二:车辆D向前挪动预设距离2或者速度增大预设数值2;
操作三:车辆B向前或者向后挪动预设距离3,或者速度增大或减小预设数值3;
操作四:车辆C向前或者向后挪动预设距离4,或者速度增大或减小预设数值4。
本申请实施例中涉及的任意时刻的延伸四元组信对应该时刻延伸车况,该时刻延伸车况是对该时刻车况进行对称规则和单调规则处理得到的。
示例性地,任意时刻的延伸四元组信息可以包括:该时刻的对称四元组信息(s e,a e,r,s e')和单调四元组信息(s m,a m,r,s m');其中,该时刻的对称四元组信息(s e,a e,r,s e')是根据对称规则对该时刻的四元组信息(s,a,r,s')进行构造得到的,s e是s的对称特征,a e是a的对称动作,s e'是s'的对称特征;该时刻的单调四元组信息(s m,a m,r,s m')是根据单调规则对该时刻的四元组信息(s,a,r,s')进行构造得到的,s m是s的单调特征,a m是a的单调动作(示例性地,a m可以等于a),s m'是s'的单调特征。
本申请实施例中的下述部分分别对任意时刻的对称四元组信息(s e,a e,r,s e')和单调四元组信息(s m,a m,r,s m')的构造方式进行介绍。
例如,1)假设任意时刻的四元组信息(s,a,r,s')中的s l等于如下式子:
Figure PCTCN2020090234-appb-000013
则根据对称原则确定该时刻的对称四元组信息(s e,a e,r,s e')中的s el等于如下式子:
Figure PCTCN2020090234-appb-000014
2)假设任意时刻的四元组信息(s,a,r,s')中的s g等于如下式子:
s g=(gap L,gap M,gap R,V L,V M,V R)
则根据对称原则确定该时刻的对称四元组信息(s e,a e,r,s e')中的s eg等于如下式子:
s eg=(gap R,gap M,gap V,V R,V M,V L)
因此,根据s el和s eg构造s e
3)根据对称原则确定该时刻的对称四元组信息(s e,a e,r,s e')中的a e等于如下式子:
Figure PCTCN2020090234-appb-000015
其中,a等于0代表保持直行,a等于1代表向左相邻车道换道,a等于2代表向右相邻车道换道。
需要说明的是,该时刻的对称四元组信息(s e,a e,r,s e')中的s e'的构造方式与s e的构造方式类似,此处不再赘述。
例如,假设车辆A,B,C,D的相对距离特征归一化后分别为d A,d B,d C,d D,相对速度特征归一化后分别为v A,v B,v C,v D,且Δ d为当前速度下对应的相对距离精度,Δ v为当前车度下对应的相对速度精度,则(d A,v A,d B,v B,d C,v C,d D,v D)的值将通过单调原则被改变为以下包含2*2*3*3*3*3*2*2个元素的集合中的值:
s ml∈{d Ad,d A}×{v Av,v A}×{d Bd,d B,d Bd}
×{v Bv,v v,v Bv}×{d Cd,d C,d Cd}×{v Cv,v v,v Cv}×{d Dd,d D}×{v Dv,v D}因此,从上述集合中随机选取预设数量(例如10)组,从而构成该时刻的单调四元组信息(s m,a m,r,s m')中的s ml
可选地,假设任意时刻的四元组信息(s,a,r,s')中的s g等于该时刻的单调四元组信息(s m,a m,r,s m')中的s mg,则根据s ml和s mg构造s m
需要说明的是,该时刻的单调四元组信息(s m,a m,r,s m')中的s m'的构造方式与s m的构造方式类似,此处不再赘述。
需要说明的是,当训练设备1001执行本申请实施例提供的控制策略的训练方法时,所涉及的自动驾驶车辆、障碍物、车道等信息可以为训练设备1001中的模拟道路信息,或实际道路信息上发生的历史数据。当执行设备1002执行本申请实施例提供的自动换道方法时,所涉及的自动驾驶车辆、障碍物、车道等信息为实际的实时道路信息。
下面从控制策略(或控制策略模型)训练侧和控制策略应用侧对本申请提供的方法进行描述:
本申请实施例提供的控制策略的训练方法,涉及计算机处理,具体可以应用于数据训练、机器学习、深度学习等数据处理方法,对训练数据(如本申请实施例中的预设数量个历史时刻的四元组信息)进行符号化和形式化的智能信息建模、抽取、预处理、训练等, 最终得到训练好的控制策略;并且,本申请实施例提供的自动换道方法可以运用上述训练好的控制策略,将输入数据(如本申请实施例中的局部邻居特征和全局统计特征)输入到训练好的控制策略中,得到输出数据(如本申请实施例中的目标动作指示)。当然,自动换道方法中也可实时更新控制策略,或者每隔预设时长更新控制策略。需要说明的是,本申请实施例提供的控制策略的训练方法和应用方法是基于同一个构思产生的发明,也可以理解为一个系统中的两个部分,或一个整体流程的两个阶段:如控制策略训练阶段和控制策略应用阶段。
下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。
下面对本申请实施例提供的控制策略的训练方法进行详细说明。
图9为本申请一实施例提供的控制策略的训练方法的流程示意图。本实施例的方法具体可以由如图1所示的训练设备1001执行。如图9所示,本申请实施例提供的方法可以包括:
步骤S901、获取预设数量个历史时刻的四元组信息。
本步骤中,从数据库获取预设数量个历史时刻的四元组信息,其中,任意历史时刻的四元组信息对应该历史时刻车况,可以包括但不限于:该历史时刻的特征、该历史时刻的自动驾驶车辆的目标动作(即在该历史时刻根据对应的控制策略所确定的目标动作)、该历史时刻的目标动作对应的回报值以及该历史时刻的下一时刻的特征。
示例性地,该历史时刻的特征可以包括但不限于:自动驾驶车辆在该历史时刻的局部邻居特征和全局统计特征。
示例性地,该历史时刻的下一时刻的特征可以包括但不限于:自动驾驶车辆在该下一时刻的局部邻居特征和全局统计特征。
本申请实施例中涉及的自动驾驶车辆的任意时刻的局部邻居特征用于表示自动驾驶车辆的特定的邻居障碍物在该时刻相对于自动驾驶车辆的运动状态信息(例如相对距离和相对速度。
示例性地,特定的邻居障碍物可以包括但不限于:自动驾驶车辆在该时刻所在车道上与自动驾驶车辆相邻的前后障碍物、自动驾驶车辆在该时刻所在车道的相邻左车道上与自动驾驶车辆相邻的前后障碍物、自动驾驶车辆在该时刻所在车道的相邻右车道上与自动驾驶车辆相邻的前后障碍物。
本申请实施例中涉及的自动驾驶车辆的任意时刻的全局统计特征用于表示自动驾驶车辆的感知范围内各个车道的障碍物在该时刻的稀疏与稠密程度,例如,各个车道所有障碍物在该时刻的平均行驶速度以及平均间隔。
示例性地,任意时刻的全局统计特征可以包括但不限于:自动驾驶车辆在该时刻所在车道的左侧所有车道上前后相邻的两障碍物之间的平均间隔、自动驾驶车辆在该时刻所在车道上前后相邻的两障碍物之间的平均间隔、自动驾驶车辆在该时刻所在车道的右侧所有车道上前后相邻的两障碍物之间的平均间隔、自动驾驶车辆在该时刻所在车道的左侧所有车道上障碍物的平均行驶速度、自动驾驶车辆在该时刻所在车道上障碍物的平均行驶速度和自动驾驶车辆在该时刻所在车道的右侧所有车道上障碍物的平均行驶速度。
步骤S902、根据至少一个第一历史时刻的四元组信息、至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息,对当前控制策略进行更新得到下一时刻的控制策略。
示例性地,至少一个第一历史时刻的四元组信息为上述预设数量个历史时刻的四元组信息中历史时刻的自动驾驶车辆的目标动作为换道所对应的历史时刻的四元组信息。
示例性地,至少一个第二历史时刻的四元组信息为上述预设数量个历史时刻的四元组信息中除至少一个第一历史时刻的四元组信息之外的其它历史时刻的四元组信息,即预设数量个历史时刻的四元组信息中历史时刻的自动驾驶车辆的目标动作为保持执行所对应的历史时刻的四元组信息。
例如,假设预设数量个历史时刻的四元组信息可以包括:历史时刻①的四元组信息(其中,历史时刻①的自动驾驶车辆的目标动作为换道)、历史时刻②的四元组信息(其中,历史时刻②的自动驾驶车辆的目标动作为保持直行)、历史时刻③的四元组信息(其中,历史时刻③的自动驾驶车辆的目标动作为换道)和历史时刻④的四元组信息(其中,历史时刻④的自动驾驶车辆的目标动作为保持直行),则至少一个第一历史时刻的四元组信息可以包括:历史时刻①的四元组信息和历史时刻③的四元组信息,至少一个第二历史时刻的四元组信息可以包括:历史时刻②的四元组信息和历史时刻④的四元组信息。
本申请实施例中涉及的任意第一历史时刻的延伸四元组信息对应第一历史时刻延伸车况,通过对第一历史时刻车况进行对称规则和单调规则处理得到的。
本申请实施例中涉及的对称规则是指以自动驾驶车辆所在车道为轴,将自动驾驶车辆所在车道的左右两侧所有车道上障碍物的位置进行对称变换。
本申请实施例中涉及的单调原则是指将自动驾驶车辆换道的目标车道上的自动驾驶车辆的前后邻居障碍物之间的距离增大,和/或,非目标车道上的自动驾驶车辆的前后邻居障碍物之间的距离改变小于预设距离范围。
示例性地,任意第一历史时刻的延伸四元组信息可以包括:该第一历史时刻的对称四元组信息和单调四元组信息。例如,该第一历史时刻的对称四元组信息可以是对该第一历史时刻的四元组信息进行对称原则处理得到的,该第一历史时刻的单调四元组信息可以是对该第一历史时刻的四元组信息进行单调原则处理得到的。
具体的,该第一历史时刻的对称四元组信息和单调四元组信息的构造方式,可以参考本申请上述关于“任意时刻的对称四元组信息和单调四元组信息”的构造方式,此处不再赘述。
本步骤中,根据至少一个第一历史时刻的四元组信息、至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息,对当前控制策略中的参数进行更新得到下一时刻的控制策略(用于确定下一时刻的目标动作)。
示例性地,根据至少一个第一历史时刻的四元组信息、至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息中的第l个四元组信息,生成第l个四元组信息对应的目标值;进一步地,利用梯度下降法对包含第l个四元组信息对应的目标值的预设函数中的参数θ进行迭代更新;进一步地,将迭代更新后的参数θ替换当前控制策略中的参数θ,得到下一时刻的控制策略。
本实施例中,根据至少一个第一历史时刻的四元组信息、至少一个第一历史时刻的 延伸四元组信息,以及至少一个第二历史时刻的四元组信息中的第l个四元组信息(s l,a l,r l,s l')可以采用如下公式,生成第l个四元组信息对应的目标值y l;其中,l为取遍不大于q的正整数,q为至少一个第一历史时刻的四元组信息、至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息中所包括的四元组信息总数。
示例性地,
Figure PCTCN2020090234-appb-000016
其中,结束状态是指自动驾驶车辆自动行驶完成了预设最大距离或者人为干预自动驾驶车辆的行驶;γ代表预设遗忘因子,γ∈(0,1);Q(s l',a l,θ)代表动作价值函数;
Figure PCTCN2020090234-appb-000017
代表遍历a l使Q(s l',a l,θ)取最大值;s l'代表第l个四元组信息中的后一个时刻的特征。
当然,根据至少一个第一历史时刻的四元组信息、至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息中的第l个四元组信息,还可通过上述公式的其它变形或者等效公式生成第l个四元组信息对应的目标值,本申请实施例中对此并不作限制。
进一步地,利用梯度下降法对包含第l个四元组信息对应的目标值y l的预设函数
Figure PCTCN2020090234-appb-000018
中的参数θ进行迭代更新;其中,Q(s l,a l,θ)为第l个四元组信息对应的动作价值函数,s l代表第l个四元组信息中的前一个时刻的特征,a l代表第l个四元组信息中前一个时刻的目标动作。
进一步地,将迭代更新后的参数θ替换当前控制策略中的参数θ,从而得到下一时刻的控制策略,以便于用于确定下一时刻的目标动作。
当然,根据至少一个第一历史时刻的四元组信息、至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息,还可通过其它方式对当前控制策略中的参数进行更新得到下一时刻的控制策略,本申请实施例中对此并不作限制。
本申请实施例中,上述训练设备1001可以将步骤S901-步骤S902循环执行预设次数,或者可以将上述步骤S901-步骤S902循环执行多次直至更新后的控制策略满足预设条件时停止。上述训练设备1001最终得到的控制策略可以用于执行设备1002执行自动换道方法时使用。
示例性地,当上述训练设备1001首次执行上述步骤S902时,本申请实施例的当前控制策略可以是预设的初始控制策略;当上述训练设备1001不是首次执行上述步骤S902时,本申请实施例的当前控制策略可以是训练设备1001上次执行步骤S902后所得到的控制策略。
本申请实施例中,通过获取预设数量个历史时刻的四元组信息;进一步地,根据至少一个第一历史时刻的四元组信息、至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息,对当前控制策略进行更新得到下一时刻的控制策略。可见,通过在预设数量个历史时刻的四元组信息的基础上,进一步根据预设数量个历史时刻的四元组信息中的第一历史时刻的延伸四元组信息对当前控制策略进行更新,从而可以获得更加准确的控制策略,以便于可以准确地确定出对应的目标动作。
图10为本申请另一实施例提供的控制策略的训练方法的流程示意图。在上述实施例的基础上,本申请实施例中对“历史时刻的四元组信息”的生成方式进行介绍。如图10所示,在上述步骤S901之前,还包括:
S1001、对于每个历史时刻,根据自动驾驶车辆的行驶信息以及自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算自动驾驶车辆在该历史时刻的局部邻居特征以及全局统计特征。
需要说明的是,当障碍物为车辆或其他移动终端时,该障碍物的运动信息为行驶信息;当障碍物为人、动物或静止物体时,该障碍物的运动信息可以包括运动速度、运动位置等相关信息。
本步骤中,对于每个历史时刻,根据自动驾驶车辆的行驶信息(例如行驶速度和/或行驶位置等信息)以及自动驾驶车辆感知范围(即自动驾驶车辆的传感器可检测的范围,例如距离自动驾驶车辆的预设间隔内的范围)内各个车道的障碍物的运动信息(例如车辆的行驶速度和/或行驶位置等信息,人物、动物或静止物体的运动速度和/或运动位置等),计算自动驾驶车辆在该历史时刻的局部邻居特征以及全局统计特征。
本申请实施例中涉及的自动驾驶车辆在任意历史时刻的局部邻居特征用于表示自动驾驶车辆在该历史时刻的特定的邻居车(例如自动驾驶车辆所在车道上与自动驾驶车辆相邻的前后障碍物、自动驾驶车辆所在车道的相邻左侧车道上与自动驾驶车辆相邻的前后障碍物、自动驾驶车辆所在车道的相邻右侧车道上与自动驾驶车辆相邻的前后障碍物)相对于自动驾驶车辆的运动状态信息(例如相对距离和相对速度)。
示例性地,任意时刻的自动驾驶车辆的局部邻居特征s l可以包括但不限于:自动驾驶车辆所在车道上与自动驾驶车辆相邻的前障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000019
和相对距离
Figure PCTCN2020090234-appb-000020
自动驾驶车辆所在车道上与自动驾驶车辆相邻的后障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000021
和相对距离
Figure PCTCN2020090234-appb-000022
自动驾驶车辆所在车道的相邻左侧车道上与自动驾驶车辆相邻的前障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000023
和相对距离
Figure PCTCN2020090234-appb-000024
自动驾驶车辆所在车道的相邻左侧车道上与自动驾驶车辆相邻的后障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000025
和相对距离
Figure PCTCN2020090234-appb-000026
自动驾驶车辆所在车道的相邻右侧车道上与自动驾驶车辆相邻的前障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000027
和相对距离
Figure PCTCN2020090234-appb-000028
自动驾驶车辆所在车道的相邻右侧车道上与自动驾驶车辆相邻的后障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000029
和相对距离
Figure PCTCN2020090234-appb-000030
本申请实施例中涉及的自动驾驶车辆在任意历史时刻的全局统计特征用于表示自动驾驶车辆的感知范围内各个车道的障碍物的稀疏与稠密程度,例如各个车道所有障碍物在该历史时刻的平均行驶速度以及平均间隔。
示例性地,任意时刻的自动驾驶车辆的全局车流统计特征s g可以包括但不限于:自动驾驶车辆所在车道的左侧所有车道上前后的两障碍物之间的平均间隔gap L、自动驾驶车辆所在车道上前后相邻的两障碍物之间的平均间隔gap M、自动驾驶车辆所在车道的右侧所有车道上前后相邻的两障碍物之间的平均间隔gap R、自动驾驶车辆所在车道的左侧所有车道上障碍物的平均行驶速度V L、自动驾驶车辆所在车道上障碍物的平均行 驶速度V M和自动驾驶车辆所在车道的右侧所有车道上障碍物的平均行驶速度V R
S1002、根据该历史时刻的局部邻居特征、全局统计特征和该历史时刻的控制策略获取该历史时刻的目标动作指示。
本步骤中,通过将该历史时刻的局部邻居特征和全局统计特征输入到该历史时刻的控制策略,便可获取该历史时刻的目标动作指示(用于指示自动驾驶车辆执行目标动作)。
示例性地,任意时刻的控制策略(例如该历史时刻的控制策略)可以表示为:
Figure PCTCN2020090234-appb-000031
其中,s代表该时刻的局部邻居特征和全局统计特征;a'∈(0,1,2),a'等于0代表保持直行,a'等于1代表向左相邻车道换道,a'等于2代表向右相邻车道换道。
本实施例中针对任意历史时刻的局部邻居特征和全局统计特征,选择使动作价值函数Q(s,a',θ)取最大值的动作a'作为该历史时刻的目标动作a。
当然,该历史时刻的控制策略还可采用上述公式的其它变形或者等效公式,本申请实施例中对此并不作限制。
示例性地,目标动作至少包括两类:换道或保持直行,其中,换道可以包括:向左相邻车道换道或向右相邻车道换道。
S1003、通过执行目标动作得到反馈信息。
示例性地,反馈信息可以包括但不限于:自动驾驶车辆执行目标动作后的行驶信息(如行驶速度或行驶位置等),自动驾驶车辆在下一时刻的行驶信息和自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息;当目标动作为换道时,反馈信息还可以包括:执行目标动作的时间与历史平均时间的比值,以及自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况;其中,历史平均时间包括自动驾驶车辆在预设历史时间段内(例如500时间窗口)执行同类动作(如换道动作)的平均时间。
示例性地,自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况可以是根据自动驾驶车辆和自动驾驶车辆感知范围内其他障碍物在自动驾驶车辆换道前后的行驶信息(例如,自动驾驶车辆换道前所在车道上前后相邻的两障碍物之间的平均间隔gap cur,自动驾驶车辆换道前所在车道上障碍物的平均行驶速度V cur,自动驾驶车辆换道后所在车道上前后相邻的两障碍物之间的平均间隔gap goal,自动驾驶车辆换道后所在车道上障碍物的平均行驶速度V goal),以及预设全局分类模型F 0确定的。
S1004、根据反馈信息计算目标动作对应的回报值,以及自动驾驶车辆在该历史时刻的下一时刻的局部邻居特征以及全局统计特征。
本步骤中,可以根据反馈信息中的自动驾驶车辆在下一时刻的行驶信息和自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息,计算自动驾驶车辆在该历史时刻的下一时刻的局部邻居特征以及全局统计特征。具体地计算方式可以参考上述步骤S401中关于获取自动驾驶车辆在该历史时刻的局部邻居特征以及全局统计特征的方式,本申请实施例中对此不再赘述。
本申请实施例的下述部分对“根据反馈信息计算目标动作对应的回报值”的可实现方式进行介绍。
一种可能的实现方式,当目标动作为保持直行时,根据自动驾驶车辆执行目标动 作后的行驶信息计算回报值。
示例性地,根据预设函数R(s”)和自动驾驶车辆执行目标动作后的行驶信息s”(如行驶速度或行驶位置等)计算回报值。例如,预设函数R(s”)=V ego',V ego'代表自动驾驶车辆执行目标动作后的行驶速度;当然,预设函数R(s”)还可以等于包括自动驾驶车辆执行目标动作后的行驶信息的其它函数,本申请实施例中对此并不作限制。
另一种可能的实现方式,当目标动作为换道时,根据自动驾驶车辆执行目标动作后的行驶信息、执行目标动作的时间与历史平均时间的比值,以及自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况,计算回报值。
本实现方式中,根据执行目标动作的时间T与历史平均时间T e的比值确定局部回报系数K l。进一步地,根据自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况确定全局回报系数K g,其中,当自动驾驶车辆换道后所在车道比自动驾驶车辆换道前所在车道稠密时,K g>1;当自动驾驶车辆换道后所在车道比自动驾驶车辆换道前所在车道稀疏时,K g<1。进一步地,根据自动驾驶车辆执行目标动作后的行驶信息、局部回报系数K l和全局回报系数K g,计算回报值。
示例性地,根据公式c*K l*K g*R(s”)计算回报值;其中,c代表预设折扣因子(例如0.3),R(s”)代表包含自动驾驶车辆执行目标动作后的行驶信息的预设函数;当然,还可通过上述公式的其它等效或变形公式计算回报值,本申请实施例中对此并不作限制。
若该历史时刻的自动驾驶车辆的局部邻居特征还可以包括:导航目标车道与自动驾驶车辆所在车道之间的位置信息flag,以及自动驾驶车辆距离沿行驶方向的下一个路口之间的距离dist2goal;其中,flag∈{0,-1,1},其中,flag等于0代表自动驾驶车辆在导航目标车道上,flag等于-1代表导航目标车道在自动驾驶车辆所在车道的左侧,flag等于1代表导航目标车道在自动驾驶车辆所在车道的右侧,则根据执行目标动作的时间T与历史平均时间T e的比值确定局部回报系数K l。进一步地,根据自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况确定第一全局回报系数
Figure PCTCN2020090234-appb-000032
进一步地,根据导航目标车道与自动驾驶车辆所在车道之间的位置信息flag和目标动作确定第二全局回报系数
Figure PCTCN2020090234-appb-000033
示例性地,根据如下公式确定第二全局回报系数
Figure PCTCN2020090234-appb-000034
Figure PCTCN2020090234-appb-000035
其中,gap cur代表自动驾驶车辆换道前所在车道上前后相邻的两障碍物之间的平均间隔,gap goal代表自动驾驶车辆换道后所在车道上前后相邻的两障碍物之间的平均间隔,a代表目标动作。
当然,还可通过上述公式的其它等效或变形公式计算第二全局回报系数
Figure PCTCN2020090234-appb-000036
本申请实施例中对此并不作限制
进一步地,根据自动驾驶车辆执行目标动作后的行驶信息、局部回报系数K l、第一全局回报系数
Figure PCTCN2020090234-appb-000037
和第二全局回报系数
Figure PCTCN2020090234-appb-000038
计算回报值。
示例性地,根据公式
Figure PCTCN2020090234-appb-000039
计算回报值;其中,c代表预设折扣因子(例如0.3),R(s”)代表包含自动驾驶车辆执行目标动作后的行驶信息的预设函数。
例如,预设函数
Figure PCTCN2020090234-appb-000040
V ego'代表自动驾驶车辆执行目标动作后的行驶速度,flag'代表导航目标车道与自动驾驶车辆执行目标动作后所在车道之间的位置信息,flag'等于0代表自动驾驶车辆执行目标动作后在导航目标车道上,flag'等于-1代表导航目标车道在自动驾驶车辆执行目标动作后所在车道的左侧,flag'等于1代表导航目标车道在自动驾驶车辆执行目标动作后所在车道的右侧;dist2goal'代表自动驾驶车辆执行目标动作后距离沿行驶方向的下一个路口之间的距离;当然,预设函数R(s”)还可以等于包括自动驾驶车辆执行目标动作后的行驶信息的其它函数,本申请实施例中对此并不作限制。
当然,还可通过上述公式的其它等效或变形公式计算回报值,本申请实施例中对此并不作限制。
当然,根据反馈信息还可通过其它方式计算目标动作对应的回报值,本申请实施例中对此并不作限制。
S1005、存储该历史时刻的四元组信息。
本步骤中可以将该历史时刻的四元组信息存储在数据库中,为了便于后续训练控制策略。示例性地,该历史时刻的四元组信息对应该历史时刻车况,可以包括:该历史时刻的特征、自动驾驶车辆在该历史时刻的目标动作、该历史时刻的目标动作对应的回报值以及该历史时刻的下一时刻的特征,该历史时刻的特征包括自动驾驶车辆在该历史时刻的局部邻居特征和全局统计特征,该历史时刻的下一时刻的特征包括自动驾驶车辆在该历史时刻的下一时刻的局部邻居特征和全局统计特征。
本申请实施例中,对于每个历史时刻,通过根据自动驾驶车辆在该历史时刻的局部邻居特征、全局统计特征和该历史时刻的控制策略获取该历史时刻的目标动作指示;进一步地,通过执行目标动作得到反馈信息,并根据反馈信息计算目标动作对应的回报值以及自动驾驶车辆在该历史时刻的下一时刻的局部邻居特征以及全局统计特征,并存储该历史时刻的四元组信息。可见,通过在局部邻居特征的基础上,进一步引入全局统计特征和目标动作对应的回报值等信息,使得用于训练控制策略的训练数据更加完善,从而有利于训练出更加准确的控制策略。
进一步地,在上述实施例的基础上,本申请实施例中对“预设全局分类模型F 0”的生成方式进行介绍。
示例性地,获取全局分类模型特征;其中,全局分类模型特征可以包括但不限于:在预设时间段(例如2000000个时间窗口)内自动驾驶车辆和自动驾驶车辆感知范围内各个车道的障碍物在自动驾驶车辆每次换道前后的运动信息(例如,自动驾驶车辆每次换道前所在车道上前后相邻的两障碍物之间的平均间隔gap cur,自动驾驶车辆每次换道前所在车道上障碍物的平均行驶速度V cur,自动驾驶车辆每次换道后所在车道上前后相邻的两障碍物之间的平均间隔gap goal,自动驾驶车辆每次换道后所在车道上障碍物的平均行驶速度V goal)。进一步地,根据全局分类模型特征采用逻辑回归算法,生成预设全局分类模型F 0
本申请实施例中,首先可以在模拟器中预设不同稀疏与稠密程度和/或不同速度的道路场景,例如,构建长度为预设长度(例如4km)的包含三个车道的地图作为训练地图以及社会车(即除自动驾驶车辆之外的其它车辆)的布置覆盖到无车场景、稀疏中速场景、 稀疏高速场景、稀疏低速场景、均匀中速场景、均匀高速场景、均匀低速场景、稠密中速场景、稠密高速场景、稠密低速场景、稠密超低速场景等场景(其中,稀疏场景、均匀场景、稠密场景的车辆密度平均分别为15辆/4000米,40辆/4000米,100辆/4000米,社会车的平均速度的范围在5公里/小时,10公里/小时,20公里/小时,30公里/小时,40公里/小时,50公里/小时,60公里/小时等)。
其次,在模拟器中随机加载一个训练地图,让自动驾驶车辆以随机策略在模拟环境中行驶;所谓随机策略,就是自动驾驶车辆随机的在决策空间A(如0,1,2)中选择目标动作进行执行。假设自动驾驶车辆每次行驶到训练地图终点时将随机切换新训练地图和新的社会车配置场景,直到行驶预设时间段(例如2000000个时间窗口)后终止。
在模拟过程中,获取全局分类模型特征;其中,全局分类模型特征可以包括但不限于:在预设时间段(例如2000000个时间窗口)内自动驾驶车辆和自动驾驶车辆感知范围内其他车辆在自动驾驶车辆每次换道前后的行驶信息(例如,自动驾驶车辆每次换道前所在车道上前后相邻的两辆车之间的平均间隔gap cur,自动驾驶车辆每次换道前所在车道上车辆的平均行驶速度V cur,自动驾驶车辆每次换道后所在车道上前后相邻的两辆车之间的平均间隔gap goal,自动驾驶车辆每次换道后所在车道上车辆的平均行驶速度V goal)。
进一步地,如果任意换道前后的行驶信息中的gap cur<gap goal,则换道前后的行驶信息所对应的标签为1(代表自动驾驶车辆换道后所在车道比自动驾驶车辆换道前所在车道稀疏),否则为0(代表自动驾驶车辆换道后所在车道比自动驾驶车辆换道前所在车道稠密)。
进一步地,将每次换道前后的行驶信息和对应的标签作为样本数据添加到训练集D中,并利用训练集D中的样本数据和逻辑回归算法学习生成预设全局分类模型F 0(模型的输出是换道后所在车道更加稀疏的概率)。
当然,还可通过其它方式生成预设全局分类模型F 0,本申请实施例中对此并不作限制。
下面对本申请实施例提供的自动换道方法进行详细说明。
图11为本申请一实施例提供的自动换道方法的流程示意图。本实施例的方法具体可以由如图1所示的执行设备1002执行。如图11所示,本申请实施例提供的方法可以包括:
S1101、根据自动驾驶车辆当前时刻的行驶信息以及自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算自动驾驶车辆当前时刻的局部邻居特征以及全局统计特征。
需要说明的是,当障碍物为车辆或其他移动终端时,该运动信息为行驶信息;当障碍物为人、动物或静止物体时,该运动信息可以包括运动速度等相关信息。
本步骤中,根据自动驾驶车辆当前时刻的行驶信息(例如行驶速度和/或行驶位置等信息)以及自动驾驶车辆感知范围(即自动驾驶车辆的传感器可检测的范围,例如距离自动驾驶车辆的预设间隔内的范围)内各个车道的障碍物的运动信息(例如车辆的行驶速度和/或行驶位置等信息,人物、动物或静止物体的运动速度和/或运动位置等),计算自动驾驶车辆在当前时刻的局部邻居特征以及全局统计特征。
本申请实施例中涉及的自动驾驶车辆在当前时刻的局部邻居特征用于表示自动驾驶车辆在当前时刻的特定的邻居障碍物(例如自动驾驶车辆在当前时刻所在车道上与自 动驾驶车辆相邻的前后障碍物、自动驾驶车辆在当前时刻所在车道的相邻左侧车道上与自动驾驶车辆相邻的前后障碍物、自动驾驶车辆在当前时刻所在车道的相邻右侧车道上与自动驾驶车辆相邻的前后障碍物)相对于自动驾驶车辆的运动状态信息(例如相对距离和相对速度)。
示例性地,任意时刻的自动驾驶车辆的局部邻居特征s l可以包括但不限于:自动驾驶车辆在该时刻所在车道上与自动驾驶车辆相邻的前障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000041
和相对距离
Figure PCTCN2020090234-appb-000042
自动驾驶车辆在该时刻所在车道上与自动驾驶车辆相邻的后障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000043
和相对距离
Figure PCTCN2020090234-appb-000044
自动驾驶车辆在该时刻所在车道的相邻左侧车道上与自动驾驶车辆相邻的前障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000045
和相对距离
Figure PCTCN2020090234-appb-000046
自动驾驶车辆在该时刻所在车道的相邻左侧车道上与自动驾驶车辆相邻的后障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000047
和相对距离
Figure PCTCN2020090234-appb-000048
自动驾驶车辆在该时刻所在车道的相邻右侧车道上与自动驾驶车辆相邻的前障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000049
和相对距离
Figure PCTCN2020090234-appb-000050
自动驾驶车辆在该时刻所在车道的相邻右侧车道上与自动驾驶车辆相邻的后障碍物相对于自动驾驶车辆的相对速度
Figure PCTCN2020090234-appb-000051
和相对距离
Figure PCTCN2020090234-appb-000052
本申请实施例中涉及自动驾驶车辆在当前时刻的全局统计特征用于表示自动驾驶车辆的感知范围内各个车道的障碍物在当前时刻的稀疏与稠密程度,例如,各个车道所有障碍物在当前时刻的平均行驶速度以及平均间隔。
示例性地,任意时刻的自动驾驶车辆的全局统计特征s g可以包括但不限于:自动驾驶车辆在该时刻所在车道的左侧所有车道上前后相邻的两障碍物之间的平均间隔gap L、自动驾驶车辆在该时刻所在车道上前后相邻的两障碍物之间的平均间隔gap M、自动驾驶车辆在该时刻所在车道的右侧所有车道上前后相邻的两障碍物之间的平均间隔gap R、自动驾驶车辆在该时刻所在车道的左侧所有车道上障碍物的平均行驶速度V L、自动驾驶车辆在该时刻所在车道上障碍物的平均行驶速度V M和自动驾驶车辆在该时刻所在车道的右侧所有车道上障碍物的平均行驶速度V R
S1102、根据局部邻居车特征、全局统计特征和当前控制策略获取目标动作指示。
示例性地,当上述执行设备1002首次执行上述步骤S1101-S1102时,本申请实施例的当前控制策略可以是预设的控制策略,例如上述训练设备1001通过执行上述控制策略的训练方法最终所得到的控制策略;当上述执行设备1002不是首次执行上述步骤S1101-S1102时,本申请实施例的当前控制策略可以是执行设备1002在上一时刻更新后得到的控制策略。
本步骤中,通过将自动驾驶车辆在当前时刻的局部邻居特征和全局统计特征输入到当前控制策略(即当前时刻的控制策略),便可获取当前时刻的目标动作指示(用于指示自动驾驶车辆执行目标动作)。
示例性地,任意时刻的控制策略(例如当前控制策略)可以表示为:
Figure PCTCN2020090234-appb-000053
其中,s代表该时刻的自动驾驶车辆的局部邻居特征和全局统计特征;a'∈(0,1,2),a'等于0代表保持直行,a'等于1代表向左相邻车道换道,a'等于2代表向右相邻车道换道。
本实施例中针对任意当前时刻的自动驾驶车辆的局部邻居特征和全局统计特征,选 择使Q(s,a',θ)取最大值的动作a'作为当前时刻的目标动作a。
示例性地,目标动作至少包括两类:换道或保持直行,其中,换道包括:向左相邻车道换道或向右相邻车道换道。
当然,当前时刻的控制策略还可采用上述公式的其它变形或者等效公式,本申请实施例中对此并不作限制。
S1103、根据目标动作指示执行目标动作。
示例性地,若目标动作指示换道,则自动驾驶车辆执行换道动作;若目标动作指示保持直行,则自动驾驶车辆执执行保持直行动作。
具体地,根据目标动作指示执行目标动作的方式,可以参考相关技术中的内容,本申请实施例中对此并不作限制。
本申请实施例中,根据自动驾驶车辆当前时刻的行驶信息以及自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算自动驾驶车辆在当前时刻的局部邻居特征以及全局统计特征;进一步地,根据局部邻居特征、全局统计特征和当前控制策略获取目标动作指示,并根据目标动作指示执行目标动作。可见,通过在局部邻居特征的基础上,进一步引入全局统计特征输入当前控制策略获取目标动作指示,不仅考虑了局部的邻居障碍物(如他车)的信息还考虑了全局统计特征(如整体车流)的宏观情况,因此,综合了局部和全部路面障碍物信息得到的目标动作是全局最优的策略动作。
进一步地,在上述实施例的基础上,本申请实施例中还可以通过执行目标动作得到反馈信息,并根据反馈信息更新当前控制策略得到下一时刻的控制策略,从而使得下一时刻可以根据该下一时刻的控制策略准确地确定下一时刻的目标动作。
需要说明的是,在每个t时刻可以根据该t时刻的反馈信息对该t时刻的控制策略进行更新,得到t+1时刻的控制策略,使得生成目标动作的控制策略一直在自适应的持续更新优化中,从而保证每一个时刻都有其对应的最佳控制策略,为每一个时刻的目标动作的准确生成提供了保障。
本实施例中,通过执行目标动作得到反馈信息(用于更新当前控制策略),以便于确定目标动作对应的回报值以及自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征,从而更新当前控制策略。
示例性地,反馈信息可以包括但不限于:自动驾驶车辆执行目标动作后的行驶信息(如行驶速度或行驶位置等),自动驾驶车辆在下一时刻的行驶信息和自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息;当目标动作为换道时,反馈信息还可以包括:执行目标动作的时间与历史平均时间的比值,以及自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况;其中,历史平均时间包括自动驾驶车辆在预设历史时间段内(例如500时间窗口)执行同类动作(如换道动作)的平均时间。
示例性地,自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况可以是根据自动驾驶车辆和自动驾驶车辆感知范围内各个车道的障碍物在自动驾驶车辆换道前后的运动信息(例如,自动驾驶车辆换道前所在车道上前后相邻的两障碍物之间的平均间隔gap cur,自动驾驶车辆换道前所在车道上障碍物的平均行驶速度 V cur,自动驾驶车辆换道后所在车道上前后相邻的两障碍物之间的平均间隔gap goal,自动驾驶车辆换道后所在车道上障碍物的平均行驶速度V goal),以及预设全局分类模型F 0确定的。
图12为本申请另一实施例提供的自动换道方法的流程示意图。在上述实施例的基础上,本申请实施例对“根据反馈信息更新当前控制策略得到下一时刻的控制策略”的可实现方式进行介绍。如图12所示,本申请实施例的方法可以包括:
S1201、根据反馈信息计算目标动作对应的回报值,以及自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征。
本步骤中,可以根据反馈信息中的自动驾驶车辆在下一时刻的行驶信息和自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息,计算自动驾驶车辆在当前时刻的下一时刻的局部邻居特征以及全局统计特征。具体地计算方式可以参考上述步骤S1001中关于获取自动驾驶车辆在该历史时刻的局部邻居特征以及全局统计特征的方式,本申请实施例中对此不再赘述。
本申请实施例的下述部分对“根据反馈信息计算目标动作对应的回报值”的可实现方式进行介绍。
一种可能的实现方式,当目标动作为保持直行时,根据自动驾驶车辆执行目标动作后的行驶信息计算回报值。
示例性地,根据预设函数R(s”)和自动驾驶车辆执行目标动作后的行驶信息s”(如行驶速度或距离等)计算回报值。例如,预设函数R(s”)=V ego',V ego'代表自动驾驶车辆执行目标动作后的行驶速度;当然,预设函数R(s”)还可以等于包括自动驾驶车辆执行目标动作后的行驶信息的其它函数,本申请实施例中对此并不作限制。
另一种可能的实现方式,当目标动作为换道时,根据自动驾驶车辆执行目标动作后的行驶信息、执行目标动作的时间与历史平均时间的比值,以及自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况,计算回报值。
本实现方式中,根据执行目标动作的时间T与历史平均时间T e的比值确定局部回报系数K l。进一步地,根据自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况确定全局回报系数K g,其中,当自动驾驶车辆换道后所在车道比自动驾驶车辆换道前所在车道稠密时,K g>1;当自动驾驶车辆换道后所在车道比自动驾驶车辆换道前所在车道稀疏时,K g<1。进一步地,根据自动驾驶车辆执行目标动作后的行驶信息、局部回报系数K l和全局回报系数K g,计算回报值。
示例性地,根据公式c*K l*K g*R(s”)计算回报值;其中,c代表预设折扣因子(例如0.3),R(s”)代表包含自动驾驶车辆执行目标动作后的行驶信息的预设函数;当然,还可通过上述公式的其它等效或变形公式计算回报值,本申请实施例中对此并不作限制。
若当前时刻的自动驾驶车辆的局部邻居特征还可以包括:导航目标车道与自动驾驶车辆所在车道之间的位置信息flag,以及自动驾驶车辆距离沿行驶方向的下一个路口之间的距离dist2goal;其中,flag∈{0,-1,1},其中,flag等于0代表自动驾驶车辆在导航目标车道上,flag等于-1代表导航目标车道在自动驾驶车辆所在车道的左侧,flag等于1代表导航目标车道在自动驾驶车辆所在车道的右侧,则根据执行目标动作的时间T与 历史平均时间T e的比值确定局部回报系数K l。进一步地,根据自动驾驶车辆换道后所在车道与换道前所在车道在稀疏程度上的变化情况确定第一全局回报系数
Figure PCTCN2020090234-appb-000054
进一步地,根据导航目标车道与自动驾驶车辆所在车道之间的位置信息flag和目标动作确定第二全局回报系数
Figure PCTCN2020090234-appb-000055
示例性地,根据如下公式确定第二全局回报系数
Figure PCTCN2020090234-appb-000056
Figure PCTCN2020090234-appb-000057
其中,gap cur代表自动驾驶车辆换道前所在车道上前后相邻的两障碍物之间的平均间隔,gap goal代表自动驾驶车辆换道后所在车道上前后相邻的两障碍物之间的平均间隔,a代表目标动作。
当然,还可通过上述公式的其它等效或变形公式计算第二全局回报系数
Figure PCTCN2020090234-appb-000058
本申请实施例中对此并不作限制
进一步地,根据自动驾驶车辆执行目标动作后的行驶信息、局部回报系数K l、第一全局回报系数
Figure PCTCN2020090234-appb-000059
和第二全局回报系数
Figure PCTCN2020090234-appb-000060
计算回报值。
示例性地,根据公式
Figure PCTCN2020090234-appb-000061
计算回报值;其中,c代表预设折扣因子(例如0.3),R(s”)代表包含自动驾驶车辆执行目标动作后的行驶信息的预设函数。
例如,预设函数
Figure PCTCN2020090234-appb-000062
V ego'代表自动驾驶车辆执行目标动作后的行驶速度,flag'代表导航目标车道与自动驾驶车辆执行目标动作后所在车道之间的位置信息,flag'等于0代表自动驾驶车辆执行目标动作后在导航目标车道上,flag'等于-1代表导航目标车道在自动驾驶车辆执行目标动作后所在车道的左侧,flag'等于1代表导航目标车道在自动驾驶车辆执行目标动作后所在车道的右侧;dist2goal'代表自动驾驶车辆执行目标动作后距离沿行驶方向的下一个路口之间的距离;当然,预设函数R(s”)还可以等于包括自动驾驶车辆执行目标动作后的行驶信息的其它函数,本申请实施例中对此并不作限制。
当然,还可通过上述公式的其它等效或变形公式计算回报值,本申请实施例中对此并不作限制。
当然,根据反馈信息还可通过其它方式计算目标动作对应的回报值,本申请实施例中对此并不作限制。
步骤S1202、确定当前时刻的四元组信息。
本步骤中,根据自动驾驶车辆在当前时刻的局部邻居特征和全局统计特征、当前时刻的目标动作、上述步骤S1201中所计算得到的目标动作对应的回报值、自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征,确定出当前时刻的四元组信息。
示例性地,当前时刻的四元组信息对应当前时刻车况,可以包括:当前时刻的特征、自动驾驶车辆在当前时刻的目标动作、目标动作对应的回报值以及当前时刻的下一时刻的特征,当前时刻的特征包括自动驾驶车辆在当前时刻的局部邻居特征和全局统计特征,当前时刻的下一时刻的特征包括自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征。
S1203、根据当前时刻的四元组信息对当前控制策略进行更新得到下一时刻的控制策略。
一种可能的实现方式,当目标动作为直行时,根据当前时刻的四元组信息,生成四元组信息对应的目标值;进一步地,利用梯度下降法对包含目标值的第一预设函数中的参数θ进行迭代更新;进一步地,将迭代更新后的参数θ替换当前控制策略中的参数θ,得到当前时刻的下一时刻的控制策略。
本实现方式中,当目标动作为直行时,根据当前时刻的四元组信息可以采用如下公式,生成四元组信息对应的目标值y。
示例性地,
Figure PCTCN2020090234-appb-000063
其中,γ代表预设遗忘因子,γ∈(0,1);Q(s',a,θ)代表动作价值函数;
Figure PCTCN2020090234-appb-000064
代表遍历a使Q(s',a,θ)取最大值;s'代表当前时刻的下一时刻的特征。
当然,根据当前时刻的四元组信息,还可通过上述公式的其它变形或者等效公式生成四元组信息对应的目标值,本申请实施例中对此并不作限制。
进一步地,利用梯度下降法对包含目标值y的第一预设函数(y-Q(s,a,θ)) 2中的参数θ进行迭代更新;其中,Q(s,a,θ)为当前时刻的四元组信息对应的动作价值函数,s代表当前时刻的四元组信息中的当前时刻的局部邻居特征和全局特征,a代表当前时刻的四元组信息中的当前时刻的目标动作。
进一步地,将迭代更新后的参数θ替换当前控制策略中的参数θ,从而得到当前时刻的下一时刻的控制策略,以便于用于确定下一时刻的目标动作。
另一种可能的实现方式,当目标动作为换道时,获取当前时刻的延伸四元组信息;进一步地,根据当前时刻的四元组信息和当前时刻的延伸四元组信息对当前控制策略进行更新得到当前时刻的下一时刻的控制策略。
本申请实施例中涉及的当前时刻的延伸四元组信息对应当前时刻延伸车况,通过对当前时刻车况进行对称规则和单调规则处理得到的。
本申请实施例中涉及的对称规则是指以自动驾驶车辆所在车道为轴,将自动驾驶车辆所在车道的左右两侧所有车道上障碍物的位置进行对称变换。
本申请实施例中涉及的单调原则是指将自动驾驶车辆换道的目标车道上的自动驾驶车辆的前后邻居障碍物之间的距离增大,和/或,非目标车道上的自动驾驶车辆的前后邻居障碍物之间的距离改变小于预设距离范围。
示例性地,当前时刻的延伸四元组信息可以包括:当前时刻的对称四元组信息和单调四元组信息。例如,当前时刻的对称四元组信息可以是对当前时刻的四元组信息进行对称原则处理得到的,当前时刻的单调四元组信息可以是对当前时刻的四元组信息进行单调原则处理得到的。
具体的,当前时刻的对称四元组信息和单调四元组信息的构造方式,可以参考本申请上述关于“任意时刻的对称四元组信息和单调四元组信息”的构造方式,此处不再赘述。
本实现方式中,当目标动作为换道时,通过获取当前时刻的延伸四元组信息,并根据当前时刻的四元组信息和当前时刻的延伸四元组信息中的第i个四元组信息(s i,a i,r i,s i')可以采用如下公式,生成第i个四元组信息对应的目标值y i;其中,i为取遍不大于n的正整数,n为当前时刻的四元组信息和当前时刻的延伸四元组信息中包括的 四元组信息总数。
示例性地,
Figure PCTCN2020090234-appb-000065
其中,γ代表预设遗忘因子,γ∈(0,1);Q(s i',a i,θ)代表动作价值函数;
Figure PCTCN2020090234-appb-000066
代表遍历a i使Q(s i',a i,θ)取最大值;s i'代表第i个四元组信息中的后一个时刻的特征。
当然,根据当前时刻的四元组信息和当前时刻的延伸四元组信息中的第i个四元组信息,还可通过上述公式的其它变形或者等效公式生成第i个四元组信息对应的目标值,本申请实施例中对此并不作限制。
进一步地,利用梯度下降法对包含第i个四元组信息对应的目标值y i的第二预设函数
Figure PCTCN2020090234-appb-000067
中的参数θ进行迭代更新;其中,Q(s i,a i,θ)为第i个四元组信息对应的动作价值函数,s i代表第i个四元组信息中的前一个时刻的特征,a i代表第i个四元组信息中前一个时刻的目标动作。
进一步地,将迭代更新后的参数θ替换当前控制策略中的参数θ,从而得到当前时刻的下一时刻的控制策略,以便于用于确定下一时刻的目标动作。
另一种可能的实现方式,当目标动作为保持直行时,根据当前时刻的四元组信息、历史时刻的四元组信息和历史时刻的延伸四元组信息对当前控制策略进行更新得到当前时刻的下一时刻的控制策略。
本申请实施例中的历史时刻的四元组信息对应历史时刻车况,可以包括但不限于:历史时刻的特征、历史时刻的自动驾驶车辆的目标动作(即在历史时刻根据对应的控制策略所确定的目标动作)、历史时刻的目标动作对应的回报值以及该历史时刻的下一时刻的特征。
示例性地,该历史时刻的特征可以包括但不限于:自动驾驶车辆在该历史时刻的局部邻居特征和全局统计特征。
示例性地,该历史时刻的下一时刻的特征可以包括但不限于:自动驾驶车辆在该下一时刻的局部邻居特征和全局统计特征。
本申请实施例中涉及的历史时刻的延伸四元组信息对应历史时刻延伸车况,通过对历史时刻车况进行对称规则和单调规则处理得到的。
本申请实施例中涉及的对称规则是指以自动驾驶车辆所在车道为轴,将自动驾驶车辆所在车道的左右两侧所有车道上障碍物的位置进行对称变换。
本申请实施例中涉及的单调原则是指将自动驾驶车辆换道的目标车道上的自动驾驶车辆的前后邻居障碍物之间的距离增大,和/或,非目标车道上的自动驾驶车辆的前后邻居障碍物之间的距离改变小于预设距离范围。
示例性地,历史时刻的延伸四元组信息可以包括:历史时刻的对称四元组信息和单调四元组信息。例如,历史时刻的对称四元组信息可以是对历史时刻的四元组信息进行对称原则处理得到的,历史时刻的单调四元组信息可以是对历史时刻的四元组信息进行单调原则处理得到的。
具体的,历史时刻的对称四元组信息和单调四元组信息的构造方式,可以参考本 申请上述关于“任意时刻的对称四元组信息和单调四元组信息”的构造方式,此处不再赘述。
本实现方式中,当目标动作为保持直行时,根据所述当前时刻的四元组信息、历史时刻的四元组信息和历史时刻的延伸四元组信息中的第j个四元组信息(s j,a j,r j,s j')可以采用如下公式,生成第j个四元组信息对应的目标值y j;其中,j为取遍不大于m的正整数,m为当前时刻的四元组信息、历史时刻的四元组信息和历史时刻的延伸四元组信息中包括的四元组信息总数。
示例性地,
Figure PCTCN2020090234-appb-000068
其中,γ代表预设遗忘因子,γ∈(0,1);Q(s j',a j,θ)代表动作价值函数;
Figure PCTCN2020090234-appb-000069
代表遍历a j使Q(s j',a j,θ)取最大值;s j'代表第j个四元组信息中的后一个时刻的特征。
当然,根据当前时刻的四元组信息、历史时刻的四元组信息和历史时刻的延伸四元组信息中的第j个四元组信息,还可通过上述公式的其它变形或者等效公式生成第j个四元组信息对应的目标值,本申请实施例中对此并不作限制。
进一步地,利用梯度下降法对包含第j个四元组信息对应的目标值y j的第三预设函数
Figure PCTCN2020090234-appb-000070
中的参数θ进行迭代更新;其中,Q(s j,a j,θ)为第j个四元组信息对应的动作价值函数,s j代表第j个四元组信息中的前一个时刻的特征,a j代表第j个四元组信息中前一个时刻的目标动作。
进一步地,将迭代更新后的参数θ替换当前控制策略中的参数θ,从而得到下一时刻的控制策略,以便于用于确定下一时刻的目标动作。
另一种可能的实现方式,当目标动作为换道时,获取当前时刻的延伸四元组信息;进一步地,根据当前时刻的四元组信息、当前时刻的延伸四元组信息、历史时刻的四元组信息和历史时刻的延伸四元组信息,对当前控制策略进行更新得到当前时刻的下一时刻的控制策略。
本申请实施例中涉及的当前时刻的延伸四元组信息对应当前时刻延伸车况,通过对当前时刻车况进行对称规则和单调规则处理得到的。
本申请实施例中涉及的对称规则是指以自动驾驶车辆所在车道为轴,将自动驾驶车辆所在车道的左右两侧所有车道上障碍物的位置进行对称变换。
本申请实施例中涉及的单调原则是指将自动驾驶车辆换道的目标车道上的自动驾驶车辆的前后邻居障碍物之间的距离增大,和/或,非目标车道上的自动驾驶车辆的前后邻居障碍物之间的距离改变小于预设距离范围。
示例性地,当前时刻的延伸四元组信息可以包括:当前时刻的对称四元组信息和单调四元组信息。例如,当前时刻的对称四元组信息可以是对当前时刻的四元组信息进行对称原则处理得到的,当前时刻的单调四元组信息可以是对当前时刻的四元组信息进行单调原则处理得到的。
具体的,当前时刻的对称四元组信息和单调四元组信息的构造方式,可以参考本 申请上述关于“任意时刻的对称四元组信息和单调四元组信息”的构造方式,此处不再赘述。
本申请实施例中的历史时刻的四元组信息对应历史时刻车况,可以包括但不限于:历史时刻的特征、历史时刻的自动驾驶车辆的目标动作(即在历史时刻根据对应的控制策略所确定的目标动作)、历史时刻的目标动作对应的回报值以及该历史时刻的下一时刻的特征。
示例性地,该历史时刻的特征可以包括但不限于:自动驾驶车辆在该历史时刻的局部邻居特征和全局统计特征。
示例性地,该历史时刻的下一时刻的特征可以包括但不限于:自动驾驶车辆在该下一时刻的局部邻居特征和全局统计特征。
本申请实施例中涉及的历史时刻的延伸四元组信息对应历史时刻延伸车况,通过对历史时刻车况进行对称规则和单调规则处理得到的。
示例性地,历史时刻的延伸四元组信息可以包括:历史时刻的对称四元组信息和单调四元组信息。例如,历史时刻的对称四元组信息可以是对历史时刻的四元组信息进行对称原则处理得到的,历史时刻的单调四元组信息可以是对历史时刻的四元组信息进行单调原则处理得到的。
具体的,历史时刻的对称四元组信息和单调四元组信息的构造方式,可以参考本申请上述关于“任意时刻的对称四元组信息和单调四元组信息”的构造方式,此处不再赘述。
本实现方式中,当目标动作为换道时,通过获取当前时刻的延伸四元组信息,并根据当前时刻的四元组信息、当前时刻的延伸四元组信息、历史时刻的四元组信息和历史时刻的延伸四元组信息中的第k个四元组信息(s k,a k,r k,s k')可以采用如下公式,生成第k个四元组信息对应的目标值y k;其中,k为取遍不大于p的正整数,p为当前时刻的四元组信息、当前时刻的延伸四元组信息、历史时刻的四元组信息和历史时刻的延伸四元组信息中包括的四元组信息总数。
示例性地,
Figure PCTCN2020090234-appb-000071
其中,γ代表预设遗忘因子,γ∈(0,1);Q(s k',a k,θ)代表动作价值函数;
Figure PCTCN2020090234-appb-000072
代表遍历a k使Q(s k',a k,θ)取最大值;s k'代表第k个四元组信息中的后一个时刻的特征。
当然,根据当前时刻的四元组信息、当前时刻的延伸四元组信息、历史时刻的四元组信息和历史时刻的延伸四元组信息中的第k个四元组信息,还可通过上述公式的其它变形或者等效公式生成第k个四元组信息对应的目标值,本申请实施例中对此并不作限制。
进一步地,利用梯度下降法对包含第k个四元组信息对应的目标值y k的第四预设函数
Figure PCTCN2020090234-appb-000073
中的参数θ进行迭代更新;其中,Q(s k,a k,θ)为第k个四元组信息对应的动作价值函数,s k代表第k个四元组信息中的前一个时刻的特征,a k代表第k个四元组信息中前一个时刻的目标动作。
进一步地,将迭代更新后的参数θ替换当前控制策略中的参数θ,从而得到当前时刻的下一时刻的控制策略,以便于用于确定下一时刻的目标动作。
当然,根据当前时刻的四元组信息,还可通过其它方式对当前控制策略进行更新得到下一时刻的控制策略,本申请实施例中对此并不作限制。
图13为本申请实施例提供的训练数据示意图。如图13所示,本申请实施例提供的控制策略的训练方法在训练过程中得到的不同阶段的控制策略在四中不同交通流(例如,稀疏场景,普通场景,拥塞场景以及非常拥塞的场景)中测试的表现趋势。图13中的横坐标代表整个训练过程的迭代次数(单位:10000次),纵坐标代表自动驾驶车辆在固定长度的车道上行驶完全程的所用时间(单位:秒)。其中,红色曲线代表仅用局部邻居特征作为输入训练(方案一)的收敛趋势,蓝色曲线代表在局部邻居特征的基础上,进一步加入了全局统计特征后进行训练(方案二)的趋势,绿色曲线代表计算回报值时引入了局部回报系数和全局回报系数后的训练(方案三)收敛趋势。可以看出,采用了全局统计特征后的控制策略的表现明显增加;当计算回报值时引入了局部回报系数和全局回报系数后,不仅可以加速收敛,同时能够增强控制策略的表现。
表1为本申请实施例提供的训练数据示意表。如表1所示,分别在稀疏场景,普通场景,拥塞场景以及非常拥塞的场景与相关方案进行比较。可见,本方案比相关方案在平均速度、平均换道次数这两项指标上均取得优势。同时,我们也统计了一些看似局部不合理,实际长程合理的换道行为,称之为“软换道”,发现我们的方案存在一定的“软换道”比例,说明我们的模型更具长远的智能性。
表1为本申请实施例提供的训练数据示意表
Figure PCTCN2020090234-appb-000074
图14为本申请一实施例提供的自动换道装置的结构示意图。如图14所示,本实施例提供的自动换道装置140可以包括:计算模块1401、获取模块1402以及执行模块1403。
其中,计算模块1401,用于根据自动驾驶车辆当前时刻的行驶信息以及所述自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算所述自动驾驶车辆当前时刻的局部邻居特征以及全局统计特征;所述局部邻居特征用于表示所述自动驾驶车辆的特定的邻居障碍物相对于所述自动驾驶车辆的运动状态信息;所述全局统计特征用于表示所述感知范围内各个车道的障碍物的稀疏与稠密程度;
获取模块1402,用于根据所述局部邻居特征、所述全局统计特征和当前控制策略 获取目标动作指示,所述目标动作指示用于指示所述自动驾驶车辆执行目标动作,所述目标动作至少包括两类:换道或保持直行;
执行模块1403,用于根据所述目标动作指示执行所述目标动作。
在一种可能的实现方式中,所述装置还包括:
反馈模块,用于通过执行所述目标动作得到反馈信息,所述反馈信息用于更新所述当前控制策略;其中,所述反馈信息包括所述自动驾驶车辆执行所述目标动作后的行驶信息,所述自动驾驶车辆在下一时刻的行驶信息和所述自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息;当所述目标动作为换道时,所述反馈信息还包括:执行所述目标动作的时间与历史平均时间的比值,以及所述自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况;其中,所述历史平均时间包括所述自动驾驶车辆在预设历史时间段内执行同类动作的平均时间;
更新模块,用于根据所述反馈信息更新所述当前控制策略得到下一时刻的控制策略。
在一种可能的实现方式中,所述更新模块包括:
计算单元,用于根据所述反馈信息计算所述目标动作对应的回报值,以及所述自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征;
确定单元,用于确定当前时刻的四元组信息;其中,所述当前时刻的四元组信息对应当前时刻车况,包括:所述当前时刻的特征、所述目标动作、所述目标动作对应的回报值以及所述下一时刻的特征,所述当前时刻的特征包括所述自动驾驶车辆在当前时刻的局部邻居特征和全局统计特征,所述下一时刻的特征包括所述自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征;
更新单元,用于根据所述当前时刻的四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略。
在一种可能的实现方式中,当所述目标动作为保持直行时,所述更新单元具体用于:
根据所述当前时刻的四元组信息,生成所述四元组信息对应的目标值;
利用梯度下降法对包含所述目标值的第一预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
在一种可能的实现方式中,当所述目标动作为换道时,所述更新单元具体用于:
获取所述当前时刻的延伸四元组信息,所述当前时刻的延伸四元组信息对应当前时刻延伸车况,其中所述当前时刻延伸车况是对所述当前时刻车况进行对称规则和单调规则处理得到的,所述对称规则是指以所述自动驾驶车辆所在车道为轴,将所述自动驾驶车辆所在车道的左右两侧所有车道上障碍物的位置进行对称变换;所述单调规则是指将所述换道的目标车道上的所述自动驾驶车辆的前后邻居障碍物之间的距离增大,和/或,非目标车道上的所述自动驾驶车辆的前后邻居障碍物之间的距离改变小于预设距离范围;
根据所述当前时刻的四元组信息和所述当前时刻的延伸四元组信息,对所述当前控制策略进行更新得到所述下一时刻的控制策略。
在一种可能的实现方式中,所述更新单元具体用于:
根据所述当前时刻的四元组信息和所述当前时刻的延伸四元组信息中的第i个四元组信息,生成所述第i个四元组信息对应的目标值;其中,所述i为取遍不大于n的正整数,n为所述当前时刻的四元组信息和所述当前时刻的延伸四元组信息中包括的四元组信息总数;
利用梯度下降法对包含所述第i个四元组信息对应的目标值的第二预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
在一种可能的实现方式中,当所述目标动作为保持直行时,所述更新单元具体用于:
根据所述当前时刻的四元组信息、历史时刻的四元组信息和所述历史时刻的延伸四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略;
其中,所述历史时刻的四元组信息对应历史时刻车况,包括:所述历史时刻的特征、所述历史时刻的目标动作、所述历史时刻的目标动作对应的回报值以及所述历史时刻的下一时刻的特征,所述历史时刻的特征包括所述自动驾驶车辆在历史时刻的局部邻居特征和全局统计特征,所述历史时刻的下一时刻的特征包括所述自动驾驶车辆在历史时刻的下一时刻的局部邻居特征和全局统计特征;所述历史时刻的延伸四元组信息对应历史时刻延伸车况,所述历史时刻延伸车况是对所述历史时刻车况进行对称规则和单调规则处理得到的。
在一种可能的实现方式中,所述更新单元具体用于:
根据所述当前时刻的四元组信息、所述历史时刻的四元组信息和所述历史时刻的延伸四元组信息中的第j个四元组信息,生成所述第j个四元组信息对应的目标值;其中,所述j为取遍不大于m的正整数,m为所述当前时刻的四元组信息、所述历史时刻的四元组信息和所述历史时刻的延伸四元组信息中包括的四元组信息总数;
利用梯度下降法对包含所述第j个四元组信息对应的目标值的第三预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
在一种可能的实现方式中,当所述目标动作为换道时,所述更新单元具体用于:
获取所述当前时刻的延伸四元组信息;其中,所述当前时刻的延伸四元组信息对应当前时刻延伸车况,所述当前时刻延伸车况是对所述当前时刻车况进行对称规则和单调规则处理得到的;
根据所述当前时刻的四元组信息、所述当前时刻的延伸四元组信息、历史时刻的四元组信息和所述历史时刻的延伸四元组信息,对所述当前控制策略进行更新得到所述下一时刻的控制策略;其中,所述历史时刻的四元组信息对应历史时刻车况,所述历史时刻的延伸四元组信息对应历史时刻延伸车况,所述历史时刻延伸车况是对所述历史时刻车况进行对称规则和单调规则处理得到的。
在一种可能的实现方式中,所述根据更新单元具体用于:
根据所述当前时刻的四元组信息、所述当前时刻的延伸四元组信息、所述历史时刻的四元组信息和所述历史时刻的延伸四元组信息中的第k个四元组信息,生成所述第k个四元组信息对应的目标值;其中,所述k为取遍不大于p的正整数,p为所述当前时刻的四元组信息、所述当前时刻的延伸四元组信息、所述历史时刻的四元组信息和所述历史时刻的延伸四元组信息中包括的四元组信息总数;
利用梯度下降法对包含所述第k个四元组信息对应的目标值的第四预设函数中的参数θ进行迭代更新;
将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
在一种可能的实现方式中,当所述目标动作为保持直行时,所述计算单元具体用于:
根据所述自动驾驶车辆执行所述目标动作后的行驶信息计算所述回报值;
根据所述自动驾驶车辆在下一时刻的行驶信息和所述自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息,计算所述自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征。
在一种可能的实现方式中,当所述目标动作为换道时,所述计算单元具体用于:
根据所述自动驾驶车辆执行所述目标动作后的行驶信息、所述执行所述目标动作的时间与历史平均时间的比值,以及所述自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况,计算所述回报值;
根据所述自动驾驶车辆在下一时刻的行驶信息和所述自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息,计算所述自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征。
在一种可能的实现方式中,所述自动驾驶车辆的特定的邻居障碍物包括以下至少一项:所述自动驾驶车辆所在车道上与所述自动驾驶车辆相邻的前后障碍物、所述自动驾驶车辆所在车道的相邻左车道上与所述自动驾驶车辆相邻的前后障碍物、所述自动驾驶车辆所在车道的相邻右车道上与所述自动驾驶车辆相邻的前后障碍物;
其中,当所述自动驾驶车辆位于左边道时,所述自动驾驶车辆所在车道的相邻左车道上与所述自动驾驶车辆相邻的前后障碍物,相对于所述自动驾驶车辆的运动状态信息为默认值;和/或,
当所述自动驾驶车辆位于右边道时,所述自动驾驶车辆所在车道的相邻右车道上与所述自动驾驶车辆相邻的前后障碍物,相对于所述自动驾驶车辆的运动状态信息为默认值。
在一种可能的实现方式中,所述自动驾驶车辆当前时刻的全局车流统计特征包括以下至少一项:所述感知范围内各个车道所有障碍物的平均行驶速度以及平均间隔。
本申请实施例提供的自动换道装置140,可以用于执行本申请上述自动换道方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
图15为本申请另一实施例提供的自动换道装置的结构示意图。如图15所示,本实施例提供的自动换道装置150可以包括:处理器1501和存储器1502。
其中,所述存储器1502,用于存储程序指令;
所述处理器1501,用于调用并执行所述存储器1502中存储的程序指令,当所述处理器1501执行所述存储器1502存储的程序指令时,所述自动换道装置用于执行本申请上述自动换道方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
可以理解的是,图15仅仅示出了自动换道装置的简化设计。在其他的实施方式中,自动换道装置还可以包含任意数量的处理器、存储器和/或通信单元等,本申请实施例中对此并不作限制。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行本申请上述自动换道方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
本申请实施例还提供一种程序,所述程序在被处理器执行时用于执行本申请上述自动换道方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行本申请上述自动换道方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
在一些实施例中,所公开的方法可以实施为以机器可读格式被编码在计算机可读存储介质上的或者被编码在其它非瞬时性介质或者制品上的计算机程序指令。图16为本申请实施例提供的计算机程序产品的概念性局部视图,图16示意性地示出根据这里展示的至少一些实施例而布置的示例计算机程序产品的概念性局部视图,所述示例计算机程序产品包括用于在计算设备上执行计算机进程的计算机程序。在一个实施例中,示例计算机程序产品600是使用信号承载介质601来提供的。所述信号承载介质601可以包括一个或多个程序指令602,其当被一个或多个处理器运行时可以实现本申请上述自动换道方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
在一些示例中,信号承载介质601可以包含计算机可读介质603,诸如但不限于硬盘驱动器、紧密盘(CD)、数字视频光盘(DVD)、数字磁带、存储器、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等等。在一些实施方式中,信号承载介质601可以包含计算机可记录介质604,诸如但不限于存储器、读/写(R/W)CD、R/W DVD、等等。在一些实施方式中,信号承载介质601可以包含通信介质605,诸如但不限于数字和/或模拟通信介质(例如,光纤电缆、波导、有线通信链路、无线通信链路、等等)。因此,例如,信号承载介质601可以由无线形式的通信介质605(例如,遵守IEEE 802.11标准或者其它传输协议的无线通信介质)来传达。一个或多个程序指令602可以是,例如,计算机可执行指令或者逻辑实施指令。在一些示例中,计算设备可以被配置为响应于通过计算机可读介质603、计算机可记录介质604、和/或通信介质605中的一个或多个传达到计算设备的程序指令602,提供各种操作、功能、或者动作。应该理解,这里描述的布置仅仅是用于示例的目的。因而,本领域技术人员将理解,其它布置和其它元素(例如,机器、接口、功能、顺序、和功能组等等)能够被取而代之地使用,并且一些元素可以根据所期望的结果而一并省略。另外,所描述的元素中的许多是可以被实现为离散的或者分布式的组件的、或者以任何适当的组合和位置来结合其它组件实施的功能实体。
图17为本申请一实施例提供的控制策略的训练装置的结构示意图。如图17所示,本实施例提供的控制策略的训练装置170可以包括:第一获取模块1701以及更新模块1702。
其中,第一获取模块1701,用于执行步骤A:获取预设数量个历史时刻的四元组信息,其中,所述历史时刻的四元组信息对应历史时刻车况,包括:所述历史时刻的特征、所述历史时刻的自动驾驶车辆的目标动作、所述历史时刻的目标动作对应的回报值以及所述历史时刻的下一时刻的特征,所述历史时刻的特征包括所述自动驾驶车辆在历史时刻的局部邻居特征和全局统计特征,所述历史时刻的下一时刻的特征包括所述自动驾驶车辆在所述历史时刻的下一时刻的局部邻居特征和全局统计特征;
更新模块1702,用于执行步骤B:根据至少一个第一历史时刻的四元组信息、所述至少一个第一历史时刻的延伸四元组信息,以及至少一个第二历史时刻的四元组信息,对当前控制策略进行更新得到下一时刻的控制策略;
其中,所述步骤A和步骤B的循环执行次数达到预设次数,或者所述步骤A和步骤B循环执行多次直至更新后的控制策略满足预设条件时停止;所述步骤A和步骤B循环执行多次最终得到的控制策略用于自动换道装置在执行自动换道方法时获取目标动作指示;
其中,所述至少一个第一历史时刻的四元组信息为所述预设数量个历史时刻的四元组信息中历史时刻的目标动作为换道所对应的历史时刻的四元组信息;所述至少一个第二历史时刻的四元组信息为所述预设数量个历史时刻的四元组信息中除所述至少一个第一历史时刻的四元组信息之外的其它历史时刻的四元组信息;任意所述第一历史时刻的延伸四元组信息对应第一历史时刻延伸车况,所述第一历史时刻延伸车况是对第一历史时刻车况进行对称规则和单调规则处理得到的。
在一种可能的实现方式中,所述更新模块1702,包括:
生成单元,用于根据所述至少一个第一历史时刻的四元组信息、所述至少一个第一历史时刻的延伸四元组信息,以及所述至少一个第二历史时刻的四元组信息中的第l个四元组信息,生成所述第l个四元组信息对应的目标值;其中,所述l为取遍不大于q的正整数,q为所述至少一个第一历史时刻的四元组信息、所述至少一个第一历史时刻的延伸四元组信息,以及所述至少一个第二历史时刻的四元组信息中包括的四元组信息总数;
更新单元,用于利用梯度下降法对包含所述第l个四元组信息对应的目标值的预设函数中的参数θ进行迭代更新;
替换单元,用于将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
在一种可能的实现方式中,所述装置还包括:
第一计算模块,用于对于每个历史时刻,根据自动驾驶车辆的行驶信息以及所述自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算所述自动驾驶车辆在所述历史时刻的局部邻居特征以及全局统计特征;
第二获取模块,用于根据所述历史时刻的局部邻居特征、全局统计特征和所述历史时刻的控制策略获取所述历史时刻的目标动作指示,所述目标动作指示用于指示所述自动驾驶车辆执行目标动作,所述目标动作至少包括两类:换道或保持直行;
反馈模块,用于通过执行所述目标动作得到反馈信息;其中,所述反馈信息包括所述 自动驾驶车辆执行所述目标动作后的行驶信息,所述自动驾驶车辆在下一时刻的行驶信息和所述自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息;当所述目标动作为换道时,所述反馈信息还包括:执行所述目标动作的时间与历史平均时间的比值,以及所述自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况;其中,所述历史平均时间包括所述自动驾驶车辆在预设历史时间段内执行同类动作的平均时间;
第二计算模块,用于根据所述反馈信息计算所述目标动作对应的回报值,以及所述自动驾驶车辆在所述历史时刻的下一时刻的局部邻居特征以及全局车流统计特征;
存储模块,用于存储所述历史时刻的四元组信息。
在一种可能的实现方式中,当所述目标动作为保持直行时,所述第二计算模块具体用于:
根据所述自动驾驶车辆执行所述目标动作后的行驶信息计算所述回报值。
在一种可能的实现方式中,当所述目标动作为换道时,所述第二计算模块具体用于:
根据所述自动驾驶车辆执行所述目标动作后的行驶信息、所述执行所述目标动作的时间与历史平均时间的比值,以及所述自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况,计算所述回报值。
本申请实施例提供的控制策略的训练装置170,可以用于执行本申请上述控制策略的训练方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
图18为本申请另一实施例提供的控制策略的训练装置的结构示意图。如图18所示,本实施例提供的控制策略的训练装置180可以包括:处理器1801和存储器1802;
其中,所述存储器1802,用于存储程序指令;
所述处理器1801,用于调用并执行所述存储器1802中存储的程序指令,当所述处理器1801执行所述存储器1802存储的程序指令时,所述控制策略的训练装置用于执行本申请上述控制策略的训练方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
可以理解的是,图18仅仅示出了控制策略的训练装置的简化设计。在其他的实施方式中,控制策略的训练装置还可以包含任意数量的处理器、存储器和/或通信单元等,本申请实施例中对此并不作限制。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行本申请上述控制策略的训练方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
本申请实施例还提供一种程序,所述程序在被处理器执行时用于执行本申请上述控制策略的训练方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行本申请上述控制策略的训练方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
示例性地,本申请实施例提供的计算机程序产品的概念性局部视图可以参考图16所示,此处不再赘述。
本申请实施例还提供一种芯片,该芯片包括处理器与数据接口,该处理器通过该数据接口读取存储器上存储的指令,执行上述控制策略的训练方法实施例或上述自动换道方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
可选地,作为一种实现方式,该芯片还可以包括存储器,该存储器中存储有指令,该处理器用于执行该存储器上存储的指令,当该指令被执行时,该处理器用于执行上述控制策略的训练方法实施例或上述自动换道方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
本申请实施例还提供一种电子设备,该电子设备包括上述自动换道装置实施例中提供的自动换道装置。
本申请实施例还提供一种电子设备,该电子设备包括上述控制策略的训练装置实施例中提供的控制策略的训练装置。
本申请实施例中涉及的处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
本申请实施例中涉及的存储器可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在上述各实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产 品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。

Claims (20)

  1. 一种自动换道方法,其特征在于,包括:
    根据自动驾驶车辆当前时刻的行驶信息以及所述自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算所述自动驾驶车辆当前时刻的局部邻居特征以及全局统计特征;所述局部邻居特征用于表示所述自动驾驶车辆的特定的邻居障碍物相对于所述自动驾驶车辆的运动状态信息;所述全局统计特征用于表示所述感知范围内各个车道的障碍物的稀疏与稠密程度;
    根据所述局部邻居特征、所述全局统计特征和当前控制策略获取目标动作指示,所述目标动作指示用于指示所述自动驾驶车辆执行目标动作,所述目标动作至少包括两类:换道或保持直行;
    根据所述目标动作指示执行所述目标动作。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    通过执行所述目标动作得到反馈信息,所述反馈信息用于更新所述当前控制策略;其中,所述反馈信息包括所述自动驾驶车辆执行所述目标动作后的行驶信息,所述自动驾驶车辆在下一时刻的行驶信息和所述自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息;当所述目标动作为换道时,所述反馈信息还包括:执行所述目标动作的时间与历史平均时间的比值,以及所述自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况;其中,所述历史平均时间包括所述自动驾驶车辆在预设历史时间段内执行同类动作的平均时间;
    根据所述反馈信息更新所述当前控制策略得到下一时刻的控制策略。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述反馈信息更新所述当前控制策略得到下一时刻的控制策略包括:
    根据所述反馈信息计算所述目标动作对应的回报值,以及所述自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征;
    确定当前时刻的四元组信息;其中,所述当前时刻的四元组信息对应当前时刻车况,包括:所述当前时刻的特征、所述目标动作、所述目标动作对应的回报值以及所述下一时刻的特征,所述当前时刻的特征包括所述自动驾驶车辆在当前时刻的局部邻居特征和全局统计特征,所述下一时刻的特征包括所述自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征;
    根据所述当前时刻的四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略。
  4. 根据权利要求3所述的方法,其特征在于,当所述目标动作为保持直行时,所述根据所述当前时刻的四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略,包括:
    根据所述当前时刻的四元组信息,生成所述四元组信息对应的目标值;
    利用梯度下降法对包含所述目标值的第一预设函数中的参数θ进行迭代更新;
    将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
  5. 根据权利要求3所述的方法,其特征在于,当所述目标动作为换道时,所述根据所述当前时刻的四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略,包括:
    获取所述当前时刻的延伸四元组信息,所述当前时刻的延伸四元组信息对应当前时刻延伸车况,其中所述当前时刻延伸车况是对所述当前时刻车况进行对称规则和单调规则处理得到的,所述对称规则是指以所述自动驾驶车辆所在车道为轴,将所述自动驾驶车辆所在车道的左右两侧所有车道上障碍物的位置进行对称变换;所述单调规则是指将所述换道的目标车道上的所述自动驾驶车辆的前后邻居障碍物之间的距离增大,和/或,非目标车道上的所述自动驾驶车辆的前后邻居障碍物之间的距离改变小于预设距离范围;
    根据所述当前时刻的四元组信息和所述当前时刻的延伸四元组信息,对所述当前控制策略进行更新得到所述下一时刻的控制策略。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述当前时刻的四元组信息和所述当前时刻的延伸四元组信息,对所述当前控制策略进行更新得到所述下一时刻的控制策略,包括:
    根据所述当前时刻的四元组信息和所述当前时刻的延伸四元组信息中的第i个四元组信息,生成所述第i个四元组信息对应的目标值;其中,所述i为取遍不大于n的正整数,n为所述当前时刻的四元组信息和所述当前时刻的延伸四元组信息中包括的四元组信息总数;
    利用梯度下降法对包含所述第i个四元组信息对应的目标值的第二预设函数中的参数θ进行迭代更新;
    将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
  7. 根据权利要求3所述的方法,其特征在于,当所述目标动作为保持直行时,所述根据所述当前时刻的四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略,包括:
    根据所述当前时刻的四元组信息、历史时刻的四元组信息和所述历史时刻的延伸四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略;
    其中,所述历史时刻的四元组信息对应历史时刻车况,包括:所述历史时刻的特征、所述历史时刻的目标动作、所述历史时刻的目标动作对应的回报值以及所述历史时刻的下一时刻的特征,所述历史时刻的特征包括所述自动驾驶车辆在历史时刻的局部邻居特征和全局统计特征,所述历史时刻的下一时刻的特征包括所述自动驾驶车辆在历史时刻的下一时刻的局部邻居特征和全局统计特征;所述历史时刻的延伸四元组信息对应历史时刻延伸车况,所述历史时刻延伸车况是对所述历史时刻车况进行对称规则和单调规则处理得到的。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述当前时刻的四元组信息、历史时刻的四元组信息和所述历史时刻的延伸四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略,包括:
    根据所述当前时刻的四元组信息、所述历史时刻的四元组信息和所述历史时刻的 延伸四元组信息中的第j个四元组信息,生成所述第j个四元组信息对应的目标值;其中,所述j为取遍不大于m的正整数,m为所述当前时刻的四元组信息、所述历史时刻的四元组信息和所述历史时刻的延伸四元组信息中包括的四元组信息总数;
    利用梯度下降法对包含所述第j个四元组信息对应的目标值的第三预设函数中的参数θ进行迭代更新;
    将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
  9. 根据权利要求3所述的方法,其特征在于,当所述目标动作为换道时,所述根据所述当前时刻的四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略,包括:
    获取所述当前时刻的延伸四元组信息;其中,所述当前时刻的延伸四元组信息对应当前时刻延伸车况,所述当前时刻延伸车况是对所述当前时刻车况进行对称规则和单调规则处理得到的;
    根据所述当前时刻的四元组信息、所述当前时刻的延伸四元组信息、历史时刻的四元组信息和所述历史时刻的延伸四元组信息,对所述当前控制策略进行更新得到所述下一时刻的控制策略;其中,所述历史时刻的四元组信息对应历史时刻车况,所述历史时刻的延伸四元组信息对应历史时刻延伸车况,所述历史时刻延伸车况是对所述历史时刻车况进行对称规则和单调规则处理得到的。
  10. 一种自动换道装置,其特征在于,包括:
    计算模块,用于根据自动驾驶车辆当前时刻的行驶信息以及所述自动驾驶车辆感知范围内各个车道的障碍物的运动信息,计算所述自动驾驶车辆当前时刻的局部邻居特征以及全局统计特征;所述局部邻居特征用于表示所述自动驾驶车辆的特定的邻居障碍物相对于所述自动驾驶车辆的运动状态信息;所述全局统计特征用于表示所述感知范围内各个车道的障碍物的稀疏与稠密程度;
    获取模块,用于根据所述局部邻居特征、所述全局统计特征和当前控制策略获取目标动作指示,所述目标动作指示用于指示所述自动驾驶车辆执行目标动作,所述目标动作至少包括两类:换道或保持直行;
    执行模块,用于根据所述目标动作指示执行所述目标动作。
  11. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    反馈模块,用于通过执行所述目标动作得到反馈信息,所述反馈信息用于更新所述当前控制策略;其中,所述反馈信息包括所述自动驾驶车辆执行所述目标动作后的行驶信息,所述自动驾驶车辆在下一时刻的行驶信息和所述自动驾驶车辆感知范围内各个车道的障碍物在下一时刻的运动信息;当所述目标动作为换道时,所述反馈信息还包括:执行所述目标动作的时间与历史平均时间的比值,以及所述自动驾驶车辆换道后所在车道与换道前所在车道在稀疏与稠密程度上的变化情况;其中,所述历史平均时间包括所述自动驾驶车辆在预设历史时间段内执行同类动作的平均时间;
    更新模块,用于根据所述反馈信息更新所述当前控制策略得到下一时刻的控制策略。
  12. 根据权利要求11所述的装置,其特征在于,所述更新模块包括:
    计算单元,用于根据所述反馈信息计算所述目标动作对应的回报值,以及所述自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征;
    确定单元,用于确定当前时刻的四元组信息;其中,所述当前时刻的四元组信息对应当前时刻车况,包括:所述当前时刻的特征、所述目标动作、所述目标动作对应的回报值以及所述下一时刻的特征,所述当前时刻的特征包括所述自动驾驶车辆在当前时刻的局部邻居特征和全局统计特征,所述下一时刻的特征包括所述自动驾驶车辆在下一时刻的局部邻居特征和全局统计特征;
    更新单元,用于根据所述当前时刻的四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略。
  13. 根据权利要求12所述的装置,其特征在于,当所述目标动作为保持直行时,所述更新单元具体用于:
    根据所述当前时刻的四元组信息,生成所述四元组信息对应的目标值;
    利用梯度下降法对包含所述目标值的第一预设函数中的参数θ进行迭代更新;
    将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
  14. 根据权利要求12所述的装置,其特征在于,当所述目标动作为换道时,所述更新单元具体用于:
    获取所述当前时刻的延伸四元组信息,所述当前时刻的延伸四元组信息对应当前时刻延伸车况,其中所述当前时刻延伸车况是对所述当前时刻车况进行对称规则和单调规则处理得到的,所述对称规则是指以所述自动驾驶车辆所在车道为轴,将所述自动驾驶车辆所在车道的左右两侧所有车道上障碍物的位置进行对称变换;所述单调规则是指将所述换道的目标车道上的所述自动驾驶车辆的前后邻居障碍物之间的距离增大,和/或,非目标车道上的所述自动驾驶车辆的前后邻居障碍物之间的距离改变小于预设距离范围;
    根据所述当前时刻的四元组信息和所述当前时刻的延伸四元组信息,对所述当前控制策略进行更新得到所述下一时刻的控制策略。
  15. 根据权利要求14所述的装置,其特征在于,所述更新单元具体用于:
    根据所述当前时刻的四元组信息和所述当前时刻的延伸四元组信息中的第i个四元组信息,生成所述第i个四元组信息对应的目标值;其中,所述i为取遍不大于n的正整数,n为所述当前时刻的四元组信息和所述当前时刻的延伸四元组信息中包括的四元组信息总数;
    利用梯度下降法对包含所述第i个四元组信息对应的目标值的第二预设函数中的参数θ进行迭代更新;
    将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
  16. 根据权利要求12所述的装置,其特征在于,当所述目标动作为保持直行时,所述更新单元具体用于:
    根据所述当前时刻的四元组信息、历史时刻的四元组信息和所述历史时刻的延伸四元组信息对所述当前控制策略进行更新得到所述下一时刻的控制策略;
    其中,所述历史时刻的四元组信息对应历史时刻车况,包括:所述历史时刻的特征、所述历史时刻的目标动作、所述历史时刻的目标动作对应的回报值以及所述历史时刻的下一时刻的特征,所述历史时刻的特征包括所述自动驾驶车辆在历史时刻的局部邻居特征和全局统计特征,所述历史时刻的下一时刻的特征包括所述自动驾驶车辆在历史时刻的下一时刻的局部邻居特征和全局统计特征;所述历史时刻的延伸四元组信息对应历史时刻延伸车况,所述历史时刻延伸车况是对所述历史时刻车况进行对称规则和单调规则处理得到的。
  17. 根据权利要求16所述的装置,其特征在于,所述更新单元具体用于:
    根据所述当前时刻的四元组信息、所述历史时刻的四元组信息和所述历史时刻的延伸四元组信息中的第j个四元组信息,生成所述第j个四元组信息对应的目标值;其中,所述j为取遍不大于m的正整数,m为所述当前时刻的四元组信息、所述历史时刻的四元组信息和所述历史时刻的延伸四元组信息中包括的四元组信息总数;
    利用梯度下降法对包含所述第j个四元组信息对应的目标值的第三预设函数中的参数θ进行迭代更新;
    将迭代更新后的参数θ替换所述当前控制策略中的参数θ,得到所述下一时刻的控制策略。
  18. 根据权利要求12所述的装置,其特征在于,当所述目标动作为换道时,所述更新单元具体用于:
    获取所述当前时刻的延伸四元组信息;其中,所述当前时刻的延伸四元组信息对应当前时刻延伸车况,所述当前时刻延伸车况是对所述当前时刻车况进行对称规则和单调规则处理得到的;
    根据所述当前时刻的四元组信息、所述当前时刻的延伸四元组信息、历史时刻的四元组信息和所述历史时刻的延伸四元组信息,对所述当前控制策略进行更新得到所述下一时刻的控制策略;其中,所述历史时刻的四元组信息对应历史时刻车况,所述历史时刻的延伸四元组信息对应历史时刻延伸车况,所述历史时刻延伸车况是对所述历史时刻车况进行对称规则和单调规则处理得到的。
  19. 一种自动换道装置,其特征在于,包括:处理器和存储器;
    其中,所述存储器,用于存储程序指令;
    所述处理器,用于调用并执行所述存储器中存储的程序指令,当所述处理器执行所述存储器存储的程序指令时,所述自动换道装置用于执行如权利要求1至9中任一项所述的方法。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行如权利要求1至9中任一项所述的方法。
PCT/CN2020/090234 2019-05-21 2020-05-14 自动换道方法、装置及存储介质 WO2020233495A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20809674.3A EP3965004A4 (en) 2019-05-21 2020-05-14 AUTOMATIC LANE CHANGE METHOD AND DEVICE AND STORAGE MEDIA
US17/532,640 US20220080972A1 (en) 2019-05-21 2021-11-22 Autonomous lane change method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910426248.7 2019-05-21
CN201910426248.7A CN110532846B (zh) 2019-05-21 2019-05-21 自动换道方法、装置及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/532,640 Continuation US20220080972A1 (en) 2019-05-21 2021-11-22 Autonomous lane change method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2020233495A1 true WO2020233495A1 (zh) 2020-11-26

Family

ID=68659298

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/090234 WO2020233495A1 (zh) 2019-05-21 2020-05-14 自动换道方法、装置及存储介质

Country Status (4)

Country Link
US (1) US20220080972A1 (zh)
EP (1) EP3965004A4 (zh)
CN (2) CN110532846B (zh)
WO (1) WO2020233495A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102018216364B4 (de) * 2018-09-25 2020-07-09 Volkswagen Aktiengesellschaft Verfahren und Vorrichtung zum Unterstützen eines Spurwechselvorgangs für ein Fahrzeug
CN110532846B (zh) * 2019-05-21 2022-09-16 华为技术有限公司 自动换道方法、装置及存储介质
CN113009539A (zh) * 2021-02-19 2021-06-22 恒大新能源汽车投资控股集团有限公司 一种车辆自动变道处理方法、车辆及设备
CN113257027B (zh) * 2021-07-16 2021-11-12 深圳知帮办信息技术开发有限公司 针对连续变道行为的导航控制系统
CN114013443B (zh) * 2021-11-12 2022-09-23 哈尔滨工业大学 一种基于分层强化学习的自动驾驶车辆换道决策控制方法
CN114701500B (zh) * 2022-03-30 2023-06-13 小米汽车科技有限公司 车辆变道方法、装置及介质
CN116653963B (zh) * 2023-07-31 2023-10-20 福思(杭州)智能科技有限公司 车辆变道控制方法、系统和智能驾驶域控制器

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391504A (zh) * 2014-11-25 2015-03-04 浙江吉利汽车研究院有限公司 基于车联网的自动驾驶控制策略的生成方法与生成装置
US20160082953A1 (en) * 2012-05-14 2016-03-24 Google Inc. Consideration of Risks in Active Sensing for an Autonomous Vehicle
CN107539313A (zh) * 2016-06-23 2018-01-05 本田技研工业株式会社 车辆通信网络以及其使用和制造方法
CN108227710A (zh) * 2017-12-29 2018-06-29 商汤集团有限公司 自动驾驶控制方法和装置、电子设备、程序和介质
CN108313054A (zh) * 2018-01-05 2018-07-24 北京智行者科技有限公司 自动驾驶自主换道决策方法和装置及自动驾驶车辆
CN108583578A (zh) * 2018-04-26 2018-09-28 北京领骏科技有限公司 用于自动驾驶车辆的基于多目标决策矩阵的车道决策方法
CN110532846A (zh) * 2019-05-21 2019-12-03 华为技术有限公司 自动换道方法、装置及存储介质

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4313568C1 (de) * 1993-04-26 1994-06-16 Daimler Benz Ag Verfahren zur Leithilfe für einen Fahrspurwechsel durch ein Kraftfahrzeug
US7940961B2 (en) * 2007-12-20 2011-05-10 The United States Of America As Represented By The Secretary Of The Navy Method for enhancing ground-based detection of a moving object
CN105329238B (zh) * 2015-12-04 2017-08-08 北京航空航天大学 一种基于单目视觉的自动驾驶汽车换道控制方法
CN108431549B (zh) * 2016-01-05 2020-09-04 御眼视觉技术有限公司 具有施加的约束的经训练的系统
US10788836B2 (en) * 2016-02-29 2020-09-29 AI Incorporated Obstacle recognition method for autonomous robots
US11016497B2 (en) * 2016-03-28 2021-05-25 Honda Motor Co., Ltd. Vehicle control system, vehicle control method, and vehicle control program
DE102016216135A1 (de) * 2016-08-29 2018-03-01 Bayerische Motoren Werke Aktiengesellschaft Spurwechselassistenzsystem und -verfahren zum automatisierten Durchführen mehrfacher Spurwechsel
KR102513185B1 (ko) * 2017-01-12 2023-03-23 모빌아이 비젼 테크놀로지스 엘티디. 규칙 기반 항법
US10671076B1 (en) * 2017-03-01 2020-06-02 Zoox, Inc. Trajectory prediction of third-party objects using temporal logic and tree search
US10133275B1 (en) * 2017-03-01 2018-11-20 Zoox, Inc. Trajectory generation using temporal logic and tree search
CN106991846B (zh) * 2017-05-15 2020-04-21 东南大学 一种车联网环境下的高速公路车辆强制换道控制方法
US10942256B2 (en) * 2017-06-05 2021-03-09 Metawave Corporation Intelligent metamaterial radar for target identification
EP3764060A1 (en) * 2017-06-14 2021-01-13 Mobileye Vision Technologies Ltd. Fusion framework of navigation information for autonomous navigation
US11093829B2 (en) * 2017-10-12 2021-08-17 Honda Motor Co., Ltd. Interaction-aware decision making
US10739776B2 (en) * 2017-10-12 2020-08-11 Honda Motor Co., Ltd. Autonomous vehicle policy generation
EP3642092A2 (en) * 2018-03-20 2020-04-29 Mobileye Vision Technologies Ltd. Systems and methods for navigating a vehicle
US20210312725A1 (en) * 2018-07-14 2021-10-07 Moove.Ai Vehicle-data analytics
WO2020018394A1 (en) * 2018-07-14 2020-01-23 Moove.Ai Vehicle-data analytics
CN109085837B (zh) * 2018-08-30 2023-03-24 阿波罗智能技术(北京)有限公司 车辆控制方法、装置、计算机设备及存储介质
JP2020091611A (ja) * 2018-12-04 2020-06-11 富士通株式会社 行動決定プログラム、行動決定方法、および行動決定装置
CN109737977A (zh) * 2018-12-10 2019-05-10 北京百度网讯科技有限公司 自动驾驶车辆定位方法、装置及存储介质
CN109582022B (zh) * 2018-12-20 2021-11-02 驭势科技(北京)有限公司 一种自动驾驶策略决策系统与方法
CN111382768B (zh) * 2018-12-29 2023-11-14 华为技术有限公司 多传感器数据融合方法和装置
CN109557928A (zh) * 2019-01-17 2019-04-02 湖北亿咖通科技有限公司 基于矢量地图和栅格地图的自动驾驶车辆路径规划方法
US11242050B2 (en) * 2019-01-31 2022-02-08 Honda Motor Co., Ltd. Reinforcement learning with scene decomposition for navigating complex environments
CN109739246B (zh) * 2019-02-19 2022-10-11 阿波罗智能技术(北京)有限公司 一种变换车道过程中的决策方法、装置、设备及存储介质
US11279361B2 (en) * 2019-07-03 2022-03-22 Toyota Motor Engineering & Manufacturing North America, Inc. Efficiency improvement for machine learning of vehicle control using traffic state estimation
US20210049465A1 (en) * 2019-08-12 2021-02-18 University Of Southern California Self-optimizing and self-programming computing systems: a combined compiler, complex networks, and machine learning approach
US11809977B2 (en) * 2019-11-14 2023-11-07 NEC Laboratories Europe GmbH Weakly supervised reinforcement learning
EP4268142A1 (en) * 2020-12-22 2023-11-01 Telefonaktiebolaget LM Ericsson (publ) Methods and apparatuses of determining for controlling a multi-agent reinforcement learning environment
KR20230085697A (ko) * 2021-12-07 2023-06-14 현대자동차주식회사 차량 제어 장치, 및 그를 이용한 차량 제어 방법

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160082953A1 (en) * 2012-05-14 2016-03-24 Google Inc. Consideration of Risks in Active Sensing for an Autonomous Vehicle
CN104391504A (zh) * 2014-11-25 2015-03-04 浙江吉利汽车研究院有限公司 基于车联网的自动驾驶控制策略的生成方法与生成装置
CN107539313A (zh) * 2016-06-23 2018-01-05 本田技研工业株式会社 车辆通信网络以及其使用和制造方法
CN108227710A (zh) * 2017-12-29 2018-06-29 商汤集团有限公司 自动驾驶控制方法和装置、电子设备、程序和介质
CN108313054A (zh) * 2018-01-05 2018-07-24 北京智行者科技有限公司 自动驾驶自主换道决策方法和装置及自动驾驶车辆
CN108583578A (zh) * 2018-04-26 2018-09-28 北京领骏科技有限公司 用于自动驾驶车辆的基于多目标决策矩阵的车道决策方法
CN110532846A (zh) * 2019-05-21 2019-12-03 华为技术有限公司 自动换道方法、装置及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3965004A4

Also Published As

Publication number Publication date
US20220080972A1 (en) 2022-03-17
EP3965004A1 (en) 2022-03-09
EP3965004A4 (en) 2023-01-25
CN110532846B (zh) 2022-09-16
CN110532846A (zh) 2019-12-03
CN115578711A (zh) 2023-01-06

Similar Documents

Publication Publication Date Title
WO2020233495A1 (zh) 自动换道方法、装置及存储介质
CN109901574B (zh) 自动驾驶方法及装置
EP3835908B1 (en) Automatic driving method, training method and related apparatuses
US20210262808A1 (en) Obstacle avoidance method and apparatus
CN110379193B (zh) 自动驾驶车辆的行为规划方法及行为规划装置
WO2021102955A1 (zh) 车辆的路径规划方法以及车辆的路径规划装置
WO2022027304A1 (zh) 一种自动驾驶车辆的测试方法及装置
WO2022001773A1 (zh) 轨迹预测方法及装置
WO2021000800A1 (zh) 道路可行驶区域推理方法及装置
WO2021244207A1 (zh) 训练驾驶行为决策模型的方法及装置
WO2021212379A1 (zh) 车道线检测方法及装置
CN110371132B (zh) 驾驶员接管评估方法及装置
WO2022017307A1 (zh) 自动驾驶场景生成方法、装置及系统
WO2021168669A1 (zh) 车辆控制方法及装置
WO2022142839A1 (zh) 一种图像处理方法、装置以及智能汽车
WO2022000127A1 (zh) 一种目标跟踪方法及其装置
US20230399023A1 (en) Vehicle Driving Intention Prediction Method, Apparatus, and Terminal, and Storage Medium
WO2022016901A1 (zh) 一种规划车辆行驶路线的方法以及智能汽车
US20230048680A1 (en) Method and apparatus for passing through barrier gate crossbar by vehicle
CN113552867A (zh) 一种运动轨迹的规划方法及轮式移动设备
US20230107033A1 (en) Method for optimizing decision-making regulation and control, method for controlling traveling of vehicle, and related apparatus
CN113741384A (zh) 检测自动驾驶系统的方法和装置
WO2021254000A1 (zh) 车辆纵向运动参数的规划方法和装置
US20230256970A1 (en) Lane Change Track Planning Method and Apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20809674

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020809674

Country of ref document: EP

Effective date: 20211202