CN114675655A

CN114675655A - Vehicle control method and device

Info

Publication number: CN114675655A
Application number: CN202210404156.0A
Authority: CN
Inventors: 徐鑫
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2022-06-28

Abstract

The invention discloses a vehicle control method and device, and relates to the technical field of intelligent driving. One embodiment of the method comprises: determining a plurality of lanes in an intersection control area, one or more conflict subareas corresponding to the lanes, and a plurality of vehicles to be controlled; determining the passing sequence of the vehicles to be controlled corresponding to the intersection control area by utilizing a Monte Carlo tree search algorithm according to the number of the vehicles to be controlled and the identification of the target vehicle; the target vehicle is a vehicle in a lane closest to the conflict partition and/or a vehicle with the minimum arrival time aiming at the same conflict partition; determining the running speed of the vehicle to be controlled according to the passing sequence and the positions of one or more conflict subareas in the intersection control area; and controlling the vehicle to be controlled to pass through the intersection control area according to the passing sequence and the running speed. The embodiment reduces the calculation time of the search tree, improves the efficiency of determining the passing sequence and further improves the efficiency of vehicle passing in the intersection control area.

Description

Vehicle control method and device

Technical Field

The invention relates to the technical field of intelligent driving, in particular to a vehicle control method and device.

Background

The road right distribution of the existing urban intersections obtains the optimal passing sequence by searching for the optimal nodes on a search tree.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

the existing road weight distribution method needs to exhaustively traverse all nodes in a search tree, and calculate a target function value corresponding to each node, so as to find out an optimal passing sequence corresponding to the optimal node. When the intersection traffic flow is large and the number of lanes is large, the calculation time increases exponentially with the increase of the number of vehicles, so that all nodes cannot be traversed to find the optimal node within a limited time, and the requirement of real-time performance cannot be met.

Disclosure of Invention

In view of this, embodiments of the present invention provide a vehicle control method and apparatus, which can use a target vehicle meeting a condition as a heuristic strategy of a monte carlo tree search algorithm, determine a passing order of a plurality of vehicles to be controlled in an intersection control area according to the monte carlo tree search algorithm, and determine a traveling speed of the vehicle to be controlled, so as to control the vehicle to be controlled to pass through the intersection control area; therefore, by combining the Monte Carlo tree search algorithm and the heuristic strategy, the traversal direction of the search tree can be quickly and accurately guided, the condition that the optimal passing sequence can be obtained only by exhaustively traversing all nodes is avoided, and the tree search efficiency is improved; the problem of overlong calculation time caused by the increase of vehicles and lanes is also solved, the real-time requirement of vehicle passing is met, and the passing efficiency of the vehicles in the intersection control area is further improved.

Furthermore, the running speed of the vehicle to be controlled is determined according to the constructed time fitting model and the speed fitting model, and the vehicle to be controlled is controlled according to the running speed, so that the reasonable control of the vehicle is realized, and the vehicle passing efficiency of the intersection control area is improved.

To achieve the above object, according to a first aspect of an embodiment of the present invention, there is provided a vehicle control method including:

determining a plurality of lanes in an intersection control area, one or more conflict subareas corresponding to the lanes and a plurality of vehicles to be controlled;

determining the passing sequence of the vehicles to be controlled corresponding to the intersection control area by utilizing a Monte Carlo tree search algorithm according to the number of the vehicles to be controlled and the identification of the target vehicle; the target vehicle is a vehicle in a lane closest to the collision zone and/or a vehicle with the minimum arrival time for the same collision zone;

determining the running speed of the vehicle to be controlled according to the passing sequence and the positions of the one or more conflict subareas in the intersection control area;

and controlling the vehicle to be controlled to pass through the intersection control area according to the passing sequence and the running speed.

Optionally, the determining, by using a monte carlo tree search algorithm, a passing order of the plurality of vehicles to be controlled corresponding to the intersection control area includes:

constructing a root node and a first-level child node of a search tree according to the identifications of the vehicles to be controlled and the number of the vehicles; one of the primary child nodes corresponds to a vehicle identification;

expanding a root node of the search tree by using the Monte Carlo tree search algorithm according to the identification of the target vehicle to generate one or more leaf nodes; the leaf nodes indicate the corresponding passing sequence of a plurality of vehicles to be controlled;

and determining the passing sequence of the vehicles to be controlled corresponding to the intersection control area according to the passing sequence indicated by the leaf nodes.

Optionally, the expanding a root node of the search tree to generate one or more leaf nodes includes:

circularly executing the following steps until reaching the preset time:

determining a primary child node with the maximum UCB value as a current child node according to a UCB algorithm searched by a Monte Carlo tree;

circularly executing the following steps until the determined subordinate node is a leaf node:

and determining a target node as a lower node of the current child node according to the UCB value and/or the identification of the target vehicle by using the Monte Carlo tree search algorithm, and taking the lower node as the current child node.

Optionally, the determining, according to the UCB value and/or the identifier of the target vehicle, the target node as a lower node of the current child node includes:

determining whether a target node exists in a primary child node corresponding to the target vehicle, wherein the target node is not a superior node of the current child node;

and if so, determining the lower node of the current child node from the target node.

Alternatively, in the case where there are multiple target nodes,

and determining the target node with the maximum UCB value as a lower node of the current child node according to the UCB values of the target nodes.

Optionally, after the determined lower node is a leaf node, the method further includes:

and according to the objective function values corresponding to the leaf nodes, utilizing a back propagation algorithm of Monte Carlo tree search to reversely update the statistical information corresponding to the plurality of superior nodes of the leaf nodes, wherein the statistical information comprises the accumulated scores and the access times.

Optionally, the determining, according to the UCB algorithm of the monte carlo tree search, the first-level child node with the largest UCB value as the current child node includes:

for each level one child node: determining the UCB value of the primary child node according to the statistical information of the primary child node, the statistical information of the root node and the weight parameter;

and taking the primary child node with the maximum UCB value as the current child node.

Optionally, the method further comprises:

taking a vehicle which is located in one lane and is closest to a conflict partition as the target vehicle;

and/or the presence of a gas in the gas,

taking the corresponding vehicle with the shortest arrival time as the target vehicle from a plurality of vehicles to be controlled corresponding to the same conflict partition; the fastest arrival time indicates a time at which the vehicle arrives at the conflict zone at a maximum vehicle speed and acceleration.

Optionally, the determining the passing order of the vehicles to be controlled corresponding to the intersection control area includes:

determining an optimal traversal path according to the statistical information of each node of the search tree;

and determining a target leaf node corresponding to the optimal traversal path, and taking a passing sequence corresponding to the target leaf node as a passing sequence of the intersection control area.

Optionally, the determining the running speed of the vehicle to be controlled includes:

aiming at a plurality of vehicles to be controlled corresponding to the same conflict subarea: determining the predicted arrival time of the vehicles to be controlled to arrive at the conflict subarea according to the positions of the conflict subareas in the intersection control areas and the passing sequence;

and determining the running speeds of the vehicles to be controlled according to the predicted arrival time, the vehicle state function and the double-integral dynamic model.

Optionally, the determining the predicted arrival times of the plurality of vehicles to be controlled to the conflict partition includes:

constructing a time fitting model and a plurality of first constraint conditions; the first constraint condition comprises a fastest time constraint, a collision constraint and a collision interval constraint;

and according to the first constraint condition, obtaining estimated arrival times respectively corresponding to the arrival of the vehicles to be controlled to the conflict subareas by utilizing the time fitting model for fitting.

Optionally, the determining the traveling speeds of the vehicles to be controlled according to the predicted arrival time, the vehicle state function and the double-integral dynamics model includes:

constructing a speed fitting model;

for each of the vehicles to be controlled: determining state parameters of the vehicle to be controlled; the state parameters include: vehicle travel time, vehicle travel location, vehicle maximum vehicle speed, vehicle minimum vehicle speed, vehicle constant speed, vehicle maximum acceleration, and vehicle minimum acceleration;

determining the vehicle state function according to the state parameters;

and fitting the running speed of the vehicle to be controlled according to the speed fitting model, the predicted arrival time, the vehicle state function and the double-integral dynamics model.

According to a second aspect of the embodiment of the present invention, there is provided a vehicle control apparatus including: the device comprises a determining module, a searching module and a control module; wherein,

the determining module is used for determining a plurality of lanes in an intersection control area, one or more conflict subareas corresponding to the lanes and a plurality of vehicles to be controlled;

the searching module is used for determining the passing sequence of the vehicles to be controlled corresponding to the intersection control area by utilizing a Monte Carlo tree searching algorithm according to the number of the vehicles to be controlled and the identification of the target vehicle; the target vehicle is a vehicle in a lane closest to the conflict partition and/or a vehicle with the minimum arrival time aiming at the same conflict partition;

the control module is used for determining the running speed of the vehicle to be controlled according to the passing sequence and the positions of the one or more conflict subareas in the intersection control area; and controlling the vehicle to be controlled to pass through the intersection control area according to the passing sequence and the running speed.

According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method as in any one of the vehicle control methods provided in the first aspect above.

According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as in any one of the vehicle control methods provided by the first aspect above.

One embodiment of the above invention has the following advantages or benefits: the target vehicle meeting the conditions can be used as a heuristic strategy of a Monte Carlo tree search algorithm, the passing sequence of a plurality of vehicles to be controlled in the intersection control area is determined according to the Monte Carlo tree search algorithm, and the running speed of the vehicles to be controlled is determined, so that the vehicles to be controlled are controlled to pass through the intersection control area; therefore, by combining the Monte Carlo tree search algorithm and the heuristic strategy, the traversal direction of the search tree can be quickly and accurately guided, the condition that the optimal passing sequence can be obtained only by exhaustively traversing all the nodes is avoided, and the tree search efficiency is improved; the problem of overlong calculation time caused by the increase of vehicles and lanes is also solved, the real-time requirement of vehicle passing is met, and the passing efficiency of the vehicles in the intersection control area is further improved.

Furthermore, the running speed of the vehicle to be controlled is determined according to the constructed time fitting model and the speed fitting model, and the vehicle to be controlled is controlled according to the running speed, so that the reasonable control of the vehicle is realized, and the passing efficiency of the vehicle in the intersection control area is improved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic flow chart diagram of a vehicle control method provided by one embodiment of the present invention;

FIG. 2 is a schematic diagram of a multi-lane intersection control area provided by one embodiment of the present invention;

FIG. 3 is a schematic diagram of a single lane intersection control area provided by one embodiment of the present invention;

FIG. 4 is a diagram of a tree representation of a traffic order provided by one embodiment of the present invention;

FIG. 5 is a diagram illustrating an iterative process of a Monte Carlo tree search according to an embodiment of the present invention;

FIG. 6 is a schematic flow chart diagram of a vehicle control method provided by another embodiment of the present invention;

fig. 7 is a schematic structural diagram of a vehicle control apparatus according to an embodiment of the present invention;

FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.

As shown in fig. 1, an embodiment of the present invention provides a vehicle control method, which may include the following steps S101 to S104:

step S101: the method comprises the steps of determining a plurality of lanes in an intersection control area, one or more conflict subareas corresponding to the lanes, and a plurality of vehicles to be controlled.

Step S102: determining the passing sequence of the vehicles to be controlled corresponding to the intersection control area by utilizing a Monte Carlo tree search algorithm according to the number of the vehicles to be controlled and the identification of the target vehicle; the target vehicle is a vehicle in a lane closest to the collision zone and/or a vehicle with the smallest arrival time for the same collision zone.

Step S103: and determining the running speed of the vehicle to be controlled according to the passing sequence and the positions of the one or more conflict subareas in the intersection control area.

Step S104: and controlling the vehicle to be controlled to pass through the intersection control area according to the passing sequence and the running speed.

One application scenario of the invention is a scenario in which a plurality of autonomous vehicles cooperatively pass at an urban intersection. In a multi-lane complex urban intersection, a plurality of automatic driving vehicles can adopt wireless communication and sensing technologies to interact with surrounding vehicles and roadside equipment, share the state information (such as vehicle positions, vehicle speeds and the like) of the vehicles and inform a system of planned passing routes of the vehicles; calculating the passing sequence of a plurality of automatic driving vehicles at the urban intersection through an intelligent vehicle-road cooperative system to obtain the optimal passing sequence; and then controlling a plurality of automatic driving vehicles to pass through the urban intersection by controlling the running speed of the vehicles according to the optimal passing sequence.

A schematic diagram of a multi-lane urban intersection can be shown in fig. 2, an intersection control area includes multiple lanes, and multiple vehicles running in the intersection control area are to-be-controlled vehicles. Vehicles in different directions may collide when passing through a plurality of intersection regions formed by a plurality of lanes in the figure, and the intersection regions are a plurality of collision zones. In fig. 2, there are four directions, three lanes in each direction, and the lanes are marked and can be represented as lane 1, lane 2, lane 3 … to lane 12, so that the 36 collision zones formed by these 12 lanes can be divided into collision zone 1 and collision zone 2 … collision zone 36 according to their positions in the intersection control area.

In addition, the number of the vehicles to be controlled in the intersection control area can be determined in real time through a sensing technology, and meanwhile, the vehicles to be controlled can be numbered so as to obtain the vehicle identification of the vehicles in the system. Of course, the key information (such as the license plate number) of the vehicle obtained by using the wireless communication technology can also be used as the identification of the vehicle.

The passing efficiency of a plurality of vehicles in the urban intersection control area is mainly determined by the passing sequence of the plurality of vehicles to be controlled. Multiple pass orders may form a solution space for a search tree. And each node of the search tree represents a feasible traffic order. Wherein the passing order represents the priority between vehicles passing through the same conflict partition. For example, two vehicles will pass through a conflicting zone at the same time, and a vehicle that is forward in the order of passage may preferentially pass through. Taking the single intersection scene shown in fig. 3 as an example, the control area includes

lanes

1, 2, 3, and 4 and includes five vehicles with vehicle identifications A, B, C, D, E. Wherein the passage order BD indicates: when the vehicle D passes through the conflict subarea 2, the vehicle D needs to yield, and the time of the vehicle D reaching the conflict subarea is adjusted according to the time of the vehicle B reaching the conflict subarea 2. However, when the passing sequence is BC, the vehicle C can still enter the conflict zone 1 at the fastest arrival time because there is no conflict between the vehicle C and the vehicle B. In other words, the objective function values corresponding to the passing orders BC and CB are the same. Each vehicle determines the fastest time to reach the conflict subarea as much as possible according to the passing sequence. Thus two non-conflicting vehicles may pass through the intersection at the same time, and a vehicle further back in the sequence of passage may also pass through the intersection before another vehicle with which it does not conflict. Further, if a traffic sequence does not include all vehicles in the intersection control zone, that type of sequence is referred to as a partial traffic sequence. For example, in fig. 3, there are 5 vehicles in total, and the traffic sequence BC belongs to a partial traffic sequence.

Taking the traffic scenario in fig. 3 as an example, the tree representation of the traffic order may be as shown in fig. 4. First, the Root node is set to Root empty, and then each of the primary children of the Root node will consist of one character (A, B, C, D), indicating the first vehicle in the transit order. Then, on this basis, the node of the third layer will be a character string (e.g., CA, CB, CD, CE) composed of two characters. Similarly, each child node extends its own subordinate child nodes (e.g., CBA, CBD, CBE, CBDA, CBDE, etc.) until reaching the deepest depth of the search tree, i.e., the lowest level leaf nodes (e.g., CBDAE, CBDEA, etc.). The sequence of the strings (e.g., CBDAE) corresponding to a leaf node corresponds to a complete pass sequence. Thus, all leaf nodes of the search tree represent all feasible traffic orders.

It will be appreciated that the number of nodes of such a search tree will grow exponentially as the number of vehicles increases. In order to avoid traversing the search tree completely by means of exhaustive traversal to find the optimal passage order and improve the traversal efficiency, the search tree may be expanded in the following manner provided by an embodiment of the present invention: constructing a root node and a first-level child node of a search tree according to the identifications of the vehicles to be controlled and the number of the vehicles; one of the primary child nodes corresponds to a vehicle identification; expanding a root node of the search tree by using the Monte Carlo tree search algorithm according to the identification of the target vehicle to generate one or more leaf nodes; the leaf nodes indicate the corresponding passing sequence of a plurality of vehicles to be controlled; and determining the passing sequence of the vehicles to be controlled corresponding to the intersection control area according to the passing sequence indicated by the leaf nodes.

The process of the monte carlo tree search algorithm may be as shown in fig. 5. Each iteration process of the method can comprise four parts of selection, expansion, simulation and back transmission. The specific embodiments are explained in detail in the following examples.

It can be understood that, in order to improve the efficiency of the monte carlo tree search algorithm, the following method provided by an embodiment of the present invention may be adopted as the heuristic strategy of the monte carlo tree search algorithm: taking a vehicle which is located in one lane and is closest to a conflict partition as the target vehicle; and/or taking the corresponding vehicle with the shortest arrival time as the target vehicle from a plurality of vehicles to be controlled corresponding to the same conflict partition; the fastest arrival time indicates a time at which the vehicle arrives at the conflict zone at a maximum vehicle speed and acceleration.

Wherein the target vehicle may be: vehicles in the same lane closest to the conflict zone, namely vehicles in the same lane and closest to the conflict zone, can be quickly pruned in an unreasonable sequence by utilizing the heuristic strategy. If vehicles in the same lane pass through the conflict partition sequentially, some unreasonable passing sequences in fig. 3, such as (EC, ED, EB, EA …, etc.), may be pruned, and thus the case of E as a level of child node does not occur in fig. 4.

In addition, the target vehicle may be: and the vehicles in a plurality of lanes closest to the conflict subarea or the vehicle with the shortest arrival time in a plurality of vehicles to be controlled corresponding to the same conflict subarea. In an embodiment of the present invention, the heuristic strategy may be the simulation strategy in fig. 5, and the target vehicle is more accurately found and added to the passing sequence in the simulation process, so that a situation that a large number of different types of leaf nodes are generated by random sampling, which results in an excessive passing sequence, is avoided, the efficiency of monte carlo tree search is improved, and the passing real-time performance of the intersection control area is further improved.

Specifically, the process of expanding the root node of the search tree to generate one or more leaf nodes may be as follows: circularly executing the following steps until reaching the preset time: determining a primary child node with the maximum UCB value as a current child node according to a UCB algorithm searched by a Monte Carlo tree; circularly executing the following steps until the determined subordinate node is a leaf node: and determining a target node as a lower node of the current child node according to the UCB value and/or the identification of the target vehicle by using the Monte Carlo tree search algorithm, and taking the lower node as the current child node.

It is understood that in one embodiment of the present invention, the passing order represented by the lower node is a passing order character string formed by appending a vehicle identifier represented by a target node to the tail of the passing order indicated by the current child node. For example, the lower node string of C is CB, and the lower node string of CB is CBD.

In each iteration of the Monte Carlo tree search, the root node is used as a starting point, and the expansion is continuously carried out towards the lower level until the bottommost part of the search tree is reached, namely the leaf node is generated. When determining the lower node of the root node, that is, from which one-level child node of the root node the downward extension is determined, in this case, the one-level child node having the largest UCB value may be used as the lower node of the root node.

In the Monte Carlo tree search, when a search tree is constructed, each node has own statistical information, namely accumulated scores and access times. The accumulated score is an accumulated score formed after the current node is accessed for multiple times, and the access times are the times of selecting the node as the current child node.

Specifically, when determining the first-level child node with the maximum UCB value in the search tree, the following method provided by an embodiment of the present invention may be adopted: for each level one child node: determining the UCB value of the primary child node according to the statistical information of the primary child node, the statistical information of the root node and the weight parameter; and taking the primary child node with the maximum UCB value as the current child node.

The UCB value is calculated using the UCB formula:

wherein i is a child node, Q_iIs the cumulative score, n, of the statistical information of the child node_iIs the number of accesses of the child node, n is the cumulative score of the parent node of the child node, C is a weight parameter, and may be set to a constant value, such as 2, or

Taking the intersection traffic scenario of fig. 3 as an example, when starting from the Root node, the Root node is Root, and the generated first-level child node can be A, B, C, D as shown in fig. 4. And E cannot be used as a primary child node due to the principle of the sequence of the same lane. At this point, the UCB values for the four level-one children nodes may be calculated A, B, C, D. The node with the largest UCB value is taken as a subordinate node from the root node, and the extended node is prepared next, and the process can be regarded as the selection process in fig. 5. For example, in this round of selection, the UCB value of C is the largest, and C is taken as a primary child node of the root node, thereby entering the expansion phase in fig. 5.

For a new node that has never been visited (i.e., a node that is not currently in the search tree but is likely to be added to the search tree), the number of visits is 0, and the cumulative score is also 0. As the number of iterations increases, after a plurality of leaf nodes are generated, the statistical information of most of the nodes can be updated using the back propagation algorithm of the monte carlo tree search, and then the UCB value of the node itself can be calculated using the above-mentioned UCB formula.

It will be appreciated that each traversal of the search tree proceeds sequentially down the root node. In the first iteration of the root node, the selected first-level child nodes A, B, C, D are all accessed for the reason that they are not accessed, so the corresponding access times are all 0, the access time of the root node is 0, and the UCB value of A, B, C, D of the current child node is ∞ (infinity) calculated according to the UCB formula. At this time, a vehicle identifier may be randomly selected as a subordinate node, for example, C may be selected as a subordinate node to be extended, and then C may be extended.

When the first-level child node C is expanded, the following manner provided by an embodiment of the present invention may be adopted: determining whether a target node exists in a primary child node corresponding to the target vehicle, wherein the target node is not a superior node of the current child node; if yes, determining the lower node of the current child node from the target node. And under the condition that a plurality of target nodes exist, determining the target node with the maximum UCB value as a lower node of the current child node according to the UCB values of the plurality of target nodes.

A target node is determined from the first-level child nodes, and the passing order corresponding to the target node is a lower-level node formed by adding a first-level child node which is not included in the character string s at the tail of the passing order character string s of the current child node, as shown in fig. 4. If the subordinate node is a leaf node, the vehicle identifier not included in the character string s may be one or more, i.e., the subordinate node other than the leaf node represents a partial traffic sequence. It will be appreciated that these newly constructed lower level nodes may be nodes in the search tree that have not been traversed. For example, when selecting a subordinate node for a CB, which has already formed a CBA, the CBA may be selected to continue traversing down, or may be a subordinate node formed based on other passable sequences CBD, CBE.

In the process of expansion, the target vehicle (i.e. heuristic strategy) can be used for continuously simulating until a leaf node is generated. Taking the traffic scenario of fig. 3 as an example, the expansion process shown in fig. 5 indicates that C is expanded. Where C is the current child node, and in A, B, D, E (primary child nodes), the target node is selected according to the target vehicle of the heuristic strategy among four vehicles, for example, if B in fig. 5 is the vehicle closest to the conflict partition, then the target node may be B, and the partial traffic sequence CB corresponding to the selected B is used as the lower node of C. If a plurality of target nodes are found at this time, such as B, A, all of which satisfy the condition of the heuristic strategy, then the UCB values of CB and CA may be calculated, and the node with the largest UCB value is taken as the lower node. If both CB and CA are new, potentially joining nodes, their UCB values are ∞ (infinity) since they have never been accessed, and one can be chosen randomly at this time. In addition, in the case where the UCB values are equal, one may be randomly selected.

Therefore, after the simulation process of the monte carlo tree search is finished, that is, after the determined lower-level node is the leaf node, the statistical information of the plurality of nodes can be updated by adopting the following method provided by one embodiment of the invention: and according to the objective function values corresponding to the leaf nodes, utilizing a back propagation algorithm of Monte Carlo tree search to reversely update the statistical information corresponding to the plurality of upper nodes of the leaf nodes, wherein the statistical information comprises accumulated scores and access times.

The objective function value corresponding to the leaf node may be generated by one objective function, and the objective function is not limited herein. The target function value is used for evaluating the potential of the leaf node corresponding to the superior node to become the optimal node. The backward propagation in fig. 5 is to sequentially update the statistical information of the plurality of nodes on the path, i.e. the cumulative score Q, by using the backward propagation algorithm to make the objective function value follow the backward direction of the traversal path_iAnd number of accesses n_i。

The plurality of upper nodes may be all upper nodes or part of upper nodes on the traversal path from the leaf node to the root node.

The process of using the monte carlo tree search algorithm may be a process of performing traversal search on a search tree that has already been constructed, that is, a complete set of search trees has already been constructed, and a monte carlo tree search method is used to find an optimal traversal path. Then, when propagating backward, the statistics of all nodes on the traversal path may be updated up to the root node.

And the plurality of upper nodes may also be part of the upper nodes in the traversal path. It is understood that the process of using the monte carlo tree search algorithm may be a process of finding an optimal traversal path while constructing a search tree. That is, each iteration may determine an expandable node, and a leaf node is generated by simulation based on the expandable node, the simulation process may be a virtual construction process, a plurality of lower nodes and leaf nodes generated in the simulation process may not be stored in the current search tree, and the possibility that the expandable node and a plurality of upper nodes thereof become the optimal nodes may be evaluated only according to the objective function values of the leaf nodes; therefore, the statistical information of a plurality of upper nodes can be updated from the expandable node to the root node in a reverse direction. Since the plurality of nodes between the lower node and the leaf node of the extended node are not stored in the current search tree, evaluation may not be needed, that is, statistical information of the plurality of nodes below the lower node may not be updated. After the evaluation is finished, the nodes generated in the simulation process can be deleted, and the storage space is saved. Therefore, the process of constructing a search tree and traversing the search is realized, the purpose of traversing is to obtain the statistical information of each node, and the current optimal traversal path can be dynamically and continuously found according to the statistical information of the existing nodes.

For example, if a leaf node (CBDAE) is obtained by using the simulation process of fig. 5 and the objective function value thereof is 10, the cumulative scores Q corresponding to the CB node, the upper node C and the root node selected in the extension stage in fig. 5 may be updated_iAnd number of accesses n_i Add 10 to Q of these nodes_iIn which the number of accesses of these nodes is added by 1, e.g. Q of CB_iIs updated from 20 to 30 at the same timeNumber n of accesses to CB_iAnd adding 1.

It can be understood that, in the early iteration process of the monte carlo tree search algorithm, most nodes are nodes that are not accessed, the UCB value is ∞ (infinity), and when a current child node is extended, the selectable subordinate nodes can be preferentially determined according to the heuristic strategy; and if the lower nodes do not meet the heuristic strategy, selecting one node which is not accessed for expansion, so that most of the nodes are accessed along with the increase of the iteration times, and updating the accumulated scores and the access times of most of the nodes by using a back propagation method. During later iteration, the UCB values of most nodes can be calculated, so that the nodes worth searching can be selected by combining the UCB values and the heuristic strategy better, the traversal direction of the search tree can be guided better, and a better traversal path can be obtained when the predicted time is reached.

In one embodiment of the invention, when the passing sequence in the intersection control area passes by using Monte Carlo tree search, the current optimal passing sequence changes along with the increase of the iteration number. After a plurality of leaf nodes are determined or the preset time is reached, if the iteration of the current Monte Carlo tree search is stopped, an optimal traversal path can be determined according to the node states of the search tree constructed currently, and therefore a reasonable and efficient passing sequence is determined. The passing sequence of the plurality of vehicles to be controlled corresponding to the intersection control area can be determined according to the following modes provided by the embodiment of the invention: determining an optimal traversal path according to the statistical information of each node of the search tree; and determining a target leaf node corresponding to the optimal traversal path, and taking a passing sequence corresponding to the target leaf node as a passing sequence of the intersection control area.

Wherein, the optimal traversal path can be understood as: and selecting the node with the largest number of accesses at each level of node selection until reaching the leaf node. For example, in fig. 4, after the search tree is constructed by using the monte carlo tree search algorithm, C is selected if the access frequency of C is the most in the first-level child nodes; selecting the CB if the access frequency of the CB is the most in the next-level nodes CA, CB, CD and CE of the C; and selecting the CBDA when the access times of the CBDA are the most in next-stage nodes CBDA and CBDE of the CB, and finally selecting a leaf node CBDAE. At this time, the passing sequence corresponding to the leaf node can be used as the passing sequence of the intersection control area.

According to the combination of the Monte Carlo tree search algorithm and the heuristic strategy, unreasonable passing sequences (namely child nodes) can be removed quickly when the search tree is traversed, and the potential energy (UCB value) of the child nodes and the heuristic strategy can be combined to guide the traversing direction of the search tree, so that the time for searching the optimal passing sequence is prolonged, the real-time requirement of crossing passing is met, and the passing efficiency of a crossing control area is improved.

After determining the traffic order of the intersection control zones, in one embodiment of the invention, the predicted arrival times of a plurality of vehicles to be controlled to reach the corresponding conflict zones can be determined in the following manner: and determining the predicted arrival time of the vehicles to be controlled to arrive at the conflict subarea according to the positions of the conflict subareas in the intersection control areas and the passing sequence.

In one embodiment of the present invention, the predicted arrival time may be determined in the following manner: constructing a time fitting model and a plurality of first constraint conditions; the first constraint condition comprises a fastest time constraint, a collision constraint and a collision interval constraint; and according to the first constraint condition, obtaining estimated arrival times respectively corresponding to the arrival of the vehicles to be controlled to the conflict subareas by utilizing the time fitting model for fitting.

It will be appreciated that the goal of intersection traffic is to maximize traffic efficiency, and accordingly, the goal can be positioned to minimize the traffic delay time of a plurality of vehicles to be controlled, whereby, in one embodiment of the invention, a time fitting model can be constructed such that the sum of the traffic delay time of a plurality of vehicles to be controlled is minimized.

When the time fitting model is constructed, each vehicle entering the intersection control area is assigned with a unique number V_iIndicating that it is the ith vehicle entering the control zoneA vehicle. At the same time can define Z_iFor vehicles V_iConflicting partitions will pass. For example, Z_i{4,1} represents a vehicle V_iIt will go through conflict partition 4 and conflict partition 1 in sequence. The positions of the conflict partitions in the intersection control area can be determined according to the partition of the conflict partitions in the intersection control area and the road geometry of the vehicle driven by the multiple lanes.

In one embodiment of the invention, for simplicity, the following conditions may be set for the time-fit model: each vehicle shares own state information (such as position, speed and the like) and route information with surrounding vehicles and road side equipment through a wireless communication technology, and meanwhile, the communication process is instantly finished, so that the problems of packet loss, time delay and the like are avoided; in order to ensure safety, vehicles in the control area are prohibited from changing lanes, namely, the vehicles are prohibited from changing lanes in the same direction when passing through the control area; the speed of the vehicle during the passing of the conflict zone remains unchanged.

Before building the time-fitting model, the following constraints may be built: a fastest time constraint, a collision constraint, and a collision interval constraint.

Due to physical limitations such as speed constraints and acceleration constraints, the time for the vehicle to reach the conflict zone must be greater than or equal to the fastest arrival time, and thus, the fastest time constraint can be expressed as follows:

t_a,i,z≥t_m,i,z

wherein, t_a,i,zIs a vehicle V_iPredicted arrival time, t, to conflicting zone Z_m,i,zIs a vehicle V_iThe fastest arrival time to reach conflict zone Z is at maximum vehicle speed and acceleration.

In order to avoid collision between the front vehicle and the rear vehicle on the same lane, a certain safe headway is required to be kept between the front vehicle and the rear vehicle, so that in one embodiment of the invention, collision constraints can be constructed as follows:

t_a,i,z-t_a,j,z≥Δ_j,a

wherein, V_iIs a rear vehicle, V_jIs a front vehicle on the same lane, Δ_j,aIs the minimum safe headway and is in phase with the action of the front vehicleAnd off. The value of a represents the vehicle V_jFor example, 0 represents straight, 1 represents left turn, and 2 represents right turn. Since it usually takes more time to turn left, it is necessary to set a larger safe headway for the left turn operation.

It will be appreciated that collision constraints require that a vehicle enter a collision zone only after another collision vehicle leaves the collision zone.

The time it takes for a vehicle to travel from one conflict zone to another can be determined simply from the road geometry, based on the condition that the speed of the vehicle remains constant during its passage through the conflict zone. At this time, Δ t is defined_i,z,z′Indicating vehicle V_iThe time interval for traveling from conflicting zone Z to conflicting zone Z', a conflict interval constraint may be constructed as follows:

t_a,i,z+Δt_i,z,z′＝t_a,i,z′

after the fastest time constraint, collision constraint and collision interval constraint are constructed, in one embodiment of the invention, a time fit model is constructed by introducing binary variables as follows:

subject to t_a,i,z≥t_m,i,z

t_a,i,z-t_a,j,z≥Δ_j,a

t_a,k,z-t_a,l,z+M·b_k,l≥Δ_l,a

t_a,l,z-t_a,k,z+M·(1-b_k,l)≥Δ_k,a

t_a,i,z+Δt_i,z,z′＝t_a,i,z′

b_k,l∈{0,1}

where n is the total number of vehicles in the control zone, M is a large constant, b_k,lIs a binary number. When b is_k,lWhen 0 is equal, vehicle V_lArriving first in conflicting zones, otherwise V_kThe conflicting partition is reached first. After a passing sequence is obtained according to the Monte Carlo tree search algorithm, any two vehicles V can be determined according to the passing sequence_lAnd V_kB between_k,The value of l. Z_i(1) Is a set Z_iThe first of the elements in (a) is,

is a vehicle V_iThe time of arriving at the first conflict zone is fixed because the passing speed is constant after the vehicle arrives at the first conflict zone, and then the time of passing through other conflict zones is fixed. The time-fitting model may minimize the sum of transit delay times for a plurality of vehicles to reach their corresponding first collision zone. And according to the time fitting model, fitting to obtain the predicted arrival time of the plurality of vehicles to be controlled to the corresponding first conflict subareas.

After the estimated arrival time is determined, the vehicle may be controlled to arrive at the corresponding conflict partition within the estimated arrival time according to a certain driving speed, and in one embodiment of the present invention, the following manner may be used: and determining the running speeds of the vehicles to be controlled according to the predicted arrival time, the vehicle state function and the double-integral dynamic model. Specifically, the determination of the travel speed may be as follows: constructing a speed fitting model; for each of the vehicles to be controlled: determining state parameters of the vehicle to be controlled; the state parameters include: vehicle travel time, vehicle travel location, vehicle maximum vehicle speed, vehicle minimum vehicle speed, vehicle constant speed, vehicle maximum acceleration, and vehicle minimum acceleration; determining the vehicle state function according to the state parameters; and fitting the running speed of the vehicle to be controlled according to the speed fitting model, the predicted arrival time, the vehicle state function and the double-integral dynamics model.

From the vehicle dynamics equation, the vehicle state function can be expressed as:

x_i(t)＝[p_i(t),v_i(t)]

wherein x is_i(t) is a vehicle V_iThe state function at time t having a state parameter v_i(t) and p_i(t) determining. Wherein v is_i(t) is a vehicle V_iVelocity at time t, p_i(t) is a vehicle V_iAt the location of time t.

Is x_iFirst derivative of u_i(t) is a vehicle V_iAcceleration at time t.

The two-integral kinetic model used is as follows:

wherein,

is p_iFirst derivative of (t), equal to v_i(t)，

Is p_iSecond derivative of (t), equal to u_i(t)。

The velocity fit model constructed by one embodiment of the invention may be as follows:

subject to u_min≤u_i(t)≤u_max

v_min≤v_i(t)≤v_max

wherein,

is the initial moment of time of day and,

is the initial position of the mobile phone in the home position,

is the initial velocity, v_cIs the constant speed within the conflicting partition. Wherein v is_maxAnd v_minIs the maximum speed of the vehicle and the minimum speed of the vehicle, u_maxAnd u_minIs the vehicle maximum acceleration and the vehicle minimum acceleration.

According to the speed fitting model, the acceleration of each vehicle to be controlled at each moment is obtained through fitting, so that the running speed of each vehicle to be controlled is determined, and the vehicle to be controlled can be controlled to reach the corresponding first conflict subarea Z at the predicted arrival time through the running speed_i(1) Subsequent to Z_iOther conflicting partitions in the set are based onConstant driving speed v_cSequentially passing.

The time fitting model and the speed fitting model are used for realizing orderly passing according to the optimal passing sequence, so that the passing delay time of the vehicles to be controlled is reduced, and the passing efficiency of the vehicles in the intersection control area is improved.

Taking a traffic scene of an intersection control area as an example, a detailed description is given to a vehicle control method provided by the embodiment of the invention according to fig. 6, and the specific steps may include:

step S601: and determining a plurality of conflict subareas and a plurality of vehicles to be controlled in the intersection control area.

Step S602: and constructing a search tree according to the number and the identification of the vehicles to be controlled.

The Root node of the search tree is Root, corresponds to a null node, and generates a plurality of first-level child nodes.

Next, the root node may be expanded using a Monte Carlo tree search algorithm.

Step S603: and determining the primary child node with the maximum UCB value in the search tree as the current child node.

Step S604: and determining the target node as a lower node of the current child node according to the UCB value and/or the identification of the target vehicle.

Step S605: judging whether the subordinate node is a leaf node; if yes, go to step S606, if no, go back to step S604.

The process proceeds to S604 with the subordinate node as the current child node.

Step S606: and reversely updating the statistical information of the plurality of superior nodes according to the objective function values of the leaf nodes.

Step S607: judging whether a preset time is reached; if yes, go to step S608; if not, the process returns to step S603.

Of course, the iterative process may also be stopped if it is determined that a plurality of leaf nodes have been generated when the preset time has not been reached. It can be understood that, under the condition of time permission, multiple iterations of the Monte Carlo tree search are performed as much as possible, so that the accuracy of determining the optimal passing sequence can be effectively improved.

Step S608: and determining a target leaf node corresponding to the optimal traversal path according to the statistical information of each node of the search tree.

Step S609: and determining the predicted arrival time according to the passing sequence corresponding to the target leaf node.

For each control vehicle, the predicted arrival time of the first conflict zone that it will pass through is determined.

Step S610: and determining the running speed of each vehicle to be controlled according to the predicted arrival time.

Step S611: and controlling the vehicles to be controlled to pass through the intersection control area according to the running speed and the passing sequence.

The embodiment of the invention provides a vehicle control method, which can take a target vehicle meeting conditions as a heuristic strategy of a Monte Carlo tree search algorithm, determine the passing sequence of a plurality of vehicles to be controlled in an intersection control area according to the Monte Carlo tree search algorithm, and determine the running speed of the vehicles to be controlled, so as to control the vehicles to be controlled to pass through the intersection control area; therefore, by combining the Monte Carlo tree search algorithm and the heuristic strategy, the traversal direction of the search tree can be quickly and accurately guided, the condition that the optimal passing sequence can be obtained only by exhaustively traversing all the nodes is avoided, and the tree search efficiency is improved; the problem of overlong calculation time caused by the increase of vehicles and lanes is also solved, the real-time requirement of vehicle passing is met, and the passing efficiency of the vehicles in the intersection control area is further improved.

As shown in fig. 7, an embodiment of the present invention provides a vehicle control apparatus 700 including: a determination module 701, a search module 702, and a control module 703; wherein,

the determining module 701 is configured to determine a plurality of lanes in an intersection control area, one or more conflict partitions corresponding to the plurality of lanes, and a plurality of vehicles to be controlled;

the search module 702 is configured to determine, according to the number of the vehicles to be controlled and the identifier of the target vehicle, a passing order of the vehicles to be controlled corresponding to the intersection control area by using a monte carlo tree search algorithm; the target vehicle is a vehicle in a lane closest to the collision zone and/or a vehicle with the minimum arrival time for the same collision zone;

the control module 703 is configured to determine the driving speed of the vehicle to be controlled according to the passing order and the positions of the one or more conflict sub-areas in the intersection control area; and controlling the vehicle to be controlled to pass through the intersection control area according to the passing sequence and the running speed.

In an embodiment of the present invention, the determining module 701 is configured to use a vehicle located in a lane and closest to a conflict partition as the target vehicle; and/or taking the corresponding vehicle with the shortest arrival time as the target vehicle from a plurality of vehicles to be controlled corresponding to the same conflict partition; the fastest arrival time indicates a time at which the vehicle arrives at the conflict zone at a maximum vehicle speed and acceleration.

In an embodiment of the present invention, the search module 702 is configured to construct a root node and a first-level child node of a search tree according to the identifiers of the vehicles to be controlled and the number of the vehicles; one of the primary child nodes corresponds to a vehicle identification; expanding a root node of the search tree by using the Monte Carlo tree search algorithm according to the identification of the target vehicle to generate one or more leaf nodes; the leaf nodes indicate the corresponding passing sequence of a plurality of vehicles to be controlled; and determining the passing sequence of the vehicles to be controlled corresponding to the intersection control area according to the passing sequence indicated by the leaf nodes.

In an embodiment of the present invention, the searching module 702 is configured to execute the following steps in a loop until a preset time is reached: determining a primary child node with the maximum UCB value as a current child node according to a UCB algorithm searched by a Monte Carlo tree; circularly executing the following steps until the determined subordinate node is a leaf node: and determining a target node as a lower node of the current child node according to the UCB value and/or the identification of the target vehicle by using the Monte Carlo tree search algorithm, and taking the lower node as the current child node.

In an embodiment of the present invention, the searching module 702 is configured to determine whether a target node exists in a first-level child node corresponding to the target vehicle, where the target node is not a superior node of the current child node; and if so, determining the lower node of the current child node from the target node.

In an embodiment of the present invention, the searching module 702 is configured to, in a case that there are multiple target nodes, determine, according to the UCB values of the multiple target nodes, a target node with a largest UCB value as a subordinate node of the current child node.

In an embodiment of the present invention, the searching module 702 is configured to, after the determined lower level node is a leaf node, reversely update statistical information corresponding to each of a plurality of upper level nodes of the leaf node by using a back propagation algorithm of monte carlo tree search according to an objective function value corresponding to the leaf node, where the statistical information includes an accumulated score and an access time.

In an embodiment of the present invention, the searching module 702 is configured to, for each level-one child node: determining the UCB value of the primary child node according to the statistical information of the primary child node, the statistical information of the root node and the weight parameter; and taking the primary child node with the maximum UCB value as the current child node.

In an embodiment of the present invention, the control module 703 is configured to determine an optimal traversal path according to statistical information of each node of the search tree; and determining a target leaf node corresponding to the optimal traversal path, and taking a passing sequence corresponding to the target leaf node as a passing sequence of the intersection control area.

In an embodiment of the present invention, the control module 703 is configured to, for multiple vehicles to be controlled corresponding to the same conflict partition: determining the predicted arrival time of the vehicles to be controlled to arrive at the conflict subarea according to the positions of the conflict subareas in the intersection control areas and the passing sequence; and determining the running speeds of the vehicles to be controlled according to the predicted arrival time, the vehicle state function and the double-integral dynamic model.

In an embodiment of the present invention, the control module 703 is configured to construct a time fitting model and a plurality of first constraints; the first constraint condition comprises a fastest time constraint, a collision constraint and a collision interval constraint; and according to the first constraint condition, obtaining estimated arrival times respectively corresponding to the arrival of the vehicles to be controlled to the conflict subareas by utilizing the time fitting model for fitting.

In an embodiment of the present invention, the control module 703 is configured to construct a velocity fitting model; for each of the vehicles to be controlled: determining state parameters of the vehicle to be controlled; the state parameters include: vehicle travel time, vehicle travel location, vehicle maximum vehicle speed, vehicle minimum vehicle speed, vehicle constant speed, vehicle maximum acceleration, and vehicle minimum acceleration; determining the vehicle state function according to the state parameters; and fitting the running speed of the vehicle to be controlled according to the speed fitting model, the predicted arrival time, the vehicle state function and the double-integral dynamics model.

According to the vehicle control device provided by the embodiment of the invention, a target vehicle meeting the conditions can be used as a heuristic strategy of a Monte Carlo tree search algorithm, the passing sequence of a plurality of vehicles to be controlled in an intersection control area is determined according to the Monte Carlo tree search algorithm, and the running speed of the vehicles to be controlled is determined, so that the vehicles to be controlled are controlled to pass through the intersection control area; therefore, by combining the Monte Carlo tree search algorithm and the heuristic strategy, the traversal direction of the search tree can be quickly and accurately guided, the condition that the optimal passing sequence can be obtained only by exhaustively traversing all nodes is avoided, and the tree search efficiency is improved; the problem of overlong calculation time caused by the increase of vehicles and lanes is also solved, the real-time requirement of vehicle passing is met, and the passing efficiency of the vehicles in the intersection control area is further improved.

Fig. 8 shows an exemplary system architecture 800 of a vehicle control method or a vehicle control apparatus to which embodiments of the invention may be applied.

As shown in fig. 8, the system architecture 800 may include

terminal devices

801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the

terminal devices

801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, vehicle networking, or fiber optic cables, to name a few.

A user may use the

terminal devices

801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like.

The

terminal devices

801, 802, 803 may be vehicles having sensor devices, wireless communication devices, or equipped with automatic driving control systems.

The server 805 may be a server that provides various services, and for example, may be a server that performs data sharing and data interaction with the

terminal devices

801, 802, 803 using a wireless communication technology, a sensing technology, or the like; the server may acquire the position, speed, and route data of the vehicle, and perform the big data processing and the cooperative vehicle-road control using the data. The server can analyze and process the data of the vehicle, the data of the intersection control area and the like, and control, interact and the like the vehicle according to the processing result.

It should be noted that the vehicle control method provided in the embodiment of the present invention is generally executed by the server 805, and accordingly, the vehicle control device is generally provided in the server 805.

It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present invention, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a determination module, a search module, and a control module. Where the names of these modules do not in some cases constitute a limitation of the module itself, for example, a determination module may also be described as a "module for determining a vehicle".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: determining a plurality of lanes in an intersection control area, one or more conflict subareas corresponding to the lanes and a plurality of vehicles to be controlled; determining the passing sequence of the vehicles to be controlled corresponding to the intersection control area by utilizing a Monte Carlo tree search algorithm according to the quantity of the vehicles to be controlled and the identification of the target vehicle; the target vehicle is a vehicle in a lane closest to the collision zone and/or a vehicle with the minimum arrival time for the same collision zone; determining the running speed of the vehicle to be controlled according to the passing sequence and the positions of the one or more conflict subareas in the intersection control area; and controlling the vehicle to be controlled to pass through the intersection control area according to the passing sequence and the running speed.

According to the technical scheme of the embodiment of the invention, the target vehicle meeting the conditions can be used as a heuristic strategy of a Monte Carlo tree search algorithm, the passing sequence of a plurality of vehicles to be controlled in an intersection control area is determined according to the Monte Carlo tree search algorithm, and the running speed of the vehicles to be controlled is determined, so that the vehicles to be controlled are controlled to pass through the intersection control area; therefore, by combining the Monte Carlo tree search algorithm and the heuristic strategy, the traversal direction of the search tree can be quickly and accurately guided, the condition that the optimal passing sequence can be obtained only by exhaustively traversing all the nodes is avoided, and the tree search efficiency is improved; the problem of overlong calculation time caused by the increase of vehicles and lanes is also solved, the real-time requirement of vehicle passing is met, and the passing efficiency of the vehicles in the intersection control area is further improved.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A vehicle control method characterized by comprising:

2. The method of claim 1, wherein the determining the passing order of the plurality of vehicles to be controlled corresponding to the intersection control area using a monte carlo tree search algorithm comprises:

3. The method of claim 2, wherein expanding a root node of the search tree to generate one or more leaf nodes comprises:

circularly executing the following steps until reaching the preset time:

4. The method of claim 3, wherein determining a target node as a subordinate node to the current child node based on the UCB value and/or an identification of the target vehicle comprises:

5. The method of claim 4, wherein, in the case where there are multiple target nodes,

6. The method of claim 3, wherein after determining that the subordinate node is a leaf node, further comprising:

and according to the objective function values corresponding to the leaf nodes, utilizing a back propagation algorithm of Monte Carlo tree search to reversely update the statistical information corresponding to the plurality of upper nodes of the leaf nodes, wherein the statistical information comprises accumulated scores and access times.

7. The method of claim 6, wherein determining the first-level child node with the largest UCB value as the current child node according to the UCB algorithm of the Monte Carlo tree search comprises:

8. The method of claim 1,

and/or the presence of a gas in the atmosphere,

9. The method of claim 2, wherein the determining the order of passage of the plurality of vehicles to be controlled corresponding to the intersection control zone comprises:

10. The method of claim 1, wherein the determining the travel speed of the vehicle to be controlled comprises:

aiming at a plurality of vehicles to be controlled corresponding to the same conflict partition: determining the predicted arrival time of the vehicles to be controlled to arrive at the conflict subarea according to the positions of the conflict subareas in the intersection control areas and the passing sequence;

11. The method of claim 10, wherein the determining the predicted arrival times of the plurality of vehicles to be controlled to the conflict partition comprises:

12. The method of claim 11, wherein determining the travel speeds of the plurality of vehicles to be controlled from the predicted arrival time, a vehicle state function, and a double-integral dynamics model comprises:

constructing a speed fitting model;

determining the vehicle state function according to the state parameters;

13. A vehicle control apparatus characterized by comprising: the device comprises a determining module, a searching module and a control module; wherein,

the searching module is used for determining the passing sequence of the vehicles to be controlled corresponding to the intersection control area by utilizing a Monte Carlo tree searching algorithm according to the number of the vehicles to be controlled and the identification of the target vehicle; the target vehicle is a vehicle in a lane closest to the collision zone and/or a vehicle with the minimum arrival time for the same collision zone;

14. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-12.

15. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-12.