WO2023079690A1

WO2023079690A1 - Control device, traffic rectification system, control method, and program

Info

Publication number: WO2023079690A1
Application number: PCT/JP2021/040810
Authority: WO
Inventors: 健太丹羽; 修功上田; 宏澤田; 昭典藤野
Original assignee: 日本電信電話株式会社
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2023-05-11
Also published as: JPWO2023079690A1

Abstract

A control device provided in each of a plurality of moving bodies that are provided in a traffic rectification system wherein the plurality of moving bodies autonomously perform traffic rectification to prevent collisions between these moving bodies, said control device being provided with: a state update unit that updates the states of the moving bodies under constraints for preventing collisions between the moving bodies, on the basis of state update dynamics including sub-dynamics for updating the states of the moving bodies, and sub-dynamics for message passing between each moving body and other moving bodies in proximity; and an output unit that outputs the states updated by the state update unit and a message.

Description

Control device, traffic rectification system, control method, and program

The present invention relates to technology for autonomously rectifying traffic (which may also be referred to as traffic control) by multiple vehicles.

In the conventional technology, traffic signals installed at intersections of roads rectify traffic so that vehicles can travel safely without colliding. However, the conventional technology has the problem that chronic traffic jams occur in urban areas due to frequent stops due to traffic lights, temporary stops due to merging and right/left turns, and the like.

In the future, as autonomous vehicles become more widespread, the conventional centralized control system using traffic lights will be replaced with a distributed control system (signal-free traffic rectification), and the above problems will be solved by the distributed control system. is assumed.

The present invention has been made in view of the above points, and aims to provide technology for realizing traffic rectification without using traffic lights by autonomously controlling each vehicle.

According to the disclosed technique, the control in a traffic control system includes a plurality of mobile bodies equipped with a control device, and the plurality of mobile bodies autonomously rectifies traffic so as to prevent collisions between the mobile bodies. a device,
Between mobiles based on state update dynamics, including sub-dynamics for state updates of said mobile and sub-dynamics for message passing between other mobiles in proximity to said mobile. a state updating unit that updates the state of the moving object under constraints for deterring collisions;
an output unit that outputs the state updated by the state update unit and a message;
A controller is provided comprising:

According to the disclosed technology, a technology is provided for realizing traffic rectification without using traffic lights by autonomously controlling each vehicle.

It is a figure which shows the system configuration example in embodiment. BRIEF DESCRIPTION OF THE DRAWINGS It is a figure for demonstrating the outline|summary of embodiment. BRIEF DESCRIPTION OF THE DRAWINGS It is a figure for demonstrating the outline|summary of embodiment. It is a figure which shows the system configuration example. Fig. 3 shows a NODE-based DNN architecture; Fig. 3 shows an extended NODE-based DNN architecture following a state-space model; Fig. 3 shows a NOS-based DNN architecture; FIG. 1 illustrates Algorithm 1; FIG. 4 is a diagram showing representative parameters; It is a figure for demonstrating an experiment. It is a figure for demonstrating an experiment. It is a figure for demonstrating an experiment. It is a figure for demonstrating an experiment. It is a figure which shows an experimental result. It is a figure which shows an experimental result. It is a figure which shows an experimental result. 2 is a diagram showing an example of the functional configuration of the vehicle 1; FIG. 2 is a diagram showing a functional configuration of a control device 100; FIG. 3 is a diagram showing a functional configuration of a control server 200; FIG. It is a figure which shows the hardware configuration example of an apparatus.

An embodiment (this embodiment) of the present invention will be described below with reference to the drawings. The embodiments described below are merely examples, and embodiments to which the present invention is applied are not limited to the following embodiments.

(Overview of Embodiment)
FIG. 1 shows a configuration example of a traffic control system according to this embodiment. As shown in FIG. 1, the traffic straightening system has a plurality of nodes, and each node wirelessly communicates with other nearby nodes. Connections between nodes (connections for wireless communication) are called edges. In this traffic control system, for example, vehicles can move without collisions.

A node is not limited to a specific object, but in this embodiment, it is assumed that a node is a vehicle traveling on a road. In the following, a node may also be referred to as a "vehicle". A node may also be called a "moving object".

Figure 2 shows an image of multiple vehicles running on the road. Each vehicle performs autonomous speed control so as not to collide with other vehicles and approach a target speed, based on communication with nearby vehicles and state updates by a DNN (deep neural network). running. As a result, it is expected that travel and transportation time will be shortened to the limit while preventing traffic accidents (collisions). In addition, in this embodiment, only the speed is used as the state, but this is an example. It is also possible to use states other than velocity, or both velocity and non-velocity. Other states than speed include, for example, route, lane, and steering direction.

The neural network model installed in each vehicle is a model for solving the initial value problem of ordinary differential equations expressing state update dynamics. The ordinary differential equation can be expressed, for example, as follows.

dx/dt=M ₁ (x, t, θ, A, b)+M ₂ (x, t, A, b)
As above, the state update dynamics are decomposed into _M1 and _M2 . _M1 corresponds to in-vehicle state updates and _M2 corresponds to near vehicle-to-vehicle communication (message passing).

In each vehicle, the state (x) is updated according to the dynamics (M ₁ +M ₂ ) under the constraint of collision avoidance (Ax+b≦0). A detailed example of _M1 and _M2 will be described later.

In addition, in the present embodiment, an ordinary differential equation is discretized to form a control algorithm (neural network) that alternately repeats (1) vehicle internal state update and (2) vehicle-to-vehicle communication.

Fig. 3 shows an image of the above control algorithm when the number of vehicles is N. As shown in FIG. 3, after the initialization, the in-vehicle state update in each vehicle and the communication between each vehicle's neighboring vehicles are alternately repeated. As shown in FIG. 3, the overall computation is distributed and parallelized, so the computation on each vehicle is lightweight. Also, inter-vehicle communication is close proximity sparse communication that does not put pressure on the NW band.

The adjoint method is used for the neural network learning (learning the parameter θ) that updates the state as described above. That is, efficient differential calculation is performed by back propagation calculation of a neural network based on the adjoint method, and the parameter (θ) is learned so as to accelerate the average speed toward the target speed under the constraint of collision prevention. In this embodiment, the parameter (θ) is common to all vehicles, but it is also possible to learn so that each vehicle uses a different parameter θ.

FIG. 4 is a diagram more specifically showing the system when the node is a vehicle. As shown in FIG. 4, there are a plurality of vehicles 1, each equipped with a control device 100. FIG. The control device 100 has the DNN model described above, and performs state acquisition, state update, and state output. In the example of FIG. 4, a control server 200 is provided. Here, the control server 200 receives information such as the state from each vehicle (may be the cost calculated by each vehicle), learns the parameter (θ), and transmits the learned parameter θ to each vehicle. .

The provision of the control server 200 is an example. Each vehicle may learn the parameter θ by itself without the control server 200 .

In the following, specific examples of techniques for solving the above-mentioned ordinary differential equations and their initial value problems will be explained in detail.

(Overview of NOS)
A Neural Ordinary Differential Equation (NODE) has been proposed as a DNN model for solving Initial Value Problems (IVPs) of ordinary differential equations.

In NODE, given a state variable x(t) with an initial state x(0), a learnable parameter θ, and a nonlinear dynamics M with time t, the IVP is formulated. FIG. 5 shows a NODE-based DNN architecture. As shown in FIG. 5, the states are updated sequentially and the cost is calculated based on each state.

On the other hand, in the present embodiment, an external control input o(t) is further introduced, which NODE does not have. In this case, IVP can be defined by the following equation (1).

∂x/∂t=M(x, o, θ, t) (1)
Note that the time indices of x and o in equation (1) above are omitted for simplicity of notation.

Since the NODE-based DNN architecture can be constructed as a basic IVP discretization, different discretization methods in ODE solvers, such as higher-order Runge-Kutta solvers, construct different NODE-based DNN architectures. available for

Although NODE allows us to handle basic IVPs, its application to large-scale systems with complex dynamics M, which can be represented as a graph consisting of many subsystems (nodes) and their connections (edges), is difficult. An example of such a system is the system of vehicles that rectifies traffic as described above.

Also, in a centralized computation method such as NODE, it is difficult to execute forward propagation (x transition) and backward propagation (learning θ) in Equation (1) for a large-scale system.

In the above system consisting of nodes and edges, (i) state transitions at each node (e.g. speed state update at each vehicle) and (ii) message passing between nodes (e.g. communication between closely spaced vehicles) It is effective to execute processing in a distributed manner while repeating

To address this problem, the present embodiment adopts federated dynamics learning using neural operator splitting (NOS) as an extension of NODE. The outline is as follows (details will be described later).

The overall dynamics M is decomposed into (i) _M1 for state transitions of each node and (ii) _M2 for message passing between nodes for efficient management of large-scale systems in graph form. and is represented by the following formula (2).

∂x/∂t=M ₁ (x, o, θ, t) + M ₂ (x, o, t) (2)
Here, _M1 can be further decomposed into sub-dynamics at each node. _M2 is set to have no learnable parameters as it relates to message passing.

In addition, by discretizing the above (2) using the known techniques of operator decomposition such as PRS (Peaceman-Rachford Splitting) and DRS (Douglas-Rachford Splitting), the state transition is the residual , recurrent, and alternating forms.

Also, in this embodiment, state domain relaxation is performed to impose physical constraints. That is, in equation (2), the state domain of _M1 and the state domain of _M2 are the same as x, but in the NOS of this embodiment, in order to impose physical constraints on the state variables of many nodes, Relax the state domain constraint of equation (2).

Specifically, using x for M ₁ and giving {A,b} for M ₂ , using the auxiliary state variable y=Ax+b, M ₂ (y,o, t). y=Ax+b represents a physical constraint, for example y=Ax+b≦0. That is, (i) each vehicle velocity (state of N nodes {x ₁ ,..., x _N }εx) is updated according to M ₁ while (ii) y=Ax+b≦ By designing _M2 as 0, some distance is preserved between other vehicles. As a result, it is expected that the parameter search area will be appropriately limited, that, for example, vehicle collisions will be prevented, and that .theta. will be learned quickly and stably.

As will be described later, in the present embodiment, NOS is associated with a constrained cost minimization problem derived by the ADMM method (Alternating Direction Method of Multipliers).

(extended NODE following the state-space model)
Before describing NOS in detail, extended NODE according to the state space model will be described here as a technology related to NOS. It is an extended NODE-based DNN architecture following a state-space model assuming a noisy, nonlinear, and indirect observation process for the state variable x in real applications.

Below, we first define a cost function that takes into account the observation process. Next, with 4D-Var, we formulate the adjoint method for learning x and the parameter θ in the above cost function, and derive an extended NODE-based DNN architecture that follows the state-space model.

<Cost function derivation>
Assume that the state variable x(t) is successively updated according to equation (1) at time intervals tε(0,T). Given an initial state x(0) and an external control input o(t), equation (1) can be rewritten into equation (3) in integral form below.

In keeping with the state-space model, we allow an indirect observation of x(t) through the observation system H. If the ideal behavior of the system (e.g. ideal control state) is given by r(t), the cost function at time t is the following equation (4)

can be represented by

(4), R(t) is the covariance matrix of the measurement noise ξ, and ^T represents the transpose of the matrix. Assuming that the measurement noise follows a Gaussian distribution ξ ~ Norm(0, ν ² ), we can choose a scaled identity matrix as a particular R(t). Since the state transitions from x(0) to x(T) are constrained by equation (1), the final cost form J(x, θ) for learning θ is given by equation (5) below: given by the constrained cost integral minimization problem.

<Adjoint method for dynamics learning>
Next, solve the problem of finding θ that makes the first-order variation of the appropriate functional zero. First, based on equation (5), Lagrangian variables (adjoint variables)

We formulate a Lagrangian function L(x, λ, θ, t) with The variational problem is to find θ such that the first-order variation of the functional δL is zero. The resulting adjoint equation is shown below in equation (6).

where {M _x (t), M _θ (t)} is the Jacobian of M, ^* is the adjoint operator for the natural inner product <,>, and <Mx, λ>=<x, M ^* λ 〉 is satisfied. We also use M ^* =M ^T assuming a Euclidean domain. Due to restrictions on the types of characters that can be used in electronic application specifications, ^* is used in the specification as the symbol for the adjoint operator.

Also, from formula (4),

holds. where H _x (t) is the Jacobian of H with respect to x(t).

Let the initialization be λ(T)=0 and the adjoint variable λ(t) is updated backwards, resulting in λ(0)=g=[g ^T _x0 , g ^T _θ ] ^T . where g is the slope of the cost function J at time zero. The slope of the cost function can be obtained by integrating the adjoint equation backwards in time, as shown in Equation (7) below.

If the observation process is trivial, H in equation (4) is the identity operator, as assumed in NODE. In the adjoint method, the learning of θ for the optimal dynamics model M is performed by alternately/repeatingly performing successive forward propagation (3) and successive backward propagation (7) using _g done.

<NODE-based DNN architecture>
Several NODE-based DNN architectures are obtained by discretization of equation (1). For simplicity of notation, let x _k be the estimated system state at discrete times t _k (k=1, . . . , K), and let D denote the residual state through M Make an approximation of the update.

A given state variable {θ,o} is updated in residual and recursive form as x _k+1 =D(x _k ). D is allowed to consist of a set of operators, D=D _q D _q−1 . . . D ₁ .

The forward Euler method (8), backward Euler method (9), and Crank-Nicholson method (second-order Runge-Kutta) shown below can be used to approximate differential equations with discrete state transition rules.

[Forward Euler method]

[Backward Euler method]

[Crank-Nicholson method]

where M _o,θ represents M given {θ,o _k }, Id represents the identity operator, and ⁻¹ represents the inverse operator. In the Crank-Nicholson method, x _k+1 =D(x _k ) is

is decomposed into

The NODE-based DNN architecture can be used for recursive stacks of K iterations based on residual state updates ((8), (9), or (10)) as shown in FIGS. be. Note that the forward Euler method can be regarded as a ResNet.

The adjoint method can be used to calculate the gradient in learning θ. In this case, the NODE-based DNN architecture is extended to follow FIG. 6 by discretizing the continuous backward propagation (7), as shown in equation (11) below.

Here, an initial value λ _K =0 is given. Also, D is a discretized representation of M. If M consists of a set of operators and the adjoint operator ^* is defined in the Euclidean domain, then the Jacobian is

is decomposed as The residual and recursive update (11) gives the gradient λ ₀ =g and updates θ.

Here, we introduced a NODE-based DNN architecture that follows the state-space model. However, when the dynamics of the overall system become complex, such as when multiple vehicles rectify traffic autonomously, this approach becomes computationally difficult.

Therefore, in the present embodiment, a NOS-based DNN architecture for federated dynamics learning is introduced, and traffic rectification is achieved by alternately performing vehicle state updating (i) and message passing (ii) between vehicles. I am trying to solve the problem.

(Neural operator decomposition (NOS))
In the following, the NOS-based DNN architecture for federated dynamics learning is described in detail. First, discretization of equation (2) using operator decomposition methods such as PRS and DRS will be described. Next, we show that NOS is related to the constrained minimum cost problem for domain relaxation of state variables in {M ₁ , M ₂ }. This is important for managing state variables in large distributed systems. Furthermore, we specify the form of {M ₁ , M ₂ } for federated dynamics learning that repeats node state transitions (i) and inter-node message passing (ii).

<Discretization by operator decomposition>
Discretization of equation (2) using the operator decomposition method will be described. Various operator decomposition methods such as PRS, DRS, and FBS (Forward-Backward Splitting) are available in NOS.

In the present embodiment, PRS and DRS are selected among these. Because they are based on the Crank-Nicholson method (10) with second-order accuracy, both are expected to achieve accurate state transitions.

Discretization of equation (2) with PRS yields residual, recursive, and alternating DNN architectures, as shown in equation (12) below.
[PRS-Net]

In equation (12), two residual state updates C _Δt/2-Mi based on the Crank-Nicholson method are applied alternately. For other operator decompositions, applying DRS to the discretization of equation (2) yields residual, recursive, and alternating forms, as in equation (13) below.
[DRS-Net]

The DNN architectures of PRS-Net and DRS-Net are shown in FIG. From equations (12) and (13), PRS-Net and DRS-Net differ in that DRS-Net uses an averaging operation of twice the step size. Learning of θ in NOS (PRS-Net, DRS-Net, etc.) can be performed according to the adjoint method described above.

Since the _state ^transitions are ^performed by the set of residual operators D=D _q D _q ₋ ₁ ^. It can be obtained by replacing D _q ^T.

<Relaxation of state domain for flexible subdynamics cooperation>
Here we associate the NOS (PRS-Net and DRS-Net) with the constrained cost minimization problem for state-domain relaxation of two subdynamics {M ₁ , M ₂ }.

If the state variable of _M1 is x(t), then the state variable of _M2 is relaxed with y(t)=A(t)x(t)+b(t). Here we omit the time index in {A,b} to simplify the notation. Also, let A be chosen such that there is an inverse matrix (A ^T A) ⁻¹ for any t.

For the ADMM flow, it takes into account the discrete time limitation to create a continuous ODE form of (discrete) ADMM. In the ADMM flow, {M ₁ , M ₂ } are specified by the (sub)differential of the cost function V minimized with an affine transformation using (A ^T A) ⁻¹ .

Based on this, in the present embodiment, {M ₁ , M ₂ } are set by decomposing the function V into V=V ₁ +V ₂ as in the following equation (14). where _V1 is a smooth but non-convex function and _V2 is a convex function but may not be differentiable.

Also, ∂ indicates a subdifferential operator. Note that equation (14) differs from the ADMM flow in the following ways: (i) the parameter θ in V ₁ is learnable and non-convex, (ii) V ₂ may not be differentiable One, (iii) a bias b is added for generalization of the transform, (iv) {A,b} can be time-varying.

To explore the physical meaning of equation (14), we relate it to the constrained cost minimization problem. For this purpose, we constrain {V ₁ , V ₂ } to be convex and make the constraint parameters {A, b} time-invariant. A linearly constrained convex minimization problem is given by the following equation (15).

Since {V ₁ ,V ₂ } is constrained to be convex, {∇V ₁ ,∂V ₂ } has its inverse {(∇V ₁ ) ⁻¹ ,(∂V ₂ ) ⁻¹ }. Then, the dual problem of Equation (15) is given by Equation (16) below.

^where _the _Lagrangian coefficients _for the constraints _in ^Eq ^. -z)=sup _y (-<z,y>-V ₂ (y)). where ∇V ₁ ^* =(∇V ₁ ) ⁻¹ and ∂V ₂ ^* =(∂V ₂ ) ⁻¹ .

Applying DRS to equation (16) yields the well-known ADMM algorithm. This updates {x, y, z} in a residual, recursive, alternating manner in discrete steps.

V ₁ in equation (14) can be non-convex, but if Δt is small enough, it can be Taylor expanded around the previous state x _k−1 as in equation (17) below can be approximated by a local quadratic form.

Here the Hessian matrix of V ₁ is approximated by a scaled identity matrix using α>0. PRS-Net(12) and DRS-Net(13) solve equation (15) under the assumption that {V ₁ , V ₂ } are convex and {A, b} are time-invariant in that Consistent.

<Cost function specification for federated dynamics learning>
We now define {V ₁ , V ₂ } in equation (14) for federated dynamics learning. For this purpose, we introduce symbols of graphical notation. For the traffic control problem according to this embodiment, connections (edges) between vehicles (nodes) change over time. Therefore, we use the time-varying graph G(t).

G(t) consists of a set N(t) of N(t) nodes and a set E(t) of E(t) edges. The set of indices of neighboring vehicles connected to the i-th node is represented by E _i (t)={jεN(t)|(i,j)εE(t)}. Hereafter, we omit the time index of {G(t), N(t), E(t)}.

At each node there is a dynamic subsystem with its own cost function V _local,i (x _i ,o _i ,θ _i ,t). From a global point of view, the input variable stack consists ^of time-varying variables x=[ _x1T ,..., _xNT ^] ^T , o=[ _o1T ^, ..., _oNT ^] ^T and time consists of invariant variables θ=[θ ₁ ^T , . . . , θ _N ^T ] ^T .

To simplify the notation, _the control _input o _i _includes the edge information E _i and the time-varying constraint parameters {A _i|j , b _i|j }. The stack of {A _i|j ,b _i|j } is associated with {A,b} in equation (14) as follows.

For federated dynamics learning, a cost function for state transition is formulated as shown in Equation (18) below.

In the traffic rectification problem in this embodiment, (i) V _local,i is designed to accelerate the velocity state towards the target, and (ii) inequality constraints are designed to maintain inter-vehicle distance Designed.

The inequality constraint can be expressed as (I + P) (Ax + b) ≤ 0. where P(o,t) is a permutation matrix that defines the exchange of variables between adjacent nodes following the time-varying graph G. If the constraint is represented by an indicator function, Equation (18) can be expressed as Equation (19) below.

The indicator function is expressed as Equation (20) below.

Since the constrained optimization problem (15) is consistent with the ODE form (14), {V ₁ , V ₂ } can be defined by equation (21) below for federated dynamics learning.

As described above, the cost function (21) having the quadratic approximation (17) is substituted into the dynamics (14) in PRS-Net (12) and DRS-Net (13), so that federated dynamics learning A NOS-based DNN architecture for is defined.

(Distributed traffic rectification for autonomous vehicles)
The operation of a distributed control system that implements NOS-applied traffic rectification will now be described in detail. Each vehicle autonomously updates its state (eg, speed) using external control inputs (eg, inputs from cameras/LiDAR) and message passing between nearby vehicles.

　The behavior of the distributed control system is a function of state transition dynamics, and the inputs are the previous state x and the control input o. Also, NOS-based federated dynamics learning is used to optimize the parameter θ of the state transition dynamics.

Below, first, an overview of related technologies related to distributed traffic rectification will be explained. Next, {J, V, A, b} in Eq. (21) for signal-free traffic rectification that accelerates the average vehicle speed to the target speed while avoiding collision by maintaining inter-vehicle distance will be described. Then, derivation of state transition DNN architecture based on PRS-Net and DRS-Net will be described.

<Related technology>
Related technologies related to traffic rectification include, for example, the reference "Wang, Z., Zheng, Y., Li, S. E., You, K., and Li, K. (2018). Parallel optimal control for cooperative automation of large-scale connected vehicles via admm."

This technology describes using an ADMM to set each vehicle to a target speed while maintaining a certain distance between vehicles. However, there are problems: (i) extra link nodes are needed to connect vehicle nodes; (ii) there are no learnable parameters in the state transition dynamics.

On the other hand, the technology according to the present embodiment (i) realizes completely decentralized traffic rectification consisting only of vehicle nodes, and (ii) realizes federated dynamics learning for obtaining optimal autonomous state transition dynamics. are doing.

<Federated dynamics learning for distributed traffic rectification>
First, {J, V _{local, i} } in equations (4) and (21) are formulated for the traffic rectification problem.

For N autonomous vehicles (nodes), given their routes, initial velocity states, positions, and maximum velocities s _max [m/s], each vehicle has a normalized velocity state x=[ Let us define a common set of learnable parameters ^— θ(= _θ1 =, ^{...,=θN) for determining x1,...,xN]T} _, ₍ _xi _∈ [0,1]) think to keep. In addition, due to restrictions on the type of characters used in the specification, a bar at the beginning of the character is indicated in front of the character, such as “ ^-θ ”.

The goal of federated dynamics learning for decentralized signal-free traffic rectification is to keep x _i (t) close to the target x _tar ε[0,1] while keeping some distance between vehicles to avoid collisions. ^Na- is to find θ.

Therefore, in the present embodiment, the following function is selected as the cost function in Equation (4).

As a simple example, assume the following scenario.

Each vehicle runs in the center of a single-lane road and overtaking is prohibited. That is, each vehicle follows the other vehicle across the intersection. The control inputs o _i of each vehicle are the normalized velocity o _spd,i , the 2D direction vector o _dir,i , the 2D position vector o _pos,i , the surrounding image o _img,i of each vehicle, the mapping vector o _{map, i|j} and the set jεE _i of neighboring connections. The mapping vector o _map,i|j is for converting the 2D positions into scalar values for measuring the distance from the i th vehicle to the j th vehicle.

The cost function ^- V _local (=V _local,1 =,...,=V _local,N ) is designed to estimate the velocity state x _i (t) by a quadratic function as shown in equation (17). there is The quadratic function is the learnable acceleration

including. This term uses ρε(0,1) to approximate x _i (t) to the point between x _tar and the previous state x _i,k−1 as follows: be. The value range limit for x _i is x _i ε[0,1].

Before specifying {A, b}, we consider the physical meaning of these parameters to derive a NOS-based state update DNN architecture.

^- By substituting V _local into equation (12) for PRS-Net and into equation (13) for DRS-Net, the residual, recursive, and alternating update rules as shown in FIG. It is summarized as the following formula (22).

In the above formula, π differs between PRS-NET and DRS-NET. Also, [ ] _{∈ [0, 1]} indicates that the internal element value is clamped to the range between 0 and 1,

indicates to set negative elements to 0.

Next, identification of {A, b} in equation (18) will be described. In this embodiment, no collision constraint is imposed on the state pair {x _i , x _j }. Because of this constraint, {x _i , x _j } are constrained to keep the Manhattan distance (l _i|j +l _j|i ) greater than or equal to l _min [m], as in Equation (23) below. be.

here,

can be calculated for the i-th vehicle. Also, the mapping vectors {oma _p,i|j ,o _map,j|i } are chosen to make the left hand side of equation (23) positive. However, {A, b} cannot be uniquely determined because there is ambiguity when converting equation (23) into the form of the inequality constraint Ax+b≦0 shown in equation (18).

From the update rule for x in equation (22), {A,b} is associated with each vehicle's acceleration/deceleration. For example, when A _i|j b _i|j is positive, the i th vehicle brakes to keep some distance to the j th vehicle.

To avoid the problem of many vehicles getting stuck because the vehicles behind cannot pass the vehicles ahead that are stuck, in the present embodiment, for each vehicle pair, a front flag/rear (back) flag to prevent the vehicle in front from braking as much as possible. Specifically, the design of {A _i|j , b _i|j } in this embodiment is as shown in Equation (24) below.

In the above equation, A _i|j is normalized as σ ² _max =σ ² _min =1.

The derivation of the above equation, ie, the no-collision constraint in inequality form, will now be described in more detail. Given the current state x _i , the 2D direction o _dir,i , and the parameters {s _max ,Δt}, the position of the i-th vehicle's next time step is p _i =o _pos,i +Δt·s _max can be estimated as x _i ·o _dir,i . For the position of the jth vehicle, it can be estimated as p _j =o _pos,j +Δt·s _max ·x _j ·o _dir,j .

Choose a pair of mapping vectors {o _{map, i|j} , o _{map, j|i} } and calculate the Manhattan distance as

A collision-free constraint that keeps the Manhattan distance between {i,j} vehicles greater than or equal to l _min is given by equation (25) below.

Equation (25) above corresponds to equation (23).

Next, conversion of equation (23) to equation (24) will be described. From the update rule for x in equation (22), it can be seen that vehicle i brakes when A _i|j b _i|j is positive. As mentioned above, the design policy for {A _i|j ,b _i|j } is to avoid collisions by adjusting the acceleration/deceleration of the vehicle behind after assigning a front/rear flag to each vehicle pair. is.

In the follow-up case, the front/rear allocation is determined by the position and orientation of the vehicle pair. In the case of crossing an intersection, the current speed and position can be used to determine the assignment of the front/back flag with the estimated remaining time to the center of the intersection.

Assuming that the i-th vehicle is in front and the j-th vehicle is in the rear, the transformation of equation (23) to fit the inequality constraints of equation (18) is to accelerate/decelerate the trailing vehicle as shown in equation (24). is done by normalizing A _{i |j} to 1 and transferring all remaining biases to b _{j |i} . Since {A _i|j , A _i|j } have been normalized, all eigenvalues of A ^T A become 1. That is, σ ² _max =σ ² _min =1.

An outline of the distributed traffic rectification procedure using PRS-Net and DRS-Net is shown in Figure 8 as Algorithm 1 (Alg.1). In Algorithm 1, the operation of each node/each edge is decomposed and shown.

In the procedure of FIG. 8, initialization is performed on the first line. The processing in the 3rd to 19th lines is performed from 1 to K every time k. In lines 4 to 6, each node i performs control input, edge connection, and acquisition of parameters {A _i|j , b _i|j } according to equation (24).

On lines 8-11, each node i performs message passing (receiving message z _{j |i} ) for each edge j connected to it.

On lines 13-18, each node updates its internal state based on equation (22). This allows node i to compute k+1 velocities (x _i ) and k+1 messages (z _i|j for each j).

In the above process, as message passing, a single dimensional {z _i|j , z _j|i } is exchanged between connected vehicles (i, j∈E _i ) at each discrete time, so , the communication is lightweight.

Each vehicle (or control server 200) records data {x, o, A, b} in a fixed time window KΔt=T. Each vehicle (or control server 200) performs backward propagation (11) by using the recorded data {x, o, A, b} to update ^−θ . Federated dynamics learning is performed by iterative I-rounds of forward-propagation/back-propagation.

The example shown in FIG. 7 shows an image in which the control server 200 records data {x, o, A, b} at each discrete time and performs backward propagation parameter learning based on cost calculation. ing.

(Numerical experiment)
Numerical experiments were performed using the traffic simulator SUMO to demonstrate the effectiveness of NOS-based associative dynamics learning. average vehicle speed while avoiding vehicle collisions

We aimed to train ^- θ so that ≉x _tar =1.0. The maximum speed was s _max =8.75 [m/s] (=31.5 km/h). Typical parameters are shown in FIG.

<Experiment setting>
As shown in FIG. 10, N=30 vehicles were randomly placed in the traffic simulator. In addition, 10 road maps were prepared. To avoid overfitting to the training data, each road map has small random perturbations for straight road lengths and intersection locations.

The vehicle's initial state (position and speed) was set randomly. Each vehicle traveled on a randomly selected road pre-defined with right and left turns. For each mini-batch (with random road maps and road perturbations), we chose K=300 [times] and Δt=0.1 [s].

Therefore, the forward propagation by the traffic simulation is performed for T = KΔt = 30.0 seconds and then using the recorded data {x, o, A, b} to update the dynamics parameter ^-θ Back propagation was performed. Parameters were updated with Adam using g _θ with a learning rate of 0.0025. One set of this forward propagation and backward propagation is one round.

　Dynamics parameter estimates were obtained over a total of I = 500 rounds. Evaluation sessions were conducted every 10 rounds. A total of 50 patterns of fixed initial settings (10 road maps x 5 road randomness) were prepared for a fair comparison between the proposed method and the reference method.

An image o _img,i of each vehicle and its surroundings was generated at each discrete time instant as the control input for determining the acceleration/deceleration of each vehicle.

The size of the image is defined as 64 (W) x 64 (H) x 5 (Ch). Each channel consists of (1) the surrounding roads, (2) the surrounding vehicle positions, (3) their normalized velocities, and (4,5) the 2D direction vectors of the surrounding vehicles. Each image is rotated so that each vehicle is facing the same direction. Examples of o _img for vehicle 1 and vehicle 30 in FIG. 10 are shown in FIGS. 11 and 12, respectively. FIG. 13 shows an image of repeating state update and learning for each round.

　In the proposed method (NOS), PRS-Net and DRS-Net are set to Alg. 1. In updating the state of equation (22),

, we used a small-sized CNN and chose α=0.05, ρ=0.75 to mitigate abrupt changes in x _i . Δt=2πασ ² _min =0.1π[s] was used as the optimal discrete time interval setting. Since π=1 ^for PRS-Net and π=1/2 for DRS-Net, the _forward direction/ The number of seconds for backward propagation is different.

To achieve realistic message passing, we assumed P to be a sparse matrix. We allowed communication between vehicle pairs (i, jεE _i ) within Manhattan distance of l _com =25.0 [m] at any time. That is, the communication area was set to 25.0 [m]. {z _i|j , A _i|j , b _i|j } were adaptively assigned to the i-th vehicle after communication between {i,j} vehicles was initiated.

When a vehicle connected by an edge is further than _{l_com} or the road changes (eg, from an intersection to a straight road), message passing stops and variables are deallocated to save memory. l _min =15.0 [m] was used as the minimum distance constraint in Equation (23).

<Reference method>
Three reference systems were prepared to evaluate NOS, a theoretical DNN architecture based on discretized ODE(2).

As the first reference system, we used SUMO's native collision-free traffic rectification with appropriate parameter settings (TraCI's speed mode: 31). However, SUMO was thought to be designed to be controlled using centralized traffic lights, and SUMO-controlled vehicles often stopped before entering intersections to avoid collisions.

To remedy this situation under this SUMO collision-free setting, we built Plug and Play (PnP) Net as a second reference system. PnP-Net uses state transitions implemented in recursive DNNs for updating x. The method avoided analytical derivation as much as possible and learned in a data-driven manner. The DNN parameter sizes of this PnP-Net were initially set as close as possible to the NOS parameter values.

　As a third reference system, in both the PRS-Net and DRS-Net types, learnable DNN functions

NOS was used without . As shown in FIG. 7, both PnP-Net and learning-impossible NOS alternately perform (i) internal state update and (ii) message passing between adjacent vehicles.

<Implementation>
We built the software running on a server with a GPU (NVIDIA DGX A100 8), with a CPU (AMD EPYC 7262 8-Core 4). PyTorch (1.9.0+cu111) was used for DNN. In each round of PRS-Net, the average computation time was 45.4 [s] without considering SUMO-GUI.

<Experimental results>
FIG. 14 shows the experimental results. FIG. 14 shows that the average normalized speed changes with increasing number of rounds. The highest speed ( ^−x _av =0.98) was obtained with the PRS-Net and the second speed ( ^−x _av =0.97) with the DRS-Net. The performance difference between them is small. That is, both methods are effective for NOS-based associative dynamics learning. In each learning round, we confirmed that ^-θ was updated to maintain the inter-vehicle distance as much as possible.

Among the reference methods, PnP-Net obtained the highest speed ( ^−x _av =0.64). However, NOS scores could not be reached because the added cost of keeping distance interfered with training to obtain the desired dynamics.

For the remaining non-learnable reference methods, where the vehicle often stops before entering the intersection to avoid a collision, the average normalized speed is 0.30 for non-learnable PRS-Net and 0 for non-learnable DRS-Net. .31, and 0.56 with SUMO's native traffic control system. Experimental results show that NOS-based associative dynamics learning is effective.

FIG. 15 is a diagram that more clearly shows the difference between NOS-based joint dynamics learning (proposed method) and the conventional method of SUMO implementation (conventional method). As shown in FIG. 15, in the proposed method, the normalization speed approaches the target value as learning progresses, and the speed is greatly improved over the conventional method (SUMO implementation).

Also, FIG. 16 shows a convergence curve for the evaluation set, showing better performance as it approaches 0.0. As shown in FIG. 16, the loss value (proportional to the difference between the normalized speed target value and the current value) decreases with learning.

(Device configuration example)
FIG. 17 shows a configuration example of the vehicle 1. As shown in FIG. As shown in FIG. 17, the vehicle 1 has a camera 11, a sensor 12, a control device 100, a communication section 13, and a drive section . Camera 11 acquires an image of the surroundings. The sensor 12 acquires its own position information by Lidar, GPS, or the like, for example. Sensor 12 may include the ability to acquire its own velocity. Also, a sensor that further acquires information other than these may be mounted.

The control device 100 inputs external information acquired by the camera 11 and the sensor 12, performs the processing of Algorithm 1, and outputs the state (speed) x and the message z. The control device 100 may also include a function of parameter learning by backward propagation based on Equation (11).

The communication unit 13 receives a message transmitted from another adjacent vehicle, passes the message to the control device 100, and transmits a message output from the control device 100 to the other adjacent vehicle. In addition, when learning is performed by the control server 200, the communication unit 13 transmits the recorded data {x, o, A, b} obtained in the state update to the control server 200, and the latest learned parameter θ Received from control server 200 .

The drive unit 14 includes a function (engine, motor, etc.) for running according to the state x output from the control device 100 . For example, when a certain speed is output as a state from the control device 100, the vehicle is driven so as to run at that speed.

18 shows a configuration example of the control device 100. FIG. As shown in FIG. 18 , the control device 100 includes an input section 110 , a state update section 120 , an output section 130 , a data storage section 140 and a learning section 150 . Note that when learning is performed by the control server 200 , the learning unit 150 may not be provided in the control device 100 .

The input unit 110 inputs external information o acquired by the camera 11 and the sensor 12 and a message z received from an adjacent vehicle. The state update unit 120 is a DNN that implements a NOS that updates the states of x and z according to Algorithm 1 .

The output unit 130 outputs the state x obtained by the state update unit 120 and the message z. The data storage unit 140 records the data {x, o, A, b} obtained in the process of processing by the state update unit 120 for each discrete time. Further, the data storage unit 140 stores the latest learned parameter θ, and the state update unit 120 executes state update processing using the latest learned parameter θ.

The learning unit 150 learns the parameter θ by backward propagation (11) using the recorded data {x, o, A, b}, and stores the learned parameter θ in the data storage unit 140.

FIG. 19 shows a configuration example of the control server 200. As shown in FIG. 19, the control server 200 includes an input unit 210, a learning unit 220, an output unit 230, and a data storage unit 240.

The input unit 210 receives recorded data {x, o, A, b} from each vehicle. The data storage unit 240 stores recorded data {x, o, A, b} received from each vehicle. The learning unit 220 learns the parameter θ by backward propagation (11) using the recorded data {x, o, A, b} stored in the data storage unit 240 . The output unit 230 transmits the learned parameter θ to each vehicle.

(Hardware configuration example)
Both the control device 100 and the control server 200 can be realized, for example, by causing a computer to execute a program.

That is, the device can be realized by executing a program corresponding to the processing performed by the device using hardware resources such as a CPU, GPU, and memory built into the computer. The above program can be recorded in a computer-readable recording medium (portable memory, etc.), saved, or distributed. It is also possible to provide the above program through a network such as the Internet or e-mail.

FIG. 20 is a diagram showing a hardware configuration example of the computer. The computer of FIG. 20 has a drive device 1000, an auxiliary storage device 1002, a memory device 1003, a processor 1004, an interface device 1005, a display device 1006, an input device 1007, an output device 1008, etc., which are interconnected by a bus BS. . The processor 1004 may be a CPU, a GPU, or both a CPU and a GPU.

A program that implements the processing in the computer is provided by a recording medium 1001 such as a CD-ROM or memory card, for example. When the recording medium 1001 storing the program is set in the drive device 1000 , the program is installed from the recording medium 1001 to the auxiliary storage device 1002 via the drive device 1000 . However, the program does not necessarily need to be installed from the recording medium 1001, and may be downloaded from another computer via the network. The auxiliary storage device 1002 stores installed programs, as well as necessary files and data.

The memory device 1003 reads and stores the program from the auxiliary storage device 1002 when a program activation instruction is received. The processor 1004 implements the functions of the light touch maintaining device 100 according to programs stored in the memory device 1003 . The interface device 1005 is used as an interface for connecting to a network or the like. A display device 1006 displays a GUI (Graphical User Interface) or the like by a program. An input device 1007 is composed of a keyboard, a mouse, buttons, a touch panel, or the like, and is used to input various operational instructions. The output device 1008 outputs the calculation result.

(Summary, effect)
As described above, in this embodiment, NOS is proposed as an extension of NODE for federated dynamics learning, and the two subdynamics of equation (2) are applied to (i) internal state updating and (ii) message passing. decided to assign. We also constructed NOS-based DNN architectures such as PRS-Net and DRS-Net through the discretization of equation (2) based on the operator decomposition method.

NOS has also been successfully applied to the problem of signal-free traffic rectification with the goal of finding the dynamics parameter ^-θ that maximizes the average speed to a target value.

As a result, it can be expected to reduce travel and transportation time to the limit while preventing traffic accidents (collisions) without using traffic lights.

(Appendix)
This specification discloses at least a traffic control system, a control device, a control method, and a program for each of the following items.
(Section 1)
A control device in a traffic rectification system comprising a plurality of mobile bodies equipped with a control device, wherein the plurality of mobile bodies autonomously rectify traffic so as to prevent collisions between the mobile bodies,
Between mobiles based on state update dynamics, including sub-dynamics for state updates of said mobile and sub-dynamics for message passing between other mobiles in proximity to said mobile. a state updating unit that updates the state of the moving object under constraints for deterring collisions;
an output unit that outputs the state updated by the state update unit and a message;
A control device comprising:
(Section 2)
2. The control device according to claim 1, wherein the state update unit uses a neural network to alternately repeat the state update of the moving object and the message passing.
(Section 3)
3. The control device according to claim 2, wherein the parameters in the neural network are updated by backpropagation calculation based on the adjoint method so that the speed of the moving object approaches the target speed under the constraint.
(Section 4)
The state update unit calculates the state at the next point in time based on the state at the point in time, the external control input at the point in time, and the message received from another mobile object at the point in time. 4. The control device according to any one of items 3 to 3.
(Section 5)
5. The control device according to claim 4, wherein the restriction is to set the distance between the mobile body and the other mobile body to a predetermined distance or more, which is calculated based on the external control input.
(Section 6)
a plurality of mobile bodies equipped with the control device according to any one of items 1 to 5;
a control server that updates the parameters of the neural network provided in the state update unit by back propagation calculation based on the adjoint method so that the average speed of a plurality of moving bodies approaches a target speed under the constraints. .
(Section 7)
A control method executed by a control device in a traffic control system comprising a plurality of mobile bodies equipped with a control device and autonomously controlling traffic so as to prevent collisions between the mobile bodies. There is
Between mobiles based on state update dynamics, including sub-dynamics for state updates of said mobile and sub-dynamics for message passing between other mobiles in proximity to said mobile. a state update step of updating the state of the moving object under constraints for deterring collisions;
an output step of outputting the state updated by the state update step and a message;
A control method comprising:
(Section 8)
A program for causing a computer to function as each unit in the control device according to any one of items 1 to 5.

Although the present embodiment has been described above, the present invention is not limited to such a specific embodiment, and various modifications and changes can be made within the scope of the gist of the present invention described in the claims. It is possible.

1 vehicle 11 camera 12 sensor 13 communication unit 14 drive unit 100 control device 110 input unit 120 state update unit 130 output unit 140 data storage unit 150 learning unit 200 control server 210 input unit 220 learning unit 230 output unit 240 data storage unit 300 network 1000 drive device 1001 recording medium 1002 auxiliary storage device 1003 memory device 1004 processor 1005 interface device 1006 display device 1007 input device

Claims

A control device in a traffic rectification system comprising a plurality of mobile bodies equipped with a control device, wherein the plurality of mobile bodies autonomously rectify traffic so as to prevent collisions between the mobile bodies,
Between mobiles based on state update dynamics, including sub-dynamics for state updates of said mobile and sub-dynamics for message passing between other mobiles in proximity to said mobile. a state updating unit that updates the state of the moving object under constraints for deterring collisions;
an output unit that outputs the state updated by the state update unit and a message;
A control device comprising:
The control device according to claim 1, wherein the state update unit uses a neural network to alternately repeat the state update of the moving body and the message passing.
3. The control device according to claim 2, wherein the parameters in the neural network are updated by backpropagation calculation based on the adjoint method so that the speed of the moving body approaches the target speed under the constraint.
The state update unit calculates the state at the next point in time based on the state at the point in time, the external control input at the point in time, and the message received from another mobile object at the point in time. 4. The control device according to any one of 3.
5. The control device according to claim 4, wherein the restriction is that the distance between the moving body and the other moving body, which is calculated based on the external control input, is greater than or equal to a predetermined distance.
a plurality of moving bodies comprising the control device according to any one of claims 1 to 5;
a control server that updates the parameters of the neural network provided in the state update unit by back propagation calculation based on the adjoint method so that the average speed of a plurality of moving bodies approaches a target speed under the constraints. .
A control method executed by a control device in a traffic control system comprising a plurality of mobile bodies equipped with a control device and autonomously controlling traffic so as to prevent collisions between the mobile bodies. There is
Between mobiles based on state update dynamics, including sub-dynamics for state updates of said mobile and sub-dynamics for message passing between other mobiles in proximity to said mobile. a state update step of updating the state of the moving body under constraints for deterring collisions;
an output step of outputting the state updated by the state update step and a message;
A control method comprising:
A program for causing a computer to function as each unit in the control device according to any one of claims 1 to 5.