CN111489568B

CN111489568B - Traffic signal lamp regulation and control method and device and computer readable storage medium

Info

Publication number: CN111489568B
Application number: CN201910074683.8A
Authority: CN
Inventors: 肖楠; 余亮; 刘跃虎; 张欣
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-01-25
Filing date: 2019-01-25
Publication date: 2022-08-02
Anticipated expiration: 2039-01-25
Also published as: CN111489568A

Abstract

The invention provides a traffic signal lamp regulation and control method, a traffic signal lamp regulation and control device and a computer readable storage medium. The method comprises the following steps: obtaining training samples according to historical control data, wherein each training sample comprises historical signal lamp timing, historical traffic measurement and historical traffic indexes; training according to the training sample to obtain a mapping function; acquiring actual traffic measurement and actual traffic indexes; and calculating to obtain the timing of the signal lamp to be output according to the mapping function, the actual traffic measurement and the actual traffic index. According to the invention, the problems that the performance of the signal lamp control system is poor in the initial interactive training process of the existing signal lamp timing scheme and the final performance of the signal lamp control system is poor due to the error of a traffic model can be effectively solved. The initial performance and the final performance of the signal lamp control system are improved.

Description

Traffic signal lamp regulation and control method and device and computer readable storage medium

Technical Field

The invention relates to the technical field of traffic control, in particular to a method and a device for regulating and controlling a traffic signal lamp and a computer readable storage medium.

Background

With the increasing speed of urbanization and the popularization of motor vehicles, people face the problem that traffic congestion becomes increasingly serious. Under the condition of not increasing new traffic roads, the utilization efficiency of the roads is improved through reasonable traffic control, and then the improvement of the traffic efficiency is an effective way for solving the problem of traffic congestion.

At present, a traffic signal regulation scheme based on a traffic model is applied to a traffic signal control system. Traffic models typically include traffic flow models, queue length models, delay models, etc., which establish a mathematical relationship between traffic measurements and signal timing and traffic conditions or traffic indicators. And outputting signal lamp timing by optimizing the traffic state or traffic index based on the signal lamp regulation and control scheme of the traffic model. However, in the signal lamp control scheme, errors exist in the traffic model, and the final performance of the signal lamp control system is poor.

And the signal lamp regulation and control scheme based on the traditional deep reinforcement learning is characterized in that signal lamp timing is randomly generated in an initial stage to interact with a signal lamp control system, training learning is carried out by using the randomly generated signal lamp timing and traffic measurement data and traffic index data which are acquired from the signal lamp control system in real time, the performance of the signal lamp control system is gradually improved through multiple times of training learning, and thus signal lamp regulation and control are realized. However, according to the signal lamp regulation and control scheme, a large amount of data needs to be trained and learned before ideal signal lamp timing is obtained, so that the performance of a signal lamp control system in an initial interactive training process is poor, and the feasibility in practical application is poor.

Accordingly, the inventors have recognized a need for improvement in response to the problems with the prior art described above.

Disclosure of Invention

An object of the embodiments of the present invention is to provide a new technical solution for regulating and controlling a traffic signal lamp.

According to a first aspect of the embodiments of the present invention, there is provided a method for regulating a traffic signal lamp, the method including:

obtaining training samples according to historical control data, wherein each training sample comprises historical signal lamp timing, historical traffic measurement and historical traffic indexes;

training according to the training sample to obtain a mapping function;

acquiring actual traffic measurement and actual traffic indexes;

and calculating to obtain the timing of the signal lamp to be output according to the mapping function, the actual traffic measurement and the actual traffic index.

Optionally, the training according to the training sample to obtain a mapping function includes:

and training a loss function according to the training sample to obtain the mapping function.

Optionally, the loss function is a weighted sum of a time difference loss function, a supervised learning loss function, and a regularization loss function, which are respectively set to corresponding weight values.

respectively determining a signal lamp timing prediction expression of each training sample by taking the undetermined coefficient of the mapping function as a variable according to each training sample;

constructing a loss function according to the signal lamp timing prediction expression of each training sample and the historical signal lamp timing of each training sample;

and determining the undetermined coefficient according to the loss function, and finishing the training of the mapping function.

Optionally, after the time of the signal light to be output is obtained through calculation, the method further includes:

acquiring the signal lamp timing to be output, the actual traffic measurement and the actual traffic index as new training samples;

and correcting the mapping function according to the new training sample.

Optionally, the traffic measurement at least includes traffic flow statistics of each lane or each phase at the intersection;

the traffic index at least comprises vehicle delay;

the signal lamp timing comprises at least one of the following items: signal period, split, phase difference and phase sequence.

Optionally, the method further includes: and storing the signal lamp timing to be output, the actual traffic measurement and the actual traffic index as the historical control data.

According to a second aspect of the present invention, there is provided a method of regulating a traffic signal lamp, the method comprising:

acquiring a mapping function, actual traffic measurement and actual traffic indexes; the mapping function is obtained by training a training sample obtained according to historical control data;

According to a third aspect of the present invention, there is provided a traffic signal lamp regulating device, the device comprising:

the historical data acquisition module is used for acquiring training samples according to historical control data, wherein each training sample comprises historical signal lamp timing, historical traffic measurement and historical traffic indexes;

the training module is used for training according to the training sample to obtain a mapping function;

the actual data acquisition module is used for acquiring actual traffic measurement and actual traffic indexes;

and the calculation module is used for calculating the timing of the signal lamp to be output according to the mapping function, the actual traffic measurement and the actual traffic index.

According to a fourth aspect of the present invention, there is provided a traffic signal lamp regulating device, the device comprising:

the acquisition module is used for acquiring a mapping function, actual traffic measurement and actual traffic indexes; the mapping function is obtained by training a training sample obtained according to historical control data;

According to a fifth aspect of the present invention, there is provided a traffic signal lamp control apparatus, comprising: a memory and a processor; the memory is configured to store executable instructions, and the processor is configured to execute the operations in the traffic signal lamp regulation method according to any one of the first aspect of the present invention according to the control of the instructions, or the processor is configured to execute the operations in the traffic signal lamp regulation method according to the second aspect of the present invention according to the control of the instructions.

According to a sixth aspect of the present invention, there is provided a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of regulating a traffic signal lamp according to any one of the first aspects of the present invention; alternatively, the computer program, when being executed by a processor, implements the method for regulating a traffic signal according to the second aspect of the present invention.

According to one embodiment of the invention, the problems that the initial performance of a signal lamp control system is poor in the initial interactive training process of the existing signal lamp timing scheme and the final performance of the signal lamp control system is poor due to the error of a traffic model can be effectively solved. The initial performance and the final performance of the signal lamp control system are improved.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a block diagram showing a hardware configuration of a server that can implement an embodiment of the present invention.

Fig. 2 shows a schematic flow chart of a traffic signal lamp control method according to a first embodiment of the present invention.

FIG. 3 is a schematic flow chart diagram of step 2200 in accordance with an embodiment of the present invention.

Fig. 4 shows a schematic flow chart of a traffic signal lamp control method according to a second embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a traffic signal lamp control device according to a first embodiment of the present invention.

Fig. 6 is a schematic structural diagram of a traffic signal lamp control device according to a second embodiment of the present invention.

Fig. 7 is a schematic structural diagram of a regulation and control device of a traffic signal lamp according to a third embodiment of the invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Various embodiments and examples according to embodiments of the present invention are described below with reference to the accompanying drawings.

< hardware configuration >

The server 1000 may be, for example, a blade server or the like. In one example, server 1000 may be a computer. In another example, the server 1000 may be as shown in fig. 1, including a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600. Although the server 1000 may also include a speaker, a microphone, and the like, these components are not relevant to the present invention and are omitted here.

The processor 1100 may be, for example, a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a serial interface, and the like. Communication device 1400 is capable of wired or wireless communication, for example. The display device 1500 is, for example, a liquid crystal display panel. The input device 1600 may include, for example, a touch screen, a keyboard, and the like.

The servers shown in fig. 1 are merely illustrative and are in no way meant to limit the invention, its application, or uses. In an embodiment of the present invention, the memory 1200 of the server 1000 is used for storing instructions, and the instructions are used for controlling the processor 1100 to operate so as to execute any one of the methods for regulating and controlling a traffic signal lamp provided by the embodiment of the present invention. It should be understood by those skilled in the art that although a plurality of devices are shown for the server 1000 in fig. 1, the present invention may only relate to some of the devices, for example, only the processor 1100 and the storage device 1200 of the server 1000. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.

< method examples >

The method for regulating and controlling a traffic signal lamp in this embodiment may be specifically executed by the server 1000 shown in fig. 1.

As shown in fig. 2, at step 2100, training samples are obtained based on historical control data.

It should be noted that the historical control data is input and output data in the interaction process between the existing traffic model-based signal lamp control algorithm and the signal lamp control system. The input data in the historical control data comprises historical traffic measurements and historical traffic indicators, and the output data in the historical control data comprises historical signal light timing.

In this embodiment, according to different application scenarios, the server 1000 may obtain historical traffic measurement, historical traffic index, and historical signal light timing of an intersection; alternatively, the server 1000 may obtain historical traffic measurements, historical traffic indicators, and historical signal light timing for a specified road segment; alternatively, the server 1000 may obtain historical traffic measurements, historical traffic indicators, and historical signal light timing for the entire traffic road network. This embodiment is not particularly limited thereto.

Specifically, the historical traffic measurement at least includes traffic flow statistics of each lane or each phase of the intersection. The historical traffic indicators include at least vehicle delays. The signal timing comprises at least one of: signal period, split, phase difference and phase sequence. Optionally, the historical traffic measurements may further include at least one of queue length, vehicle speed, and vehicle position. The historical traffic index may also include the number of times the vehicle was parked.

The vehicle delay refers to the difference between the actual time when the vehicle runs and the time when the vehicle is supposed to run at the speed of the free-flow vehicle, and the average delay of the vehicle is counted according to the application scene to delay all vehicles at the intersection, the specified road section or the traffic road network.

The signal period refers to the time from the start of the turn-on of the green light of the intersection phase to the turn-on of the green light again. The green signal ratio is the ratio of the green light display time of a certain phase at the intersection to the signal period time in one signal period. The phase difference refers to the time difference of green lights of two adjacent intersections, and in practical application, the good phase difference can enable the vehicle to continuously pass through a plurality of intersections without stopping. The phase sequence refers to the execution sequence of each phase of the intersection.

The queue length refers to the queue length of vehicles waiting to pass through each lane or each phase of the intersection within the red light time. The vehicle speed refers to the average speed of the vehicle, and the average speed of the vehicle is obtained by counting the instant speeds of all vehicles at the intersection, the specified road section or the traffic road network according to the application scene. The vehicle position refers to the instant coordinate position of the vehicle, and vehicle track data can be generated by the time sequence of the vehicle position. The number of vehicle stops is the sum of the number of times that the vehicle stops after entering the queuing state; wherein, when the vehicle speed is lower than 5km/h, the parking is recorded as the occurrence of parking, when the vehicle speed exceeds 10km/h, the parking is terminated, and the parking from the occurrence of parking to the termination of parking is recorded as one parking. The average number of times of parking or the total number of times of parking may be obtained by counting all vehicles at the intersection, the designated road section or the traffic road network according to the application scenario.

For example, in an application scenario in which historical control data of a specified link is acquired, assuming that a time unit is one day, the server 1000 acquires, as a training sample, historical traffic measurements and historical traffic indicators of the specified link in a past day (24 hours) and corresponding historical signal light timings when acquiring the training sample from the historical control data.

Step 2200, training according to the training sample to obtain a mapping function.

In this step, the server 1000 may adopt a gradient descent algorithm in a deep neural network algorithm, and perform offline training on the loss function according to the training sample to obtain a mapping function. Wherein the loss function comprises a time difference loss function, a supervised learning loss function, and a regularization loss function.

The time difference loss function is a difference function of expected total return in the future of two continuous moments in the reinforcement learning algorithm, and can be used for meeting the starting condition of the deep reinforcement learning algorithm in the subsequent step. The supervised learning loss function is an output error function of a mapping function taking historical control data of a traffic light regulation and control algorithm based on a traffic model as a prediction target, and is used for improving the initial starting performance of a deep reinforcement learning algorithm in the subsequent steps. The regularization loss function refers to the norm of the weights of the mapping function, and the commonly used norm comprises the square root 2 of the sum of the squares of the weights and the sum 1 of the absolute values of the weights, so as to avoid over-fitting in the pre-training process.

In this embodiment, the loss function is a weighted sum of a time difference loss function, a supervised learning loss function, and a regularization loss function, which are respectively set to corresponding weight values.

It should be noted that the time difference loss function is a precondition that the deep reinforcement learning algorithm normally works in the subsequent step, if the weight value set for the time difference loss function is too small, the deep reinforcement learning algorithm may not work, and if the weight value set for the time difference loss function is too large, the effects of other loss functions may be submerged. The supervised learning loss function is used for ensuring the initial performance of the deep reinforcement learning algorithm, if the weight value set for the supervised learning loss function is too small, the initial performance of the deep reinforcement learning algorithm is poor, and if the weight value set for the supervised learning loss function is too large, the effects of other loss functions are submerged. The regularization loss function ensures that overfitting does not occur in the deep reinforcement learning process, when the weight value of the regularization loss function is set, overfitting easily occurs when the weight value is set to be too small, and the action of other loss functions or under-fitting can be submerged when the weight value is set to be too large.

Therefore, the distribution of the weight values needs to reflect the balance relationship among the loss functions, and in practical application, the weight values set for the loss functions are determined through trial and error experiments according to different road network requirements. This embodiment is not particularly limited thereto.

Step 2300, obtaining actual traffic measurement and actual traffic index.

Specifically, actual traffic measurements and actual traffic indicators may be obtained from a signal light control system. The actual traffic measurement at least comprises traffic flow statistics of each lane or each phase of the intersection. The actual traffic indicator includes at least a vehicle delay. Optionally, the actual traffic measurement may further include at least one of a queue length, a vehicle speed, and a vehicle position. The actual traffic index can also comprise the number of vehicle parking.

And 2400, calculating to obtain the timing of the signal lamp to be output according to the mapping function, the actual traffic measurement and the actual traffic index.

In the step, a deep reinforcement learning algorithm is adopted, and the signal lamp timing to be output is obtained by calculating the mapping function, the actual traffic measurement and the actual traffic index as input values of the deep reinforcement learning algorithm.

After the signal lamp timing to be output is obtained through calculation, in the embodiment, the signal lamp timing to be output, the actual traffic measurement and the actual traffic index can be further obtained to be used as a new training sample; and correcting the mapping function according to the new training sample.

Further, in this embodiment, the time of the signal lamp to be output, the actual traffic measurement, and the actual traffic index may also be stored as the historical control data.

In an embodiment, as shown in fig. 3, the training in step 2200 above according to the training sample to obtain the mapping function may further include the following steps 2201 to 2203:

step 2201, according to each training sample, determining a signal light timing prediction expression of each training sample by taking the undetermined coefficient of the mapping function as a variable.

Step 2202, constructing a loss function according to the signal timing prediction expression of each training sample and the historical signal timing of each training sample.

Specifically, when the loss function is constructed, for each training sample, a corresponding loss expression is determined according to the signal timing prediction expression and the historical signal timing. And summing the loss expressions of each training sample to obtain the loss function.

Step 2203, determining the undetermined coefficient according to the loss function, and completing the training of the mapping function.

In a specific implementation, a neural network model is selected as the mapping function. The undetermined coefficient of the neural network model adopts random initialization quantity, historical traffic measurement and historical traffic indexes are used as input data, and historical signal lamp timing is used as output data, so that a training sample is obtained.

The server 1000 sequentially inputs the historical traffic measurements in a set of training samples into the neural network model and outputs corresponding signal lamp timing. The number of the selected training samples can be set according to the requirement, and is not limited specifically herein.

The server 1000 constructs a loss function according to the outputted signal lamp timing, the historical traffic index in the training sample, and the historical signal lamp timing. Specifically, for each of the training samples, a corresponding loss expression is determined. And summing the loss expressions of each training sample in the selected group of training samples to obtain the loss function.

And the server 1000 updates the undetermined coefficient of the neural network model by adopting a standard gradient descent algorithm according to the obtained loss function, and completes the training of the mapping function.

The server 1000 iterates the above steps multiple times until the loss function converges. The structure of the neural network model, the number of the selected group of training samples, and the weight values of the time difference loss function, the supervised learning loss function and the regularization loss function in the loss function are determined through the steps.

In the embodiment, training samples are obtained according to historical control data; training according to the training sample to obtain a mapping function; acquiring actual traffic measurement and actual traffic indexes; and calculating to obtain the timing of the signal lamp to be output according to the mapping function, the actual traffic measurement and the actual traffic index. By utilizing the existing historical control data of signal lamp control based on the traffic model, the deep neural network model is trained offline to obtain a mapping function, and the obtained mapping function is used as an initial neural network model for online deep reinforcement learning, so that the initial performance of the signal lamp control system in the initial training period of the online deep reinforcement learning is improved on one hand, and the final performance of the signal lamp control system can be further improved by the online deep reinforcement learning algorithm compared with the prior art on the other hand. Therefore, the problems that the initial performance of a signal lamp control system is poor in the initial interactive training process of the conventional signal lamp timing scheme and the final performance of the signal lamp control system is poor due to errors of a traffic model are effectively solved.

As shown in fig. 4, at step 4100, a mapping function, actual traffic measurements, and actual traffic indicators are obtained; the mapping function is obtained by training a training sample obtained according to historical control data.

In this step, the actual traffic measurement at least includes traffic flow statistics of each lane or each phase at the intersection. The actual traffic indicator includes at least a vehicle delay. Optionally, the actual traffic measurement may further include at least one of a queue length, a vehicle speed, and a vehicle position. The actual traffic index can also comprise the number of vehicle parking.

The historical control data includes historical traffic measurements, historical traffic indicators, and historical signal light timing. The historical signal timing comprises at least one of: signal period, split, phase difference and phase sequence. Each training sample includes historical signal light timing, historical traffic measurements, and historical traffic indicators.

The mapping function is obtained by the server 1000 obtaining a training sample according to historical control data and training the training sample, and the specific training process may refer to the description in the first embodiment, which is not described herein again.

In step 4200, the signal lamp timing to be output is calculated according to the mapping function, the actual traffic measurement and the actual traffic index.

Specifically, the server 1000 may use a deep reinforcement learning algorithm, and calculate the mapping function, the actual traffic measurement, and the actual traffic index as input values of the deep reinforcement learning algorithm to obtain the signal lamp timing to be output.

In the embodiment, the mapping function obtained after training the training sample obtained according to the historical control data, the actual traffic measurement and the actual traffic index are obtained, and the mapping function, the actual traffic measurement and the actual traffic index are calculated by using the deep reinforcement learning algorithm to obtain the timing of the signal lamp to be output, so that on one hand, the initial performance of the signal lamp control system in the initial training period of the deep reinforcement learning is improved, and on the other hand, the final performance of the signal lamp control system can be further improved by using the deep reinforcement learning algorithm compared with the prior art. Therefore, the problems that the initial performance of a signal lamp control system is poor in the initial interactive training process of the conventional signal lamp timing scheme and the final performance of the signal lamp control system is poor due to errors of a traffic model are effectively solved.

< apparatus embodiment >

As shown in fig. 5, the traffic signal lamp control device 5000 of the present embodiment may include: a historical data acquisition module 5100, a training module 5200, an actual data acquisition module 5300 and a calculation module 5400.

The historical data obtaining module 5100 is configured to obtain training samples according to historical control data, where each training sample includes historical signal light timing, historical traffic measurement, and historical traffic indicator.

The training module 5200 is configured to perform training according to the training sample to obtain a mapping function.

The actual data acquisition module 5300 is used for acquiring actual traffic measurements and actual traffic indicators.

The calculation module 5400 is configured to calculate, according to the mapping function, the actual traffic measurement, and the actual traffic index, a timing of the signal lamp to be output.

The loss function is a weighted sum of a time difference loss function, a supervised learning loss function and a regularization loss function which are respectively set to corresponding weight values.

The training module 5200 is specifically configured to train a loss function according to the training sample to obtain the mapping function.

Specifically, the training module 5200 is configured to respectively determine, according to each training sample, a signal light timing prediction expression of each training sample by using the undetermined coefficient of the mapping function as a variable; constructing a loss function according to the signal lamp timing prediction expression of each training sample and the historical signal lamp timing of each training sample; and determining the undetermined coefficient according to the loss function, and finishing the training of the mapping function.

Further, the actual data obtaining module 5300 may be further configured to obtain the signal light timing to be output, the actual traffic measurement, and the actual traffic indicator as a new training sample. Correspondingly, the training module 5200 may be further configured to modify the mapping function according to the new training sample.

The traffic measurement at least comprises traffic flow statistics of each lane or each phase of the intersection. The traffic indicator includes at least a vehicle delay. The signal lamp timing comprises at least one of the following items: signal period, split, phase difference and phase sequence.

Further, the traffic signal lamp control device 5000 may further include a storage module (not shown in the figure) for storing the signal lamp timing to be output, the actual traffic measurement, and the actual traffic indicator as the historical control data.

The traffic signal lamp regulation and control device of the embodiment can be used for executing the technical scheme of the method embodiment, the implementation principle and the technical effect are similar, and the details are not repeated here.

As shown in fig. 6, in the present embodiment, the control device 6000 of the traffic signal lamp may include a memory 6100 and a processor 6200; the memory 6100 is configured to store executable instructions, and the processor 6200 is configured to execute the operations in the method for regulating and controlling a traffic signal provided in any of the above embodiments according to the control of the instructions.

As shown in fig. 7, in this embodiment, the traffic signal lamp control apparatus 7000 includes: an acquisition module 7100 and a calculation module 7200.

The obtaining module 7100 is used for obtaining a mapping function, actual traffic measurement and actual traffic indexes; the mapping function is obtained by training a training sample obtained according to historical control data.

The calculation module 7200 is configured to calculate the time allocation of the signal lamp to be output according to the mapping function, the actual traffic measurement and the actual traffic index.

It will be appreciated by those skilled in the art that the traffic signal light regulating device can be implemented in various ways. For example, the processor may be configured by instructions to implement a traffic signal light regulating device. For example, instructions may be stored in ROM and read from ROM into a programmable device when the device is activated to implement a traffic signal light regulating apparatus. For example, the traffic signal light's regulating device may be cured into a dedicated device (e.g., ASIC). The regulating device of the traffic signal lamp can be divided into mutually independent units, or can be combined together to realize the regulation. The traffic signal lamp control device may be implemented by one of the above-described various implementations, or may be implemented by a combination of two or more of the above-described various implementations.

< computer-readable storage Medium >

In this embodiment, a computer-readable storage medium is further provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the method for regulating a traffic signal lamp according to any embodiment of the present invention.

It is well known to those skilled in the art that with the development of electronic information technology such as large scale integrated circuit technology and the trend of software hardware, it has been difficult to clearly divide the software and hardware boundaries of a computer system. As any of the operations may be implemented in software or hardware. Execution of any of the instructions may be performed by hardware, as well as by software. Whether a hardware implementation or a software implementation is employed for a certain machine function depends on non-technical factors such as price, speed, reliability, storage capacity, change period, and the like. A software implementation and a hardware implementation are equivalent for the skilled person. The skilled person can choose software or hardware to implement the above described scheme as desired. Therefore, specific software or hardware is not limited herein.

The present invention may be an apparatus, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.

While embodiments of the present invention have been described above, the above description is illustrative, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A method for regulating and controlling a traffic signal lamp, the method comprising:

acquiring training samples according to historical control data, wherein each training sample comprises historical signal lamp timing, historical traffic measurement and historical traffic indexes, and the historical control data is input and output data in the interaction process of an existing signal lamp regulation and control algorithm based on a traffic model and a signal lamp control system;

training a loss function according to the training sample to obtain a mapping function, wherein the loss function comprises a time difference loss function, a supervised learning loss function and a regularization loss function, and the supervised learning loss function is an output error function of the mapping function taking historical control data of a traffic signal lamp regulation and control algorithm based on a traffic model as a prediction target and is used for improving the initial starting performance of a deep reinforcement learning algorithm in the subsequent steps;

acquiring actual traffic measurement and actual traffic indexes;

2. The method of claim 1, wherein the training based on the training samples to obtain a mapping function comprises:

3. The method of claim 2, wherein the loss function is a weighted sum of a time difference loss function, a supervised learning loss function, and a regularized loss function that respectively set corresponding weight values.

4. The method of claim 1, wherein the training based on the training samples to obtain a mapping function comprises:

5. The method of claim 1, wherein after calculating the timing of the signal to be output, the method further comprises:

and correcting the mapping function according to the new training sample.

6. The method of claim 1, wherein the traffic measurements include at least traffic flow statistics for each lane or each phase of the intersection;

the traffic index at least comprises vehicle delay;

the signal timing comprises at least one of the following: signal period, split, phase difference and phase sequence.

7. The method of claim 1, further comprising:

and storing the signal lamp timing to be output, the actual traffic measurement and the actual traffic index as the historical control data.

8. A traffic signal light control apparatus, comprising:

the historical data acquisition module is used for acquiring training samples according to historical control data, wherein each training sample comprises historical signal lamp timing, historical traffic measurement and historical traffic indexes, and the historical control data is input and output data in the interaction process of the existing signal lamp regulation and control algorithm based on a traffic model and a signal lamp control system;

the training module is used for training a loss function according to the training sample to obtain a mapping function, wherein the loss function comprises a time difference loss function, a supervised learning loss function and a regularization loss function, the supervised learning loss function is an output error function of the mapping function taking historical control data of a traffic signal lamp regulation and control algorithm based on a traffic model as a prediction target, and is used for improving the initial starting performance of a deep reinforcement learning algorithm in the subsequent steps;

9. A traffic signal light control apparatus, comprising: a memory and a processor; the memory is used for storing executable instructions, and the processor is used for executing the operation in the traffic signal lamp regulation and control method according to any one of claims 1-7 according to the control of the instructions.

10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out a method of regulating a traffic signal lamp according to any one of claims 1 to 7.