CN117022313A

CN117022313A - Task scheduling strategy generation method and device for automatic driving system

Info

Publication number: CN117022313A
Application number: CN202210474595.9A
Authority: CN
Inventors: 徐倩; 徐程远; 汪建平
Original assignee: City University of Hong Kong CityU
Current assignee: City University of Hong Kong CityU
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2023-11-10

Abstract

The invention provides a task scheduling strategy generation method, a task scheduling method and a task scheduling device of an automatic driving system, wherein the method comprises the following steps: acquiring training configuration data; the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function; performing reinforcement learning training according to training data corresponding to the training parameters, the task scheduling period, the number of cores, the scheduling task set, scheduling constraint conditions, a scheduling optimization objective function and a Markov decision model of the scheduling task to obtain the task scheduling strategy; wherein the Markov decision model of the scheduling task is preset. The device is used for executing the method. The task scheduling strategy generation method, the task scheduling method and the task scheduling device of the automatic driving system, provided by the embodiment of the invention, improve the safety of automatic driving.

Description

Task scheduling strategy generation method and device for automatic driving system

Technical Field

The invention relates to the technical field of automatic control, in particular to a task scheduling strategy generation method, a task scheduling method and a task scheduling device of an automatic driving system.

Background

An autopilot system (Autonomous Driving System, ADS for short) includes a variety of sensors that are capable of performing related computing tasks based on data collected by the sensors to achieve perception, motion planning and control.

The task scheduling of an autopilot system should be aimed at minimizing the gap between the driving decision generation time of the vehicle and the real environment. Although extensive research has been directed to model and optimization algorithms to shorten execution time and improve accuracy. The scheduling of autopilot tasks falls into the category of task scheduling for real-time streaming, where the interdependencies of computational tasks are described by a directed acyclic graph (Directed Acyclic Graph, DAG for short). Traditional DAG scheduling, i.e. task-dependent scheduling on multiple cores, has been studied for many years, but most performance indicators considered by research are response time and computational throughput, which are not directly related to driving safety, and have limited influence on guaranteeing driving safety.

Disclosure of Invention

Aiming at the problems in the prior art, the embodiment of the invention provides a task scheduling strategy generation method, a task scheduling method and a task scheduling device of an automatic driving system, which can at least partially solve the problems in the prior art.

In a first aspect, the present invention provides a method for generating a task scheduling policy of an autopilot system, including:

acquiring training configuration data; the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function;

performing reinforcement learning training according to training data corresponding to the training parameters, the task scheduling period, the number of cores, the scheduling task set, scheduling constraint conditions, a scheduling optimization objective function and a Markov decision model of the scheduling task to obtain the task scheduling strategy; wherein the Markov decision model of the scheduling task is preset.

In a second aspect, the present invention provides a task scheduling method of an autopilot system based on a task scheduling policy generated by the task scheduling policy generating method of an autopilot system according to any one of the embodiments, including:

acquiring sensor data periodically acquired by each sensor;

based on a task scheduling strategy, acquiring corresponding sensor data in each task scheduling period to execute each calculation task to generate a corresponding control instruction; wherein, the corresponding relation between each scheduling task and the sensor data is preset.

In still another aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the task scheduling policy generating method of the autopilot system or the task scheduling method of the autopilot system according to any one of the embodiments described above when executing the program.

In yet another aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the task scheduling policy generation method of an autopilot system or the task scheduling method of an autopilot system described in any one of the above embodiments.

The task scheduling strategy generation method, the task scheduling method and the task scheduling device of the automatic driving system can acquire training configuration data; the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function; and performing reinforcement learning training according to training data, task scheduling period, core number, scheduling task set, scheduling constraint conditions, scheduling optimization objective function and a Markov decision model of the scheduling tasks corresponding to the training parameters to obtain a task scheduling strategy, which can be applied to task scheduling of an automatic driving system, optimize utilization of vehicle-mounted computing resources and improve safety of automatic driving.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:

fig. 1 is a flowchart of a task scheduling policy generation method of an autopilot system according to a first embodiment of the present application.

Fig. 2 is a flow chart of a task scheduling method of an autopilot system according to a second embodiment of the present application.

Fig. 3 is a schematic structural diagram of a task scheduling policy generating device of an autopilot system according to a third embodiment of the present application.

Fig. 4 is a schematic structural diagram of a task scheduling device of an autopilot system according to a fourth embodiment of the present application.

Fig. 5 is a schematic physical structure of an electronic device according to a fifth embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present application and their descriptions herein are for the purpose of explaining the present application, but are not to be construed as limiting the application. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be arbitrarily combined with each other.

In order to facilitate understanding of the technical scheme provided by the application, the following description will explain relevant contents of the technical scheme of the application.

In practical applications, automatic Driving Systems (ADS) involve data collection from a variety of sensors, and perform computational tasks to generate control instructions based on the data collected from the sensors. When a computing task is put on the algorithm stack, the scheduling of the computing task can greatly influence the data freshness in driving decisions, thereby influencing driving safety. Therefore, the application provides the information age (Age of information, aoI for short) of the control command as the performance index of task scheduling in the automatic driving system, obtains the task scheduling strategy of the automatic driving system through the reinforcement learning training of the Markov decision model for scheduling the tasks, and applies the task scheduling strategy to the automatic driving system, thereby improving the utilization rate of vehicle-mounted computing resources and the safety of automatic driving. The information age of the control command refers to a time difference between a time stamp of generating the control command and a minimum time stamp of the sensor data used for generating the control command in the previous round, that is, a difference between a generation time of the control command minus an earliest acquisition time of the sensor data used for generating the control command in the previous round. The calculation tasks refer to various calculation tasks involved in the process of generating control instructions through processing sensor data, such as image recognition, positioning, laser radar target detection based, track planning and the like. Control commands such as steering, throttle, braking, etc. The generation of one control command involves a plurality of computing tasks.

In order to optimize task scheduling in an automatic driving system through the information age of a control command, an integer programming problem is constructed, and a decision variable is a perception data selection variable y _kk′z I.e. whether the kth frame data of sensor z will be used by the corresponding computing task execution at the kth round, and the computing task on-core execution variable x _jkpt I.e. the calculation task j starts to perform the kth round of calculation at time t in the core p. Since the calculation time unit in the automatic driving system is in the millisecond level, a huge decision variable space is generated for scheduling calculation tasks, so that the integer programming problem is insoluble, a periodic scheduling scheme is designed, and the scheduling scheme is an optimal solution under a periodic strategy. In particular, a periodic scheduling scheme refers to a given period T ₀ The calculation task is T ₀ For the period, the acquisition period of the sensor data is repeated and is also equal to the period T ₀ Matching, i.e. period T ₀ Is a common multiple of the acquisition period. Thereby converting the integer programming problem into a periodic scheduling problem. To solve the aboveThe periodic scheduling problem obtains a task scheduling strategy, a Markov decision model of scheduling tasks is established, a scheduling optimization objective function is established by taking the largest information age among the information ages of all control commands in each round of scheduling in a task scheduling period as a target, and relevant constraint conditions are set, so that the Markov decision model of the scheduling tasks is subjected to model training.

The following describes a specific implementation procedure of the task scheduling policy generating method of the automatic driving system provided by the embodiment of the present invention, taking a server as an execution subject. It can be understood that the execution subject of the task scheduling policy generation method of the autopilot system provided by the embodiment of the present invention is not limited to a server.

Fig. 1 is a flowchart of a task scheduling policy generation method of an autopilot system according to a first embodiment of the present invention, where, as shown in fig. 1, the task scheduling policy generation method of an autopilot system according to an embodiment of the present invention includes:

s101, acquiring training configuration data; the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function;

specifically, training configuration data needs to be set before training the markov decision model of the scheduling task. The configuration data comprises a task scheduling period, a task scheduling set, training parameters, scheduling constraint conditions and a scheduling optimization objective function. The server may obtain the set training configuration data.

The task scheduling period is a common multiple of the acquisition period of each sensor of the automatic driving system, the specific numerical value is set according to actual needs, and the embodiment of the invention is not limited. The number of cores is determined based on the number of cores of the processor of the autopilot system that is required to deploy the task scheduling policy. The dispatching task set comprises all calculation tasks of the automatic driving system, execution time of all calculation tasks and dependency relations among all calculation tasks. The training parameters refer to input parameters of the training of the Markov decision model of the scheduled task, and the training of the Markov decision model of the scheduled task is performed The method is an unsupervised training, specific values corresponding to part of training parameters are preset, and specific values corresponding to part of training parameters need to be determined based on training effect tuning. The scheduling optimization objective function may be expressed asWherein c _nk Generating a timestamp of the control command for the kth wheel, S _k-1 The minimum timestamp of the sensor data used by the nth computing task scheduled for the kth-1 round, n is the total number of the computing tasks of the kth-1 round included in the scheduling task set, the nth computing task is the computing task generating the control command, and k is a positive integer. The scheduling constraint conditions are set according to actual needs, and the embodiment of the invention is not limited.

S102, performing reinforcement learning training according to training data corresponding to the training parameters, the task scheduling period, the number of cores, the scheduling task set, scheduling constraint conditions, a scheduling optimization objective function and a Markov decision model of the scheduling task to obtain the task scheduling strategy; wherein the Markov decision model of the scheduling task is preset.

Specifically, the server performs reinforcement learning training on a Markov decision model of the scheduling task according to training data corresponding to the training parameters, the task scheduling period, the number of cores, the scheduling task set, scheduling constraint conditions and a scheduling optimization objective function, screens out computing tasks which cannot be executed according to the scheduling constraint conditions in the training process, and performs scheduling of the computing tasks according to the number of cores and the scheduling task set in a given task scheduling period. At the end of one cycle of training, the control commands obtained for each round of training are calculated according to the schedule optimization objective function AoI. And finally, the task scheduling strategy can be obtained after a plurality of cycles of training. The task scheduling strategy comprises scheduling of each round of computing tasks in a given task scheduling period, namely, perception data used by each computing task, relative starting execution time of each computing task and a core for task execution. The sensing data used by the calculation task refers to the acquisition data of the acquisition period of the corresponding sensor used by the calculation task; calculating the relative start execution time of a task refers to calculating the time difference of the task relative to the start time of the task scheduling period.

Each cycle of training comprises multiple rounds of training, each round of training can generate a control command, each round of training is to schedule each calculation task for generating the control command, and at the end of each cycle of training, all calculation tasks in a task scheduling set are completed once. For example, a control command may be generated by executing 9 computing tasks, and in this training round for generating the control command, 9 computing tasks may be scheduled, and the control command may be generated by the last computing task of the 9 computing tasks.

The task scheduling strategy generation method of the automatic driving system provided by the embodiment of the invention can acquire training configuration data, wherein the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function; and performing reinforcement learning training according to training data, task scheduling period, core number, scheduling task set, scheduling constraint conditions, scheduling optimization objective function and a Markov decision model of the scheduling tasks corresponding to the training parameters to obtain a task scheduling strategy, which can be applied to task scheduling of an automatic driving system, optimize utilization of vehicle-mounted computing resources and improve safety of automatic driving.

Further, on the basis of the above embodiments, the training parameters include a sensor-related characteristic parameter, a calculation task-related characteristic parameter, and a core-related characteristic parameter, where:

the sensor related characteristic parameters comprise an acquisition period of each sensor, a first time difference corresponding to each sensor, a second time difference corresponding to each sensor and a third time difference corresponding to each sensor; the first time difference is equal to the time difference between the current time and the acquisition time of the data acquired by the sensor used by the current wheel; the second time difference is equal to the time difference between the current time and the acquisition time of the data acquired by the sensor used in the previous round; the third time difference is equal to the current time difference between the current time and the acquisition time of the next acquisition data of the sensor;

the relevant characteristic parameters of the computing tasks comprise the execution time of each computing task, the corresponding time interval of each computing task and the execution requirement of each computing task; the time interval corresponding to the calculation task is equal to the time interval between the current time and the starting execution time of the calculation task at the current wheel; the execution requirement of the calculation task is whether the preface task corresponding to the calculation task is executed or not;

The core-related characteristic parameters include an identification of the cores considered by the current decision point, a time interval between the current time of each core and the next available time.

Specifically, each sensor involved in the automatic driving system is periodically acquired data, and each sensor has an acquisition period. The acquisition period of sensor z can be noted as τ _z The method comprises the steps of carrying out a first treatment on the surface of the The current time is marked as t, and the acquisition time of the data acquired by the sensor z used by the current wheel is marked as s _kz The acquisition time of the data acquired by the sensor z used in the previous round of the current time is recorded as s _(k-1)z The acquisition time of the next acquired data of the sensor z is recorded as t+tau _z -t|τ _z ，t|τ _z Representing t modulo τ _z The first time difference corresponding to sensor z may be expressed as t-s _kz The second time difference corresponding to sensor z may be expressed as t-s _(k-1)z The third time difference corresponding to sensor z may be denoted as τ _z -t|τ _z . The acquisition period of each sensor is preset and is set according to the acquisition period of each sensor involved in the automatic driving system.

For each computing task, there is a corresponding execution time. The current time is marked as t, and the starting execution time of the computing task i at the current wheel is marked as b _ik The time interval corresponding to the calculation task is |t-b _ik | a. The invention relates to a method for producing a fibre-reinforced plastic composite. The predecessor tasks of the computing task i can be obtained through the dependency relationship among the computing tasks in the dispatching task set, and the computing task i can be executed after the execution of the predecessor tasks of the computing task i is completed. The execution time of the calculation task is obtained in advance through simulation test, and is as followsGiven in the schedule. The execution logic relationship among the computing tasks in each round of training can be determined by scheduling the dependency relationship among the computing tasks in the task set, namely, which preceding computing tasks need to be executed before each computing task is executed. The dependency relationship between the individual computing tasks is determined based on the dependency relationship between the computing tasks that the autopilot system actually needs to schedule. The last computational task of each round of training is used to generate control commands.

The cores considered by the current decision point are cores which are not occupied by any computing task at the current moment, and can be determined by sequentially checking whether the execution tasks exist on each core at the current moment, and if the execution tasks do not exist on the cores at the current moment, the cores are unoccupied. The next available time for each core refers to the completion time of the most recently scheduled computing task on each core.

Further, on the basis of the above embodiments, the task scheduling period is a common multiple of the acquisition period of all the sensors.

Specifically, the acquisition period of all sensors involved in the automatic driving system is acquired, and then a common multiple of the acquisition period of all sensors is calculated as the task scheduling period. The specific numerical value of the common multiple is set according to actual needs, and the embodiment of the application is not limited.

Further, on the basis of the above embodiments, the markov decision model of the scheduling task includes a decision process, wherein:

the decision process is represented by four dimensions (S, a, P, R), S representing a state space comprising sensor-related characteristic parameters, computational task-related characteristic parameters, and core-related characteristic parameters; a represents an action space, which comprises a computing task; p represents a transition probability function; r represents a reward function expressed as 1/(alpha D), alpha is a positive integer of 2 or more, and D represents the information age of the control command.

In particular, the periodic scheduling problem of the present application can be modeled as a discrete time Markov decision problem. The decision process of the periodic scheduling problem can be represented by four dimensions (S, a, P, R), S representing a state space, a representing an action space, P representing a transition probability function, and R representing a reward function.

The state space S is used for providing environmental characteristics required for decision making, and specifically includes sensor-related characteristic parameters, calculation task-related characteristic parameters and core-related characteristic parameters. The action space includes computing tasks. The transition probability function P indicates that the probability of performing action a in the state space S to enter the next state space S is a positive real number.

The system state sigma can be obtained in the automatic driving system at the time of the first step _l E S, then take the scheduling action eta _l E A, scheduling action, namely scheduling the execution of a computing task at the current core at the current moment, and obtaining rewards r _l E R. Assuming that the autopilot system has P kernels, at T ₀ The computing resources within a cycle may be represented as P x T ₀ Defined as a resource table. By P x T ₀ Resource table to simulate [0, T ] ₀ ]And defines a decision point (t, p) indicating that the core p can perform a computational task at time t. At each decision point, the system state is acquired, whether a calculation task is executed is determined through a learned strategy, namely a probability function P fitted after training, rewards are acquired after the calculation task is executed, and then the executed state is updated to the next decision point.

Further, based on the above embodiments, the information age of the control command is a time difference between a time stamp of the calculation task generating the control command and a minimum time stamp of the sensor data used for generating the control command in the previous round.

Further, on the basis of the above embodiments, the scheduling optimization objective function is expressed as:wherein c _nk Generating a timestamp of the control command for the kth wheel, S _k-1 The minimum timestamp of the sensor data used by the nth computing task scheduled for the kth-1 round, n is the total number of the computing tasks of the kth-1 round included in the scheduling task set, the nth computing task is the computing task generating the control command, and k is a positive integer.

Specifically, after each round of training is finished, aoI of obtaining a control command may be calculated, the scheduling optimization objective function is to obtain a minimum value of a maximum AoI corresponding to each round of training, and the task scheduling strategy is obtained when the minimum value of a maximum AoI is obtained after training.

Further, on the basis of the foregoing embodiments, the scheduling constraint includes:

the acquired data of each acquisition period of each sensor is used at most once;

each computing task in the dispatching task set uses the latest data acquired by the corresponding sensor, and the starting time of each computing task is later than the acquisition time of the used sensor data;

each computing task is executed on only one core, and each core executes at most one computing task at the same time;

The completion time of the round of training cannot be earlier than the completion time of the previous round of training, and the execution sequence of each calculation task meets the dependency relationship among the calculation tasks in the dispatching task set.

Fig. 2 is a flow chart of a task scheduling method of an autopilot system according to a second embodiment of the present invention, and as shown in fig. 2, the task scheduling method of an autopilot system according to an embodiment of the present invention, which adopts the task scheduling policy generated by the task scheduling policy generating method of an autopilot system according to any embodiment of the present invention, includes:

s201, acquiring sensor data periodically acquired by each sensor;

specifically, each sensor related to the automatic driving system periodically acquires sensor data, and the vehicle-mounted terminal can acquire the sensor data acquired by each sensor. The execution main body of the task scheduling method of the automatic driving system provided by the embodiment of the invention comprises, but is not limited to, a vehicle-mounted terminal.

S202, based on a task scheduling strategy, acquiring corresponding sensor data in each task scheduling period, executing each calculation task and generating a corresponding control instruction; wherein, the corresponding relation between each scheduling task and the sensor data is preset.

Specifically, the vehicle-mounted terminal sequentially acquires each calculation task to be executed in each task scheduling period according to a task scheduling policy, acquires sensor data corresponding to each calculation task, and then executes the calculation task based on the sensor data to generate a corresponding control instruction. Wherein, the corresponding relation between each scheduling task and the sensor data is preset.

According to the task scheduling method of the automatic driving system, which is provided by the embodiment of the application, sensor data periodically collected by each sensor are obtained; based on a task scheduling strategy, corresponding sensor data is acquired in each task scheduling period to execute each calculation task to generate a corresponding control instruction, and the scheduling of the calculation tasks is realized through the task scheduling strategy, so that the utilization of vehicle-mounted calculation resources can be optimized, and the safety of automatic driving is improved.

The task scheduling method of the automatic driving system provided by the embodiment of the application can be deployed in an ROS system or a hundred-degree Cyber RT system to schedule calculation tasks for the automatic driving system, can optimize response time and throughput at the same time, and has significance in improving driving safety.

The task scheduling strategy obtained by the task scheduling strategy generation method of the automatic driving system is applied to the Bai-Apollo automatic driving system, a simulation experiment of task scheduling of the automatic driving system is carried out, and the performance of the task scheduling method of the automatic driving system provided by the application is compared with that of the task scheduling method in the prior art from the angles of AoI, throughput, worst-case response time and the like of calculation tasks. Experimental results show that the maximum AoI of the obtained calculation task after the 4-core Bai Porro automatic driving system is applied in the task scheduling method of the automatic driving system is smaller than the maximum AoI of the obtained calculation task after the 8-core Bai Porro automatic driving system is applied in the task scheduling method in the prior art.

In the future commercialization of automatic driving, the information age of the control command proposed by the present application may be considered as an important safe driving index. The task scheduling method of the automatic driving system provided by the embodiment of the application can be applied to an intermediate system of interaction between the automatic driving system of the vehicle and a computing unit, such as ROS (reactive oxygen species) and Cyber RT, and the automatic driving task is scheduled through the task scheduling strategy, so that the information age of a control command can be effectively reduced, and the safety of automatic driving of the vehicle is greatly improved.

Fig. 3 is a schematic structural diagram of a task scheduling policy generating device of an autopilot system according to a third embodiment of the present application, and as shown in fig. 3, the task scheduling policy generating device of an autopilot system according to an embodiment of the present application includes a first acquiring unit 301 and a training unit 302, where:

the first acquiring unit 301 is configured to acquire training configuration data; the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function; the training unit 302 is configured to perform reinforcement learning training according to training data corresponding to the training parameters, the task scheduling period, the number of cores, the scheduling task set, the scheduling constraint condition, the scheduling optimization objective function, and a markov decision model of the scheduling task, so as to obtain the task scheduling policy; wherein the Markov decision model of the scheduling task is preset.

Specifically, training configuration data needs to be set before training the markov decision model of the scheduling task. The configuration data comprises a task scheduling period, a task scheduling set, training parameters, scheduling constraint conditions and a scheduling optimization objective function. The first acquisition unit 301 may acquire the set training configuration data.

The training unit 302 performs reinforcement learning training on the markov decision model of the scheduled task according to training data corresponding to the training parameters, the task scheduling period, the number of cores, the scheduled task set, the scheduling constraint condition and the scheduling optimization objective function, screens out the computing tasks which cannot be executed according to the scheduling constraint condition in the training process, and performs scheduling of the computing tasks according to the number of cores, the scheduled task set and the markov decision model of the scheduled task in the given task scheduling period. At the end of one cycle of training, the control commands obtained for each round of training are calculated according to the schedule optimization objective function AoI. And finally, the task scheduling strategy can be obtained after a plurality of cycles of training. The task scheduling strategy comprises scheduling of each round of computing tasks in a given task scheduling period, namely, perception data used by each computing task, relative starting execution time of each computing task and a core for task execution. The sensing data used by the calculation task refers to the acquisition data of the acquisition period of the corresponding sensor used by the calculation task; calculating the relative start execution time of a task refers to calculating the time difference of the task relative to the start time of the task scheduling period.

The task scheduling strategy generating device of the automatic driving system provided by the embodiment of the invention can acquire training configuration data, wherein the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function; and performing reinforcement learning training according to training data, task scheduling period, core number, scheduling task set, scheduling constraint conditions, scheduling optimization objective function and a Markov decision model of the scheduling tasks corresponding to the training parameters to obtain a task scheduling strategy, which can be applied to task scheduling of an automatic driving system, optimize utilization of vehicle-mounted computing resources and improve safety of automatic driving.

the sensor related characteristic parameters comprise an acquisition period of each sensor, a first time difference corresponding to each sensor, a second time difference corresponding to each sensor and a third time difference corresponding to each sensor; the first time difference is equal to the time difference between the current time and the acquisition time of the data acquired by the sensor used by the current wheel; the second time difference is equal to the time difference between the current time and the acquisition time of the data acquired by the sensor used in the previous round; the third time difference is equal to the time difference between the current time and the acquisition time of the next acquisition data of the sensor;

Further, on the basis of the above embodiments, the scheduling optimization objective function is expressed as:wherein c _nk Generating a timestamp of the control command for the kth wheel, S _k-1 Minimum time stamp of sensor data used by the nth computation task scheduled for the kth-1 round, n being the kth-1 round of the scheduled task setThe total number of calculation tasks, the nth calculation task is the calculation task for generating the control command, and k is a positive integer.

Fig. 4 is a schematic structural diagram of a task scheduling device of an autopilot system according to a fourth embodiment of the present invention, and as shown in fig. 4, the task scheduling device of an autopilot system according to an embodiment of the present invention adopts a task scheduling policy generated by a task scheduling policy generating device of an autopilot system according to any one of the embodiments, where the task scheduling policy generating device includes a second acquiring unit 401 and a scheduling unit 402, and the task scheduling policy generating device includes:

the second acquisition unit 401 is configured to acquire sensor data periodically acquired by each sensor; the scheduling unit 402 is configured to obtain corresponding sensor data in each task scheduling period based on a task scheduling policy, execute each computing task, and generate a corresponding control instruction; wherein, the corresponding relation between each scheduling task and the sensor data is preset.

The embodiment of the apparatus provided in the embodiment of the present invention may be specifically used to execute the processing flow of each method embodiment, and the functions thereof are not described herein again, and may refer to the detailed description of the method embodiments.

Fig. 5 is a schematic physical structure of an electronic device according to an embodiment of the present invention, as shown in fig. 5, the electronic device may include: a processor (processor) 501, a communication interface (Communications Interface) 502, a memory (memory) 503 and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 communicate with each other via the communication bus 504. The processor 501 may call logic instructions in the memory 503 to perform the following method: acquiring training configuration data; the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function; performing reinforcement learning training according to training data corresponding to the training parameters, the task scheduling period, the number of cores, the scheduling task set, scheduling constraint conditions, a scheduling optimization objective function and a Markov decision model of the scheduling task to obtain the task scheduling strategy; wherein the Markov decision model of the scheduling task is preset. Or alternatively

Acquiring sensor data periodically acquired by each sensor; based on a task scheduling strategy, acquiring corresponding sensor data in each task scheduling period to execute each calculation task to generate a corresponding control instruction; wherein, the corresponding relation between each scheduling task and the sensor data is preset.

Further, the logic instructions in the memory 503 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the methods provided by the above-described method embodiments, for example comprising: acquiring training configuration data; the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function; performing reinforcement learning training according to training data corresponding to the training parameters, the task scheduling period, the number of cores, the scheduling task set, scheduling constraint conditions, a scheduling optimization objective function and a Markov decision model of the scheduling task to obtain the task scheduling strategy; wherein the Markov decision model of the scheduling task is preset. Or alternatively

The present embodiment provides a computer-readable storage medium storing a computer program that causes the computer to execute the methods provided by the above-described method embodiments, for example, including: acquiring training configuration data; the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function; performing reinforcement learning training according to training data corresponding to the training parameters, the task scheduling period, the number of cores, the scheduling task set, scheduling constraint conditions, a scheduling optimization objective function and a Markov decision model of the scheduling task to obtain the task scheduling strategy; wherein the Markov decision model of the scheduling task is preset. Or alternatively

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In the description of the present specification, reference to the terms "one embodiment," "one particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for generating a task scheduling strategy for an autopilot system, comprising:

2. The method of claim 1, wherein the training parameters include a sensor-related characteristic parameter, a computing task-related characteristic parameter, and a core-related characteristic parameter, wherein:

3. The method of claim 1, wherein the task scheduling period is a common multiple of the acquisition period of all sensors.

4. The method of claim 1, wherein the markov decision model of the scheduled task comprises a decision process, wherein:

5. The method of claim 4, wherein the information age of the control command is a time difference between a time stamp of the control command generated for the computing task and a minimum time stamp of the sensor data used for generating the control command at a previous round.

6. The method of claim 1, wherein the scheduling optimization objective function is expressed as:wherein c _nk Generating a timestamp of the control command for the kth wheel, S _k-1 The minimum timestamp of the sensor data used by the nth computing task scheduled for the kth-1 round, n is the total number of the computing tasks of the kth-1 round included in the scheduling task set, the nth computing task is the computing task generating the control command, and k is a positive integer.

7. The method of claim 6, wherein the scheduling constraint comprises:

8. A task scheduling method of an automatic driving system based on a task scheduling policy generated by the task scheduling policy generation method of an automatic driving system according to any one of claims 1 to 7, characterized by comprising:

Acquiring sensor data periodically acquired by each sensor;

9. A task scheduling policy generation device for an automatic driving system, comprising:

the first acquisition unit is used for acquiring training configuration data; the configuration data comprises a task scheduling period, a core number, a scheduling task set, training parameters, scheduling constraint conditions and a scheduling optimization objective function;

the training unit is used for performing reinforcement learning training according to training data corresponding to the training parameters, the task scheduling period, the core number, the scheduling task set, the scheduling constraint conditions, the scheduling optimization objective function and a Markov decision model of the scheduling task to obtain the task scheduling strategy; wherein the Markov decision model of the scheduling task is preset.

10. A task scheduling device for an automatic driving system based on the task scheduling policy generated by the task scheduling policy generating device for an automatic driving system according to claim 9, comprising:

The second acquisition unit is used for acquiring the sensor data periodically acquired by each sensor;

the scheduling unit is used for acquiring corresponding sensor data in each task scheduling period based on a task scheduling strategy, executing each calculation task and generating a corresponding control instruction; wherein, the corresponding relation between each scheduling task and the sensor data is preset.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 7 or the method of claim 8 when executing the computer program.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any one of claims 1 to 7, or the method of claim 8.