WO2023124000A1

WO2023124000A1 - Multi-concurrency data processing method and device

Info

Publication number: WO2023124000A1
Application number: PCT/CN2022/104711
Authority: WO
Inventors: 王振东; 张伟德; 朱军; 刘坤鹏; 郑朝友; 段锐; 孙建蕾; 任思阳; 葛绍亮; 刘加银
Original assignee: 中国第一汽车股份有限公司
Priority date: 2021-12-31
Filing date: 2022-07-08
Publication date: 2023-07-06
Also published as: CN114327834A

Abstract

Disclosed in embodiments of the present application are a multi-concurrency data processing method and device. The method comprises: obtaining road test data acquired by a vehicle; decomposing the road test data to generate a plurality of vehicle work tasks to be processed in parallel, the vehicle work tasks being tasks for controlling the vehicle to work according to specified control instructions; submitting the plurality of vehicle work tasks to be processed in parallel to a container cluster manager, the container cluster manager being used for orchestrating and scheduling computing resources in a server cluster; and scheduling a plurality of processing containers by means of the container cluster manager to respectively perform parallel processing on the vehicle work tasks to be processed in parallel so as to generate processing results, each processing container having a set of computing resources, and the computing resources being used for processing the vehicle work task.

Description

Multi-concurrent data processing method and device

This application claims the priority of the Chinese patent application with the priority number 202111676678.8 and the title of the invention "Multi-Concurrent Data Processing Method and Device" submitted to the China Patent Office on December 31, 2021, the entire contents of which are incorporated herein by reference Applying.

technical field

The embodiments of the present application relate to the vehicle field, and in particular, relate to a multi-concurrent data processing method and device.

Background technique

At present, it is necessary to train different modules in a timely manner by using massive amounts of high-quality data in order to continuously optimize the automatic driving system.

In related technologies, on the one hand, hardware resources cannot be expanded infinitely, and on the other hand, the data collected by self-driving cars is increasing rapidly every day. A car will collect 700MB of data per second, and the amount of data will reach 10T in a day. At the same time, the speed of algorithm iteration depends on the speed of high-quality data generation, so if the road test data cannot be processed in time, it will greatly affect the optimization speed of the automatic driving system, and the existing processing logic still cannot fully The problem of utilizing hardware resources for high concurrent task processing.

Aiming at the problem that hardware resources cannot be fully utilized to process high-concurrency tasks in related technologies, no effective solution has been proposed so far.

Contents of the invention

Embodiments of the present application provide a multi-concurrent data processing method and device, so as to at least solve the technical problem in the related art that hardware resources cannot be fully utilized for high-concurrency task processing.

According to an aspect of the embodiment of the present application, a multi-concurrent data processing method is provided, including: acquiring the drive test data collected by the vehicle; decomposing the drive test data to generate multiple vehicle work tasks to be processed in parallel, wherein , the vehicle work task is the task of controlling the vehicle to work according to the specified control instruction; submit multiple vehicle work tasks to be processed in parallel to the container cluster manager, wherein the container cluster manager is used to arrange and schedule computing resources in the server cluster; The container cluster manager schedules multiple processing containers to process each vehicle task to be processed in parallel in parallel to generate a processing result, wherein each processing container has a set of computing resources, and the computing resources are used to process the vehicle task.

Optionally, the method further includes: the server cluster includes multiple servers, and a corresponding number of processing containers are obtained according to the number of computing resources of the server cluster, and the computing resources include at least processor resources and storage resources, wherein the server cluster training deployment has At least one master server and at least one slave server. The container cluster manager is installed on the master server to monitor the working status of the slave servers.

Optionally, create a task processing program of the container cluster manager, wherein the task processing program is used to determine the number of processing containers that need to be transferred according to preset parameters; the task processing program is encapsulated to build a task processing image; based on the task processing image , constructing the containers required for running the vehicle work tasks, wherein the containers include: a management container for scheduling management and a processing container for running tasks.

Optionally, scheduling multiple processing containers through the container cluster manager to perform parallel processing on each vehicle work task to be processed in parallel, including: receiving multiple vehicle work tasks to be processed in parallel, scheduling at least one management container and communicating with Processing containers with the same number of vehicle work tasks; each management container distributes corresponding vehicle work tasks to designated processing containers; starts the processing containers, and each processing container runs the assigned vehicle work tasks respectively.

Optionally, after each processing container runs the assigned vehicle work tasks, the method further includes: merging the sub-run results of each vehicle work task to generate a processing result; storing the processing result to a predetermined database, wherein , the database is a database that allows interactive queries.

According to another aspect of the embodiment of the present application, a multi-concurrent data processing device is also provided, including: an acquisition component configured to acquire drive test data collected by vehicles; a decomposition component configured to decompose the drive test data, Generate multiple vehicle work tasks to be processed in parallel, wherein the vehicle work task is a task to control the vehicle to work according to the specified control instruction; the submission component is set to submit multiple vehicle work tasks to be processed in parallel to the container cluster manager, Among them, the container cluster manager is used to arrange and schedule computing resources in the server cluster; the processing component is set to schedule multiple processing containers through the container cluster manager to perform parallel processing on each vehicle task to be processed in parallel and generate processing results , where each processing container has a set of computing resources used to process vehicle work tasks.

Optionally, the server cluster includes multiple servers, and the corresponding number of processing containers are obtained according to the number of computing resources of the server cluster. The computing resources include at least: processor resources and storage resources, wherein the server training deployment has at least one master server and At least one slave server. The container cluster manager is installed on the master server to monitor the working status of the slave server.

Optionally, the device also includes: a creation component, configured as a task handler for creating a container cluster manager, wherein the task handler is used to determine the number of processing containers that need to be transferred according to preset parameters; an encapsulation component, configured as Encapsulate the task processing program and build a task processing image; build components, set it based on the task processing image, and build the container required for the running of the vehicle work task. Among them, the container includes: a management container for scheduling management and a running The processing container for the task.

Optionally, the processing component includes: a receiving subcomponent, configured to receive a plurality of vehicle work tasks to be processed in parallel; a scheduling subcomponent, configured to schedule at least one management container and processing containers with the same number as the vehicle work tasks; The component is configured to distribute the corresponding vehicle work tasks to the designated processing containers for each management container; the running sub-component is configured to start the processing containers, and each processing container runs the assigned vehicle work tasks respectively.

Optionally, the device further includes: a merging component, configured to merge the sub-running results of each vehicle work task, and generate a processing result; a storage component, configured to store the processing result in a predetermined database, wherein the database allows interactive query database.

According to another aspect of the embodiments of the present invention, a non-volatile readable storage medium is also provided. The non-volatile storage medium includes a stored program, wherein when the program is running, the device where the non-volatile storage medium is located is controlled to execute the above multi-concurrent data processing method.

According to another aspect of the embodiments of the present application, a processor is also provided. The processor is configured to run a program, wherein the above-mentioned multi-concurrent data processing method is executed when the program is running.

In the embodiment of the present application, the road test data collected by the vehicle is obtained; the drive test data is decomposed to generate a plurality of vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks that control the vehicle to work according to the specified control instructions ; Submit multiple vehicle tasks to be processed in parallel to the container cluster manager, wherein the container cluster manager is used to arrange and schedule computing resources in the server cluster; The vehicle work tasks processed in parallel are processed in parallel to generate processing results, wherein each processing container has a set of computing resources, and the computing resources are used to process the vehicle work tasks. That is to say, the present application decomposes the drive test data to generate multiple vehicle tasks to be processed in parallel, and uses the container cluster manager to schedule multiple processing containers to perform parallel processing on each vehicle task to be processed in parallel, thereby It solves the technical problem that hardware resources cannot be fully utilized to process high concurrent tasks in related technologies.

Description of drawings

The drawings described here are used to provide a further understanding of the embodiments of the present application, and constitute a part of the present application. The schematic embodiments of the present application and their descriptions are used to explain the present application, and do not constitute improper limitations to the present application. In the attached picture:

Fig. 1 is a flow chart of a multi-concurrent data processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of task scheduling according to an embodiment of the present application;

Fig. 3 is a schematic diagram of task decomposition according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a multi-concurrent data processing device according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a non-volatile storage medium according to an embodiment of the present application;

Fig. 6 is a schematic structural diagram of a processor according to an embodiment of the present application.

Detailed ways

In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

According to the embodiment of the present application, an embodiment of a multi-concurrent data processing method is provided. It should be noted that the steps shown in the flowcharts of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and , although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.

Fig. 1 is a flowchart of a multi-concurrent data processing method according to an embodiment of the present application. As shown in Figure 1, the method may include the following steps:

Step S102, acquiring road test data collected by the vehicle.

In the technical solution provided in the above step S102 of the present invention, the road test data can be the data collected by the car during the automatic driving process, can be the speed, acceleration and time of the automatic driving process, or can be the performance of each module in the automatic driving process, etc. .

In this embodiment, the driving trajectory, driving speed, acceleration and performance parameters of each module during the driving process of the test vehicle can be collected through the camera, radar, sensor of the target vehicle, controller, etc. during the automatic driving process.

Step S104, decomposing the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicle to work according to specified control instructions.

In the technical solution provided by the above-mentioned step S104 of the present invention, the vehicle work task is to control the vehicle to work according to the specified control command, which may be whether to accelerate, decelerate, and brake suddenly on a certain road section. Decomposing the road test data can be used to label the road test data according to established rules. The rules can be to collect whether there are traffic lights on the driving section, whether there are pedestrians, the number of pedestrians, and whether there is sudden braking during driving.

For example, obtain the drive test data collected by the vehicle, decompose the obtained drive test data according to the established rules through the data processing program, and mark it with a specific label, and schedule the data processing program to perform the marked drive test The data is processed to generate multiple vehicle work tasks to be processed in parallel. Wherein, the data processing program may be a cluster computing engine (spark).

Step S106, submitting multiple vehicle work tasks to be processed in parallel to the container cluster manager, wherein the container cluster manager is used to arrange and schedule computing resources in the server cluster.

In the technical solution provided in the above step S106 of the present invention, the container cluster manager can be a large-scale container cluster management tool (Kubernetes, referred to as K8S), which can be called a container orchestration engine, a container orchestrator, etc., and is an open source container The orchestration engine is used for task scheduling and management. Based on the container cluster manager, it can realize automatic deployment, large-scale scalability and application container management, so as to better arrange and schedule computing resources in server clusters.

Optionally, multiple vehicle work tasks to be processed in parallel are submitted to the container cluster manager, and the container cluster manager schedules and manages the submitted tasks.

In step S108, a plurality of processing containers are scheduled by the container cluster manager to perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each processing container has a set of computing resources, and the computing resources are used for Handle vehicle work tasks.

In the technical solution provided in step S108 of the present invention, the processing container may be a drive unit, which may be represented by driver, and used to distribute tasks. Computing resources can be execution units, which can be represented by executors, and are used to process vehicle work tasks.

Optionally, select a server as the node of the container cluster manager to deploy related environments, submit multiple vehicle tasks to be processed in parallel to the container cluster manager, and the container cluster manager schedules multiple drive units, which can Optionally, the number of drive units can be defined by the system, and can also be a specified parameter for the user to submit a task: two, three, four, etc., which are not specifically limited here. The engine drive unit can start multiple execution units, so as to complete the parallel processing of multiple vehicle tasks to be processed in parallel, and generate processing results, which can be queried in the database.

As can be seen from the above, in the above-mentioned embodiments of the present application, the road test data collected by the vehicle is obtained; The task of controlling the instruction work; submitting multiple vehicle work tasks to be processed in parallel to the container cluster manager, wherein the container cluster manager is used to arrange and schedule computing resources in the server cluster; scheduling multiple processing containers through the container cluster manager Each of the vehicle work tasks to be processed in parallel is processed in parallel to generate a processing result, wherein each processing container has a set of computing resources, and the computing resources are used to process the vehicle work tasks. That is to say, the application decomposes the drive test data to generate multiple vehicle tasks to be processed in parallel, and uses the container cluster manager to schedule multiple processing containers to perform parallel processing on each vehicle task to be processed in parallel, thereby solving the problem of Solved the technical problem in related technologies that hardware resources cannot be fully utilized for high concurrent task processing.

The above-mentioned method of this embodiment will be further introduced below.

As an optional embodiment, the server cluster includes multiple servers, and the corresponding number of processing containers are obtained according to the number of computing resources of the server cluster. There are at least one master server and at least one slave server, and the container cluster manager is installed on the master server to monitor the working status of the slave servers.

In this embodiment, the server cluster includes multiple servers, where the servers may be multiple central processing units in the target vehicle, and the corresponding number of processing containers are obtained according to the number of computing resources of the server cluster.

Optionally, the data processing program is submitted to the container cluster manager, and the container cluster manager arranges and schedules the server sets in the server cluster for resource management, at least one master server and at least one slave server in the server intensive training deployment, and the master server controls the slave The server performs scheduling management, wherein the container cluster manager is installed on the master server to monitor the working status of the slave server.

For example, the data processing program is submitted to the container cluster manager, and the container cluster manager orchestrates and schedules the server sets in the server cluster to manage resources, and generates processing containers and computing containers. Optionally, the processing container may be a drive unit, which may be represented by a driver, and is used for distributing tasks. The computing container can be an execution unit, which can be represented by an executor, and is used to process vehicle work tasks.

As an optional embodiment, the method further includes: creating a task processing program of the container cluster manager, wherein the task processing program is used to call the processing capacity according to the determination of the preset parameter; Carry out encapsulation and build a task processing image; based on the task processing image, build the container required for the running of the vehicle work task, wherein the container includes: a management container for scheduling management and a processing container for running tasks.

In this embodiment, a task processing program of the container cluster manager is created, and the task processing program is used to determine the number of containers that need to be called for processing according to preset parameters. For example, write a task processing program, and determine the amount of processing capacity that needs to be called according to preset parameters.

Optionally, in this embodiment, the data is screened according to the established rules and then marked with a specific label, the task processing program is packaged, and then the relevant image construction rules (Dockerfile) are written to instruct the system to perform the image according to the specified steps. Constructing, constructing a task processing image, based on the task processing image, constructing a management container for scheduling management and a processing container for running tasks required by the vehicle work task runtime.

As an optional embodiment, the container cluster manager schedules multiple processing containers to perform parallel processing on each vehicle work task to be processed in parallel, including: receiving multiple vehicle work tasks to be processed in parallel, scheduling At least one management container and the same number of processing containers as the number of vehicle work tasks; each management container distributes corresponding vehicle work tasks to designated processing containers; starts the processing container, and each processing container runs the assigned vehicle work tasks.

In this embodiment, multiple vehicle work tasks to be processed in parallel are received, at least one management container and processing containers having the same number as the vehicle work tasks are dispatched, and each management container distributes corresponding vehicle work tasks to designated processing containers; The processing containers are started to run the assigned vehicle work tasks respectively.

As an optional embodiment, after each processing container runs the assigned vehicle work task, the method further includes: merging the sub-run results of each vehicle work task to generate a processing result; storing the processing result To a predetermined database, wherein the database is a database that allows interactive query.

In this embodiment, the split working task execution results are combined to obtain a complete processing result, and the processing result is stored in a predetermined database, and the data to be queried can be selected in the predetermined database. Wherein, the predetermined database may be a collection (MongoDB) database, which is not specifically limited here.

The technical solutions of the embodiments of the invention of the present application are illustrated below in conjunction with preferred implementation modes.

The core of the self-driving car is the self-driving system composed of artificial intelligence, visual computing, radar and global positioning system, so the quality of the self-driving system determines whether the self-driving car can be used in real life on a large scale In the final analysis, all behaviors are actually data-driven. Only by using massive high-quality data to train different modules in a timely manner can the automatic driving system be continuously optimized. Existing hardware can quickly and timely process and screen valuable data, which is crucial for the rapid iteration and development of the automatic driving system.

How to use limited hardware resources in a timely manner to process the extremely fast accumulated test data is the main problem faced by the autonomous driving data platform. On the one hand, hardware resources cannot be expanded infinitely. On the other hand, the data collected by self-driving cars is increasing rapidly. A car will collect 700MB of data per second, and the amount of data will reach 10T in a day. At the same time, the algorithm iterates The speed depends on the generation speed of high-quality data, so if the data collected by the road test cannot be processed in time, it will greatly affect the optimization speed of the automatic driving system, and the existing processing logic cannot make full use of hardware resources, such as a single machine There may be more than a dozen or twenty central processing units, but the same number of tasks are not run in parallel, which will cause a waste of hardware resources.

Therefore, in order to overcome the above problems, in a related technology, a high-concurrency data processing system and method is proposed, which utilizes the data processing characteristics of each server itself, thereby solving the problem of excessive load on the server caused by a large number of data requests. question. A solution and system for high concurrent reception of massive data is also proposed, which allocates the processing capacity of each application server according to the load balancing strategy, thereby realizing high concurrent sending and receiving of massive data, and storing the processed message data, thereby solving the problem of data sending and receiving processing Inefficient technical problems.

However, the scheduling units of the above two methods are only servers, and the number of servers needs to be greatly increased if high-concurrency execution is to be performed, and there is still a technical problem that hardware resources cannot be fully utilized.

However, this application proposes a multi-concurrent automatic data processing system based on a large-scale container cluster management tool and a cluster computing engine. Among them, the cluster computing engine uses the most advanced Directed Acyclic Graph (DAG for short) scheduler, query optimizer and physical execution engine, which is 100 times faster than previous data processing tools; the large-scale container cluster management tool Kubernetes is It is an open source container orchestration engine that supports automated deployment, large-scale scalability, and application container management.

This application disassembles a single task through a cluster computing engine, so that a single task can create a distributed data set for parallel operation and summarize the results. At the same time, it uses a large-scale container cluster management tool to schedule multi-task parallel execution, so that the existing hardware can be fully utilized Resources, high concurrent processing of massive data, so that the data processing speed is only limited by the total number of CPUs of the server rather than the number of servers, thereby solving the technical problem of not being able to make full use of hardware resources for high concurrent task processing in the prior art. The multi-concurrent data processing method of this embodiment may include the following parts.

The first part: the construction of K8S cluster.

The first step is to select a server as the control node of the open source container orchestration engine to deploy related environments. The control (Master) node is mainly used as the management and control center of the cluster.

The first step is to use other servers as the workload (Node) node of the open source container orchestration engine to deploy related environments. The workload on the workload node is allocated by the control node, which is mainly used for maintenance and operation. container and provide a running environment for the container orchestration engine.

The third step is to test the operation of the container orchestration engine cluster to ensure that the functions such as communication between nodes are normal.

The second part: cluster computing engine processing program writing and container packaging.

In the first step, the cluster computing engine processing program is mainly developed in python language, which mainly includes two links:

Step 1: Divide the tasks submitted by the user into blocks to create a distributed data set for parallel operation. In the program, you can set the order of magnitude in which a single task needs to be executed, that is, the number of Tasks.

Link 2: Data processing program, which screens massive data according to established rules and puts specific labels on it. The main rules include whether there are traffic lights, whether there are pedestrians and the number of pedestrians, and whether there is an emergency brake, etc.

The second step is to encapsulate the processing program of the cluster computing engine into a mirror, so as to facilitate the scheduling and operation of the container orchestration engine cluster.

Part III: Submit tasks for large-scale and high-concurrency data processing.

Step 1: Set the relevant parameters for submitting tasks, mainly including: running the resource allocation unit program (spark.executor.instances), currently setting to add a resource allocation unit requires a CPU resource, and the parallelism of a single task is the resource allocation The number of units is multiplied by the number of cores of each resource allocation unit; specify the basic image program (spark.kubernetes.container.image) used for running, and the image obtained by encapsulating the second step of the second part can be used; since the original data storage is in On the storage server, the container started at runtime is not visible, so it is necessary to make the data visible to the container through the mounting method. Specifically, specify the local directory to be mounted during operation (such as: spark.kubernetes.driver.volumes.hostPath.spark -local-dir-2.options.path and spark.kubernetes.driver.volumes.hostPath.spark-local-dir-2.mount.path), thus making the data visible to the container.

The second step is to submit the task for execution.

In this implementation, the following points need to be explained: (1) Calculation of single-task concurrency: The number of computing nodes (Executors) applied for and the number of cores of computing nodes determine the tasks that can be executed in parallel at the same time. For example, when a task is disassembled into ten parts, ten tasks will be generated during calculation. If the resource configuration is five computing nodes, and each node is assigned two CPUs, the number of tasks that can be parallelized at the same time is 10. , that is, the concurrency of a single task is 5*2=10. (2) The calculation of multi-task concurrency is the number of tasks multiplied by the single-task concurrency. For example, if the single-task concurrency is 10 and five tasks are submitted, the concurrency of the entire cluster is 5*10=50.

Each part of the technical solutions of the embodiments of the present invention will be further described in detail below.

The first part is shown in FIG. 2 , which is a schematic diagram of task scheduling according to an embodiment of the present invention.

The driver unit 201 and the driver unit 202 are generated after the cluster computing engine task is submitted to the container orchestration engine, and are mainly used for dispatching and managing the distribution tasks and the generated execution units. They are management units. Among them, the drive unit 201 includes: execution unit 2011, execution unit 2012, execution unit 2013, execution unit 2014, execution unit 2015; the drive unit 202 includes: execution unit 2021, execution unit 2022, execution unit 2023, execution unit 2024, execution unit 2025.

Execution unit 2011, execution unit 2012, execution unit 2013, execution unit 2014, execution unit 2015, execution unit 2021, execution unit 2022, execution unit 2023, execution unit 2024, execution unit 2025 are generated after the cluster computing engine task is submitted to the container orchestration engine , mainly as a specific task execution unit, in fact it is a collection of computing resources, that is, a collection of central processing unit cores and storage capacity (cpu core, memory).

The container orchestration engine 203 mainly schedules and manages all server resources and submitted tasks.

The Spark processing program image 204 is mainly used as a basic image construction task to run the required container.

The MongoDB database 205 is used to save the data screening and processing results.

The specific steps are: after the user submits the task, create 201 and 202 and execution units required for executing the task based on 204. Among them, 201 and 202 are mainly responsible for scheduling and managing the execution unit, and the execution unit is mainly responsible for executing specific tasks. The execution unit 0 The quantity is determined by the parameters specified by the user when submitting the task, 203 is to schedule and manage 201, and the final result will be stored in 205.

The second part is shown in FIG. 3 , which is a schematic diagram of task decomposition according to an embodiment of the present invention.

Data processing program task 1 (25G) 301, data processing program task 2 (25G) 302, data processing program task 3 (25G) 303, data processing program task 4 (25G) 304, the execution module after the original task data is decomposed, that is, the vehicle The task execution unit formed after the collected data is decomposed according to the specified parameters.

The cluster computing engine tool 305 is responsible for decomposing the task of processing vehicle data collection, scheduling the data processing program to process the data, and merging the split task execution results to obtain a complete result.

The vehicle collection data (100G) 306 is the original road test collection data without processing.

The database 307 is used to store the results of the data screening process.

The specific steps are: 306 is submitted as a pending task to 305, 305 decomposes 306 into parallel tasks to obtain a certain number of 301, 302, 303, 304, 301, 302, 303, 304 will perform data interaction with 307 during execution, and will The data processing result is stored in 307.

The technical solutions of the embodiments of the present invention will be further described in detail below in conjunction with specific embodiments.

The first step is to select a server as the control node of the container cluster management tool, and install the v1.20.1 version of the container cluster management tool on it. During the installation process, you can specify the image source --image-repository=registry.cn-hangzhou.aliyuncs. com/google_containers, the download address is designated as a Chinese address to prevent instability in pulling data from the original version, and the prerequisite is docker installation.

The second step is to use other servers as the load nodes of the container cluster management tool to deploy related environments, and then add the load nodes to the cluster. After installation, you can view the status of the load nodes on the control node, and at the same time perform network, deployment, etc. related tests.

The third step is to write the processing program in the cluster computing engine, which can be: write the relevant processing program, where the number of blocks is set to five, that is, after the task is submitted, the cluster computing engine will automatically divide the amount of data to be processed in the submitted task into roughly There are five equal copies. The data screening rules mainly include whether there are traffic lights on the road section where the data is collected, whether there are pedestrians and the number of pedestrians, and whether there is sudden braking during driving.

The third step is to encapsulate the processing program of the cluster computing engine into a mirror, write the relevant mirror construction rules, and build the mirror according to the specified steps. Among them, the entrypoint (that is, the command that needs to be executed after the container starts) needs to be set as the default startup The operation that needs to be performed when the container is used, so as to improve the degree of automation, and then meet the needs of program operation.

The fourth step is to submit the task for large-scale and high-concurrency data processing. First, set the relevant parameters of the submitted task, which are: spark.executor.instances, that is, to create several load containers after the specified task is submitted, here specified as five; spark .kubernetes.container.image, that is, the image name encapsulated by the spark handler; spark.kubernetes.driver.volumes.hostPath.spark-local-dir-2.options.path and spark.kubernetes.driver.volumes.hostPath.spark -local-dir-2.mount.path, that is, specify the local directory to be mounted during operation, etc. After setting, submit the task.

The fifth step is to submit the task, select two folders, each with 100G drive test data, submit a task for each folder, after submission, the container cluster management tool will create two management containers (spark driver), each The management container will start 5 processing containers (spark executor). At this time, a total of 200G data is divided into 10 20G tasks to be executed in parallel. The cluster parallelism is 10 at this time. After the operation is completed, the relevant results can be queried in the database.

From the above, it can be seen that this implementation has the following points: For tasks performed in units of servers, the concurrency is limited by the number of servers, and a server may have 40 central processing units, which results in a huge resource Waste, through the combination of container orchestration engine and cluster computing engine, make full use of server hardware resources, greatly improve the concurrency of task execution, change the number of task execution units from the number of servers to the number of central processing units, under the condition of limited resources The efficiency is maximized, saving not only cost but also time; the use of container orchestration engine for task scheduling and management allows developers to focus more on program development instead of spending a lot of time on the deployment and scaling of containerized applications Etc., it can help developers to manage the cluster simply and efficiently; using the cluster computing engine for task decomposition can save developers from paying attention to the decomposition and scheduling logic, and can save time on the development of the data processing program.

According to an embodiment of the present application, a multi-concurrent data processing device is also provided. It should be noted that the multi-concurrent data processing apparatus can be used to execute the above-mentioned multi-concurrent data processing method.

Fig. 4 is a schematic diagram of a multi-concurrent data processing apparatus according to an embodiment of the present application. As shown in FIG. 4 , the multi-concurrent data processing apparatus 400 may include: an acquisition component 401 , a decomposition component 402 , a submission component 403 and a processing component 404 .

It should be noted here that the above acquisition component 401, decomposition component 402, submission component 403 and processing component 404 can run in the terminal as part of the device, and the functions realized by the above modules can be executed by the processor in the terminal, and the terminal can also It can be smart phones (such as Android phones, iOS phones, etc.), tablet computers, applause computers, mobile Internet devices (Mobile Internet Devices, MID), PAD and other terminal devices.

The acquisition component 401 is configured to acquire the drive test data collected by the vehicle.

The decomposing component 402 is configured to decompose the drive test data to generate multiple vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicle to work according to specified control instructions.

The submitting component 403 is configured to submit multiple vehicle work tasks to be processed in parallel to the container cluster manager, wherein the container cluster manager is used to arrange and schedule computing resources in the server cluster.

The processing component 404 is configured to schedule a plurality of processing containers through the container cluster manager to perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each processing container has a set of computing resources, and the computing Resources are used to handle vehicle work tasks.

Optionally, the server cluster includes multiple servers, and the corresponding number of processing containers are obtained according to the number of computing resources of the server cluster. The computing resources include at least: processor resources and storage resources, wherein the server training deployment has at least one main server and at least one slave server, and the container cluster manager is installed on the master server to monitor the working status of the slave server.

It should be noted here that the above creation component, packaging component and construction component can be run in the terminal as a part of the device, and the functions implemented by the above modules can be executed by the processor in the terminal.

It should be noted here that the above receiving subcomponent, scheduling subcomponent, distribution subcomponent and running subcomponent can run in the terminal as part of the device, and the functions implemented by the above modules can be executed by the processor in the terminal.

It should be noted here that the above merging component and storage component may run in the terminal as a part of the device, and the functions implemented by the above modules may be executed by the processor in the terminal.

In the multi-concurrent data processing device of this embodiment, the drive test data is decomposed to generate a plurality of vehicle tasks to be processed in parallel, and the container cluster manager schedules multiple processing containers for each vehicle to be processed in parallel The work tasks are processed in parallel, thereby solving the technical problem in related technologies that hardware resources cannot be fully utilized for high concurrent task processing.

According to an embodiment of the present application, there is also provided a non-volatile readable storage medium, wherein the non-volatile readable storage medium includes a stored program, wherein the program executes any one of the embodiments of the present application The method for multi-concurrent data processing.

Each functional module provided by the embodiment of the present application can be run in a multi-concurrent data processing method or a similar computing device, and can also be stored as a part of a non-volatile storage medium.

Fig. 5 is a schematic structural diagram of a non-volatile storage medium according to an embodiment of the present application. As shown in FIG. 5 , a program product 50 according to an embodiment of the present application is described, on which a computer program is stored, and when the computer program is executed by a processor, the program code that implements the following steps:

Obtain the road test data collected by the vehicle; decompose the road test data to generate multiple vehicle work tasks to be processed in parallel, wherein the vehicle work task is the task of controlling the vehicle to work according to the specified control instructions.

Submit multiple vehicle work tasks to be processed in parallel to the container cluster manager, wherein the container cluster manager is used to arrange and schedule computing resources in the server cluster; the container cluster manager schedules multiple processing containers for each to be parallelized The processed vehicle work tasks are processed in parallel to generate processing results, wherein each processing container has a set of computing resources for processing the vehicle work tasks.

Optionally, when the computer program is executed by the processor, the program code that implements the following steps: the server cluster includes multiple servers, and the corresponding number of processing containers are obtained according to the number of computing resources of the server cluster, and the computing resources include at least: processor resources and storage resources, where at least one master server and at least one slave server are deployed in the server training camp, and the container cluster manager is installed on the master server to monitor the working status of the slave servers.

Optionally, when the computer program is executed by the processor, the program code for implementing the following steps: creating a task processing program of the container cluster manager, wherein the task processing program is used to determine the number of processing containers that need to be transferred according to preset parameters; Encapsulate the task processing program and build a task processing image; based on the task processing image, build the container required for the running of the vehicle work task, wherein the container includes: a management container for scheduling management and a processing container for running tasks.

Optionally, when the computer program is executed by the processor, the program code that implements the following steps: receiving multiple vehicle work tasks to be processed in parallel, dispatching at least one management container and processing containers with the same number as the vehicle work tasks; The container distributes the corresponding vehicle work tasks to designated processing containers; the processing containers are started, and each processing container runs the assigned vehicle work tasks respectively.

Optionally, when the computer program is executed by the processor, the program code that implements the following steps: After each processing container executes the assigned vehicle work tasks, merge the sub-run results of each vehicle work task to generate a processing result ; Store the processing result in a predetermined database, wherein the database is a database that allows interactive query.

Optionally, in this embodiment, the non-volatile storage medium may also be configured as program codes of various preferred or optional method steps provided by the multi-concurrent data processing method.

Optionally, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments, and details are not repeated in this embodiment.

Non-volatile storage media may include a data signal carrying readable program code in baseband or as part of a carrier wave traveling as a data signal. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A non-volatile storage medium may send, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

The program code contained in the non-volatile storage medium can be transmitted by any appropriate medium, including but not limited to wireless, cable, optical cable, radio frequency, etc., or any suitable combination of the above.

According to an embodiment of the present application, a processor is provided. Fig. 6 is a schematic structural diagram of a processor according to an embodiment of the present application. As shown in FIG. 6 , the processor 60 is configured to run a program, wherein the program executes the multi-concurrent data processing method described in the embodiment of the present application when running.

In this embodiment of the invention, the above-mentioned processor 60 may execute the execution programs of the multi-concurrent data processing method.

Optionally, in this embodiment, the processor 60 may be configured to perform the following steps:

Acquiring drive test data collected by the vehicle; decomposing the drive test data to generate multiple vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicle to work according to specified control instructions.

Optionally, the processor 60 may also be configured to perform the following steps: the server cluster includes multiple servers, and a corresponding number of processing containers are obtained according to the number of computing resources of the server cluster, and the computing resources include at least: processor resources and storage resources, where at least one master server and at least one slave server are deployed in the server training camp, and the container cluster manager is installed on the master server to monitor the working status of the slave server.

Optionally, the processor 60 may also be configured to perform the following steps: create a task processing program of the container cluster manager, wherein the task processing program is used to determine the number of processing containers that need to be transferred according to preset parameters; The processing program is packaged to build a task processing image; based on the task processing image, the container required for the running of the vehicle work task is constructed, wherein the container includes: a management container for scheduling management and a processing container for running tasks.

Optionally, the processor 60 may also be configured to perform the following steps: receiving a plurality of vehicle work tasks to be processed in parallel, scheduling at least one management container and processing containers having the same number as the vehicle work tasks; each management container will The corresponding vehicle work tasks are distributed to the designated processing containers; the processing containers are started, and each processing container runs the assigned vehicle work tasks respectively.

Optionally, the processor 60 may also be configured to perform the following steps: After each processing container executes the assigned vehicle work tasks, merge the sub-run results of each vehicle work task to generate a processing result; The processing results are stored in a predetermined database, wherein the database is a database that allows interactive query.

The above-mentioned processor 60 can execute various functional applications and data processing by running software programs and modules stored in the memory, that is, realize the above-mentioned multi-concurrent data processing method.

The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.

In the above-mentioned embodiments of the present application, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed technical content can be realized in other ways. Wherein, the device embodiments described above are only illustrative. For example, the division of the units may be a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrate into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of units or modules may be in electrical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of this application or the part that contributes to the related technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. Several instructions are included to make a computer device (which may be a personal computer, server or network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disc, etc., which can store program codes. .

The above description is only the preferred embodiment of the present application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present application, some improvements and modifications can also be made. These improvements and modifications are also It should be regarded as the protection scope of this application.

Industrial Applicability

The solution provided by the embodiment of the present application can be applied in the multi-concurrent data processing process to obtain the drive test data collected by the vehicle; decompose the drive test data to generate multiple vehicle tasks to be processed in parallel, wherein the The vehicle work task is a task of controlling the vehicle to work according to a specified control instruction; the multiple vehicle work tasks to be processed in parallel are submitted to the container cluster manager, wherein the container cluster manager is used for orchestrating and scheduling servers Computing resources in the cluster; through the container cluster manager, a plurality of processing containers are scheduled to perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each of the processing containers has a group of the Computing resources for processing the vehicle work tasks. The above solution decomposes the drive test data to generate multiple vehicle tasks to be processed in parallel, thereby making full use of hardware resources to report parallel tasks, thereby solving the problem of not being able to make full use of hardware resources for high concurrent task processing in related technologies technical problem.

Claims

A multi-concurrent data processing method, including:

Obtain the road test data collected by the vehicle;

Decomposing the drive test data to generate multiple vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicle to work according to specified control instructions;

Submitting the plurality of vehicle work tasks to be processed in parallel to a container cluster manager, wherein the container cluster manager is used to arrange and schedule computing resources in the server cluster;

A plurality of processing containers are scheduled by the container cluster manager to perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each processing container has a set of computing resources, and the computing A resource is used to process the vehicle work task.
The method according to claim 1, wherein the server cluster includes a plurality of servers, and the corresponding number of processing containers are obtained by dividing the computing resources of the server cluster, and the computing resources include at least: processor resources and storage resources, wherein at least one master server and at least one slave server are deployed in the server camp training, and the container cluster manager is installed on the master server to monitor the working status of the slave server.
The method according to claim 2, wherein the method further comprises:

Create a task processing program of the container cluster manager, wherein the task processing program is used to determine the number of processing containers that need to be transferred according to preset parameters;

Encapsulating the task processing program to construct a task processing image;

Based on the task processing image, the containers required for running the vehicle work tasks are constructed, wherein the containers include: a management container for scheduling management and a processing container for running tasks.
The method according to claim 3, wherein, through the container cluster manager, scheduling a plurality of processing containers to perform parallel processing on each vehicle work task to be processed in parallel, comprising:

receiving the plurality of vehicle work tasks to be processed in parallel, scheduling at least one management container and the same number of processing containers as the vehicle work tasks;

Each of the management containers distributes the corresponding vehicle work tasks to designated processing containers;

The processing containers are started, and each of the processing containers runs on the assigned vehicle work tasks.
The method according to claim 4, wherein, after each of the processing containers respectively executes the assigned vehicle work tasks, the method further comprises:

Merging the sub-running results of each of the vehicle work tasks to generate the processing results;

The processing result is stored in a predetermined database, wherein the database is a database that allows interactive query.
A multi-concurrent data processing device, comprising:

The acquisition component is set to acquire the road test data collected by the vehicle;

A decomposition component is configured to decompose the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicle to work according to specified control instructions;

The submission component is configured to submit the plurality of vehicle work tasks to be processed in parallel to the container cluster manager, wherein the container cluster manager is used to arrange and schedule computing resources in the server cluster;

The processing component is configured to schedule a plurality of processing containers through the container cluster manager to perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each processing container has a set of computing resources, the computing resources are used to process the vehicle work tasks.
The device according to claim 6, wherein the server cluster includes a plurality of servers, and the corresponding number of processing containers are obtained by dividing the computing resources of the server cluster, and the computing resources include at least: processor resources and storage resources, wherein at least one master server and at least one slave server are deployed in the server camp training, and the container cluster manager is installed on the master server to monitor the working status of the slave server.
The device according to claim 7, wherein the device further comprises:

The creation component is configured to create a task processing program of the container cluster manager, wherein the task processing program is configured to determine the number of processing containers that need to be transferred according to preset parameters;

An encapsulation component is configured to encapsulate the task processing program to construct a task processing image;

The building component is configured to build the container required for running the vehicle work task based on the task processing image, wherein the container includes: a management container for scheduling management and a processing container for running the task.
The apparatus of claim 8, wherein the processing component comprises:

The receiving subassembly is configured to receive the plurality of vehicle work tasks to be processed in parallel;

a scheduling subcomponent configured to schedule at least one management container and the same number of processing containers as the number of work tasks of the vehicle;

The distribution subcomponent is configured to distribute the corresponding vehicle work task to a designated processing container for each of the management containers;

The running subcomponent is configured to start the processing containers, and each of the processing containers respectively runs the assigned vehicle work tasks.
The device according to claim 9, wherein the device further comprises:

A merging component is configured to merge the sub-running results of each of the vehicle work tasks to generate the processing results;

The storage component is configured to store the processing result in a predetermined database, wherein the database is a database that allows interactive query.
A non-volatile storage medium, the non-volatile storage medium includes a stored program, wherein, when the program is running, the device where the non-volatile storage medium is located is controlled to execute any one of claims 1 to 5 The multi-concurrent data processing method described in the item.
A processor, the processor is configured to run a program, wherein the multi-concurrent data processing method according to any one of claims 1 to 5 is executed when the program is running.