CN114327834A

CN114327834A - Multi-concurrent data processing method and device

Info

Publication number: CN114327834A
Application number: CN202111676678.8A
Authority: CN
Inventors: 王振东; 张伟德; 朱军; 刘坤鹏; 郑朝友; 段锐; 孙建蕾; 任思阳; 葛绍亮; 刘加银
Original assignee: FAW Group Corp
Current assignee: FAW Group Corp
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-12
Also published as: WO2023124000A1

Abstract

The invention discloses a multi-concurrency data processing method and device. Wherein, the method comprises the following steps: acquiring drive test data acquired by a vehicle; decomposing the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicles to work according to specified control instructions; submitting a plurality of vehicle work tasks to be processed in parallel to a container cluster manager, wherein the container cluster manager is used for arranging and scheduling computing resources in a server cluster; and scheduling a plurality of processing containers by a container cluster manager to respectively perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each processing container is provided with a group of computing resources, and the computing resources are used for processing the vehicle work tasks. The invention solves the technical problem that the hardware resources cannot be fully utilized to process the high concurrent tasks in the prior art.

Description

Multi-concurrent data processing method and device

Technical Field

The invention relates to the field of vehicles, in particular to a multi-concurrency data processing method and device.

Background

Currently, continuous optimization of the autopilot system is achieved by training different modules in time through the use of high-volume, high-quality data.

In the related technology, on one hand, the hardware resources cannot be expanded infinitely, on the other hand, the data acquired by the automatic driving vehicle every day is increased rapidly, 700MB of data can be acquired by one vehicle every second, the data volume can reach 10T level after one day, and meanwhile, the algorithm iteration speed depends on the generation speed of high-quality data, so if the road test acquired data cannot be processed in time, the optimization speed of the automatic driving system is greatly influenced, and the existing processing logic still has the problem that the hardware resources cannot be fully utilized to process high concurrent tasks.

Aiming at the problem that hardware resources cannot be fully utilized to process high concurrent tasks in the prior art, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a multi-concurrency data processing method and device, which at least solve the technical problem that hardware resources cannot be fully utilized to process high-concurrency tasks in the prior art.

According to an aspect of an embodiment of the present invention, there is provided a multi-concurrent data processing method, including: acquiring drive test data acquired by a vehicle; decomposing the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicles to work according to specified control instructions; submitting a plurality of vehicle work tasks to be processed in parallel to a container cluster manager, wherein the container cluster manager is used for arranging and scheduling computing resources in a server cluster; and scheduling a plurality of processing containers by a container cluster manager to respectively perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each processing container is provided with a group of computing resources, and the computing resources are used for processing the vehicle work tasks.

Optionally, the method further comprises: the server cluster comprises a plurality of servers, and the processing containers with corresponding quantities are obtained by dividing according to the quantity of the computing resources of the server cluster, wherein the computing resources at least comprise: the system comprises processor resources and storage resources, wherein the server is trained and deployed with at least one master server and at least one slave server, and the container cluster manager is installed on the master server and used for monitoring the working state of the slave servers.

Optionally, a task processing program of the container cluster manager is created, where the task processing program is used to determine the number of processing containers to be called according to preset parameters; encapsulating the task processing program and constructing a task processing mirror image; based on the task processing mirror image, constructing a container required by the running of the vehicle work task, wherein the container comprises: a management container for performing scheduling management and a processing container for running tasks.

Optionally, scheduling, by the container cluster manager, a plurality of processing containers to perform parallel processing on each vehicle job task to be processed in parallel, respectively, includes: receiving a plurality of vehicle work tasks to be processed in parallel, and scheduling at least one management container and processing containers with the same number as the vehicle work tasks; each management container distributes the corresponding vehicle job task to a designated processing container; and starting the processing containers, wherein each processing container respectively runs the distributed vehicle work tasks.

Optionally, after each processing container is operated for the assigned vehicle job task, the method further comprises: combining the sub-operation results of each vehicle work task to generate a processing result; and storing the processing result to a preset database, wherein the database is a database allowing interactive query.

According to another aspect of the embodiments of the present invention, there is also provided a multi-concurrent data processing apparatus, including: the acquisition module is used for acquiring the drive test data acquired by the vehicle; the system comprises a decomposition module, a data processing module and a data processing module, wherein the decomposition module is used for decomposing drive test data to generate a plurality of vehicle work tasks to be processed in parallel, and the vehicle work tasks are tasks for controlling a vehicle to work according to a specified control instruction; the system comprises a submitting module, a container cluster manager and a processing module, wherein the submitting module is used for submitting a plurality of vehicle work tasks to be processed in parallel to the container cluster manager, and the container cluster manager is used for arranging and scheduling computing resources in a server cluster; and the processing module is used for scheduling a plurality of processing containers through the container cluster manager to respectively perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each processing container is provided with a group of computing resources, and the computing resources are used for processing the vehicle work tasks.

Optionally, the server cluster includes a plurality of servers, and the processing containers of corresponding quantities are obtained by dividing according to the quantity of the computing resources of the server cluster, where the computing resources at least include: the system comprises processor resources and storage resources, wherein the server is trained and deployed with at least one master server and at least one slave server, and the container cluster manager is installed on the master server and used for monitoring the working state of the slave servers.

Optionally, the apparatus further comprises: the system comprises a creating module, a calling module and a calling module, wherein the creating module is used for creating a task processing program of a container cluster manager, and the task processing program is used for determining the number of processing containers to be called according to preset parameters; the encapsulation module is used for encapsulating the task processing program and constructing a task processing mirror image; the construction module is used for constructing a container required by the running of a vehicle work task based on the task processing mirror image, wherein the container comprises: a management container for performing scheduling management and a processing container for running tasks.

Optionally, the processing module comprises: the receiving submodule is used for receiving a plurality of vehicle working tasks to be processed in parallel; the scheduling submodule is used for scheduling at least one management container and processing containers with the same number as the number of the vehicle work tasks; the distribution submodule is used for distributing the corresponding vehicle work task to the designated processing container by each management container; and the operation submodule is used for starting the processing containers, and each processing container respectively operates the distributed vehicle work tasks.

Optionally, the apparatus further comprises: the merging module is used for merging the sub-operation results of each vehicle work task to generate a processing result; and the storage module is used for storing the processing result to a preset database, wherein the database is a database allowing interactive query.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium. The computer readable storage medium includes a stored program, wherein the apparatus in which the computer readable storage medium is located is controlled to execute the multi-concurrent data processing method according to the embodiment of the present invention when the program runs.

According to another aspect of the embodiments of the present invention, there is also provided a processor. The processor is used for running a program, wherein the program executes the multi-concurrent data processing method of the embodiment of the invention when running.

In the embodiment of the invention, the drive test data collected by the vehicle is obtained; decomposing the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicles to work according to specified control instructions; submitting a plurality of vehicle work tasks to be processed in parallel to a container cluster manager, wherein the container cluster manager is used for arranging and scheduling computing resources in a server cluster; and scheduling a plurality of processing containers by a container cluster manager to respectively perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each processing container is provided with a group of computing resources, and the computing resources are used for processing the vehicle work tasks. That is to say, the method and the device decompose the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, and schedule a plurality of processing containers through the container cluster manager to respectively process each vehicle work task to be processed in parallel, so that the technical problem that in the prior art, hardware resources cannot be fully utilized to process high concurrent tasks is solved, and the technical effect of fully utilizing the hardware resources to process the high concurrent tasks is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow diagram of a method of multi-concurrent data processing according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of task scheduling according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a task decomposition according to an embodiment of the invention;

FIG. 4 is a schematic diagram of a multi-concurrent data processing apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided a multi-concurrent data processing method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a flow chart of a method of multi-concurrent data processing according to an embodiment of the present invention. As shown in fig. 1, the method may include the steps of:

and step S102, acquiring drive test data acquired by the vehicle.

In the technical solution provided in step S102 of the present invention, the drive test data may be data collected during an automatic driving process of the vehicle, may be a speed, an acceleration, and a time of the automatic driving process, and may also be performances of each module during the automatic driving process.

In this embodiment, the running track, running speed, acceleration, performance parameters of each module during running and the like of the test vehicle during automatic driving can be collected through a camera, a radar, a sensor of the target vehicle, a controller and the like.

And step S104, decomposing the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicle to work according to the specified control instruction.

In the technical solution provided in step S104 of the present invention, the vehicle work task is a task for controlling the vehicle to work according to a specified control instruction, and may be whether to accelerate, decelerate, and suddenly brake in a certain road section. The method for decomposing the road test data can be used for labeling the road test data according to set rules, and the rules can be used for collecting whether red and green street lamps exist in a driving road section, whether pedestrians exist, the number of the pedestrians and whether emergency brakes exist in the driving process.

For example, the method includes the steps of acquiring drive test data acquired by a vehicle, decomposing the drive test data according to a set rule through a data processing program, marking the drive test data with a specific label, and scheduling the data processing program to process the marked drive test data to generate a plurality of vehicle work tasks to be processed in parallel. The data processing program may be a cluster computing engine (spark).

And step S106, submitting a plurality of vehicle work tasks to be processed in parallel to a container cluster manager, wherein the container cluster manager is used for arranging and scheduling computing resources in the server cluster.

In the technical solution provided by step S106 of the present invention, the container cluster manager may be a large-scale container cluster management tool (kubernets, abbreviated as K8S), which may be called a container orchestration engine, a container orchestrator, etc., and is an open-source container orchestration engine for performing task scheduling and management, and the container cluster manager may implement automated deployment, large-scale scalability, and application containerization management, so as to better orchestrate and schedule computing resources in the server cluster.

In this embodiment, a plurality of vehicle job tasks to be processed in parallel are submitted to the container cluster manager, and the container cluster manager schedules and manages the submitted tasks.

And step S108, scheduling a plurality of processing containers by the container cluster manager to respectively perform parallel processing on each vehicle work task to be processed in parallel, and generating a processing result, wherein each processing container is provided with a group of computing resources, and the computing resources are used for processing the vehicle work tasks.

In the technical solution provided by step S108 of the present invention, the processing container may be a driving unit, which may be denoted by driver, and is used for distributing the task. The computing resource may be an execution unit, which may be denoted as an executor, for processing the vehicle work task.

Optionally, a server is selected as a node of the container cluster manager, deployment of a related environment is performed, a plurality of vehicle work tasks to be processed in parallel are submitted to the container cluster manager, the container cluster manager schedules a plurality of driving units, optionally, the number of the driving units can be customized for a system, and also determined, the number of the driving units can be two, three, four, and the like, which are specified parameters for submitting tasks for a user, and no specific limitation is made here. The engine driving unit can start a plurality of execution units, so that the parallel processing of a plurality of vehicle work tasks to be processed in parallel is completed, a processing result is generated, and the generated processing result can be inquired in the database.

In the present application, in the steps S102 to S108, the drive test data collected by the vehicle is acquired; decomposing the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicles to work according to specified control instructions; submitting a plurality of vehicle work tasks to be processed in parallel to a container cluster manager, wherein the container cluster manager is used for arranging and scheduling computing resources in a server cluster; and scheduling a plurality of processing containers by a container cluster manager to respectively perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each processing container is provided with a group of computing resources, and the computing resources are used for processing the vehicle work tasks. That is to say, the method and the device decompose the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, and schedule a plurality of processing containers through the container cluster manager to respectively process each vehicle work task to be processed in parallel, so that the technical problem that in the prior art, hardware resources cannot be fully utilized to process high concurrent tasks is solved, and the technical effect of fully utilizing the hardware resources to process the high concurrent tasks is achieved.

The above-described method of this embodiment is further described below.

As an optional embodiment, the server cluster includes a plurality of servers, and the processing containers with corresponding numbers are obtained by dividing according to the number of the computing resources of the server cluster, where the computing resources at least include: the system comprises processor resources and storage resources, wherein the server is trained and deployed with at least one master server and at least one slave server, and the container cluster manager is installed on the master server and used for monitoring the working state of the slave servers.

In this embodiment, the server cluster includes a plurality of servers, where the servers may be a plurality of central processing units in the target vehicle, and the processing containers of corresponding numbers are obtained by dividing according to the number of computing resources of the server cluster.

Optionally, the data processing program is submitted to a container cluster manager, the container cluster manager organizes and schedules a server set in the server cluster to perform resource management, the server set trains and deploys at least one master server and at least one slave server, the master server performs scheduling management on the slave server, and the container cluster manager is installed on the master server and is used for monitoring the working state of the slave server.

For example, a data processing program is submitted to a container cluster manager, and the container cluster manager organizes and schedules server sets in a server cluster to manage resources, and generates a processing container and a computing container. Alternatively, the processing container may be a drive unit, which may be denoted driver, for distributing tasks. The computation container may be an execution unit, which may be denoted as an executor, for handling vehicle work tasks.

As an optional implementation manner, the method further includes: creating a task processing program of the container cluster manager, wherein the task processing program is used for calling the processing capacity quantity according to the determination requirement of a preset parameter device; encapsulating the task processing program and constructing a task processing mirror image; based on the task processing mirror image, constructing a container required by the running of the vehicle work task, wherein the container comprises: a management container for performing scheduling management and a processing container for running tasks.

In this embodiment, a task handler of the container cluster manager is created, and the task handler is configured to determine the number of processing containers to call according to preset parameters. For example, a task processing program is written, and the number of processing volumes to be called is determined according to preset parameters.

Optionally, in this embodiment, the data is screened according to a predetermined rule, and then a specific tag is printed, the task processing program is encapsulated, and then a related mirror image construction rule (Dockerfile) is written to instruct the system to construct a mirror image according to a specified step, construct a task processing mirror image, and construct a management container for scheduling management and a processing container for a running task, which are required by a vehicle work task running line, based on the task processing mirror image.

As an optional embodiment, scheduling, by a container cluster manager, a plurality of processing containers to perform parallel processing on each vehicle job task to be processed in parallel respectively includes: receiving a plurality of vehicle work tasks to be processed in parallel, and scheduling at least one management container and processing containers with the same number as the vehicle work tasks; each management container distributes the corresponding vehicle job task to a designated processing container; and starting the processing containers, wherein each processing container respectively runs the distributed vehicle work tasks.

In the embodiment, a plurality of vehicle work tasks to be processed in parallel are received, at least one management container and processing containers with the same number as the vehicle work tasks are scheduled, and each management container distributes the corresponding vehicle work tasks to the designated processing containers; the processing container is started, and the assigned vehicle work tasks are operated.

As an alternative embodiment, after each processing container is operated for the assigned vehicle job, the method further includes: combining the sub-operation results of each vehicle work task to generate a processing result; and storing the processing result to a preset database, wherein the database is a database allowing interactive query.

In this embodiment, the split work task operation results are merged to obtain a complete processing result, the processing result is stored in a predetermined database, and data to be queried can be selected from the predetermined database. The predetermined database may be a set (MongoDB) database, which is not specifically limited herein.

Example 2

The technical solutions of the embodiments of the present invention will be illustrated below with reference to preferred embodiments.

The core of the automatic driving automobile is an automatic driving system formed by combining artificial intelligence, vision calculation, radar and a global positioning system, so that whether the automatic driving system can be applied to actual life on a large scale is determined by the quality of the automatic driving system, all behaviors of the automatic driving system are actually data-driven, and the automatic driving system can be continuously optimized only by training different modules in time by using high-mass data, so that in the face of mass data acquired in the automatic driving test process, how to fully utilize the existing hardware is performed, and the processing is performed rapidly and timely, so that valuable data are obtained by screening, and the method is of great importance to rapid iteration and development of the automatic driving system.

How to timely utilize limited hardware resources to process test data accumulated at the extremely high speed is a main problem faced by an automatic driving data platform. On one hand, hardware resources cannot be expanded infinitely, on the other hand, data collected by an automatic driving vehicle every day is increased at the highest speed, 700MB of data can be collected by one vehicle every second, the data volume can reach 10T level after one day, and meanwhile, the algorithm iteration speed depends on the generation speed of high-quality data, so that if the data collected by the drive test cannot be processed in time, the optimization speed of the automatic driving system is greatly influenced, the existing processing logic cannot fully utilize the hardware resources, for example, a single machine may have dozens of or twenty-several central processing units, but does not run the same number of tasks in parallel, and the waste of the hardware resources is caused.

Therefore, in order to overcome the above problems, in a related art, a highly concurrent data processing system and method are proposed, which utilize the data processing characteristics of each server itself, thereby solving the problem of heavy load on the server caused by a large number of data requests. The method and the system for solving the problem of high concurrent receiving and sending efficiency of the mass data are further provided, the processing amount of each application server is distributed according to a load balancing strategy, so that the mass data are sent and received in a high concurrent mode, the processed message data are stored, and the technical problem of low data sending and receiving processing efficiency is solved.

However, the unit of scheduling in both methods is only a server, and a large number of servers are required for high-concurrency execution, and there is still a technical problem that hardware resources cannot be fully utilized.

The application provides a multi-concurrency automatic data processing system based on a large-scale container cluster management tool and a cluster computing engine. The cluster computing engine uses the most advanced Directed Acyclic (DAG) scheduler, a query optimizer and a physical execution engine, and is 100 times faster than the previous data processing tool; the large-scale container cluster management tool kubernet is an open-source container orchestration engine, and supports automatic deployment, large-scale scalable, and application containerization management.

According to the method and the device, the cluster computing engine is used for disassembling the single task, the single task can create a distributed data set of parallel operation and summarize results, meanwhile, the large-scale container cluster management tool is used for scheduling the multi-task parallel execution, so that the existing hardware resources can be fully utilized, mass data can be processed at high concurrency, the data processing speed is only limited by the number of the total central processing units of the server and does not depend on the number of the servers, and the technical problem that the hardware resources cannot be fully utilized to process the high concurrent tasks in the prior art is solved. The multi-concurrent data processing method of this embodiment may include the following several parts.

A first part: and building a K8S cluster.

Firstly, a server is selected as a control node of an open-source container arrangement engine to carry out deployment of a related environment, and the control (Master) node is mainly used as a management control center of a cluster.

The first step is that other servers are used as the workload (Node) nodes of the container arrangement engine which selects one server as an open source to carry out the deployment of the related environment, and the workload on the workload nodes is distributed by the control Node and is mainly used for maintaining the operating container and providing the operating environment of the container arrangement engine.

And thirdly, testing the running condition of the container arrangement engine cluster to ensure that the functions of communication between nodes and the like are normal.

A second part: the cluster computing engine processes program compiling and container packaging.

Firstly, a cluster computing engine processing program is mainly developed by adopting a python language and mainly comprises two links:

a first link: the tasks submitted by the users are blocked to create a distributed data set operating in parallel, and the order of magnitude of execution of the individual tasks, i.e. the Task number, can be set in the program.

And a second link: and the data processing program screens the mass data according to a set rule and then marks a specific label, wherein the main rule comprises whether traffic lights exist, whether pedestrians exist, the number of the pedestrians, whether emergency brakes exist and the like.

And secondly, packaging the cluster computing engine processing program into a mirror image, thereby facilitating the scheduling and operation of the container arrangement engine cluster.

And a third part: and submitting the tasks to perform large-scale high-concurrency data processing.

The first step is as follows: setting relevant parameters of a submission task, and mainly comprising the following steps: running a resource allocation unit program (spare. instances), setting that adding one resource allocation unit needs one central processing unit resource at present, and multiplying the number of resource allocation units by the number of cores of each resource allocation unit for single task parallelism; a basic image program (mirror) used for operation is specified, and the obtained image can be packaged by adopting the second step of the second part; since the original data is stored in the storage server, the container started at runtime is not visible, so that the data needs to be visible to the container in a mount manner, specifically, a local directory mounted at runtime is specified (for example, spark.

And secondly, submitting the task to execute.

In this embodiment, there are the following points to be explained: (1) and (3) calculating the single task concurrency: the number of the computing nodes (executors) and the core number of the computing nodes determine the Task which can be executed in parallel at the same time. Such as: and if the resource configuration is five calculation nodes, each node is allocated with two central processing units, and meanwhile, the number of the tasks which can be parallel is 10, namely the concurrency of a single Task is 5 x 2 to 10. (2) The calculation of the multi-task concurrency is to multiply the number of tasks by the single-task concurrency, for example, if the single-task concurrency is 10 and five tasks are submitted, the concurrency of the whole cluster is 5 × 10 — 50.

Each part of the technical solution of the embodiment of the present invention is further described in detail below.

First, as shown in fig. 2, fig. 2 is a schematic diagram of task scheduling according to an embodiment of the present invention.

01: the driving unit generated after the cluster computing engine task is submitted to the container arrangement engine is mainly used for scheduling and managing the distributed tasks and the generated execution unit and is a management unit.

02: the execution unit generated after the cluster computing engine task is submitted to the container arrangement engine is mainly used as a specific task execution unit, and actually is a set of computing resources, namely a set of a central processing unit core and a storage (cpu core, memory).

03: the container arrangement engine cluster is mainly used for scheduling and managing all server resources and submitted tasks.

04: the cluster computing engine processes program images and is mainly used as a basic image to construct a container required by task operation.

05: MongoDB database. And the data screening processing result is stored.

The method comprises the following specific steps: after a user submits a task, 01 and 02 required for executing the task are created on the basis of 04, wherein 01 is mainly responsible for

scheduling management

02, 02 is mainly responsible for executing specific tasks, the number of 02 is determined by parameters specified when the user submits the task, 03 is used for scheduling management of 01, and finally obtained results are stored in 05.

Second, as shown in FIG. 3, FIG. 3 is a schematic diagram of task decomposition according to an embodiment of the present invention.

01: and the original task data is decomposed to form an execution module, namely a task execution unit formed by decomposing the vehicle acquisition data according to the specified parameters.

02: and the cluster computing engine tool is responsible for decomposing the tasks of processing the data collected by the vehicles, scheduling a data processing program to process the data, and merging the split task execution results to obtain a complete result.

03: and the vehicle collected data is raw drive test collected data which is not processed.

04: MongoDB database. And the data screening processing result is stored.

The method comprises the following specific steps: 03 is submitted to 02 as a task to be processed, 02 decomposes 03 into parallel tasks to obtain a certain number of 01, 01 performs data interaction with 04 in the executing process, and stores the data processing result in 04.

The technical solutions of the embodiments of the present invention will be further described in detail with reference to specific embodiments.

Firstly, a server is selected as a control node of a container cluster management tool, the container cluster management tool of version v1.20.1 is installed on the server, and image-retrieval (image-retrieval) can be specified in the installation process, namely image-retrieval, cn-handoff, aliyuns, com/google _ contacts, namely, a download address is specified as a chinese address, so that unstable data pulled from an original edition is prevented, and the precondition is docker installation.

And secondly, deploying relevant environments by taking other servers as load nodes of the container cluster management tool, then adding the load nodes into the cluster, checking the states of the load nodes at the control node after the load nodes are installed, and simultaneously performing relevant tests such as network and deployment.

Thirdly, writing a processing program in the cluster computing engine, which may be: and writing a related processing program, wherein the number of the blocks is set to be five, namely after the task is submitted, the cluster calculation engine automatically divides the to-be-processed data volume of the submitted task into five parts which are approximately equal, and the data screening rule mainly comprises whether traffic lights exist on the road section of the acquired data, whether pedestrians and the number of the pedestrians exist, whether emergency brakes exist in the driving process and the like.

And thirdly, packaging the cluster computing engine processing program into a mirror image, compiling related mirror image construction rules, and constructing the mirror image according to the specified steps, wherein entrypoint (namely a command required to be executed after the container is started) needs to be set as a default operation required to be executed when the container is started, so that the automation degree is improved, and the program operation requirement is met.

Step four, submitting the task to perform large-scale high-concurrency data processing, and firstly setting relevant parameters of the submitted task, wherein the relevant parameters are as follows: spare, instances, i.e., several load containers, here designated five, are created after a designated task is submitted; image, namely the name of a mirror packaged by a spark handler; the task is submitted after setting, wherein the task is submitted after the local directory and the like are specified.

And fifthly, submitting the tasks, selecting two folders with 100G drive test data respectively, submitting one task for each folder, creating two management containers (spark drivers) by the container cluster management tool after the submission, starting 5 processing containers (spark executors) by each management container, dividing 200G data into 10 tasks and 20G tasks to be executed in parallel, setting the cluster parallelism as 10, and inquiring related results in a database after the execution is finished.

As can be seen from the above, this embodiment has the following points: for the task execution unit with the number of servers as a unit, the concurrency is limited by the number of servers, and one server may have 40 central processing units, which causes great waste of resources, the container arrangement engine is combined with the cluster calculation engine, the hardware resources of the servers are fully utilized, the task execution concurrency is greatly improved, the number of the task execution units is changed from the number of the servers to the number of the central processing units, the efficiency maximization is realized under the condition of limited resources, the cost is saved, and the time is saved; the container arrangement engine is adopted to carry out task scheduling and management, so that developers can concentrate on program development more, a large amount of time is not spent on deployment, expansion and contraction and the like of containerized application programs, and the developers can be helped to simply and efficiently manage the clusters; the cluster computing engine is adopted for task decomposition, so that developers do not need to pay attention to decomposition scheduling logic, and time can be saved in development of a data processing program part.

Example 3

According to the embodiment of the invention, the invention also provides a multi-concurrency data processing device. It should be noted that the multi-concurrent data processing apparatus can be used to execute the multi-concurrent data processing method in embodiment 1.

FIG. 4 is a schematic diagram of a multi-concurrent data processing apparatus according to an embodiment of the present invention. As shown in fig. 4, the multi-concurrent data processing apparatus 400 may include: an acquisition module 401, a decomposition module 402, a submission module 403, and a processing module 404.

The obtaining module 401 is configured to obtain drive test data collected by a vehicle.

The decomposition module 402 is configured to decompose the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, where a vehicle work task is a task for controlling a vehicle to work according to a specified control instruction.

A submitting module 403, configured to submit a plurality of vehicle work tasks to be processed in parallel to a container cluster manager, where the container cluster manager is configured to schedule and schedule computing resources in a server cluster.

And a processing module 404, configured to schedule, by the container cluster manager, a plurality of processing containers to perform parallel processing on each vehicle job task to be processed in parallel, respectively, and generate a processing result, where each processing container has a set of the computing resources, and the computing resources are used to process the vehicle job task.

In the multi-concurrent data processing apparatus according to this embodiment, the drive test data is decomposed to generate a plurality of vehicle work tasks to be processed in parallel, and the plurality of processing containers are scheduled by the container cluster manager to perform parallel processing on each vehicle work task to be processed in parallel, so that a technical problem that high concurrent task processing cannot be performed by fully utilizing hardware resources in the prior art is solved, and a technical effect of performing high concurrent task processing by fully utilizing hardware resources is achieved.

Example 4

According to an embodiment of the present invention, there is also provided a computer-readable storage medium including a stored program, wherein the program executes the method of multi-concurrent data processing described in embodiment 1.

Example 5

According to an embodiment of the present invention, there is also provided a processor, configured to execute a program, where the program executes the method for processing multiple concurrent data described in embodiment 1.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for processing multiple concurrent data, comprising:

acquiring drive test data acquired by a vehicle;

decomposing the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling vehicles to work according to specified control instructions;

submitting the plurality of vehicle work tasks to be processed in parallel to a container cluster manager, wherein the container cluster manager is used for arranging and scheduling computing resources in a server cluster;

and scheduling a plurality of processing containers by the container cluster manager to respectively perform parallel processing on each vehicle work task to be processed in parallel, and generating a processing result, wherein each processing container is provided with a group of computing resources, and the computing resources are used for processing the vehicle work tasks.

2. The method of claim 1, wherein the server cluster comprises a plurality of servers, and the processing containers with corresponding quantities are obtained by dividing the number of computing resources of the server cluster, and the computing resources at least comprise: the system comprises a processor resource and a storage resource, wherein the server is trained and deployed with at least one master server and at least one slave server, and the container cluster manager is installed on the master server and used for monitoring the working state of the slave servers.

3. The method of claim 2, further comprising:

creating a task processing program of the container cluster manager, wherein the task processing program is used for determining the number of the processing containers to be called according to preset parameters;

packaging the task processing program and constructing a task processing mirror image;

constructing a container required by the vehicle work task operation based on the task processing mirror image, wherein the container comprises: a management container for performing scheduling management and a processing container for running tasks.

4. The method of claim 3, wherein scheduling, by the container cluster manager, a plurality of processing containers to respectively process each vehicle job task to be processed in parallel comprises:

receiving the plurality of vehicle work tasks to be processed in parallel, and scheduling at least one management container and the processing containers with the same number as the vehicle work tasks;

each management container distributes the corresponding vehicle work task to a designated processing container;

and starting the processing containers, wherein each processing container respectively runs the distributed vehicle work tasks.

5. The method of claim 4, wherein after each of the processing containers is respectively operating the assigned vehicle job tasks, the method further comprises:

combining the sub-operation results of each vehicle work task to generate the processing result;

and storing the processing result to a preset database, wherein the database is a database allowing interactive query.

6. A multi-concurrent data processing apparatus, comprising:

the acquisition module is used for acquiring the drive test data acquired by the vehicle;

the decomposition module is used for decomposing the drive test data to generate a plurality of vehicle work tasks to be processed in parallel, wherein the vehicle work tasks are tasks for controlling the vehicle to work according to a specified control instruction;

the submitting module is used for submitting the vehicle work tasks to be processed in parallel to a container cluster manager, wherein the container cluster manager is used for arranging and scheduling computing resources in a server cluster;

and the processing module is used for scheduling a plurality of processing containers through the container cluster manager to respectively perform parallel processing on each vehicle work task to be processed in parallel to generate a processing result, wherein each processing container is provided with a group of computing resources, and the computing resources are used for processing the vehicle work tasks.

7. The apparatus of claim 6, wherein the server cluster comprises a plurality of servers, and the processing containers with corresponding quantities are obtained by dividing the number of computing resources of the server cluster, and the computing resources at least comprise: the system comprises a processor resource and a storage resource, wherein the server is trained and deployed with at least one master server and at least one slave server, and the container cluster manager is installed on the master server and used for monitoring the working state of the slave servers.

8. The apparatus of claim 7, further comprising:

the creating module is used for creating a task processing program of the container cluster manager, wherein the task processing program is used for determining the number of the processing containers to be called according to preset parameters;

the encapsulation module is used for encapsulating the task processing program and constructing a task processing mirror image;

a building module, configured to build a container required by the vehicle work task when running based on the task processing image, where the container includes: a management container for performing scheduling management and a processing container for running tasks.

9. The apparatus of claim 8, wherein the processing module comprises:

the receiving submodule is used for receiving the plurality of vehicle working tasks to be processed in parallel;

the scheduling submodule is used for scheduling at least one management container and the processing containers with the same number as the vehicle work tasks;

the distribution submodule is used for distributing the corresponding vehicle work task to the designated processing container by each management container;

and the operation submodule is used for starting the processing containers, and each processing container respectively operates the distributed vehicle work task.

10. The apparatus of claim 9, further comprising:

the merging module is used for merging the sub-operation results of each vehicle work task to generate the processing result;

and the storage module is used for storing the processing result to a preset database, wherein the database is a database allowing interactive query.