CN106354563B

CN106354563B - Distributed computing system for 3D reconstruction and 3D reconstruction method

Info

Publication number: CN106354563B
Application number: CN201610756715.9A
Authority: CN
Inventors: 戴作卓; 方天; 权龙�
Original assignee: Guangzhou HKUST Fok Ying Tung Research Institute
Current assignee: Guangzhou HKUST Fok Ying Tung Research Institute
Priority date: 2016-08-29
Filing date: 2016-08-29
Publication date: 2020-05-22
Anticipated expiration: 2036-08-29
Also published as: CN106354563A

Abstract

The invention discloses a distributed computing system for 3D reconstruction and a 3D reconstruction method. The system includes a distributed file system and a cluster of multiple machines, where one of the multiple machines is elected as a master and the remaining machines are worker machines. Wherein the distributed file system is shared by a plurality of working machines and is compatible with a portable operating system interface standard; and the main control computer dispatches a plurality of received jobs of the 3D reconstruction project to the plurality of working machines, and the plurality of working machines run the received jobs to complete the 3D reconstruction. With a distributed computing system, existing stand-alone 3D reconstruction programs can be easily migrated into clusters without modifying the source code.

Description

Distributed computing system for 3D reconstruction and 3D reconstruction method

Technical Field

The embodiment of the invention relates to a distributed computing system for 3D reconstruction and a 3D reconstruction method.

Background

With the advent of unmanned flight technology, building data collection is facilitated. Several T Bytes (Tera Bytes) of data may be collected one day of flight. Large-scale three-dimensional (3D) reconstructions, such as urban 3D reconstructions, require significant computing resources, running weeks to months on typical consumer-level machines. Moreover, the current 3D reconstruction program is a huge project written in C + +, can only run on a single machine, and is not suitable for the currently popular distributed architectures, such as Hadoop (Hadoop is a distributed system infrastructure developed by Apache foundation) and Spark (Spark is a universal parallel framework like Hadoop MapReduce sourced by UC Berkeley AMP lab, university, beckeley). This is too costly because the current popular architecture requires developers to rewrite the Application with their language and API (Application Programming Interface).

Thus, unmanned flight technology makes building data collection easier, but poses challenges to the scalability and effectiveness of current 3D reconstruction procedures.

Disclosure of Invention

In view of the above challenges faced by the prior art, the inventors of the present invention provide a distributed computing system for 3D reconstruction and a 3D reconstruction method.

According to one embodiment of the invention, the distributed computing system for 3D reconstruction includes a distributed file system and a cluster of machines, wherein one of the machines is elected as a master and the remaining machines are worker machines. Wherein the distributed file system is shared by a plurality of working machines and is compatible with a portable operating system interface standard; and the main control computer dispatches a plurality of received jobs of the 3D reconstruction project to the plurality of working machines, and the plurality of working machines run the received jobs to complete the 3D reconstruction. Optionally, the distributed file system is a Gluster system. Gluster is a file system of a cluster, and supports PB (PeerByte) -level data volume. The Gluster system integrates the storage space distributed on different servers into a large network parallel file system in the mode of RDMA (remote direct data Access) and TCP/IP (Transmission control protocol/Internet protocol)

According to another embodiment of the present invention, the distributed computing system further includes a distributed key value storage system, and the distributed key value storage system stores the key value and the state of the master controller. Preferably, the distributed key value storage system is etcd. etcd is a highly available key-value storage system, mainly used for shared configuration and service discovery. etcd was developed and maintained by CoreOS, inspired from ZooKeeper and Doozer, written in Go language, and handled log replication through the Raft consistency algorithm to ensure strong consistency. The raw is a new consistency algorithm from Stanford, which is suitable for log replication of a distributed system, and the raw realizes consistency by means of election, and in the raw, any node can become a Leader (Leader) or a master controller (master).

Thus, electing one of the plurality of machines as the master may include: the multiple machines place respective IDs on a master key of the distributed key value storage system, but only one machine can successfully place the ID on the master key; the multiple machines acquire a master control machine ID from the distributed key value storage system; and the machine with the ID same as the obtained ID of the master control machine in the plurality of machines is used as the master control machine, and the other machines are the working machines.

In the embodiment of the present invention, when the master controller fails, a new master controller is elected, and the state of the original master controller is recovered from the distributed key value storage system. Thereby improving the fault tolerance of the system.

In an embodiment of the invention, the working machines are operated in pairs and copy each other's data. In this way, in the event of a crash of one work machine, the job can continue to run using the data of its paired work machine. Thereby further improving the fault tolerance of the system.

In various embodiments of the present invention, the attributes of the job include a job type, a number of tasks, and a limitation. The operations have logical, IO (input output) dependencies between them. In this way, the main control computer dispatches the plurality of jobs to the plurality of working machines to run according to the logic dependency and the IO dependency between the jobs. For example, jobs having IO dependencies are scheduled to run on the same work machine.

In various embodiments of the invention, the master controller communicates with the outside through the HTTP protocol (hypertext transfer protocol). In this way, a user may monitor cluster status and operating status in the distributed computing system through a Web interface (Web UI). In one embodiment, the host receives a job sent by the user in JSON format via HTTP protocol. JSON (JavaScript Object Notification) is a lightweight data exchange format that is based on a subset of ECMAScript.

In various embodiments of the invention, the master and the plurality of work machines communicate via Remote Procedure Calls (RPCs). Also, time limits may be set for the encoders and decoders for RPC of the master and the work machines, respectively.

In various embodiments of the present invention, the work machine uses a set of containers to run the tasks of the received job. For example, the container may be a Docker container. And, the tasks run inside the Docker container and are isolated from the main operating system. If a task running within a container runs out of its required resources, the task is stopped without affecting the main operating system. In a further embodiment of the present invention, an application may be exported by using the container as a Docker image, so that the application may be run on other operating systems via the Docker image. Thus, to handle an application, the user is no longer required to manually install all dependency libraries.

In various embodiments of the present invention, the master schedules jobs based on their priorities and constraint factors. The constraint factors may include job constraints and task constraints. And, the master controller processes tasks with lower priority than other tasks using a LATE (longest elapsed Time to End) algorithm. Optionally, the master controller processing tasks with lower priority than other tasks using the LATE algorithm includes: and when the idle resources exist, the master control computer speculates running tasks and copies the tasks with lower estimated priority than other tasks to another working machine to run. Thereby, the operation time can be accelerated.

In a further embodiment of the present invention, the present invention also proposes a 3D reconstruction method, which may include: decomposing a 3D reconstruction project into a plurality of jobs, sending the decomposed jobs to a distributed computing system that runs the jobs to complete a 3D reconstruction according to various embodiments herein.

With a distributed computing system, existing stand-alone 3D reconstruction programs can be easily migrated into clusters without modifying the source code. In this way, the 3D reconstruction project can be divided into a number of small jobs. The distributed computing system manages the dependencies (i.e., dependencies) of these jobs and schedules them on the appropriate machines, speculates on the jobs that are dequeued, and restarts failed jobs. In addition, a distributed computing system is a cluster resource management system that connects multiple machines together, and abstracts CPU (central processing unit), GPU (graphics processing unit), memory, and other computing resources away from these (physical or virtual) machines, thereby building a fault-tolerant, flexible distributed system and allowing the distributed system to operate efficiently. The distributed computing system also provides a Web UI for monitoring cluster status and job status so that users can add, delete, and restart jobs directly through Web pages. Through industrial application, the result shows that the distributed computing system can successfully accelerate the 3D reconstruction engineering, and effectively adapts to the application of the current unmanned flight technology in building data collection.

Drawings

FIG. 1 is a block diagram of a distributed computing system for 3D reconstruction, according to an embodiment of the present invention;

FIG. 2 is a block diagram of a distributed computing system for 3D reconstruction in accordance with another embodiment of the present invention;

FIG. 3 illustrates a process flow for determining a master from a plurality of machines in accordance with an embodiment of the present invention;

FIG. 4 illustrates a process flow for 3D reconstruction using a distributed computing system according to an embodiment of the present invention;

FIG. 5 illustrates a block diagram of a distributed computing system for 3D reconstruction, in accordance with another embodiment of the present invention;

FIG. 6 illustrates a master election process according to an embodiment of the present invention;

FIG. 7 illustrates a Remote Procedure Call (RPC) procedure between a master and a worker in a distributed computing system according to an embodiment of the present invention;

FIG. 8 illustrates a container for running tasks in a distributed computing system according to embodiments of the invention;

FIG. 9 illustrates a change in job status in a distributed computing system according to an embodiment of the present invention;

FIG. 10 illustrates a case where no engineering scheduling is performed in the distributed computing system according to an embodiment of the present invention;

FIG. 11 illustrates an exemplary engineering schedule in a distributed computing system according to embodiments of the present invention.

Detailed Description

To facilitate an understanding of the various aspects, features and advantages of the present inventive subject matter, reference is made to the following detailed description taken in conjunction with the accompanying drawings. It should be understood that the various embodiments described below are illustrative only and are not intended to limit the scope of the invention.

The present invention is described below with reference to example block diagrams of methods, systems, devices, apparatus, and programming, as well as computer program products. It will be understood that each block of the example block diagrams, and combinations of blocks in the example block diagrams, can be implemented by programming instructions, including computer program instructions. These computer program instructions may be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the block diagrams or flowchart block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instructions which implement the function specified in the block diagrams or flowchart block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the block diagrams or flowchart block or blocks. The programming instructions may also be stored in and/or implemented by electronic circuitry to perform the various functions, steps of the present invention.

Distributed computing system for 3D reconstruction

[ example 1 ]

FIG. 1 illustrates a distributed computing system for 3D reconstruction according to an embodiment of the present invention. In this embodiment, the distributed computing system for 3D reconstruction may include a distributed file system 100 and a cluster of machines, where one of the machines is elected as a master machine 200 and the remaining machines are worker machines 300. The distributed file system 100 is shared by a plurality of working machines 300, and the distributed file system 100 is compatible with the interface standard of the portable operating system, so that existing stand-alone 3D reconstruction programs, such as C + + programs, can read and write files by using their own standard file APIs, just like using local disks.

The main control computer 200 receives a 3D reconstruction project (project) submitted by a user, wherein the 3D reconstruction project is composed of a plurality of jobs (jobs), and each job is composed of a plurality of tasks (tasks). The main control computer 200 dispatches the received jobs of the 3D reconstruction project to the plurality of working machines 300, and the plurality of working machines 300 run the received jobs to complete the 3D reconstruction.

Optionally, the distributed file system is a Gluster system. Gluster is a file system of a cluster, and supports PB-level data volume. The Gluster system integrates storage spaces distributed on different servers into a large network parallel file system in an RDMA and TCP/IP mode.

By adopting the distributed computing system of the embodiment of the invention, the existing single-machine 3D reconstruction program can be easily transplanted into the cluster without modifying source codes. Thus, the expandability and the effectiveness of the 3D reconstruction program are effectively improved.

[ example 2 ]

Fig. 2 illustrates a distributed computing system for 3D reconstruction according to another embodiment of the invention. In the embodiment of the present invention, the distributed computing system further includes a distributed key value storage system 400 in addition to the distributed file system 100, the master controller 200, and the plurality of working machines 300. The main control machine 200 and the plurality of working machines 300 belong to a cluster formed by a plurality of machines, the plurality of machines compete for the main control machine, and the rest of the machines are the working machines. In an alternative embodiment of the present invention, as shown in fig. 3, at S300, the multiple machines place respective IDs (identities) on a master key of the distributed key value storage system, but only one machine may successfully place an ID on the master key; at S320, the machines obtain a master ID from the distributed key value storage system; s330, each machine judges whether the ID of the machine is the same as the ID of the main control machine; if so, in S351, a machine having the same self ID as the obtained ID of the master controller among the plurality of machines is used as the master controller 200, otherwise, in S352, the remaining machines are the working machines 300. Thus, the distributed key value storage system 400 can be used to store key values and states of the principal controller 200.

In an alternative embodiment of the present invention, the distributed key value storage system 400 may be an etcd. etcd is a highly available key-value storage system, mainly used for shared configuration and service discovery. etcd was developed and maintained by CoreOS, inspired from ZooKeeper and Doozer, written in Go language, and handled log replication through the Raft consistency algorithm to ensure strong consistency. The raw is a new consistency algorithm from Stanford, which is suitable for log replication of a distributed system, and the raw realizes consistency by means of election, and in the raw, any node can become a Leader (Leader) or a master controller (master).

In various embodiments of the present invention, when the master controller fails, a new master controller is elected, and the state of the original master controller is recovered from the distributed key value storage system 400. Thereby improving the fault tolerance of the system.

In a further embodiment of the invention, the working machines can also be operated in pairs and reproduce each other's data. In this way, in the event of a crash of one work machine, the job can continue to run using the data of its paired work machine. Thereby further improving the fault tolerance of the system.

[ example 3 ]

In an alternative embodiment of the present invention, the distributed computing system for 3D reconstruction may have the functions, structures, features described in embodiment 1 or embodiment 2 above. Further, the attributes of the job may include a job type, a number of tasks, and a limitation. The operations have logical dependencies, IO dependencies, between them. In this way, the main control computer 200 schedules the jobs to be executed on the work machines 300 according to the logical dependency between the jobs and the IO dependency. For example, jobs having IO dependencies are scheduled to run on the same work machine. This improves the data localization characteristic (datalocalization).

[ example 4 ]

In an alternative embodiment of the present invention, the distributed computing system for 3D reconstruction may have the functions, structures, features described in embodiment 1 or embodiment 2 or embodiment 3 above. Furthermore, the master controller 200 communicates with the outside through the HTTP protocol. In this way, a user can monitor the cluster status and the working status in the distributed computing system through the Web UI. Optionally, the master controller 200 receives a job sent by the user in JSON format through HTTP protocol. JSON is a lightweight data exchange format based on a subset of ECMAScript.

According to embodiments of the present invention, the distributed computing system for 3D reconstruction can accept program writes in any programming language, as compared to other distributed computing architectures, as long as the resulting JSON file conforms to the Restful API specification. Restful is a software architecture style, a design style rather than a standard, and provides only one set of design principles and constraints. The method is mainly used for the interactive software of the client and the server. Software designed based on the style can be simpler, more hierarchical and easier to realize mechanisms such as cache and the like.

[ example 5 ]

In an alternative embodiment of the present invention, the distributed computing system for 3D reconstruction may have the functions, structures, features described in

embodiments

1, 2, 3 or 4 above. Further, in an alternative embodiment of the present invention, the master controller 200 and the plurality of working machines 300 may communicate via RPC.

When there are too many traffic flows between the master 200 and the work machines 300, the network becomes unstable. In this case, the connection must be closed, otherwise packet loss will occur and the RPC will be suspended. To solve this problem, in the embodiment of the present invention, a time limit (deadline) of RPC is not set because a job completion time is not predictable, but deadlines are set for an encoder and a decoder because message encoding and decoding times may represent a state of a network. Therefore, an alternative embodiment of the invention sets time limits for the encoder and decoder for the RPC for the master and the worker, respectively.

[ example 6 ]

In an alternative embodiment of the present invention, the distributed computing system for 3D reconstruction may have the functions, structures, and features described in any of the above embodiments 1 to 5. Further still, in an alternative embodiment of the present invention, the working machine 300 uses a set of containers (containers) to run the tasks of the received job. For example, the container may be a Docker container. And, the tasks run inside the Docker container and are isolated from the main operating system. If a task running within a container runs out of its required resources, the task is stopped without affecting the main operating system.

Furthermore, in a further embodiment of the present invention, an application may be exported by treating the container as a Docker image, so that the application may be run on other operating systems (e.g., windows, Linux, Mac OS) via the Docker image. Thus, to handle an application, the user is no longer required to manually install all dependent libraries (dependencylibraries).

[ example 7 ]

In an alternative embodiment of the present invention, the distributed computing system for 3D reconstruction may have the functions, structures, and features described in any of embodiments 1 to 6 above. Further still, in an alternative embodiment of the present invention, the master controller 200 schedules jobs based on their priorities and constraint factors. The constraint factors may include job constraints and task constraints. And the master controller processes tasks with lower priority than other tasks by using LATE algorithm. Optionally, the master controller processing tasks with lower priority than other tasks using the LATE algorithm includes: and when the idle resources exist, the master control computer speculates running tasks and copies the tasks with lower estimated priority than other tasks to another working machine to run. Thereby, the operation time can be accelerated.

3D reconstruction method

Fig. 4 shows a flow of a 3D reconstruction method according to an embodiment of the present invention. In an embodiment of the present invention, the 3D reconstruction method may include: the 3D reconstruction project is decomposed into a plurality of jobs (S400), and the decomposed jobs are sent to a distributed computing system according to various embodiments herein (S420), which runs the jobs to complete the 3D reconstruction (S440). It should be understood that the 3D reconstruction process described herein may be an existing stand-alone 3D reconstruction process or a 3D reconstruction process that may be reprogrammed as needed by one skilled in the art.

According to the above various embodiments, existing stand-alone 3D reconstruction programs can be easily ported into a cluster without modifying the source code. In this way, the 3D reconstruction project can be divided into a number of small jobs. The distributed computing system of embodiments of the present invention manages the dependencies (i.e., dependencies) of these jobs and schedules them on the appropriate machines, speculates on the jobs that are dequeued, and restarts failed jobs. Furthermore, the distributed computing system of embodiments of the present invention connects machines together, and abstracts CPU, GPU, memory, and other computing resources away from these (physical or virtual) machines, thereby building a fault-tolerant, flexible distributed system and allowing the distributed system to operate efficiently. In addition, the embodiment of the invention also provides a Web UI for monitoring the cluster state and the job state, so that a user can directly add, delete and restart jobs through a webpage.

Applications of

In the face of the challenges presented to the 3D reconstruction procedure by the application of current unmanned flight technology, the inventors of the present invention have developed a distributed computing architecture for large-scale 3D reconstruction, referred to as ZRPC, based on the teachings of the various embodiments described above. By adopting ZRPC, the existing stand-alone 3D reconstruction program can be transplanted into the cluster without modifying source codes. In this way, the 3D reconstruction project may be divided into a number of steps, each of which may be considered a job consisting of tasks requiring a different number of machines. ZRPC manages the dependency of these jobs, schedules them on the appropriate machines, speculates or predicts jobs that are dequeued, and restarts failed jobs. Moreover, ZRPC is a cluster resource management system that connects machines together, and abstracts CPU, GPU, memory, and other computing resources away from these physical or virtual machines, thereby building fault-tolerant, flexible, distributed systems and allowing them to operate efficiently. ZRPC provides a Web UI for monitoring cluster status and job status so that a user can add, delete, and restart jobs directly through a Web page. The ZRPC industry is applied, and results show that ZRPC can successfully accelerate 3D reconstruction engineering.

The ZRPC proposed by the inventors of the present invention is described below in various aspects.

1. General description of the invention

1.1 purpose

There is a need for a distributed system that meets the following requirements:

1. the system should be able to treat the C + + package as a job without changing its source code. In other words, the distributed system is transparent to the user. This is a great advantage over existing popular architectures (e.g., Hadoop MapReduce and Spark). Existing popular architectures require developers to rewrite applications with their languages and APIs.

2. The use of cluster (cluster) machines can significantly reduce the completion time of the 3D reconstruction project. To accomplish this, the cluster sources may be shared and (schedule) jobs may be scheduled appropriately. The total IO throughput, network bandwidth, CPU and GPGPU (General Purpose computing graphics processor) computing power should increase linearly with the increase in the number of machines.

3. The distributed file system should be compatible with the Portable Operating System Interface (POSIX) standard, so that the C + + program can read and write files using its own standard file API, just as using a local disk. Without POSIX compatibility, the user must change the C + + program source code, which is contrary to requirement 1.

4. The cluster should meet common distributed system requirements: scalability and reliability. Scalability means that the cluster has the potential to expand to handle more tasks. As the number and size of projects increase, users should be able to dynamically add machines without impacting the current jobs running in the cluster. Reliability means that the system is fault tolerant. Failures are common in clusters. There are three types of failures: job failures, machine failures, and network failures. All faults should be handled to minimize the impact on the engineering operation.

1.2 solution

In view of the above, the inventors of the present invention developed ZRPC, which runs jobs on a cluster machine as a single system to accelerate 3D reconstruction. ZRPC is a distributed system for managing resources and scheduling jobs among multiple hosts (hosts) of a cluster, providing mechanisms for job deployment, scheduling, updating, maintenance, and extension. ZRPC has three advantages: (1) a simple abstraction that hides the details of resource management and fault handling is provided so that users can focus on application development; (2) the operation has high reliability, effectiveness and data safety; (3) tasks are run in parallel to significantly reduce job run time.

Fig. 5 shows the architecture of ZRPC. It consists of 4 parts, which are respectively: a centralized controller of logic called the master, a set of work machines running tasks, a persistent storage call ETCD based on the Raft consistency algorithm, and a distributed file system (which stores all task outputs) shared by the work machines. The work machine is actually a ZRPC process (process) that controls a set of containers that run tasks. ZRPC communicates internally using Remote Procedure Calls (RPC) and externally using the HTTP Restful API. The user can run jobs and monitor the machine through the Web browser and python API. The embodiment of the invention utilizes Gluster as a basic distributed system, is suitable for storing a large number of small files and has smaller file access delay. They are discussed in detail below.

2. Cluster management

2.1 machine discovery

ZRPC relies on etcd, a distributed consistent key-value storage technique based on Raft for sharing configuration and service discovery. Raft is a consistency algorithm that is comparable to Paxos in terms of fault tolerance and performance, except that it is broken down into multiple sub-problems relative to independence, and addresses all major parts required by the actual system clearly (addresses). The machine registration information includes the number of logical CPU cores, the availability of GPGPGPUs, and distributed file system information. This information may be updated from time to serve as constraints on task execution.

2.2 Cluster leader election

The process of establishing a cluster consists of 3 steps. First, a small etcd cluster is established. The etcd cluster may keep data reliable and secure as long as at least half of the machines are available. Second, set up all machines and register them on the etcd. Each machine begins in the election state. In the election state, each machine attempts to elect itself as the master and put their ID on the key of the master in etcd. However, only one machine can be successfully placed on the master key. Third, all machines request the master ID from the etcd cluster. If the ID of the response is equal to the ID of the machine, the machine enters a master control machine state, otherwise, the machine enters a working machine state. The elected principal control should heartbeat detection (heartbeat) etcd to maintain its principal control role. Once the master value expires, all machines re-enter the election state.

2.3 Fault tolerance

The master in ZRPC is not replicatable, but its data can be replicated. The master controller copies the state snapshot (snapshot) into the etcd. When the master controller crashes, the cluster will elect a new master controller and recover the state of the original master controller from the cluster.

As shown in fig. 6, during a job run, work machines may join and leave. For reliability, the worker in ZRPC is paired and copies each other's data. In the event that one of the jobs is down, the job may continue to run using the data of its paired peer. However, if both work machines in a pair are down at the same time, the job will go into a fault state and require an administrator to manually fix the problem. The second situation is a serious problem and therefore, it is necessary to minimize the occurrence thereof. Assume that a cluster with N nodes is employed, with N/2 pairs, with the probability of each node crashing being P, and the administrator checks the cluster every T time units, so that the probability of one node crashing in T time units is PT, and therefore the probability P of the second scenario occurring in the entire cluster is:

P＝N/2(PT)²

for example, an existing cluster has 10 nodes, with a typical average machine crash frequency of once every 3 months, and a system administrator checks the cluster every 8 hours. Thus, the probability that a pair of machines crashed during the day is found to be p-0.0000685871, meaning that it occurred every 14580 days.

2.4 deployment

In practical applications, an easy-to-use installation tool is very important. In the present embodiment, the automated deployment tool is constructed based on an infrastructure architecture. An alarm is an IT automation tool. IT can configure the system, deploy software, and schedule more advanced IT tasks, such as continuous deployment or zero-downtime rolling updates. The tool is able to automatically install the programs of this embodiment, including the ZRPC, Gluster, etcd, and 3D reconstruction programs, into a new cluster or into a new machine.

[cluster]

cent01 Name＝cent01 IP＝192.168.226.140

cent02 Name＝cent02 IP＝192.168.226.141

cent03 Name＝cent02 IP＝192.168.226.142

cent04 Name＝cent02 IP＝192.168.226.143

cent05 Name＝cent02 IP＝192.168.226.144

[cluster:vars]

ClusterName＝demo

EtcdEndpoints＝["http://192.168.226.140:2379",

"http://192.168.226.141:2379","http://192.168.226.142:2379"]

The above is a 5-node cluster configuration example, where cent01, cent02, and cent03 constitute an etcd cluster for storing and synchronizing cluster information. The user need only specify the role and IP of each machine and "Etcd Endpoints" (contact points for the etcd cluster).

3. Job and task

3.1 user management

The user must use the ZRPC system with an account of the system administrator. With this account, the user can submit jobs, upload input files, and download result files of the distributed file system. The monitoring web page may display the job status of the user.

3.2 type of operation

The attributes of a job include the job type, the number of tasks, and constraints. The constraints may be the number of CPUs, whether GPUs are relied upon, and machine performance. The job type may be Single (Single), Multiple (Multiple), Batch (Batch), and Service (Service). For these job types, ZRPC has a corresponding policy.

Single: a job of the "Single" type contains only one command and one input file. ZRPC randomly arranges it to the work machine.

Batch: a "Batch" type job contains a list of commands. Each command has its own input file and is not dependent on other commands. ZRPPC treats each command as a task and arranges them equally.

Multiple: a job of the "Multiple" type has only one command, but has a list of input files. ZRPC will intelligently break the list into multiple small lists, taking into account data localization and work machine performance. Each tabble is then an input for each task.

Service: service is a predefined job template. It is used to address the following security issues: the average user can run any jobs, including those that are harmful, on the ZRPC. The administrator registers Service on ZRPC, and the ordinary user can call it through the required parameters. Service has a program folder containing executable files and dependent files (dependent files). The user can update these files through the web interface of ZRPC.

There are two types of dependencies between operations, logical dependencies and IO dependencies. If job A is logically dependent on job B, then job A should be scheduled after job B, since B will transfer some information to A. If job A IO is dependent on job B, then job A should be scheduled after job B and their tasks should be scheduled on the same machine because job A will read the output of job B. Arranging their tasks on the same machine will improve the data localization properties.

3.3Restful API

The user may send the job to the ZRPC server in JSON format over HTTP protocol. The ZRPC server provides the Restful API. Representational State Transfer (REST) is a hybrid style derived from several network-based architectural styles. Meaning that the communication between the client and the server is stateless. By sending the job to the ZRPC server, the client can obtain the job ID. The job ID may then be used to query the job status or to stop (kill) the job.

For example, if a user job wants to run a batch job on ZRPC, the following JSON string is sent to http:// server IP/cmd/batch:

"Name" is a description of the job. "Threads" represents the number of logical CPU cores per task in a job. "GPU" is Boolean flag (boolean flag) informing the system: whether the job requires the GPU to be run. "server IP" can be the IP address of any machine in the cluster, since ZRPC is multi-agent. Multi-agent means that if a non-master work machine receives a job request, it redirects the request to the master so that the user does not need to know which is the current master. "FileDepend" is used to specify IO dependent operations. If the job IO depends on another job, the scheduler (scheduler) will try to arrange them on the same machine. When a job request is received, the server will first parse the request to see if it complies with the job specification, and then respond to the user with the job ID for subsequent use.

In contrast to other distributed computing architectures, ZRPC accepts program writes in any programming language, as long as the resulting JSON file conforms to the Restful API specification.

3.4 remote procedure Call

The master sends jobs to the work machines via Remote Procedure Calls (RPCs). RPC extends the common programming abstraction of procedure calls to a distributed environment, allowing the master to call procedures (procedures) in the worker, as do local calls. For example, when the master calls the "DoJob" method of the worker, the parameters are actually sent to the worker over the network. When the working machine completes the task, the result is returned to the main control machine

Fig. 7 shows an execution procedure of RPC on the system of the present embodiment. ZRPC transmits values in gob format (binary values exchanged between encoder and decoder) over TCP protocol. TCP is faster and less redundant than the HTTP protocol of the master and clients. The Gob format saves more space compared to JSON. Because the traffic flow between the master and the worker is much greater than the traffic flow between the master and the client, much attention must be paid to performance. Furthermore, when there are too many traffic flows between the master and the worker, the network becomes unstable. In this case, the connection must be closed, otherwise packet loss will occur and the RPC will be suspended. To solve this problem, in the embodiment of the present invention, a time limit (deadline) of RPC is not set because the job completion time is unpredictable, but deadlines are set for the gob encoder and decoder because the message encoding and decoding time represents the state of the network.

3.5 Container

Virtual machines are the traditional isolation method used by most cloud computing platforms. However, modern schedulers run tasks within the Linux Cgroup-based resource container. Cgroups is an abbreviation of control groups, and is a mechanism provided by the Linux kernel that can limit, record, and isolate physical resources (such as CPUs, memories, IOs, and the like) used by process groups (process groups). Containers are lighter, occupy less system resources, and require less startup time than virtual machines. As shown in FIG. 8, at each ZRPC worker node, a ZRPCD process receives the tasks of the master and starts a series of containers running the tasks. Three advantages of using Docker (a popular container implementation, Docker is an open source application container engine, allowing developers to package their applications and dependencies into a portable container and then distribute them to any popular Linux machine, or to implement virtualization) are discussed below.

3.5.1 resource isolation

Users may intentionally or unintentionally run jobs using more computing resources than they require. For example, if a program causes a memory leak (memory leak), it will consume all memory resources and eventually disable all other jobs. Worse still, experiments have shown that poorly written programs can even crash the entire machine. Therefore, a resource isolation mechanism is needed to limit the maximum resource usage of the job. Upon receiving the task message, ZRPCD will launch a container with declared resource restrictions, such that the task runs internally, isolated from the host operating system. If a task running within a container runs out of all of its required resources, the task is stopped without affecting the main operating system. In addition, when a process is run in a container, each sub-process it creates is also run in the container, so that the entire process group can be easily controlled.

3.5.2 consistent runtime Environment

The hardware of the machine has heterogeneity, as does the software. Over time, new machines join the cluster, old machines exit, operating system kernel versions and installed software may change. Worse, an application running normally on a developer machine may fail in the cluster. The container solves this problem because it ensures that the runtime environment of the cluster and developer machine are consistent. All Docker containers are built according to one configuration file. All containers will be rebuilt in time once the configuration file changes.

3.5.3 facilitating transplantation

Facilitating migration means that applications can migrate freely to other operating systems. With Docker, the application may be exported by taking the container as a Docker image (Docker image). The image can be run on all mainstream platforms, including Linux, Windows, Mac OS. Therefore, in order to handle the application, the user is no longer required to manually install all the dependency libraries.

Fig. 9 shows an operation state transition process. As shown in fig. 9, waiting for a job to be created; requesting a working machine, and running the operation through the working machine; if the operation is finished, the operation is successful; when the operation is wrong, failure is caused; the operation can also be stopped; and if the connection error or the system error occurs in the running process, returning to wait for a new creation job.

3.6 scheduling

ZRPC uses a simple and fast centralized scheduler (monolithic scheduler). In an embodiment of the present invention, two factors are considered in scheduling jobs, namely priority and constraint factors. The received job will be placed into the corresponding job queue based on its priority. Two constraint factors supported by ZRPC are job constraint (per-job constraint) and task constraint (per-task constraint). Job constraints are for all tasks, e.g., all tasks are required to run using the GPU. Task constraints are for a single task, e.g., data localization constraints cause a machine with input data to run. ZRPC has a global (global) job priority queue for scheduling jobs, and for each job it has its own task scheduler or scheduler.

Through the job priority queue scheduler, ZRPC can perform fine-grained control on engineering jobs. As mentioned before, the reconstruction project comprises 4 steps (jobs) running in succession. Fig. 10 and 11 illustrate how ZRPC schedules two projects to reduce the average project completion time. Assume that project a arrives at time 0 and project B arrives at time 2. FIG. 10 shows the case without scheduling, with A having a time span of 0-7 and B having a time span of 2-14, such that the average completion time is 9.5. In FIG. 11, because project A arrives before B, all jobs for A have a higher priority than jobs for B, the scheduler will run jobs for A first, and then jobs for B as long as there are free nodes. Therefore, the time span of A is 0-7, the time span of B is 2-10, and the average completion time is 7.5.

The lagger is a task that runs lower priority than other tasks and delays the completion of a job because it is only completed when its last task completes. The ZRPC scheduler processes the laggard using the LATE algorithm. LATE is a shorthand for the Longest approach Time to End, which usually predictively performs tasks that are considered to be completed only in the future. Whenever there is an idle resource budget, the scheduler speculates that a task is running and clones the identified out-dated task onto another machine (in this case a fast node). The results of earlier completed tasks are picked out as correction results. Considering the case that ZRPC runs 1 job composed of 10 tasks in a 10-node cluster, the scheduler allocates 1 task to each node. Suppose a task has a normal completion time of T_taskThen at time T _task9 tasks are completed and 1 task falls behind. Since there are now free resources, the scheduler clones the laggard task to another work machine. Therefore, the job completion time will be 2T_task，Meaning that the job completion time is delayed by 100% in the worst case.

3.7 Fault handling

Several errors may be encountered while running a task. ZRPC processes them appropriately to ensure job fault tolerance.

1. Dialing Error (Dialing Error). Such errors typically occur when the master control fails to connect to the remote work machine. At this point, the task information is not sent to the work machine, so ZRPC only needs to set the work machine state to dead and reschedule the task.

Gluster Error (Gluster Error). The working machine will check the operating environment that the task is primarily in, e.g., the distributed file system Gluster. When it finds that the file system is working abnormally, it returns the error to the master. In this case, the task is not actually started and ZRPC only needs to reschedule the task.

3. Runtime Error (Runtime Error). Which means that the task is running and exiting due to an error. ZRPC judges whether the task is operated successfully according to the returned code, and if the returned code is not 0, an operation error is generated. This is a matter of task, so ZRPC tries two more times, and if all fail, the entire job fails.

4. Loss of Connection Error (Connection Lost Error). This is the most troublesome error. Which means that the job has been sent and successfully run on the work machine, but the master control loses its connection with the work machine. Therefore, the task state on the work machine is undetermined. If ZRPC re-initiates a task on another worker, there are two identical processes that write the same file, thereby corrupting the file system. However, if ZRPC does not rerun it, the entire job is blocked. Embodiments of the present invention handle such errors by setting a recovery period. In one aspect, during the recovery period, the master attempts to reconnect the lost master and recover the job. Alternatively, the user may decide whether to run a job on another machine or to repair the current machine. If both methods fail during the recovery period, the job will be rescheduled to the other machine. If a period of time has elapsed, the machine recovers and the connection recovers, the master requires the work machine to stop to avoid duplication.

4. Conclusion

Due to the current popularity of unmanned driving, data sets for 3D reconstruction grow in large scale. The inventors of the present invention have recognized a need to introduce a distributed computing architecture to this field to address the large scale problem. Although Hadoop is the most widely used big data processing ecosystem, it is not suitable in this case because it places too much restriction on the source code of the application. Thus, the inventors of the present invention have established ZRPC, a functionally distributed computing system, for production use.

ZRPC successfully decouples the computing architecture and hardware resources and provides a simple API for users to run their programs in the cluster. It not only develops concurrency to improve performance, but also allows users to organize clustered machines to achieve specific goals. This is particularly useful in cases where existing code is hardly portable to "new" architectures (e.g., MapReduce, Spark, etc.). ZRPC delivers a stand-alone program into a distributed computing application through a simple JSON API. Although duplicate checkpoints are generated, they achieve the reliability of the data. Robust fault detectors and processors enable jobs in a cluster to run safely.

ZRPC can reduce the project completion time for both small projects and large projects, and improve the cluster utilization rate. ZRPC can also be extended appropriately without interrupting the current job.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention can be implemented by combining software and a hardware platform. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments of the present invention.

It should be understood by those skilled in the art that the foregoing is only illustrative of the present invention, and is not intended to limit the scope of the invention.

Claims

1. A distributed computing system for 3D reconstruction, the distributed computing system comprising a distributed file system and a cluster of machines, wherein one of the machines is elected as a master and the remaining machines are worker machines;

wherein the distributed file system is shared by a plurality of working machines and is compatible with a portable operating system interface standard;

the main control computer dispatches a plurality of received jobs of the 3D reconstruction project to the plurality of working machines, and the plurality of working machines run the received jobs to complete the 3D reconstruction;

the distributed computing system also comprises a distributed key value storage system, and the distributed key value storage system stores the key value and the state of the main control computer;

when the main control machine fails, selecting a new main control machine, and recovering the state of the original main control machine from the distributed key value storage system;

one of the plurality of machines is elected as a master and the remaining machines are work machines, comprising:

the multiple machines place respective IDs on a master key of the distributed key value storage system, but only one machine can successfully place the ID on the master key;

the multiple machines acquire a master control machine ID from the distributed key value storage system;

and the machine with the ID same as the obtained ID of the master control machine in the plurality of machines is used as the master control machine, and the other machines are the working machines.

2. The distributed computing system of claim 1, wherein the distributed key value storage system is an etcd.

3. The distributed computing system of claim 2, wherein the work machines operate in pairs and replicate each other's data.

4. The distributed computing system of claim 1, wherein the attributes of the job include a job type, a number of tasks, and a limit.

5. The distributed computing system of claim 1, wherein the jobs have logical dependencies, IO dependencies, between them.

6. The distributed computing system of claim 5, wherein the master controller schedules the plurality of jobs to run on the plurality of work machines according to logical dependencies between jobs, IO dependencies.

7. The distributed computing system of claim 6, wherein jobs having IO dependencies are scheduled to run on the same work machine.

8. The distributed computing system of claim 1, wherein the master communicates with the outside through the HTTP protocol.

9. The distributed computing system of claim 8, wherein a user monitors cluster status and operational status in the distributed computing system through a Web UI.

10. The distributed computing system of claim 9 wherein the host receives jobs sent by the user in JSON format over HTTP protocol.

11. The distributed computing system of claim 1, wherein the master and the plurality of work machines communicate via Remote Procedure Calls (RPCs).

12. The distributed computing system of claim 11, wherein the respective encoders and decoders for RPCs of the master and worker are set with time constraints.

13. The distributed computing system of claim 1, wherein the worker uses a set of containers to run tasks of the received job.

14. The distributed computing system of claim 13, wherein the container is a Docker container.

15. The distributed computing system of claim 14, wherein the task runs inside the Docker container and is isolated from a host operating system.

16. The distributed computing system of claim 15, wherein an application is exported by way of the container as a Docker image for running the application on other operating systems via the Docker image.

17. The distributed computing system of claim 1 wherein said master schedules jobs based on priority of jobs and constraint factors.

18. The distributed computing system of claim 17, wherein the constraint factors include job constraints and task constraints.

19. The distributed computing system of claim 18, wherein said master uses the LATE algorithm to process tasks that are lower priority than other tasks.

20. The distributed computing system of claim 19, wherein said master using the LATE algorithm to process tasks of lower priority than other tasks comprises:

and when the idle resources exist, the master control computer speculates running tasks and copies the tasks with lower estimated priority than other tasks to another working machine to run.

21. The distributed computing system of claim 1, wherein the distributed file system is a Gluster system.

22. A method of 3D reconstruction, the method comprising:

the 3D reconstruction project is decomposed into a plurality of jobs,

sending the decomposed job to the distributed computing system of any one of claims 1 to 21,

the distributed computing system runs the job to complete the 3D reconstruction.