CN104794194B

CN104794194B - A kind of distributed heterogeneous concurrent computational system towards large scale multimedia retrieval

Info

Publication number: CN104794194B
Application number: CN201510186094.0A
Authority: CN
Inventors: 王瀚漓; 肖波
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2015-04-17
Filing date: 2015-04-17
Publication date: 2018-10-26
Anticipated expiration: 2035-04-17
Also published as: CN104794194A

Abstract

The present invention relates to a kind of distributed heterogeneous concurrent computational systems towards large scale multimedia retrieval, and distributed heterogeneous computer cluster includes multiple calculate nodes, and each calculate node includes the processor of one or more types, including：Performance estimation module, for monitoring and updating calculated performance of the different computing modules on different processor in real time；The monitoring result of data cutting module, input data read/write function and performance estimation module for being provided according to user carries out cutting to the calculating task of input；Dynamic grading scheduling module, for being scheduled to the calculating task after cutting and load balance process；CHCF algorithmic tools library, for realizing a variety of multimedia retrieval algorithms.Compared with prior art, the present invention has many advantages, such as that reducing multimedia retrieval application writes difficulty, improves distributed heterogeneous computing system efficiency.

Description

A kind of distributed heterogeneous concurrent computational system towards large scale multimedia retrieval

Technical field

The present invention relates to multimedia retrieval fields, different more particularly, to a kind of distribution towards large scale multimedia retrieval Structure concurrent computational system.

Background technology

With the emergence of the original mutually acting systems of multimedia, the prevalence of the new medias such as network multimedia and mobile multimedia, with And portable intelligent terminal device (such as smart mobile phone, tablet computer) is popular and universal, the multimedia on internet (such as regards Frequently, image etc.) quantity is just being presented magnanimity grade and increasing exponentially.By internet hunt and watch abundant picture and video money Source has become the important way that numerous netizens obtain information.In face of the multi-medium data of magnanimity, how group effectively to be carried out to it Knit, store, analyze with retrieval, it has also become the task of a urgent and great challenge, at the same be also multimedia, search engine, The research hotspot in the fields such as data mining.Extensive picture retrieval based on content is exactly most generation in current many research hotspots One of table.It is effectively extracted, expressed and is deposited based on picture material is analyzed and be understood, to image information Storage, in combination with various advanced searching algorithms, to achieve the purpose that quick and precisely to be retrieved to extensive pictures.With Unlike traditional documents, expression is carried out to image and needs to extract a large amount of characteristic information, it is huge when in face of mass picture Calculated load traditional calculation and storage method will be become increasingly difficult to for after.

In order to meet the huge calculated load that processing mass multimedia data band comes, most convenient is quick and effective Method is exactly to use high-performance computer.Powerful by its and ever-increasing computing capability, high-performance computer is past Become the only selection for solving magnanimity calculating task between decades.But as semiconductor fabrication process constantly approaches its limit Soaring and environmental protection consciousness the enhancing of value, operating cost, relies solely on improving central processing unit (Central Processing Unit, CPU) dominant frequency and increase processor quantity become to be increasingly difficult in the method for improving computing capability.With this Meanwhile the world of high-performance calculation is also undergoing a new change, the i.e. isomerization of computing resource.So-called computing resource is different Structure refer to for calculating resource no longer merely rely on CPU, can also use coprocessor, such as graphics processor (Graphics Processing Unit, GPU), Intel Phi MIC etc..For example, in June, 2014 is most fast super in the world Grade computer, the Milky Way two have just used 32000 Intel Xeon CPU and 48000 Intel Phi coprocessors.And The Milky Way No.1 for equally bravely getting laurel in 2010 has then used ATI Radeon GPU as its coprocessor.In recent years, with Flourishing for integrated circuit and semiconductor industry, the calculated performance of GPU (Graphics Processing Unit) has Swift and violent development.The appearance and development of GPGPU (General Purpose GPU) technology so that GPU is no longer limited to traditional Graphics process and calculating.At the same time, advantages and potential of the GPU in fine grained parallel processing also result in the concern of people. If the Concurrent Feature of calculating task itself has diversity, constituting isomery high-performance computer cluster at least can be in three grain Degree is upper to accelerate it parallel, i.e., the coarse grain parallelism acceleration, the multiple processors of intra-node (core) between calculate node it Between middle granularity parallel accelerate and the fine-grained parallel acceleration in coprocessor.

Although Heterogeneous Computing resource brings the Exponential growth of calculated performance, how face is fast and effeciently developed Distributed Application to heterogeneous computer cluster is by no means easy, and to ensure Distributed Application can substantially effectively utilize cluster In various computing resources it is then even more extremely difficult.

Invention content

It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide a kind of reduction multimedias to examine Suo Yingyong write difficulty, improve distributed heterogeneous computing system efficiency towards large scale multimedia retrieve it is distributed heterogeneous simultaneously Row computing system.

The purpose of the present invention can be achieved through the following technical solutions：

A kind of distributed heterogeneous concurrent computational system towards large scale multimedia retrieval, distributed heterogeneous computer cluster Including multiple calculate nodes, each calculate node includes the processor of one or more types, including：

Performance estimation module, for monitoring and updating calculated performance of the different computing modules on different processor in real time；

Data cutting module, the monitoring knot of input data read/write function and performance estimation module for being provided according to user Fruit carries out cutting to the calculating task of input；

Dynamic grading scheduling module, for being scheduled to the calculating task after cutting and load balance process；

CHCF algorithmic tools library, for realizing a variety of multimedia retrieval algorithms.

The data cutting that the data cutting module executes includes cutting distribution of computation tasks to the preliminary of each calculate node The subdivision for dividing and matching calculating task in each calculate node inside points.

The dynamic grading scheduling module includes：

Master scheduler, calculating subtask scheduling for will be obtained after preliminary cutting to different calculate nodes；

Node scheduling device, for the calculated load after subdivision to be dispatched to the different processor in corresponding calculate node.

The node scheduling device is equipped with multiple, is communicated to connect respectively with master scheduler, and each node scheduling device is in communication with each other Connection.

The node scheduling device includes：

Idle notification unit, for sending " free time " notice to other node scheduling devices when corresponding to the calculate node free time；

Judging unit is responded, for judging whether respond when receiving " free time " notice that other node scheduling devices are sent The notice；

Transmission unit is responded, the node for being notified to transmission " free time " when responding the judging result of judging unit and being to be Scheduler dispatches respond；

Load cutting unit, for respond transmission unit send response when by the calculating subtask of this calculate node again Subdivision is distributed to the node scheduling device of transmission " free time " notice.

The multimedia retrieval algorithm include image characteristic point extraction algorithm, KMeans clustering algorithms, BoW generating algorithms, SMK/ASMK/ASMK-Binary algorithms, HE and WGC algorithms and inverted index generating algorithm.

The resource relied on when the system operation includes distributed file system, serializing/unserializing agreement and long-range mistake Journey calls service.

Compared with prior art, the invention has the advantages that：

(1) present invention can shield bottom hardware difference with the computing resource in integrated isomerous system, take over calculating money The tasks such as management, the scheduling in source provide a unified, efficient, expansible platform for upper layer application developer；

(2) present invention can carry out writing for multimedia retrieval application using CHCF language and CHCF tool storage rooms, substantially reduce The difficulty for writing the Distributed Application under distributed heterogeneous computing environment shortens the development cycle of multimedia retrieval application；

(3) prison for the input data read/write function and performance estimation module that data dividing die root tuber of the present invention is provided according to user The cutting that result carries out input data is surveyed, is self-adapting data cutting, parallel task input data cutting can be greatly improved Flexibility improves the service efficiency of heterogeneous system to avoid unnecessary load imbalance；

(4) present invention carries out Knowledge based engineering dynamic dispatching by dynamic grading scheduling module to calculating task, according to being The characteristic of different computing resources in system, distributes different size of calculated load, and in the process of running to calculated load into action State adjusts, and avoids in traditional distributed system due to caused by load imbalance the problems such as inefficiency.

Description of the drawings

Fig. 1 is the CHCF circuit theory schematic diagrams established based on the present invention；

Fig. 2 is self-adapting data dicing process schematic diagram of the present invention；

Fig. 3 is that the present invention is based on the dynamic grading scheduling process schematics of knowledge；

Fig. 4 is dynamic load balancing procedure schematic diagram of the present invention.

Specific implementation mode

The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention Premised on implemented, give detailed embodiment and specific operating process, but protection scope of the present invention is not limited to Following embodiments.

Distributed heterogeneous concurrent computational system (Cloud- provided in this embodiment towards large scale multimedia retrieval Based Heterogeneous Computing Framework, CHCF) in a distributed manner heterogeneous computer cluster it is flat for target Platform, distributed heterogeneous computer cluster include multiple calculate nodes, and each calculate node includes one or more processors, such as CPU, GPU, Intel Phi MIC coprocessors etc., as shown in Figure 1 for based on the distributed heterogeneous concurrent computational system foundation CHCF frames, the bottom is distributed heterogeneous computer cluster and operating system (Distributed Heterogeneous Systems&OS) 1, it is (SuSE) Linux OS that the present embodiment, which operates in the operating system on host, and the second layer is then CHCF frames Operation relies on resource layer 2, and the resource relied on includes distributed file system (Distributed File System), sequence Change/unserializing agreement (Serialization Service) and remote procedure call service (RPC Service) etc., the Three layers be CHCF runtime layer (CHCP Runtime) 3, top layer is then that the multimedia retrieval operated on CHCF frames is answered With program layer (CHCP Application) 4, by this system design it is efficient, on CHCF frames CHCF language into Row is write.According to the design pattern of CHCF, a typical CHCF program is by a drive module (Driver) and several mistakes Strainer modules (Filter) are constituted.Realize things logic, the Row control etc. of concrete application in drive module, and filter mould Block then realizes main core algorithm.

The runtime layer 3 of CHCF includes：

Performance estimation module (Performance Estimator) 301, for monitoring in real time and updating different computing modules Calculated performance on different processor；

Data cutting module (Partitioner) 302, the input data read/write function (Input for being provided according to user Format Function) and performance estimation module calculating of the monitoring result (Performance Assessment) to input Task carries out cutting；

Dynamic grading scheduling module (Scheduler&Coordinator) 303, for being carried out to the calculating task after cutting Scheduling and load balance process；

CHCF algorithmic tools library 304, including in multimedia retrieval numerous common algorithms realization, the realization of each algorithm is equal It includes image characteristic point extraction algorithm, KMeans clustering algorithms, BoW generation calculations to have CPU and GPU versions, multimedia retrieval algorithm Method, SMK/ASMK/ASMK-Binary algorithms, HE and WGC algorithms, inverted index generating algorithm etc..

The data cutting that data cutting module 302 executes includes the preliminary cutting to each calculate node by distribution of computation tasks (Preliminary Partitioning) and subdivision (Fine for matching calculating task in each calculate node inside points Partitioning).Dynamic grading scheduling module 303 includes master scheduler (Master Scheduler) and node scheduling device (Node Scheduler), the former is by the calculating subtask scheduling to different calculate nodes after preliminary cutting, and the latter will be thin Calculated load after point is dispatched in corresponding calculate node on different processor.Node scheduling device be equipped with it is multiple, respectively with homophony Device communication connection is spent, and each node scheduling device is in communication with each other connection.

Each node scheduling device includes：

Each node scheduling device realizes load balance process by above-mentioned each unit.

In order to improve the service efficiency of distributed heterogeneous computing resource, system is adjusted by data cutting module and classification dynamic It spends module and data cutting is carried out to calculating task.When developing CHCF application programs, user needs to design and Implement corresponding defeated Enter reading and writing data function (Input Format Function).When user submit calculating task after, data dividing die root tuber according to The related data that the input data read/write function and performance estimation module (Performance Assessment) that family provides provide, In conjunction with available computing resource in current isomeric group, preliminary cutting (Preliminary is carried out to input data Partitioning), will be calculated on subtask scheduling to different calculate nodes by master scheduler (Master Scheduler), section Calculated load after further subdivision is dispatched on different processor and handles by point scheduler (Node Scheduler), As shown in Fig. 2-Fig. 3.

At the same time, different node scheduling devices cooperates, the equilibrium of dynamic implement calculated load in operational process, such as Shown in Fig. 4, after the completion of the calculating task of some calculate node, pass through (inform) its node scheduling device, node scheduling Device can into system other node broadcasts (broadcast) one " free time " notify, not yet complete calculate node receive it is logical After knowing, decided whether to respond the notice according to certain rule.If there is node responds, which can bear the calculating of itself Lotus makees further cutting, responds " free time " notice request (Divide workload and response) of idle node.It is empty Not busy node carries out the processing of data after receiving response and corresponding calculated load.

Claims

1. a kind of distributed heterogeneous concurrent computational system towards large scale multimedia retrieval, in a distributed manner heterogeneous computer cluster For target platform, distributed heterogeneous computer cluster includes multiple calculate nodes, and each calculate node includes one or more The processor of type, which is characterized in that including：

Performance estimation module, for monitoring and updating calculated performance of the different calculate nodes on different processor in real time；

Data cutting module, the monitoring result pair of input data read/write function and performance estimation module for being provided according to user The calculating task of input carries out cutting, and the data cutting that the data cutting module executes includes by distribution of computation tasks to each meter The preliminary cutting of operator node and the subdivision for matching calculating task in each calculate node inside points；

Algorithmic tool library, for realizing a variety of multimedia retrieval algorithms；

The dynamic grading scheduling module includes：

2. the distributed heterogeneous concurrent computational system according to claim 1 towards large scale multimedia retrieval, feature It is, the node scheduling device is equipped with multiple, is communicated to connect respectively with master scheduler, and each node scheduling device is in communication with each other company It connects.

3. the distributed heterogeneous concurrent computational system according to claim 1 towards large scale multimedia retrieval, feature It is, the node scheduling device includes：

Judging unit is responded, for judging that whether responding this when receiving " free time " notice that other node scheduling devices are sent leads to Know；

Transmission unit is responded, the node scheduling for being notified to transmission " free time " when responding the judging result of judging unit and being to be Device sends response；

Load cutting unit, for when responding transmission unit and sending response that the calculating subtask of this calculate node is thin again Point, it is distributed to the node scheduling device of transmission " free time " notice.

4. the distributed heterogeneous concurrent computational system according to claim 1 towards large scale multimedia retrieval, feature It is, the multimedia retrieval algorithm includes image characteristic point extraction algorithm, KMeans clustering algorithms, BoW generating algorithms, SMK/ ASMK/ASMK-Binary algorithms, HE and WGC algorithms and inverted index generating algorithm.

5. the distributed heterogeneous concurrent computational system according to claim 1 towards large scale multimedia retrieval, feature It is, the resource that when system operation relies on includes distributed file system, serializing/unserializing agreement and remote process tune With service.