CN107239570A - Data processing method and server cluster - Google Patents

Data processing method and server cluster Download PDF

Info

Publication number
CN107239570A
CN107239570A CN201710504831.6A CN201710504831A CN107239570A CN 107239570 A CN107239570 A CN 107239570A CN 201710504831 A CN201710504831 A CN 201710504831A CN 107239570 A CN107239570 A CN 107239570A
Authority
CN
China
Prior art keywords
data
tables
memory system
distributed memory
computational
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710504831.6A
Other languages
Chinese (zh)
Inventor
尹正军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201710504831.6A priority Critical patent/CN107239570A/en
Publication of CN107239570A publication Critical patent/CN107239570A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Present disclose provides a kind of data processing method, applied to the PC cluster process including at least two Computational frames, methods described includes, when the first Computational frame performs the operation for the first tables of data, judge to whether there is the corresponding data message of first tables of data in distributed memory system, if being not present, the operation for the first tables of data is performed, and the corresponding data message of first tables of data is synchronized in the distributed memory system.The disclosure additionally provides a kind of server cluster.

Description

Data processing method and server cluster
Technical field
This disclosure relates to a kind of data processing method and server cluster.
Background technology
In large-scale calculations cluster, the business scenario that different Computational frames load same tables of data is frequently encountered, There are a variety of Computational frames such as Impala, Hive, SparkSQL simultaneously in such as cluster, different calculating tasks may be directed to To the loading of a certain table data, each computing engines are respective load tables, and so processing actually repeats to be loaded with identical Data, cause a large amount of disk read-writes to operate, overall performance effect is poor.
The content of the invention
An aspect of this disclosure provides a kind of data processing method, applied to including the collection of at least two Computational frames Group's calculating process, methods described includes, and when the first Computational frame performs the operation for the first tables of data, judges in distributed It whether there is the corresponding data message of first tables of data in deposit system, if being not present, perform described for the first data The operation of table, and the corresponding data message of first tables of data is synchronized in the distributed memory system.
Alternatively, if methods described includes the presence of the corresponding data letter of first tables of data in distributed memory system Breath, then first Computational frame loads the corresponding data message of first tables of data from the distributed memory system.
Alternatively, the corresponding data message of first tables of data is synchronized in the distributed memory system, including Judge whether first tables of data belongs to the tables of data that may be shared by different Computational frames, it is described may be by different calculation blocks The shared tables of data of frame is what the statistics based on tables of data inquiry plan was determined, if belonging to, by first tables of data Corresponding data message is synchronized in the distributed memory system.
Alternatively, the execution includes query execution meter of the operation for the first tables of data for the operation of the first tables of data Draw.
Alternatively, it whether there is the corresponding data packets of first tables of data in the judgement distributed memory system Include, first Computational frame obtains the corresponding data of tables of data that other Computational frames are synchronized to the distributed memory system Information, wherein, the storage class of each Computational frame at least two Computational frame is by extension, and judgement point It whether there is the corresponding data message of first tables of data in cloth memory system.
Another aspect of the present disclosure provides a kind of server cluster, including at least one processor, and at least one Memory.Be stored with computer-readable program on memory, when described program is by least one described computing device so that At least one described processor is judged in distributed in the case where the first Computational frame performs the operation for the first tables of data It whether there is the corresponding data message of first tables of data in deposit system, and in the absence of described in distributed memory system In the case of first tables of data, the operation for being directed to the first tables of data is performed, and first tables of data is synchronized to described In distributed memory system.
Alternatively, at least one described processor is also performed, and there is first tables of data in distributed memory system In the case of corresponding data message, first Computational frame is set to load first number from the distributed memory system According to the corresponding data message of data in table.
Alternatively, the corresponding data message of first tables of data is synchronized to described by least one described computing device In distributed memory system, including, judge whether first tables of data belongs to the data that may be shared by different Computational frames Table, the tables of data that may be shared by different Computational frames is what the statistics based on tables of data inquiry plan was determined, with And if belong to the tables of data that may be shared by different Computational frames, the corresponding data message of first tables of data is synchronized to In the distributed memory system.
Alternatively, the execution includes query execution meter of the operation for the first tables of data for the operation of the first tables of data Draw.
Alternatively, at least one described processor judges to whether there is first tables of data pair in distributed memory system The data message answered includes, and first Computational frame obtains the number that other Computational frames are synchronized to the distributed memory system According to the corresponding data message of table, wherein, the storage class of each Computational frame at least two Computational frame is by expanding Exhibition, and judge to whether there is the corresponding data message of first tables of data in distributed memory system.
Another aspect of the disclosure provides a kind of data handling system, and the system includes judge module and synchronous mould Block.Judge module is used to, when the first Computational frame performs the operation for the first tables of data, judge in distributed memory system With the presence or absence of the corresponding data message of first tables of data.Synchronization module is used in case of absence, perform the pin Operation to the first tables of data, and the corresponding data message of first tables of data is synchronized to the distributed memory system In.
Alternatively, the system also includes load-on module, for there are first data in distributed memory system In the case of the corresponding data message of table, first Computational frame is set to load described first from the distributed memory system The corresponding data message of tables of data.
Alternatively, the synchronization module includes the first judging submodule and synchronous submodule.First judging submodule, is used for Judge whether first tables of data belongs to the tables of data that may be shared by different Computational frames, it is described may be by different meters It is what the statistics based on tables of data inquiry plan was determined to calculate the shared tables of data of framework.Synchronous submodule, for belonging to It is in the case of the tables of data that may be shared by different Computational frames, the corresponding data message of first tables of data is synchronous Into the distributed memory system.
Alternatively, the execution includes query execution meter of the operation for the first tables of data for the operation of the first tables of data Draw.
Alternatively, the judge module includes acquisition submodule and the second judging submodule.Acquisition submodule, for making State the first Computational frame and obtain the corresponding data message of tables of data that other Computational frames are synchronized to the distributed memory system, Wherein, the storage class of each Computational frame at least two Computational frame is by extension.Second judging submodule, For judging to whether there is the corresponding data message of first tables of data in distributed memory system.
Another aspect of the present disclosure provides a kind of non-volatile memory medium, and be stored with computer executable instructions, institute Stating instruction is used to realize method as described above when executed.
Brief description of the drawings
In order to be more fully understood from the disclosure and its advantage, referring now to the following description with reference to accompanying drawing, wherein:
Fig. 1 diagrammatically illustrates the schematic diagram of the server cluster according to the embodiment of the present disclosure;
Fig. 2 diagrammatically illustrates the flow chart of the data processing method according to the embodiment of the present disclosure;
Fig. 3 diagrammatically illustrates the flow chart of the data processing method according to the embodiment of the present disclosure;
Fig. 4 is diagrammatically illustrated to be divided according to the corresponding data message of the first tables of data is synchronized to by the embodiment of the present disclosure Flow chart in cloth memory system;
Fig. 5, which is diagrammatically illustrated, whether there is described first in the judgement distributed memory system according to the embodiment of the present disclosure The flow chart of the corresponding data message of tables of data;
Fig. 6 diagrammatically illustrates the schematic diagram of the data handling system according to the embodiment of the present disclosure;
Fig. 7 diagrammatically illustrates the schematic diagram of the data handling system according to the embodiment of the present disclosure;
Fig. 8 diagrammatically illustrates the schematic diagram of the data handling system according to the embodiment of the present disclosure;
Fig. 9 diagrammatically illustrates the schematic diagram of the judge module according to the embodiment of the present disclosure;And
Figure 10 diagrammatically illustrates the block diagram of a calculate node in the server cluster according to the embodiment of the present disclosure.
Embodiment
Hereinafter, it will be described with reference to the accompanying drawings embodiment of the disclosure.However, it should be understood that these descriptions are simply exemplary , and it is not intended to limit the scope of the present disclosure.In addition, in the following description, the description to known features and technology is eliminated, with Avoid unnecessarily obscuring the concept of the disclosure.
Term as used herein is not intended to limit the disclosure just for the sake of description specific embodiment.Used here as Word " one ", " one (kind) " and "the" etc. should also include " multiple ", the meaning of " a variety of ", unless context clearly refers in addition Go out.In addition, term " comprising " as used herein, "comprising" etc. indicate the presence of the feature, step, operation and/or part, But it is not excluded that in the presence of or add one or more other features, step, operation or part.
All terms (including technology and scientific terminology) as used herein have what those skilled in the art were generally understood Implication, unless otherwise defined.It should be noted that term used herein should be interpreted that with consistent with the context of this specification Implication, without that should be explained with idealization or excessively mechanical mode.
Shown in the drawings of some block diagrams and/or flow chart.It should be understood that some sides in block diagram and/or flow chart Frame or its combination can be realized by computer program instructions.These computer program instructions can be supplied to all-purpose computer, The processor of special-purpose computer or other programmable data processing units, so that these instructions can be with when by the computing device Create the device for realizing function/operation illustrated in these block diagrams and/or flow chart.
Therefore, the technology of the disclosure can be realized in the form of hardware and/or software (including firmware, microcode etc.).Separately Outside, the technology of the disclosure can take the form of the computer program product on the computer-readable medium for the instruction that is stored with, should Computer program product is available for instruction execution system use or combined command execution system to use.In the context of the disclosure In, computer-readable medium can include, store, transmit, propagate or transmit the arbitrary medium of instruction.For example, calculating Machine computer-readable recording medium can include but is not limited to electricity, magnetic, optical, electromagnetic, infrared or semiconductor system, device, device or propagation medium. The specific example of computer-readable medium includes:Magnetic memory apparatus, such as tape or hard disk (HDD);Light storage device, such as CD (CD-ROM);Memory, such as random access memory (RAM) or flash memory;And/or wire/wireless communication link.
The need for due to different pieces of information processing scene, in server cluster 100, multiple calculation blocks are may be simultaneously present Frame.For example, for the less interactive inquiry scene of returning result collection, it is preferred to use Impala Computational frames, and for processing The larger extraction of data throughput-conversion-loading procedure (ETL, Extract-Transform-Load), then be preferred to use Hive meters Calculate framework.The handled data of different Computational frames are all stored in distributed file system, therefore, different calculation blocks Frame, it is necessary to by same metadata location data table, and is loaded into internal memory when to data manipulation, then is distributed Formula concurrent operation.
Embodiment of the disclosure provides a kind of data processing method and can apply the server cluster of this method.The party The corresponding data message of first tables of data can be synchronized to by method when the first Computational frame performs the operation for the first tables of data , can be directly from distributed memory system in order to which other Computational frames are when to the data table handling in distributed memory system Loaded in system, reduce disk read-write operation, improve the efficiency inquired about across Computational frame with table.
Distributed memory system, the system for the internal memory formation being distributed across on many machines, the data stored thereon are regular Synchronized with disk file.By data syn-chronization to distributed memory system, refer to data being loaded into many from disk file In the internal memory of machine, burst storage.
Fig. 1 diagrammatically illustrates the schematic diagram of the server cluster according to the embodiment of the present disclosure.
As shown in figure 1, server cluster 100 can include at least one calculate node 110 and network 120.Network 120 Medium to provide communication link between calculate node 110.Network 120 can include various connection types, such as it is wired, Wireless communication link or fiber optic cables etc..Calculate node 110 can be to provide the server of various services, for example, deposit number According to table, the server for providing query function, changing function etc., but not limited to this.Server cluster 100 can be real by the disclosure The method for applying example accelerates the loading procedure of tables of data.
It should be understood that the framework in Fig. 1 is only example, the component included in specific framework can be adjusted as the case may be It is whole, according to needs are realized, can have any number of network and calculate node.
The data processing method of the embodiment of the present disclosure is illustrated referring to Fig. 2.
Fig. 2 diagrammatically illustrates the flow chart of the data processing method according to the embodiment of the present disclosure.
As shown in Fig. 2 this method includes operation S210 and operation S220.
In operation S210, when the first Computational frame performs the operation for the first tables of data, distributed memory system is judged It whether there is the corresponding data message of first tables of data in system.
In operation S220, if being not present, the operation for being directed to the first tables of data is performed, and by first tables of data Corresponding data message is synchronized in the distributed memory system.
This method, can be by first tables of data correspondence when the first Computational frame performs the operation for the first tables of data Data message be synchronized in the distributed memory system, can in order to which other Computational frames are when to the data table handling To be loaded directly from distributed memory system, disk read-write operation is reduced.
According to the embodiment of the present disclosure, in operation S210, the first Computational frame performs the operation of the first tables of data, Ke Yishi Query execution plan of the first Computational frame operation for the first tables of data.Query execution plan is transported before specific calculate OK, for being predicted and optimizing to query process.Therefore, judge in query execution plan of the operation for the first tables of data It whether there is the corresponding data message of first tables of data in distributed memory system, and in case of absence, by the The corresponding data message of one tables of data is synchronized in the distributed memory system so that the operation subsequently to the table data To be loaded from distributed memory.
According to the embodiment of the present disclosure, the operation that the first Computational frame performs the first tables of data can also be to the first tables of data Other operation, such as the information for increasing, being deleted or modified in tables of data.
In operation S220, according to the embodiment of the present disclosure, first tables of data pair is not present in distributed memory system In the case of the data message answered, not only need to perform the execution of the first Computational frame for the operation of the first tables of data, for example, transport The hand-manipulating of needle is to the query execution plan of the first tables of data, or increase, the information that is deleted or modified in tables of data etc., also by the first number It is synchronized to according to the corresponding data message of table in the distributed memory system, in order to which other Computational frames are grasped to the tables of data When making, it can be loaded directly from distributed memory system.
Fig. 3 diagrammatically illustrates the flow chart of the data processing method according to the embodiment of the present disclosure.
As shown in figure 3, this method includes operation S210, operation S220 and operation S230.
In operation S210, when the first Computational frame performs the operation for the first tables of data, distributed memory system is judged It whether there is the corresponding data message of first tables of data in system, refer to the operation described by Fig. 2, here is omitted.
In operation S220, if being not present, the operation for being directed to the first tables of data is performed, and by first tables of data Corresponding data message is synchronized in the distributed memory system.
S230 is being operated, it is described if there is the corresponding data message of first tables of data in distributed memory system First Computational frame loads the corresponding data message of first tables of data from the distributed memory system.First calculation block Frame is no longer loaded, but directly pass through distributed memory to having stored in the tables of data in distributed memory system from disk System loads, reduce disk read-write operation.
Fig. 4 diagrammatically illustrate according to the embodiment of the present disclosure in operation S220 by the corresponding data of first tables of data Synchronizing information is to the flow chart in the distributed memory system.
Wrapped as shown in figure 4, the corresponding data message of first tables of data is synchronized in the distributed memory system Include operation S221 and operation S222.
In operation S221, judge whether first tables of data belongs to the tables of data that may be shared by different Computational frames. In accordance with an embodiment of the present disclosure, the tables of data that may be shared by different Computational frames is the system based on tables of data inquiry plan Count determination.According to the embodiment of the present disclosure, before many Computational frame parallel computations, server cluster obtains tables of data inquiry Plan, to carry out overall prediction and optimization to query process.At this in the works, the tables of data that will be loaded can be obtained, with And its shared situation, and then determine may be by the common tables of data of different Computational frames.
In operation S222, if first tables of data belongs to the tables of data that may be shared by different Computational frames, by institute The corresponding data message of the first tables of data is stated to be synchronized in the distributed memory system.
This method passes through that recognize may shared tables of data, it is to avoid only operated tables of data to be once loaded into distribution In formula internal memory, system resource has been saved.
Fig. 5 is diagrammatically illustrated judges whether deposited in distributed memory system according to the embodiment of the present disclosure in operation S210 In the flow chart of the corresponding data message of first tables of data.
As shown in figure 5, judging to whether there is the corresponding data packets of first tables of data in distributed memory system Include operation S211 and operation S212.
In operation S211, first Computational frame obtains other Computational frames and is synchronized to the distributed memory system The corresponding data message of tables of data, wherein, the storage class of each Computational frame at least two Computational frame be by Extension.
In existing Computational frame, on the premise of Frame Source is not changed, each Computational frame needs each to add Carry tables of data.According to the embodiment of the present disclosure, the storage class of each Computational frame can be extended, such as memoryStore Class and tachyonStore classes, while changing the execution logic of operation operator, make it support to look into the redirection of certain metadata Ask, for example, to having stored in the tables of data in distributed memory system, no longer being loaded from disk, but directly pass through distribution Formula memory system is loaded, so as to realize that a Computational frame can obtain other Computational frames and be synchronized to the distributed memory system The corresponding data message of tables of data of system.
In operation S212, judge to whether there is the corresponding data message of first tables of data in distributed memory system. Obtained realizing after other Computational frames are synchronized to the corresponding data message of tables of data of the distributed memory system, first Computational frame may determine that whether the first required tables of data is already present in distributed memory system.
This method overcomes Computational frame and is difficult to obtain other Computational frames synchronous by extending the storage class of Computational frame The problem of tables of data crossed, reach that different Computational frames can share the technique effect of the tables of data after loading.
Fig. 6 diagrammatically illustrates the schematic diagram of the data handling system 600 according to the embodiment of the present disclosure.
As shown in fig. 6, data handling system 600 includes judge module 610 and synchronization module 620.
Judge module 610, for example, perform above with reference to the operation S210 described in Fig. 2, for being performed when the first Computational frame For the first tables of data operation when, judge in distributed memory system with the presence or absence of first tables of data corresponding data letter Breath.
Synchronization module 620, for example, perform above with reference to the operation S220 described in Fig. 2, in case of absence, holding The row operation for being directed to the first tables of data, and the corresponding data message of first tables of data is synchronized in the distribution In deposit system.
Fig. 7 diagrammatically illustrates the schematic diagram of the data handling system 700 according to the embodiment of the present disclosure.
As shown in fig. 7, data handling system 700 includes judge module 610, synchronization module 620 and load-on module 730.
Judge module 610, for example, perform above with reference to the operation S210 described in Fig. 3, for being performed when the first Computational frame For the first tables of data operation when, judge in distributed memory system with the presence or absence of first tables of data corresponding data letter Breath.
Synchronization module 620, for example, perform above with reference to the operation S220 described in Fig. 3, in case of absence, holding The row operation for being directed to the first tables of data, and the corresponding data message of first tables of data is synchronized in the distribution In deposit system.
Load-on module 730, for example, perform above with reference to the operation S230 described in Fig. 3, in distributed memory system In the case of there is the corresponding data message of first tables of data, make first Computational frame from the distributed memory system The corresponding data message of first tables of data is loaded in system.
Fig. 8 diagrammatically illustrates the schematic diagram of the synchronization module 620 according to the embodiment of the present disclosure.
As shown in figure 8, synchronization module 620 includes the first judging submodule 621 and synchronous submodule 622.
First judging submodule 621, for example, perform above with reference to the operation S221 described in Fig. 4, for judging described first Whether tables of data belong to the tables of data that may be shared by different Computational frames, described to be shared by different Computational frames Tables of data is what the statistics based on tables of data inquiry plan was determined.
Synchronous submodule 622, for example, perform above with reference to the operation S222 described in Fig. 4, for belonging to the possible quilt In the case of the shared tables of data of different Computational frames, the corresponding data message of first tables of data is synchronized to the distribution In formula memory system.
Fig. 9 diagrammatically illustrates the schematic diagram of the judge module 610 according to the embodiment of the present disclosure.
As shown in figure 9, judge module 610 includes the judging submodule 612 of acquisition submodule 611 and second.
Acquisition submodule 611, for example, perform above with reference to the operation S211 described in Fig. 5, for making first calculation block Frame obtains the corresponding data message of tables of data that other Computational frames are synchronized to the distributed memory system.
Second judging submodule 612, for example, perform above with reference to the operation S212 described in Fig. 5, for judging in distribution It whether there is the corresponding data message of first tables of data in deposit system.
It is understood that judge module 610, acquisition submodule 611, the second judging submodule 612, synchronization module 620, First judging submodule 621, synchronous submodule 622 and load-on module 730 may be incorporated in a module and realize, or Any one module therein can be split into multiple modules.Or, one or more of these modules module is at least Partial function can be combined with least part function phase of other modules, and be realized in a module.According to the reality of the present invention Apply example, judge module 610, acquisition submodule 611, the second judging submodule 612, synchronization module 620, the first judging submodule 621st, at least one in synchronous submodule 622 and load-on module 730 can at least be implemented partly as hardware circuit, For example field programmable gate array (FPGA), programmable logic array (PLA), on-chip system, the system on substrate, in encapsulation System, application specific integrated circuit (ASIC), or can be hard to carry out integrated or encapsulation any other rational method etc. to circuit Part or firmware realize, or is realized with software, the appropriately combined of three kinds of implementations of hardware and firmware.Or, judge mould Block 610, acquisition submodule 611, the second judging submodule 612, synchronization module 620, the first judging submodule 621, synchronous submodule At least one in block 622 and load-on module 730 can at least be implemented partly as computer program module, when the journey When sequence is run by computer, the function of corresponding module can be performed.
Figure 10 diagrammatically illustrates the block diagram of a calculate node in the server cluster according to the embodiment of the present disclosure.
As shown in Figure 10, server cluster includes at least one calculate node 1000.According to the embodiment of the present disclosure, section is calculated Point 1000 includes a processor 1010, and a memory 1020.In other embodiments of the disclosure, calculate node 1000 can include any number of processor 1010 or memory 1020.The calculate node 1000 can for example be realized joins above The calculate node 110 of Fig. 1 descriptions is examined, and constitutes server cluster 100.Server cluster 100 can perform above with reference to Fig. 2~ The method of Fig. 5 descriptions, to realize when the first Computational frame performs the operation for the first tables of data, by the corresponding number of tables of data It is believed that breath be synchronized in distributed memory system, in order to which other Computational frames are when to the data table handling, can directly from Loaded in distributed memory system, reduce disk read-write operation.
Specifically, processor 1010 can for example include general purpose microprocessor, instruction set processor and/or related chip group And/or special microprocessor (for example, application specific integrated circuit (ASIC)), etc..Processor 1010 can also include being used to cache The onboard storage device of purposes.Processor 1010 can be performed for reference to Fig. 2~Fig. 5 describe according to the embodiment of the present disclosure Single treatment unit either multiple processing units of the different actions of method flow.
Memory 1020, for example can be can be comprising storage, transmission, the arbitrary medium for propagating or transmitting instruction.For example, Readable storage medium storing program for executing can include but is not limited to electricity, magnetic, optical, electromagnetic, infrared or semiconductor system, device, device or propagate Jie Matter.The specific example of readable storage medium storing program for executing includes:Magnetic memory apparatus, such as tape or hard disk (HDD);Light storage device, such as CD (CD-ROM);Semiconductor memory, such as random access memory (RAM) or flash memory;And/or wire/wireless communication link.
Memory 1020, can include computer program 1021, and the computer program 1021 can include code/computer Executable instruction, it as processor 1010 when being performed so that processor 1010 is performed described by for example above in conjunction with Fig. 2~Fig. 5 Method flow and its any deformation.
Computer program 1021 can be configured with such as computer program code including computer program module.Example Such as, in the exemplary embodiment, the code in computer program 1021 can include one or more program modules, for example including 1021A, module 1021B ....It should be noted that the dividing mode and number of module are not fixed, those skilled in the art It can be combined according to actual conditions using suitable program module or program module, when the combination of these program modules is by processor 1010 when being performed so that processor 1010 can be performed for example above in conjunction with the method flow described by Fig. 2~Fig. 5 and its any Deformation.
Embodiments in accordance with the present invention, judge module 610, acquisition submodule 611, the second judging submodule 612, synchronous mould At least one in block 620, the first judging submodule 621, synchronous submodule 622 and load-on module 730 can be implemented as ginseng The computer program module of Figure 10 descriptions is examined, it by processor 1010 when being performed, it is possible to achieve corresponding operating described above.
It will be understood by those skilled in the art that the feature described in each embodiment and/or claim of the disclosure can To carry out multiple combinations or/or combination, even if such combination or combination are not expressly recited in the disclosure.Especially, exist In the case of not departing from disclosure spirit or teaching, the feature described in each embodiment and/or claim of the disclosure can To carry out multiple combinations and/or combination.All these combinations and/or combination each fall within the scope of the present disclosure.
Although the disclosure, art technology has shown and described in the certain exemplary embodiments with reference to the disclosure Personnel it should be understood that without departing substantially from appended claims and its equivalent restriction spirit and scope of the present disclosure in the case of, A variety of changes in form and details can be carried out to the disclosure.Therefore, the scope of the present disclosure should not necessarily be limited by above-described embodiment, But not only should be determined by appended claims, also it is defined by the equivalent of appended claims.

Claims (10)

1. a kind of data processing method, applied to the PC cluster process including at least two Computational frames, methods described includes:
When the first Computational frame performs the operation for the first tables of data, judge in distributed memory system with the presence or absence of described The corresponding data message of first tables of data;And
If being not present, the operation for being directed to the first tables of data is performed, and by the corresponding data message of first tables of data It is synchronized in the distributed memory system.
2. according to the method described in claim 1, in addition to:
If there is the corresponding data message of first tables of data in distributed memory system, first Computational frame is from institute State and the corresponding data message of first tables of data is loaded in distributed memory system.
3. according to the method described in claim 1, wherein, it is described that the corresponding data message of first tables of data is synchronized to institute State in distributed memory system, including:
Judge whether first tables of data belongs to the tables of data that may be shared by different Computational frames, it is described may be by different meters It is what the statistics based on tables of data inquiry plan was determined to calculate the shared tables of data of framework;And
If belonging to, the corresponding data message of first tables of data is synchronized in the distributed memory system.
4. according to the method described in claim 1, wherein, described perform includes for the operation of the first tables of data:
Query execution plan of the operation for the first tables of data.
5. according to the method described in claim 1, wherein, it is described judgement distributed memory system in the presence or absence of described first number According to the corresponding data message of table, including:
First Computational frame obtains the corresponding number of tables of data that other Computational frames are synchronized to the distributed memory system It is believed that breath, wherein, the storage class of each Computational frame at least two Computational frame is by extension;And
Judge to whether there is the corresponding data message of first tables of data in distributed memory system.
6. a kind of server cluster, including:
At least one processor;And
At least one memory, is stored thereon with computer-readable program, when described program is held by least one described processor During row so that at least one described processor:
In the case where the first Computational frame performs the operation for the first tables of data, judge whether deposited in distributed memory system In the corresponding data message of first tables of data;And
In the case of first tables of data is not present in distributed memory system, the behaviour for being directed to the first tables of data is performed Make, and first tables of data is synchronized in the distributed memory system.
7. server cluster according to claim 6, at least one described processor is also performed:
In the case of there is the corresponding data message of first tables of data in distributed memory system, described first is set to calculate Framework loads the corresponding data message of data in first tables of data from the distributed memory system.
8. server cluster according to claim 6, at least one described computing device is by first tables of data pair The data message answered is synchronized in the distributed memory system, including:
Judge whether first tables of data belongs to the tables of data that may be shared by different Computational frames, it is described may be by different meters It is what the statistics based on tables of data inquiry plan was determined to calculate the shared tables of data of framework;And
If belonging to, the corresponding data message of first tables of data is synchronized in the distributed memory system.
9. server cluster according to claim 6, wherein, at least one described computing device is directed to the first tables of data Operation include:
Query execution plan of the operation for the first tables of data.
10. server cluster according to claim 6, wherein, at least one described processor judges distributed memory system It whether there is the corresponding data message of first tables of data in system, including:
First Computational frame obtains the corresponding number of tables of data that other Computational frames are synchronized to the distributed memory system It is believed that breath, wherein, the storage class of each Computational frame at least two Computational frame is by extension;And
Judge to whether there is the corresponding data message of first tables of data in distributed memory system.
CN201710504831.6A 2017-06-27 2017-06-27 Data processing method and server cluster Pending CN107239570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710504831.6A CN107239570A (en) 2017-06-27 2017-06-27 Data processing method and server cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710504831.6A CN107239570A (en) 2017-06-27 2017-06-27 Data processing method and server cluster

Publications (1)

Publication Number Publication Date
CN107239570A true CN107239570A (en) 2017-10-10

Family

ID=59987310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710504831.6A Pending CN107239570A (en) 2017-06-27 2017-06-27 Data processing method and server cluster

Country Status (1)

Country Link
CN (1) CN107239570A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109976905A (en) * 2019-03-01 2019-07-05 联想(北京)有限公司 EMS memory management process, device and electronic equipment
CN110968599A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Inquiry method and device based on Impala

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6519592B1 (en) * 1999-03-31 2003-02-11 Verizon Laboratories Inc. Method for using data from a data query cache
CN105488155A (en) * 2015-11-30 2016-04-13 浪潮集团有限公司 Method for quickly querying mass data
CN105516284A (en) * 2015-12-01 2016-04-20 深圳市华讯方舟软件技术有限公司 Clustered database distributed storage method and device
CN105574010A (en) * 2014-10-13 2016-05-11 阿里巴巴集团控股有限公司 Data querying method and device
CN105700902A (en) * 2014-11-27 2016-06-22 航天信息股份有限公司 Data loading and refreshing method and apparatus
CN106021484A (en) * 2016-05-18 2016-10-12 中国电子科技集团公司第三十二研究所 Customizable multi-mode big data processing system based on memory calculation
CN106651748A (en) * 2015-10-30 2017-05-10 华为技术有限公司 Image processing method and apparatus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6519592B1 (en) * 1999-03-31 2003-02-11 Verizon Laboratories Inc. Method for using data from a data query cache
CN105574010A (en) * 2014-10-13 2016-05-11 阿里巴巴集团控股有限公司 Data querying method and device
CN105700902A (en) * 2014-11-27 2016-06-22 航天信息股份有限公司 Data loading and refreshing method and apparatus
CN106651748A (en) * 2015-10-30 2017-05-10 华为技术有限公司 Image processing method and apparatus
CN105488155A (en) * 2015-11-30 2016-04-13 浪潮集团有限公司 Method for quickly querying mass data
CN105516284A (en) * 2015-12-01 2016-04-20 深圳市华讯方舟软件技术有限公司 Clustered database distributed storage method and device
CN106021484A (en) * 2016-05-18 2016-10-12 中国电子科技集团公司第三十二研究所 Customizable multi-mode big data processing system based on memory calculation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卜尧;吴斌;陈玉峰;白德盟;: "BDAP——一个基于Spark的数据挖掘工具平台", 《中国科学技术大学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968599A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Inquiry method and device based on Impala
CN110968599B (en) * 2018-09-30 2023-04-07 北京国双科技有限公司 Inquiry method and device based on Impala
CN109976905A (en) * 2019-03-01 2019-07-05 联想(北京)有限公司 EMS memory management process, device and electronic equipment

Similar Documents

Publication Publication Date Title
US8984516B2 (en) System and method for shared execution of mixed data flows
US11604654B2 (en) Effective and scalable building and probing of hash tables using multiple GPUs
CN103997544B (en) A kind of method and apparatus of resource downloading
US9400767B2 (en) Subgraph-based distributed graph processing
CN104699723B (en) Data synchronous system and method between data exchange adapter, heterogeneous system
CN107688853A (en) A kind of device and method for being used to perform neural network computing
US9378533B2 (en) Central processing unit, GPU simulation method thereof, and computing system including the same
WO2019084788A1 (en) Computation apparatus, circuit and relevant method for neural network
Hu et al. Trix: Triangle counting at extreme scale
CN105205154A (en) Data migration method and device
CN108182281A (en) Data processing control method, device, server and medium based on streaming computing
CN110019310A (en) Data processing method and system, computer system, computer readable storage medium
CN107239570A (en) Data processing method and server cluster
US20150172369A1 (en) Method and system for iterative pipeline
CN114064562A (en) ESL modeling method, device, equipment and medium for network on chip
CN112948025A (en) Data loading method and device, storage medium, computing equipment and computing system
CN104408178B (en) WEB controls loading device and method
CN111352896A (en) Artificial intelligence accelerator, equipment, chip and data processing method
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN108364327A (en) A kind of method and device of diagram data processing
US20210334264A1 (en) System, method, and program for increasing efficiency of database queries
CN103176843B (en) The file migration method and apparatus of MapReduce distributed system
US20150314196A1 (en) Deployment of an electronic game using device profiles
CN108491546A (en) A kind of page switching method and electronic equipment
CN113807539B (en) Machine learning and graphic computing power high multiplexing method, system, medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171010

RJ01 Rejection of invention patent application after publication