CN110222018A - Data summarization executes method and device - Google Patents
Data summarization executes method and device Download PDFInfo
- Publication number
- CN110222018A CN110222018A CN201910397774.5A CN201910397774A CN110222018A CN 110222018 A CN110222018 A CN 110222018A CN 201910397774 A CN201910397774 A CN 201910397774A CN 110222018 A CN110222018 A CN 110222018A
- Authority
- CN
- China
- Prior art keywords
- file
- aggregation process
- process module
- target aggregation
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of data summarization provided in an embodiment of the present invention executes method and device, and the method is applied to support the terminal of at least two aggregation process modules, which comprises obtains the file attribute of the input file of target aggregation process module;File attribute is matched with the property parameters for corresponding to target aggregation process module, obtains matching result;If matching result is yes, then input file is handled according to the task parameters that summarize for corresponding to target aggregation process module, obtain processing result, accomplish to summarize task for different, it is not necessary that MapReduce application program is separately provided to each task, it is handled by build-in function module specific aim, reduces development difficulty and exploitation amount, facilitate execution.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of data summarization to execute method and device.
Background technique
Commonly used with big data processing technique, especially (Hadoop is one by Apache base to open source hadoop
The distributed system infrastructure of golden club exploitation) system it is increasingly mature, hadoop has become one in Construction of Data Warehouse
A critically important infrastructure.Hadoop system is divided into data storage HDFS (distributed file system) and data operation
MapReduce, MapReduce are a kind of programming models, the concurrent operation for large-scale dataset (being greater than 1TB).
In the construction of several storehouses, basic data generally can all use Hive tableau format, Hive tableau format and common pass
It is that type database is similar, only its bottom is existed with the format of HDFS file HFile.
In usual processing scheme, for the MapReduce program that different calculating tasks is write, each program setting is different
Hive bottom input file, write corresponding map and reduce and execute logic, generate corresponding result table.
For this purpose, needing to write different MapReduce programs, even if different if to execute multiple calculating tasks
What MapReduce program was read in is identical list file, it is also desirable to repeat to read.All programs, either sequence execute still
It is parallel to execute, it requires to occupy a large amount of system and time resource.If newly one calculating task of creation, needs to rewrite one
A MapReduce program is submitted, and processing complexity is increased.
Summary of the invention
In view of the problems of the existing technology, the embodiment of the present invention provides a kind of data summarization execution method and device.
The embodiment of the present invention provides a kind of data summarization execution method, and the method is applied to support that at least two summarize place
Manage the terminal of module, which comprises
Obtain the file attribute of the input file of target aggregation process module;
The file attribute is matched with the property parameters for corresponding to target aggregation process module, obtains matching knot
Fruit;
If matching result be it is yes, according to correspond to target aggregation process module to summarize task parameters literary to the input
Part is handled, and processing result is obtained.
The embodiment of the present invention provides a kind of data summarization executive device, and described device is applied to support that at least two summarize place
The terminal of module is managed, described device includes:
Acquiring unit, the file attribute of the input file for obtaining target aggregation process module;
Matching unit, for by the file attribute and corresponding to the property parameters progress of target aggregation process module
Match, obtains matching result;
Processing unit, for being joined according to the task that summarizes for corresponding to target aggregation process module when matching result, which is, is
It is several that the input file is handled, obtain processing result.
The embodiment of the present invention provides a kind of electronic equipment, including memory, processor and storage are on a memory and can be
The computer program run on processor, the processor are realized when executing described program as above-mentioned data summarization executes method
Step.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium, is stored thereon with computer program, should
The step of executing method such as above-mentioned data summarization is realized when computer program is executed by processor.
A kind of data summarization provided in an embodiment of the present invention executes method and device, by supporting at least two aggregation process
Module summarizes the execution of task to difference respectively, the file attribute of the input file of target aggregation process module is obtained, by file
Attribute is matched with property parameters, after successful match, is summarized task parameters according to corresponding and is handled input file, obtain
Processing result is obtained, accomplishes to summarize task for different, it is not necessary that MapReduce application program is separately provided to each task, according to
By the processing of built-in functional module specific aim, development difficulty and exploitation amount are reduced, execution is facilitated.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is that data summarization of the present invention executes embodiment of the method flow chart;
Fig. 2 is that data summarization of the present invention executes embodiment of the method flow chart;
Fig. 3 is data summarization executive device example structure figure of the present invention;
Fig. 4 is data summarization executive device example structure figure of the present invention;
Fig. 5 is electronic equipment example structure schematic diagram of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
A kind of data summarization that Fig. 1 shows one embodiment of the invention offer executes method, and the method is applied to support
The terminal of at least two aggregation process modules, which comprises
S11, obtain target aggregation process module input file file attribute;
S12, the file attribute is matched with the property parameters for corresponding to target aggregation process module, is matched
As a result;
If S13, matching result be it is yes, summarize task parameters to described defeated according to target aggregation process module is corresponded to
Enter file to be handled, obtains processing result.
For step S11- step S13, it should be noted that in embodiments of the present invention, the method is applied to support
The terminal of at least two aggregation process modules, the independent MapReduce application program of the terminal built-in to data summarization at
Reason.In data summarization processing processing, making each task that summarizes at present is an independent MapReduce application program control
System, needs to follow MapReduce frame.MapReduce is a kind of programming model, for large-scale dataset (being greater than 1TB)
Concurrent operation comprising " Map (mapping) " and " two processing stages of Reduce (reduction).It is to specify a Map when realizing
(mapping) function specifies Reduce (reduction) function completion processing for one group of key-value pair is mapped to one group of new key-value pair.
And in embodiments of the present invention, all tasks that summarizes are no longer required for an individual MapReduce application program controlling, but
The different tasks that summarizes is corresponded into the different standalone modules summarized in MapReduce application program, i.e. aggregation process module.Only
It need to be placed in the configuration file for meeting general-purpose interface form in each module and execute file, reduce development difficulty and open
Hair amount.In aggregation process, all tasks that summarize are a MapReduce program, and operation, which can once meet, realizes all remittances
The result that total processing module needs.The Development of Module customized can be achieved, without considering that it is excellent between calculating task that difference summarizes
First grade, it is flexible in application.
In embodiments of the present invention, it is related to multiple aggregation process modules, needs to configure it.Therefore it needs to obtain user
The configuring request of input, the configuring request need to include aggregation process number of modules and aggregation process module id.Aggregation process mould
Block number is used to determine to be arranged how many a modules, and aggregation process module id is for distinguishing different disposal module.
In embodiments of the present invention, aggregation process module after setting completed, needs to carry out parameter to aggregation process module to match
Set, which kind of, which to be handled, to define each aggregation process module summarizes task, needed when handling task which kind of resource distribution and its
He etc..
For this purpose, obtaining the configuring request of user's input, which further includes configuration file and execution file.At this
In, configuration file includes: (hive is a data bins based on Hadoop to the Hive basic data table name for needing setting to read
The data file of structuring can be mapped as a database table by library tool), output file catalogue, read table index,
Reduce task number, the resource information of each Map/Reduce task (various parameters such as CPU, memory, JVM qualifications).
Executing file includes: the specific Map phased mission and Reduce phased mission that processing module need to execute.
After obtaining configuration file and executing file, each aggregation process module can be set according to the configuration file
Property parameters, and summarize task parameters according to what the execution file was arranged each aggregation process module.Therefore, at this
In, property parameters and summarizes task parameters and can respectively correspond and introduce content mentioned by configuration file comprising above-mentioned.
In embodiments of the present invention, after configuration, by the corresponding mark of each aggregation process module, property parameters
And summarize task parameters and be integrated into configuration information, it is stored in config directory.
In embodiments of the present invention, in summary file treatment process, Map phased mission and Reduce rank are needed to be implemented
Section task.The corresponding task parameters that summarize of each aggregation process module include MapRun function and ReduceRun function.
Before execution, the Reduce task number in the property parameters of all aggregation process modules is summed.To own
The value of occupying of the resource information of Map/Reduce task in the property parameters of aggregation process module is maximized.To make to summarize
The service requirement of all modules is known and met to MapReduce application program, is conducive to the improved efficiency of system and resource-effective.
Specific explanations explanation are as follows: read the property parameters of each aggregation process module, which includes that Reduce appoints
The resource information for the number and Map/Reduce task of being engaged in.Shape parameter, such as Reduce task number are added for resource, execute phase
Add operation, then this summarizes the Reduce task number of MapReduce application program for the setting of all aggregation process modules
The summation of Reduce task number.For the various parameters qualifications such as resource constraint shape parameter, such as CPU, memory, JVM, hold
Row Max operation, then the resources occupation value of the Map/Reduce task for summarizing MapReduce application program summarizes place to be all
Manage the maximum value of module setting.
The input file for reading in each aggregation process module in sequence, obtains the file attribute of input file.Then will
The file attribute is matched with the property parameters for corresponding to target aggregation process module, obtains matching result.Judge institute
State whether the preset attribute information in file attribute whether there is in property parameters.If it exists, then illustrate successful match;Instead
It, then match unsuccessful.
In embodiments of the present invention, successful match then illustrates that the target aggregation process module can carry out the file of input
The task execution in Map stage and Reduce stage.
In embodiments of the present invention, each aggregation process module it is corresponding summarize task parameters include MapRun function and
ReduceRun function.Therefore, when matching result, which is, is, summarize task parameters according to corresponding to target aggregation process module
The input file is handled, processing result is obtained.
A kind of data summarization provided in an embodiment of the present invention executes method, by supporting at least two aggregation process modules point
It is other that the execution of task are summarized to difference, obtain the file attribute of the input file of target aggregation process module, by file attribute with
Property parameters are matched, and after successful match, are summarized task parameters according to corresponding and are handled input file, handled
As a result, accomplishing to summarize task for different, it is not necessary that MapReduce application program is separately provided to each task, by built-in
The processing of functional module specific aim, reduces development difficulty and exploitation amount, facilitates execution.
Fig. 2 shows a kind of data summarizations that one embodiment of the invention provides to execute method, and the method is applied to support
The terminal of at least two aggregation process modules, which comprises
S21, obtain target aggregation process module input file file attribute, the file attribute include Hive basis
Data table name;
S22, judge Hive basic data table name in the file attribute and correspond to target aggregation process module
Whether property parameters match, and obtain matching result, the property parameters include Hive basic data table name;
S23, when matching result be it is yes, then execute correspond to target aggregation process module MapRun function to the input
File carries out mapping processing, obtains intermediate file, the attribute information of the intermediate file includes the mould of target aggregation process module
Block identification;
S24, the intermediate file is called, determines that target is converged according to the module id in the attribute information of the intermediate file
Total processing module is executed and is carried out at reduction corresponding to the ReduceRun function of target aggregation process module to the intermediate file
Reason obtains processing result.
For step S21- step S24, it should be noted that in embodiments of the present invention, the file of each input file
Attribute includes Hive basic data table name and index name, and the Hive basic data table name and index name, which produce, to be corresponded to
File directory.Such as: for summarizing calculating task A, if the entitled table_base of base data table read, master index
The numerical value of index1 is value1, then the file path read in is i.e. are as follows:/warehouse/hive/db/table_base/
Index1=value1/***.
In embodiments of the present invention, after obtaining the file attribute of input file of target aggregation process module, by institute
State target aggregation process module input file be put into it is preset read in file set, accomplish not repeat read in input file.
For example, summarizing calculating task B, and table_base being read, the numerical value of master index index1 is value1, then
Just do not have to continuing to import.
But it is carried out in matching process in subsequent file attribute and property parameters, the basis Hive is only obtained from file path
Data table name judges in property parameters with the presence or absence of corresponding data table name.
If matching result be it is yes, according to correspond to target aggregation process module to summarize task parameters literary to the input
Part is handled, and processing result is obtained.Treatment process includes Map stage and ReduceRun stage.It is specific as follows:
The Map stage:
The configuration file of all aggregation process modules is loaded, while generating according to aggregation process module (Module) title should
The execution example of aggregation process module, since all aggregation process modules are realized from same general-purpose interface, so software
Realize that easily efficiency is very high.After application example generates, the mapRun function of the aggregation process module can be executed.Then
Following operation is executed to every record of input file:
All aggregation process modules are traversed, judge whether the file path of corresponding input file needs by the aggregation process
Resume module, such as the path of this document is /warehouse/hive/db/table_base/index1=value1/***,
And the table that Module A to be processed is read is free of table_base table, then not executing the mapRun function of Module A then;
Conversely, then executing the mapRun function of the Module.
After the mapRun function for executing Module A, need with<Key, Value>form intermediate file is written, at this moment
The prefix for waiting setting Key is Module A Name, complete Key are as follows: ModuleName+ business major key ServiceKey;In this way
Guarantee that different intermediate files can be matched with Module;The intermediate file name prefix of identical Moude is all identical.
The Reduce stage:
The configuration file of all Module is loaded, while generating the execution example of the Module according to Module title, by
It is all to realize that institute is implemented in software easily, and efficiency is very high from same general-purpose interface in all Module.Application example generates
Afterwards, the Module and reduceRun function can execute.Then following operation is executed to every record of input file:
Judge which Module is the prefix of this record Key belong to, after judging successfully, business master is extracted from existing Key
Key ServiceKey, and the reduceRun function of corresponding Module is executed, obtain processing result.
In addition, in embodiments of the present invention, since the file attribute includes output file catalogue, obtaining the processing
As a result after, output file catalogue is read, output file catalogue is written into processing result.
It is completely illustrated with specific example below:
Acquisition summarizes calculating task A, the entitled table_base of the base data table of reading, the numerical value of master index index1
File path for value1, reading is are as follows:/warehouse/hive/db/table_base/index1=value1/***.
Judge to summarize the path of the file of calculating task A as/warehouse/hive/db/table_base/index1=
Value1/***, the table that Module A to be processed is read contain table_base table, execute the mapRun letter of the Module
Number.
After the mapRun function for executing Module A, summarize calculating task A and need with<Key, Value>form write-in
Intermediate file, the prefix that Key is at this time arranged is Module A Name, complete Key are as follows: Module A+ business major key
ServiceKey。
When the prefix of judgement record Key belongs to Module A, after judging successfully, the extraction business major key from existing Key
ServiceKey, and the reduceRun function of corresponding Module A is executed, obtain processing result.
Output file catalogue is read from the configuration file of Module A, and the output file catalogue is written into processing result
In.
A kind of data summarization provided in an embodiment of the present invention executes method, by supporting at least two aggregation process modules point
It is other that the execution of task are summarized to difference, obtain the file attribute of the input file of target aggregation process module, by file attribute with
Property parameters are matched, and after successful match, are summarized task parameters according to corresponding and are handled input file, handled
As a result, accomplishing to summarize task for different, it is not necessary that MapReduce application program is separately provided to each task, by built-in
The processing of functional module specific aim, reduces development difficulty and exploitation amount, facilitates execution.
Fig. 3 shows a kind of data summarization executive device of one embodiment of the invention offer, and described device is applied to support
The terminal of at least two aggregation process modules, described device include acquiring unit 31, matching unit 32 and processing unit 33,
In:
Acquiring unit 31, the file attribute of the input file for obtaining target aggregation process module;
Matching unit 32, for by the file attribute and corresponding to the property parameters progress of target aggregation process module
Match, obtains matching result;
Processing unit 33, for summarizing task according to corresponding to target aggregation process module when matching result, which is, is
Parameter handles the input file, obtains processing result.
Since described device of the embodiment of the present invention is identical as the principle of above-described embodiment the method, for more detailed
Explain that details are not described herein for content.
It should be noted that can be by hardware processor (hardware processor) come real in the embodiment of the present invention
Existing correlation function.
A kind of data summarization executive device provided in an embodiment of the present invention, by supporting at least two aggregation process modules point
It is other that the execution of task are summarized to difference, obtain the file attribute of the input file of target aggregation process module, by file attribute with
Property parameters are matched, and after successful match, are summarized task parameters according to corresponding and are handled input file, handled
As a result, accomplishing to summarize task for different, it is not necessary that MapReduce application program is separately provided to each task, by built-in
The processing of functional module specific aim, reduces development difficulty and exploitation amount, facilitates execution.
Fig. 4 shows a kind of data summarization executive device of one embodiment of the invention offer, and described device is applied to support
The terminal of at least two aggregation process modules, described device include acquiring unit 31, matching unit 32, processing unit 33 and storage
Unit 41, in which:
Acquiring unit 31, the file attribute of the input file for obtaining target aggregation process module;
Matching unit 32, for judging in the Hive basic data table name in the file attribute and the property parameters
Hive basic data table name whether match, obtain matching result;
Processing unit 33, for when matching result be it is yes, then execute corresponding to target aggregation process module MapRun letter
It is several that mapping processing is carried out to the input file, intermediate file is obtained, the attribute information of the intermediate file includes that target summarizes
The module id of processing module;
The intermediate file is called, determines that target summarizes place according to the module id in the attribute information of the intermediate file
Module is managed, the ReduceRun function for corresponding to target aggregation process module is executed to intermediate file progress reduction process, obtains
Obtain processing result;
It further include storage unit 41, for being deposited according to output file catalogue completion after obtaining the processing result
Storage.
Since described device of the embodiment of the present invention is identical as the principle of above-described embodiment the method, for more detailed
Explain that details are not described herein for content.
It should be noted that can be by hardware processor (hardware processor) come real in the embodiment of the present invention
Existing correlation function.
A kind of data summarization executive device provided in an embodiment of the present invention, by supporting at least two aggregation process modules point
It is other that the execution of task are summarized to difference, obtain the file attribute of the input file of target aggregation process module, by file attribute with
Property parameters are matched, and after successful match, are summarized task parameters according to corresponding and are handled input file, handled
As a result, accomplishing to summarize task for different, it is not necessary that MapReduce application program is separately provided to each task, by built-in
The processing of functional module specific aim, reduces development difficulty and exploitation amount, facilitates execution.
Fig. 5 illustrates a kind of entity structure schematic diagram of server, as shown in figure 5, the server may include: processor
(processor) 510, communication interface (Communications Interface) 520, memory (memory) 530 and communication
Bus 540, wherein processor 510, communication interface 520, memory 530 complete mutual communication by communication bus 540.
Processor 510 can call the logical order in memory 530, to execute following method: obtaining target aggregation process module
The file attribute of input file;The file attribute is matched with the property parameters for corresponding to target aggregation process module,
Obtain matching result;If matching result be it is yes, summarize task parameters to described according to target aggregation process module is corresponded to
Input file is handled, and processing result is obtained.
In addition, the logical order in above-mentioned memory 530 can be realized by way of SFU software functional unit and conduct
Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally
Substantially the part of the part that contributes to existing technology or the technical solution can be in other words for the technical solution of invention
The form of software product embodies, which is stored in a storage medium, including some instructions to
So that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation of the present invention
The all or part of the steps of example the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various
It can store the medium of program code.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (14)
1. a kind of data summarization executes method, which is characterized in that the method is applied to support at least two aggregation process modules
Terminal, which comprises
Obtain the file attribute of the input file of target aggregation process module;
The file attribute is matched with the property parameters for corresponding to target aggregation process module, obtains matching result;
If matching result be it is yes, according to correspond to target aggregation process module summarize task parameters to the input file into
Row processing, obtains processing result.
2. data summarization according to claim 1 executes method, which is characterized in that the file attribute includes the basis Hive
Data table name, the property parameters include Hive basic data table name;
The file attribute is matched with the property parameters for corresponding to target aggregation process module, obtains matching result, tool
Body includes:
Judge the Hive basic data table name in the file attribute and the Hive basic data table name in the property parameters
Claim whether to match, obtains matching result.
3. data summarization according to claim 2 executes method, which is characterized in that the task parameters that summarize include
MapRun function and ReduceRun function;
When matching result be it is yes, then according to correspond to target aggregation process module summarize task parameters to the input file into
Row processing, obtains processing result, specifically includes:
When matching result be it is yes, then execute correspond to target aggregation process module MapRun function to the input file carry out
Mapping processing, obtains intermediate file, the attribute information of the intermediate file includes the module id of target aggregation process module;
The intermediate file is called, target aggregation process mould is determined according to the module id in the attribute information of the intermediate file
Block executes and corresponds to the ReduceRun function of target aggregation process module and carry out reduction process to the intermediate file, at acquisition
Manage result.
4. data summarization according to claim 3 executes method, which is characterized in that the file attribute further includes output text
Part catalogue;
Further include: after obtaining the processing result, complete to store according to the output file catalogue.
5. data summarization according to claim 1 executes method, which is characterized in that further include:
After obtaining the file attribute of input file of target aggregation process module, by the defeated of the target aggregation process module
Enter file and is put into preset read in file set.
6. data summarization according to claim 1 executes method, which is characterized in that obtaining target aggregation process module
Before the file attribute of input file, further includes:
The property parameters include Reduce task number;
By the Reduce task number summation in the property parameters of all aggregation process modules;
The property parameters include the resource information of Map/Reduce task;
The value of occupying of the resource information of Map/Reduce task in the property parameters of all aggregation process modules is maximized.
7. a kind of data summarization executive device, which is characterized in that described device is applied to support at least two aggregation process modules
Terminal, described device includes:
Acquiring unit, the file attribute of the input file for obtaining target aggregation process module;
Matching unit is obtained for matching the file attribute with the property parameters for corresponding to target aggregation process module
Obtain matching result;
Processing unit, for summarizing task parameters pair according to corresponding to target aggregation process module when matching result, which is, is
The input file is handled, and processing result is obtained.
8. data summarization executive device according to claim 7, which is characterized in that the file attribute includes the basis Hive
Data table name, the property parameters include Hive basic data table name;
The matching unit is specifically used for:
Judge the Hive basic data table name in the file attribute and the Hive basic data table name in the property parameters
Claim whether to match, obtains matching result.
9. data summarization executive device according to claim 7, which is characterized in that the task parameters that summarize include
MapRun function and ReduceRun function;
The processing unit is specifically used for:
When matching result be it is yes, then execute correspond to target aggregation process module MapRun function to the input file carry out
Mapping processing, obtains intermediate file, the attribute information of the intermediate file includes the module id of target aggregation process module;
The intermediate file is called, target aggregation process mould is determined according to the module id in the attribute information of the intermediate file
Block executes and corresponds to the ReduceRun function of target aggregation process module and carry out reduction process to the intermediate file, at acquisition
Manage result.
10. data summarization executive device according to claim 7, which is characterized in that the file attribute further includes output
File directory;
It further include storage unit, for completing to store according to the output file catalogue after obtaining the processing result.
11. data summarization executive device according to claim 7, which is characterized in that further include having read memory module, use
In:
After obtaining the file attribute of input file of target aggregation process module, by the defeated of the target aggregation process module
Enter file and is put into preset read in file set.
12. data summarization executive device according to claim 7, which is characterized in that further include detection module, be used for:
The property parameters include Reduce task number;
Before obtaining the file attribute of input file of target aggregation process module, the attribute of all aggregation process modules is joined
Reduce task number summation in number;
The property parameters include the resource information of Map/Reduce task;
Before obtaining the file attribute of input file of target aggregation process module, the attribute of all aggregation process modules is joined
The value of occupying of the resource information of Map/Reduce task in number is maximized.
13. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine program, which is characterized in that the processor realizes that data are converged as described in any one of claim 1 to 6 when executing described program
Total the step of executing method.
14. a kind of non-transient computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer
The step of data summarization executes method as described in any one of claim 1 to 6 is realized when program is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910397774.5A CN110222018A (en) | 2019-05-14 | 2019-05-14 | Data summarization executes method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910397774.5A CN110222018A (en) | 2019-05-14 | 2019-05-14 | Data summarization executes method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110222018A true CN110222018A (en) | 2019-09-10 |
Family
ID=67821094
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910397774.5A Pending CN110222018A (en) | 2019-05-14 | 2019-05-14 | Data summarization executes method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110222018A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113987791A (en) * | 2021-10-26 | 2022-01-28 | 成都飞机工业(集团)有限责任公司 | Method and device for acquiring part attribute information, terminal equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103347038A (en) * | 2013-05-30 | 2013-10-09 | 上海斐讯数据通信技术有限公司 | Method of WEB server for processing http messages |
CN103902378A (en) * | 2012-12-26 | 2014-07-02 | 鸿富锦精密工业(深圳)有限公司 | File allocation system and method |
US20140188948A1 (en) * | 2012-12-31 | 2014-07-03 | Smartprocure, Llc | Database aggregation of purchase data |
CN105760234A (en) * | 2016-03-17 | 2016-07-13 | 联动优势科技有限公司 | Thread pool management method and device |
CN107368300A (en) * | 2017-06-26 | 2017-11-21 | 北京天元创新科技有限公司 | A kind of data aggregation system and method based on MapReduce |
CN107608773A (en) * | 2017-08-24 | 2018-01-19 | 阿里巴巴集团控股有限公司 | task concurrent processing method, device and computing device |
CN108184078A (en) * | 2017-12-28 | 2018-06-19 | 可贝熊(湖北)文化传媒股份有限公司 | A kind of processing system for video and its method |
CN108491255A (en) * | 2018-02-08 | 2018-09-04 | 昆仑智汇数据科技(北京)有限公司 | The data-optimized distribution method of self-service MapReduce and system |
-
2019
- 2019-05-14 CN CN201910397774.5A patent/CN110222018A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902378A (en) * | 2012-12-26 | 2014-07-02 | 鸿富锦精密工业(深圳)有限公司 | File allocation system and method |
US20140188948A1 (en) * | 2012-12-31 | 2014-07-03 | Smartprocure, Llc | Database aggregation of purchase data |
CN103347038A (en) * | 2013-05-30 | 2013-10-09 | 上海斐讯数据通信技术有限公司 | Method of WEB server for processing http messages |
CN105760234A (en) * | 2016-03-17 | 2016-07-13 | 联动优势科技有限公司 | Thread pool management method and device |
CN107368300A (en) * | 2017-06-26 | 2017-11-21 | 北京天元创新科技有限公司 | A kind of data aggregation system and method based on MapReduce |
CN107608773A (en) * | 2017-08-24 | 2018-01-19 | 阿里巴巴集团控股有限公司 | task concurrent processing method, device and computing device |
CN108184078A (en) * | 2017-12-28 | 2018-06-19 | 可贝熊(湖北)文化传媒股份有限公司 | A kind of processing system for video and its method |
CN108491255A (en) * | 2018-02-08 | 2018-09-04 | 昆仑智汇数据科技(北京)有限公司 | The data-optimized distribution method of self-service MapReduce and system |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113987791A (en) * | 2021-10-26 | 2022-01-28 | 成都飞机工业(集团)有限责任公司 | Method and device for acquiring part attribute information, terminal equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11100420B2 (en) | Input processing for machine learning | |
US11182691B1 (en) | Category-based sampling of machine learning data | |
US20150379425A1 (en) | Consistent filtering of machine learning data | |
CN103927314B (en) | A kind of method and apparatus of batch data processing | |
CN107784026A (en) | A kind of ETL data processing methods and device | |
CN103310460A (en) | Image characteristic extraction method and system | |
CN112181951B (en) | Heterogeneous database data migration method, device and equipment | |
CN113196231A (en) | Techniques for decoupling access to infrastructure models | |
EP2965492B1 (en) | Selection of data storage settings for an application | |
CN107403110A (en) | HDFS data desensitization method and device | |
CN107480205A (en) | A kind of method and apparatus for carrying out data partition | |
CN110019298A (en) | Data processing method and device | |
CN113779060A (en) | Data query method and device | |
CN110134646B (en) | Knowledge platform service data storage and integration method and system | |
US11221846B2 (en) | Automated transformation of applications to a target computing environment | |
US11782888B2 (en) | Dynamic multi-platform model generation and deployment system | |
US20230418842A1 (en) | Data processing independent of storage, format or schema | |
CN110222018A (en) | Data summarization executes method and device | |
CN108089871A (en) | Automatic updating method of software, device, equipment and storage medium | |
US11262944B1 (en) | Placement of data objects in storage for improved retrieval | |
CN111951112A (en) | Intelligent contract execution method based on block chain, terminal equipment and storage medium | |
CN110222105A (en) | Data summarization processing method and processing device | |
CN114564211A (en) | Cluster deployment method, cluster deployment device, equipment and medium | |
CN107992286A (en) | Intelligent coding method, device, terminal device and storage medium | |
US12135696B2 (en) | Dynamic multi-platform model generation and deployment system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190910 |