CN101593197B - Method for processing mass data based on SQL like function of file - Google Patents
Method for processing mass data based on SQL like function of file Download PDFInfo
- Publication number
- CN101593197B CN101593197B CN200810249730XA CN200810249730A CN101593197B CN 101593197 B CN101593197 B CN 101593197B CN 200810249730X A CN200810249730X A CN 200810249730XA CN 200810249730 A CN200810249730 A CN 200810249730A CN 101593197 B CN101593197 B CN 101593197B
- Authority
- CN
- China
- Prior art keywords
- data
- file
- result
- processing
- sql
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000012545 processing Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 19
- 230000008569 process Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 22
- 238000013461 design Methods 0.000 description 6
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 238000003860 storage Methods 0.000 description 5
- 238000009825 accumulation Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 101100328886 Caenorhabditis elegans col-2 gene Proteins 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 101100328883 Arabidopsis thaliana COL1 gene Proteins 0.000 description 1
- 101100328884 Caenorhabditis elegans sqt-3 gene Proteins 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007789 sealing Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 230000001256 tonic effect Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of data processing in telecommunication network, and in particular provides a method for processing mass data based on the SQL like function of a file. The method for processing the mass data based on the SQL like function of the file comprises the following steps: A, carrying out standardized format treatment on a text file generated after acquiring; B, processing the mass data based on the standardized text file, and performing related data operation in a mode of SQL like statements; and c, defining and outputting the operating result as a final result andan intermediate result, and continuing performing data operation on the file until the requirement of the result is met when the intermediate result accords with the standardized format. The method combines the respective advantages of the file and a database mode, ensures the high efficiency of mass data processing, has quite convenient and flexible operation and has strong expansion capability.
Description
Technical field
The present invention relates to the data processing field in the communication network, a kind of method of handling mass data based on SQL like function of file specifically is provided.
Background technology
Development of computer is synchronous with the adaptation data processing, is synergistic.The data processing at initial stage all exists in the data file mode: file mode storage data have a lot of shortcomings, such as operation inconvenience, can not be multiplexing, lack standard.So in evolution, produced various relevant databases, promoted the development of data handling utility.
To the processing of mass data, at data characteristic, select corresponding processing mode, be the key that improves treatment effeciency.The measure that mass data processing is at present raised the efficiency mainly contains:
● select outstanding data base tool for use;
● write good program code;
● mass data is carried out division operation;
● set up index widely;
● improve hardware condition, strengthen CPU and internal memory;
● set up caching mechanism;
● strengthen virtual memory;
● batch treatment;
● optimize the query SQL statement;
● use text formatting to handle;
● customize powerful cleaning rule and error handling processing mechanism;
● set up view or Materialized View;
● avoid using 32 servers (extreme case);
● consider the operating system problem;
● use data warehouse and multidimensional data library storage;
● use sampled data, carry out data mining;
● memory database.
China is vast in territory, and network such as electric power, communication is integration operation, causes network size huge.But these networks all have its design feature, can select suitable mass data processing mode to improve data processing efficiency.The management of communication network has its regularity, and each node of forming network is the base unit of management, carries out statistical study respectively according to administrative area, Local Area Network, overall network, is referred to as the management of network element granularity; The generation of data can be 5 minutes, 15 minutes, 60 minutes uniformly-spaced modes according to the time tissue, generally requires 60 minutes (1 hour), day, week, month, year etc. in statistical study, is referred to as time granularity.
Produce the data of certain time according to different node (network element) on data produce, may there be time delay in the network element data in the whole network owing to some reasons on data produce.
The network management requirements data must be real-time, and the data of analysis must be complete.The characteristics that data Network Based produce, in the data acquisition of adopting database to carry out, gather etc. in the operation and need to do a large amount of marks, the time point that gathers as record mark data acquisition, data.And data gather the big data quantity operation of itself, have taken the ample resources of database itself, make database provide the ability of service to weaken to the user; The time-delay that data produce causes gathering difference in the triggering in data, thereby may cause the imperfect of data.All all be unable to do without database based on the optimization and the operation of database; The restriction of database manipulation has caused a little less than untimely, imperfect, the external service provision capacity of data.
In the data that network element produces, network element granularity relation has clear and definite sign, and network element is the base unit that produces data, according to these characteristics these mass datas is realized the data accumulation, deletion, association, maximum, minimum based on file mode, the SQL operation commonly used of database such as average.
Along with the development of server technology, very fast for direct data computation, this provides hardware foundation for data processing; The descriptive array of Hash array is directly located, for the data computing mode provides basis of software.Has outstanding extendability according to the implementation that opens and closes the principle design, growth data operation easily (calculating) such as certain the special formula that increases data.
Summary of the invention
A kind of method based on SQL like function of file processing mass data of the present invention is at above situation, based on communication network, adopt file mode, the convenience of imitation database manipulation, a kind of mass data processing scheme that realizes, so as to guaranteeing the high-level efficiency of mass data processing, make also that operation is very easy, flexibly and have a very strong extended capability.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of method based on SQL like function of file processing mass data may further comprise the steps:
A, carry out standardized format and handle gathering text that the back produces;
B, carry out mass data processing, use the mode of class SQL statement to carry out the related data operation based on standardized text file;
The result of C, operation can define and be output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues this document is carried out data manipulation, until reaching requirement as a result.
The document format data that steps A is used is as follows:
##STAR|HEADER
COMPANY|DEPARTMENT
Inspur|oss
##END|HEADER
##START|DATA_BLOCK
CELL|TRX|COUNT1|COUNT2|COUNT3
DF0001|1|1|2|3
DF0001|2|1|2|3
DF0001|3|1|2|3
DF0001|4|1|2|3
DF0002|1|1|2|3
DF0002|2|1|2|3
DF0002|3|1|2|3
DF0002|4|1|2|3
##END|DATA_BLOCK
Use the disengaging database scheme at mass data among the step B, carry out operation, in this operating process, used the mode of class SQL statement, promptly pass through the processing of the predicate realization of SQL statement data based on text.
Adopted among the step C exporting redefining of result, promptly can reuse the output of any operation, realized data are exported result's requirement by class SQL operation an original input.
Realization of the present invention is adopted and is opened and closed the principle design, realizes the flexible expansion and the customization exploitation of class SQL statement.
Opening and closing principle is one of Object-Oriented Design method, ' open to demand, modification is closed ', the meaning is that the system that realizes has high flexible expansion ability, the demand that proposes for the user can be unlimited admittance, but do not need to revise original program, only need realize getting final product at the new demand that proposes of user.The content that realizes promptly can realize the function that the user needs by calling of former function.The function of tonic chord that realization is called new expanded function is referred to as the engine mode.In this example, can be understood as aggregate function or other specific (special) requirements used in the SQL statement of adding other if desired, can realize this function, describe according to configuration then, just can in this function, use by coding.The purpose of this patent is to realize the efficient, convenient of mass data handled, and conveniently is exactly that finger print has been intended this mode that everybody is familiar with of SQL operation.
Of the present inventionly a kind ofly handle the method for mass data, both guaranteed the high-level efficiency of mass data processing, make also that operation is very easy, flexibly and have a very strong extended capability based on SQL like function of file.Its characteristics mainly contain:
1, breaking away from data of database handles
Timely, the complete process of mass data are the keys that realizes network management.But based on the processing of database mode, not only take database resource in a large number, and because the restriction of database processing ability can't obtain timely data.Influenced the use of user, and influenced based on the derived data application use of (data that produce by basic data and certain judgment rule are called derived data) to other application functions.Class database language operation by based on file mode can improve service efficiency, and the use of database is given finally to use the user as far as possible, saves investment.
The adding up of data processing index certificate, delete, operation such as related, maximum, minimum, average.These operations also are the operations commonly used of data processing in the database.
2, the class SQL based on file operates
Can realize any processing by programming to some data in the data file, but these handle and personalization is just arranged and be not easy to be called, use also inconvenient.
By opening and closing the principle design architecture, be convenient to call and use among the present invention, mainly realize function such as following table:
Function | SQL operation of equal value | Remarks |
Add up | Select sum (a) from tab where condition group by col1, col2 | Can be by the grouping condition setting, and can calculate again summation earlier according to arithmetic |
Deletion | Delete* from tab where condition | Different with SQL operation, in this deletion action, can also be set to keep, that is to say staying of the not operation of the condition of doing-satisfy condition |
? | ? | Down, the deletion that does not satisfy condition |
Related | Select a.col, b.col From a, b Where condition | The association of multilist |
Maximum | Select max (col) From table Where condition Group by grouping row | Get the maximal value under the branch set condition |
Minimum | Select min (col) From table Where condition Group by grouping row | Get the minimum value under the branch set condition |
On average | Select avg (col) From table Where condition Group by grouping row | Get the mean value under the branch set condition |
3, intermediate data operation
The processing of data just can not obtain net result through a SQL statement under many circumstances, often needs just can achieve the goal through the operation of several steps, has inevitably used the temporary table storage intermediate data of database in this process.
In order to make data processing more flexible, also has processing operation in the present invention to middle data.Also can produce intermediate result (temporary table) by being provided with in the file handling procedure, can carry out same operation this middle table, promptly this intermediate file as original pending data file.In this way, can be divided into several steps to the data computation of complexity and realize, improve practicality and adaptability.
4, meet the driving engine that opens and closes principle
Each data processing function can be articulated on the master routine easily as plug-in unit, finishes function by the driven by engine of master routine.The operation that needs to carry out just can be finished function by configuration setting, and the data processing function that increases newly also can call and use easily according to same setting.
The system that realizes in this invention is that open, extendible, realizes the processing of data by driving engine.Drive engine and have multistage daily record measure and debug mechanism, can find the problem that exists easily.
Description of drawings
Fig. 1 is a kind of method flow diagram of handling the method for mass data based on SQL like function of file of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments a kind of method based on SQL like function of file processing mass data of the present invention is further described.
Based on the data processing scheme of file, made full use of present computer hardware, pending data are put in the internal memory by array handle, the speed of data processing is accelerated greatly.Array is a treatment variable mode commonly used in the program, but array generally is to arrange foundation with the numeral, can't directly find the name variable (general by the traversal array, as relatively to realize with value corresponding) that needs processing; The Hash array of introducing in this programme has been avoided this shortcoming, can be directly with the subscript of variable as array, difference and array data.The use of Hash array makes direct operation to variable become convenient and flexible, and has accelerated data processing speed.
Drive engine and adopt the design of switching principle, the so-called principle that opens and closes is exactly ' sealing revising, open to demand ', and this characteristic makes engine have stronger adaptability and better expansibility.
In order to make data processing more convenient and flexible, introduced the pattern of ephemeral data record, stipulated that in system input has identical form with temporary file, the convenient and speed of the processing of taking into account system, form is done following requirement:
##START|HEADER
##END|HEADER
##START|DATA_BLOCK_______________ piece name is used for distinguishes data; The START mark
##END|DATA_BLOCK_________________END indicates the end of this blocks of data
Master routine as shown in Figure 1.
These operations do not have dependence successively in use, and can repeatedly be called in a processing, can a plurality of operation associated treatment reach certain requirement result yet.To the whole network granularity, can do deletion action as a data accumulation that satisfies certain condition earlier one time, operate the data of realization demand then by adding up.
Program call processing with specific data, realize by the rule configuration.Following configuration is the configuration of data accumulation rule:
Title | Explanation | Remarks | Fill in | Type |
RULE_TYPE | The title of functional module | SUM | Must fill out | Scalar |
RULE_DESC | The description explanation of carrying out, this content will appear in the daily record | Rule description | Optional.Do not fill out the content that then in daily record, shows RULE_TYPE. | Scalar |
INPUT_FILE_DESCRIPTION | Handle the name of file | Can be according to canonical | Must fill out | Array |
? | Claim to describe | Expression formula is described | ? | ? |
?OPUPUT_BLOCK_NAME | The title of output block | Can distinguish with raw data and distinguish | Must fill out | Scalar |
?COUNTERS_TO_SORT_ON | The sort field that adds up, the condition field that promptly adds up | Can a plurality of fields | Must fill out | Array |
?REDUNDANT_COUNTERS | Unnecessary counter tabulation, a plurality of middle using ", " cut apart | Unwanted row in the file that produces | Optional, acquiescence does not have | Array |
?PRODUCE_PIF | Interim formatted file in the middle of producing | True-produces, and 0-does not produce | Optional, acquiescence produces | Scalar |
?PRODUCE_LIF | Produce the warehouse-in formatted file | True-produces, and 0-does not produce | Optional, acquiescence produces | Scalar |
?NON_ADDITIVE_COUNTERS | Do not need the explanation tabulation of the field that adds up, a plurality of in the middle of with ", " cut apart | As title, the time etc. do not need to add up, and the field of the ordering that adds up does not need to specify again, and this field is not done and added up | Optional, acquiescence does not have | Array |
?APPEND_STR | The additional character string promptly adds this character string after the row title that participation adds up | Be not provided with, then ignore this option | Optional, acquiescence does not have | Scalar |
?OLD_COUNTER_NAMES | The row name list that need rename | Be not provided with, then ignore this option | Optional, acquiescence does not have | Array |
NEW_COUNTER_NAMES | Row title after renaming, corresponding with last list placement | Be not provided with, then ignore this option | Optional, acquiescence does not have | Array |
OUTPUT_DIR | The path of warehouse-in document storage | The specific position of can specified file depositing | Optional, acquiescence is seen note | Scalar |
Keep_files | The backup path of warehouse-in document storage | If be not provided with then do not back up, backup mainly is to provide data source to the third party. | Optional, acquiescence is seen note | Scalar |
COMPUTE_EXPRESSION | Calculated column is expressed formula | Optional, acquiescence does not have | Array | |
COMPUTE_NAME | The name of output | Optional, acquiescence does not have | Array |
Must fill out is the item that must be provided with in the configuration, and option can not dispose when using.Object lesson is as follows:
'
RULE TYPE '=>' ACCUMULATE ', the handle that accumulation function is called
‘RULE_DESC’
=>‘Acccumulate?IN’,
‘PRODUCE_PIF’
=>‘True’,
‘PRODUCE_LIF’
=>0,
‘OUTPUT_BLOCK_NAME’
=>‘NICELASS_0’
'
INPUT_FILE_DESCRIPTION '=>(' NICELASS#*#Epif) import file name, but wildcard
' COUNTERS_TO_SORT_ON '=>(' OBJ_ID_1 ') SQL statement in, the variable name of GROUP BY part
' COMPUTE_EXPRESSION '=>(' COL1/COL2 ') two variablees do and remove operation, obtains one and newly be listed as
' COMPUTE_NAME '=>(' COMPUTE_1 ') new row title, add up by the value after calculating among the result
‘APPEND_STR’
=>‘
?0’.
Above-described embodiment, the present invention embodiment a kind of more preferably just, the common variation that those skilled in the art carries out in the technical solution of the present invention scope and replacing all should be included in protection scope of the present invention.
Claims (1)
1. handle the method for mass data based on SQL like function of file for one kind, may further comprise the steps:
A, carry out standardized format and handle gathering text that the back produces;
B, the text after handling based on standardized format carry out mass data processing, use the mode of class SQL statement to carry out the related data operation;
Result's definition of C, operation is output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues file is carried out data manipulation, until reaching requirement as a result;
The document format data that described steps A is used is as follows:
##STAR|HEADER
COMPANY|DEPARTMENT
Inspur|oss
##END|HEADER
##START?I?DATA_BLOCK
CELL|TRX|COLJNT1|COUNT2|COUNT3
DF0001|1|1|2|3
DF0001|2|1|2|3
DF0001|3|1|2|3
DF0001|4|1|2|3
DF0002|1|1|2|3
DF0002|2|1|2|3
DF0002|3|1|2|3
DF0002|4|1|2|3
##END|DATA_BLOCK
Wherein DATA_BLOCK is the piece name, is used for distinguishes data, and the START of this piece name front indicates the beginning of this blocks of data, the similar tables of data of piece; ' part between ##START I DATA_BLOCK ' and ' ##END|DATA_BLOCK ' is a data volume, every row representative is equivalent to the data line in the tables of data, the first behavior variable wherein is equivalent to the row of tables of data, and the END among the ##END|DATA_BLOCK indicates the end of these data;
Use the disengaging database scheme at mass data among the step B, carry out operation, in this operating process, used the mode of class SQL statement, promptly pass through the processing of the predicate realization of SQL statement data based on text;
Adopted among the step C exporting redefining of result,, reused the output of any operation, realized data are exported result's requirement by class SQL operation promptly to an original input.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810249730XA CN101593197B (en) | 2008-12-30 | 2008-12-30 | Method for processing mass data based on SQL like function of file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810249730XA CN101593197B (en) | 2008-12-30 | 2008-12-30 | Method for processing mass data based on SQL like function of file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101593197A CN101593197A (en) | 2009-12-02 |
CN101593197B true CN101593197B (en) | 2011-10-05 |
Family
ID=41407855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200810249730XA Expired - Fee Related CN101593197B (en) | 2008-12-30 | 2008-12-30 | Method for processing mass data based on SQL like function of file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101593197B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150220898A1 (en) * | 2014-02-04 | 2015-08-06 | Seth Priebatsch | Dynamic ingestion and processing of transactional data at the point of sale |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102541884B (en) * | 2010-12-10 | 2014-07-02 | 中国移动通信集团贵州有限公司 | Method and device for database optimization |
CN102163231A (en) * | 2011-04-13 | 2011-08-24 | 浪潮(北京)电子信息产业有限公司 | Method and device for data collection |
US8639619B1 (en) | 2012-07-13 | 2014-01-28 | Scvngr, Inc. | Secure payment method and system |
US8770478B2 (en) | 2013-07-11 | 2014-07-08 | Scvngr, Inc. | Payment processing with automatic no-touch mode selection |
CN103425779A (en) * | 2013-08-19 | 2013-12-04 | 曙光信息产业股份有限公司 | Data processing method and data processing device |
CN107577803A (en) * | 2017-09-25 | 2018-01-12 | 北京维联众诚科技有限公司 | Data processing method based on class SQL engines |
-
2008
- 2008-12-30 CN CN200810249730XA patent/CN101593197B/en not_active Expired - Fee Related
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150220898A1 (en) * | 2014-02-04 | 2015-08-06 | Seth Priebatsch | Dynamic ingestion and processing of transactional data at the point of sale |
Also Published As
Publication number | Publication date |
---|---|
CN101593197A (en) | 2009-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101593197B (en) | Method for processing mass data based on SQL like function of file | |
Buneman et al. | Comprehension syntax | |
CN109559231B (en) | Block chain-oriented tracing query method | |
CN102289507B (en) | Method for mining data flow weighted frequent mode based on sliding window | |
CN104268428A (en) | Visual configuration method for index calculation | |
EP1875335A2 (en) | System and method for analyzing and reporting extensible data from multiple sources in multiple formats | |
CN108304522A (en) | Comparison method, device and the terminal device of difference between a kind of database | |
CN103646100A (en) | Report data organization model | |
CN107463706B (en) | Hadoop-based mass wave recording data storage and analysis method and system | |
CN113642299A (en) | One-key generation method based on power grid statistical form | |
CN101944116B (en) | Complex multi-dimensional hierarchical connection and aggregation method for data warehouse | |
US20230067182A1 (en) | Data Processing Device and Method, and Computer Readable Storage Medium | |
CN114218218A (en) | Data processing method, device and equipment based on data warehouse and storage medium | |
CN102508971B (en) | Method for establishing product function model in concept design stage | |
CN101710336A (en) | Method for accelerating data processing by using relational middleware | |
CN100589101C (en) | Data access method based on the Oracle relational database of routine call interface | |
CN112651618A (en) | Construction method of audit dimension model for online audit of metering data | |
CN115687468A (en) | System for processing data in distributed service by ETL process button | |
CN115145736B (en) | Cloud platform quota intelligent distribution system based on Spark distributed computing | |
CN202433952U (en) | General network reporting system | |
CN107329998A (en) | User's increment class data capture method, device and equipment | |
CN100403308C (en) | SQL load mining-based automatic design method for physical database | |
CN110347726A (en) | A kind of efficient time series data is integrated to store inquiry system and method | |
Tuijn et al. | CGOOD, a categorical graph-oriented object data model | |
Sousa et al. | Clustering relations into abstract er schemas for database reverse engineering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20111005 Termination date: 20131230 |