CN101593197B - Method for processing mass data based on SQL like function of file - Google Patents

Method for processing mass data based on SQL like function of file Download PDF

Info

Publication number
CN101593197B
CN101593197B CN200810249730XA CN200810249730A CN101593197B CN 101593197 B CN101593197 B CN 101593197B CN 200810249730X A CN200810249730X A CN 200810249730XA CN 200810249730 A CN200810249730 A CN 200810249730A CN 101593197 B CN101593197 B CN 101593197B
Authority
CN
China
Prior art keywords
data
file
result
processing
sql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200810249730XA
Other languages
Chinese (zh)
Other versions
CN101593197A (en
Inventor
祝乃国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Communication Information System Co Ltd
Original Assignee
Inspur Communication Information System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Communication Information System Co Ltd filed Critical Inspur Communication Information System Co Ltd
Priority to CN200810249730XA priority Critical patent/CN101593197B/en
Publication of CN101593197A publication Critical patent/CN101593197A/en
Application granted granted Critical
Publication of CN101593197B publication Critical patent/CN101593197B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of data processing in telecommunication network, and in particular provides a method for processing mass data based on the SQL like function of a file. The method for processing the mass data based on the SQL like function of the file comprises the following steps: A, carrying out standardized format treatment on a text file generated after acquiring; B, processing the mass data based on the standardized text file, and performing related data operation in a mode of SQL like statements; and c, defining and outputting the operating result as a final result andan intermediate result, and continuing performing data operation on the file until the requirement of the result is met when the intermediate result accords with the standardized format. The method combines the respective advantages of the file and a database mode, ensures the high efficiency of mass data processing, has quite convenient and flexible operation and has strong expansion capability.

Description

A kind of method of handling mass data based on SQL like function of file
Technical field
The present invention relates to the data processing field in the communication network, a kind of method of handling mass data based on SQL like function of file specifically is provided.
Background technology
Development of computer is synchronous with the adaptation data processing, is synergistic.The data processing at initial stage all exists in the data file mode: file mode storage data have a lot of shortcomings, such as operation inconvenience, can not be multiplexing, lack standard.So in evolution, produced various relevant databases, promoted the development of data handling utility.
To the processing of mass data, at data characteristic, select corresponding processing mode, be the key that improves treatment effeciency.The measure that mass data processing is at present raised the efficiency mainly contains:
● select outstanding data base tool for use;
● write good program code;
● mass data is carried out division operation;
● set up index widely;
● improve hardware condition, strengthen CPU and internal memory;
● set up caching mechanism;
● strengthen virtual memory;
● batch treatment;
● optimize the query SQL statement;
● use text formatting to handle;
● customize powerful cleaning rule and error handling processing mechanism;
● set up view or Materialized View;
● avoid using 32 servers (extreme case);
● consider the operating system problem;
● use data warehouse and multidimensional data library storage;
● use sampled data, carry out data mining;
● memory database.
China is vast in territory, and network such as electric power, communication is integration operation, causes network size huge.But these networks all have its design feature, can select suitable mass data processing mode to improve data processing efficiency.The management of communication network has its regularity, and each node of forming network is the base unit of management, carries out statistical study respectively according to administrative area, Local Area Network, overall network, is referred to as the management of network element granularity; The generation of data can be 5 minutes, 15 minutes, 60 minutes uniformly-spaced modes according to the time tissue, generally requires 60 minutes (1 hour), day, week, month, year etc. in statistical study, is referred to as time granularity.
Produce the data of certain time according to different node (network element) on data produce, may there be time delay in the network element data in the whole network owing to some reasons on data produce.
The network management requirements data must be real-time, and the data of analysis must be complete.The characteristics that data Network Based produce, in the data acquisition of adopting database to carry out, gather etc. in the operation and need to do a large amount of marks, the time point that gathers as record mark data acquisition, data.And data gather the big data quantity operation of itself, have taken the ample resources of database itself, make database provide the ability of service to weaken to the user; The time-delay that data produce causes gathering difference in the triggering in data, thereby may cause the imperfect of data.All all be unable to do without database based on the optimization and the operation of database; The restriction of database manipulation has caused a little less than untimely, imperfect, the external service provision capacity of data.
In the data that network element produces, network element granularity relation has clear and definite sign, and network element is the base unit that produces data, according to these characteristics these mass datas is realized the data accumulation, deletion, association, maximum, minimum based on file mode, the SQL operation commonly used of database such as average.
Along with the development of server technology, very fast for direct data computation, this provides hardware foundation for data processing; The descriptive array of Hash array is directly located, for the data computing mode provides basis of software.Has outstanding extendability according to the implementation that opens and closes the principle design, growth data operation easily (calculating) such as certain the special formula that increases data.
Summary of the invention
A kind of method based on SQL like function of file processing mass data of the present invention is at above situation, based on communication network, adopt file mode, the convenience of imitation database manipulation, a kind of mass data processing scheme that realizes, so as to guaranteeing the high-level efficiency of mass data processing, make also that operation is very easy, flexibly and have a very strong extended capability.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of method based on SQL like function of file processing mass data may further comprise the steps:
A, carry out standardized format and handle gathering text that the back produces;
B, carry out mass data processing, use the mode of class SQL statement to carry out the related data operation based on standardized text file;
The result of C, operation can define and be output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues this document is carried out data manipulation, until reaching requirement as a result.
The document format data that steps A is used is as follows:
##STAR|HEADER
COMPANY|DEPARTMENT
Inspur|oss
##END|HEADER
##START|DATA_BLOCK
CELL|TRX|COUNT1|COUNT2|COUNT3
DF0001|1|1|2|3
DF0001|2|1|2|3
DF0001|3|1|2|3
DF0001|4|1|2|3
DF0002|1|1|2|3
DF0002|2|1|2|3
DF0002|3|1|2|3
DF0002|4|1|2|3
##END|DATA_BLOCK
Use the disengaging database scheme at mass data among the step B, carry out operation, in this operating process, used the mode of class SQL statement, promptly pass through the processing of the predicate realization of SQL statement data based on text.
Adopted among the step C exporting redefining of result, promptly can reuse the output of any operation, realized data are exported result's requirement by class SQL operation an original input.
Realization of the present invention is adopted and is opened and closed the principle design, realizes the flexible expansion and the customization exploitation of class SQL statement.
Opening and closing principle is one of Object-Oriented Design method, ' open to demand, modification is closed ', the meaning is that the system that realizes has high flexible expansion ability, the demand that proposes for the user can be unlimited admittance, but do not need to revise original program, only need realize getting final product at the new demand that proposes of user.The content that realizes promptly can realize the function that the user needs by calling of former function.The function of tonic chord that realization is called new expanded function is referred to as the engine mode.In this example, can be understood as aggregate function or other specific (special) requirements used in the SQL statement of adding other if desired, can realize this function, describe according to configuration then, just can in this function, use by coding.The purpose of this patent is to realize the efficient, convenient of mass data handled, and conveniently is exactly that finger print has been intended this mode that everybody is familiar with of SQL operation.
Of the present inventionly a kind ofly handle the method for mass data, both guaranteed the high-level efficiency of mass data processing, make also that operation is very easy, flexibly and have a very strong extended capability based on SQL like function of file.Its characteristics mainly contain:
1, breaking away from data of database handles
Timely, the complete process of mass data are the keys that realizes network management.But based on the processing of database mode, not only take database resource in a large number, and because the restriction of database processing ability can't obtain timely data.Influenced the use of user, and influenced based on the derived data application use of (data that produce by basic data and certain judgment rule are called derived data) to other application functions.Class database language operation by based on file mode can improve service efficiency, and the use of database is given finally to use the user as far as possible, saves investment.
The adding up of data processing index certificate, delete, operation such as related, maximum, minimum, average.These operations also are the operations commonly used of data processing in the database.
2, the class SQL based on file operates
Can realize any processing by programming to some data in the data file, but these handle and personalization is just arranged and be not easy to be called, use also inconvenient.
By opening and closing the principle design architecture, be convenient to call and use among the present invention, mainly realize function such as following table:
Function SQL operation of equal value Remarks
Add up Select sum (a) from tab where condition group by col1, col2 Can be by the grouping condition setting, and can calculate again summation earlier according to arithmetic
Deletion Delete* from tab where condition Different with SQL operation, in this deletion action, can also be set to keep, that is to say staying of the not operation of the condition of doing-satisfy condition
? ? Down, the deletion that does not satisfy condition
Related Select a.col, b.col From a, b Where condition The association of multilist
Maximum Select max (col) From table Where condition Group by grouping row Get the maximal value under the branch set condition
Minimum Select min (col) From table Where condition Group by grouping row Get the minimum value under the branch set condition
On average Select avg (col) From table Where condition Group by grouping row Get the mean value under the branch set condition
3, intermediate data operation
The processing of data just can not obtain net result through a SQL statement under many circumstances, often needs just can achieve the goal through the operation of several steps, has inevitably used the temporary table storage intermediate data of database in this process.
In order to make data processing more flexible, also has processing operation in the present invention to middle data.Also can produce intermediate result (temporary table) by being provided with in the file handling procedure, can carry out same operation this middle table, promptly this intermediate file as original pending data file.In this way, can be divided into several steps to the data computation of complexity and realize, improve practicality and adaptability.
4, meet the driving engine that opens and closes principle
Each data processing function can be articulated on the master routine easily as plug-in unit, finishes function by the driven by engine of master routine.The operation that needs to carry out just can be finished function by configuration setting, and the data processing function that increases newly also can call and use easily according to same setting.
The system that realizes in this invention is that open, extendible, realizes the processing of data by driving engine.Drive engine and have multistage daily record measure and debug mechanism, can find the problem that exists easily.
Description of drawings
Fig. 1 is a kind of method flow diagram of handling the method for mass data based on SQL like function of file of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments a kind of method based on SQL like function of file processing mass data of the present invention is further described.
Based on the data processing scheme of file, made full use of present computer hardware, pending data are put in the internal memory by array handle, the speed of data processing is accelerated greatly.Array is a treatment variable mode commonly used in the program, but array generally is to arrange foundation with the numeral, can't directly find the name variable (general by the traversal array, as relatively to realize with value corresponding) that needs processing; The Hash array of introducing in this programme has been avoided this shortcoming, can be directly with the subscript of variable as array, difference and array data.The use of Hash array makes direct operation to variable become convenient and flexible, and has accelerated data processing speed.
Drive engine and adopt the design of switching principle, the so-called principle that opens and closes is exactly ' sealing revising, open to demand ', and this characteristic makes engine have stronger adaptability and better expansibility.
In order to make data processing more convenient and flexible, introduced the pattern of ephemeral data record, stipulated that in system input has identical form with temporary file, the convenient and speed of the processing of taking into account system, form is done following requirement:
##START|HEADER
##END|HEADER
##START|DATA_BLOCK_______________ piece name is used for distinguishes data; The START mark
Figure DEST_PATH_GA20168157200810249730X01D00012
##END|DATA_BLOCK_________________END indicates the end of this blocks of data
Master routine as shown in Figure 1.
These operations do not have dependence successively in use, and can repeatedly be called in a processing, can a plurality of operation associated treatment reach certain requirement result yet.To the whole network granularity, can do deletion action as a data accumulation that satisfies certain condition earlier one time, operate the data of realization demand then by adding up.
Program call processing with specific data, realize by the rule configuration.Following configuration is the configuration of data accumulation rule:
Title Explanation Remarks Fill in Type
RULE_TYPE The title of functional module SUM Must fill out Scalar
RULE_DESC The description explanation of carrying out, this content will appear in the daily record Rule description Optional.Do not fill out the content that then in daily record, shows RULE_TYPE. Scalar
INPUT_FILE_DESCRIPTION Handle the name of file Can be according to canonical Must fill out Array
? Claim to describe Expression formula is described ? ?
?OPUPUT_BLOCK_NAME The title of output block Can distinguish with raw data and distinguish Must fill out Scalar
?COUNTERS_TO_SORT_ON The sort field that adds up, the condition field that promptly adds up Can a plurality of fields Must fill out Array
?REDUNDANT_COUNTERS Unnecessary counter tabulation, a plurality of middle using ", " cut apart Unwanted row in the file that produces Optional, acquiescence does not have Array
?PRODUCE_PIF Interim formatted file in the middle of producing True-produces, and 0-does not produce Optional, acquiescence produces Scalar
?PRODUCE_LIF Produce the warehouse-in formatted file True-produces, and 0-does not produce Optional, acquiescence produces Scalar
?NON_ADDITIVE_COUNTERS Do not need the explanation tabulation of the field that adds up, a plurality of in the middle of with ", " cut apart As title, the time etc. do not need to add up, and the field of the ordering that adds up does not need to specify again, and this field is not done and added up Optional, acquiescence does not have Array
?APPEND_STR The additional character string promptly adds this character string after the row title that participation adds up Be not provided with, then ignore this option Optional, acquiescence does not have Scalar
?OLD_COUNTER_NAMES The row name list that need rename Be not provided with, then ignore this option Optional, acquiescence does not have Array
NEW_COUNTER_NAMES Row title after renaming, corresponding with last list placement Be not provided with, then ignore this option Optional, acquiescence does not have Array
OUTPUT_DIR The path of warehouse-in document storage The specific position of can specified file depositing Optional, acquiescence is seen note Scalar
Keep_files The backup path of warehouse-in document storage If be not provided with then do not back up, backup mainly is to provide data source to the third party. Optional, acquiescence is seen note Scalar
COMPUTE_EXPRESSION Calculated column is expressed formula Optional, acquiescence does not have Array
COMPUTE_NAME The name of output Optional, acquiescence does not have Array
Must fill out is the item that must be provided with in the configuration, and option can not dispose when using.Object lesson is as follows:
' RULE TYPE '=>' ACCUMULATE ', the handle that accumulation function is called
‘RULE_DESC’ =>‘Acccumulate?IN’,
‘PRODUCE_PIF’ =>‘True’,
‘PRODUCE_LIF’ =>0,
‘OUTPUT_BLOCK_NAME’ =>‘NICELASS_0’
' INPUT_FILE_DESCRIPTION '=>(' NICELASS#*#Epif) import file name, but wildcard
' COUNTERS_TO_SORT_ON '=>(' OBJ_ID_1 ') SQL statement in, the variable name of GROUP BY part
' COMPUTE_EXPRESSION '=>(' COL1/COL2 ') two variablees do and remove operation, obtains one and newly be listed as
' COMPUTE_NAME '=>(' COMPUTE_1 ') new row title, add up by the value after calculating among the result
‘APPEND_STR’ =>?0’.
Above-described embodiment, the present invention embodiment a kind of more preferably just, the common variation that those skilled in the art carries out in the technical solution of the present invention scope and replacing all should be included in protection scope of the present invention.

Claims (1)

1. handle the method for mass data based on SQL like function of file for one kind, may further comprise the steps:
A, carry out standardized format and handle gathering text that the back produces;
B, the text after handling based on standardized format carry out mass data processing, use the mode of class SQL statement to carry out the related data operation;
Result's definition of C, operation is output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues file is carried out data manipulation, until reaching requirement as a result;
The document format data that described steps A is used is as follows:
##STAR|HEADER
COMPANY|DEPARTMENT
Inspur|oss
##END|HEADER
##START?I?DATA_BLOCK
CELL|TRX|COLJNT1|COUNT2|COUNT3
DF0001|1|1|2|3
DF0001|2|1|2|3
DF0001|3|1|2|3
DF0001|4|1|2|3
DF0002|1|1|2|3
DF0002|2|1|2|3
DF0002|3|1|2|3
DF0002|4|1|2|3
##END|DATA_BLOCK
Wherein DATA_BLOCK is the piece name, is used for distinguishes data, and the START of this piece name front indicates the beginning of this blocks of data, the similar tables of data of piece; ' part between ##START I DATA_BLOCK ' and ' ##END|DATA_BLOCK ' is a data volume, every row representative is equivalent to the data line in the tables of data, the first behavior variable wherein is equivalent to the row of tables of data, and the END among the ##END|DATA_BLOCK indicates the end of these data;
Use the disengaging database scheme at mass data among the step B, carry out operation, in this operating process, used the mode of class SQL statement, promptly pass through the processing of the predicate realization of SQL statement data based on text;
Adopted among the step C exporting redefining of result,, reused the output of any operation, realized data are exported result's requirement by class SQL operation promptly to an original input.
CN200810249730XA 2008-12-30 2008-12-30 Method for processing mass data based on SQL like function of file Expired - Fee Related CN101593197B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810249730XA CN101593197B (en) 2008-12-30 2008-12-30 Method for processing mass data based on SQL like function of file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810249730XA CN101593197B (en) 2008-12-30 2008-12-30 Method for processing mass data based on SQL like function of file

Publications (2)

Publication Number Publication Date
CN101593197A CN101593197A (en) 2009-12-02
CN101593197B true CN101593197B (en) 2011-10-05

Family

ID=41407855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810249730XA Expired - Fee Related CN101593197B (en) 2008-12-30 2008-12-30 Method for processing mass data based on SQL like function of file

Country Status (1)

Country Link
CN (1) CN101593197B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150220898A1 (en) * 2014-02-04 2015-08-06 Seth Priebatsch Dynamic ingestion and processing of transactional data at the point of sale

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541884B (en) * 2010-12-10 2014-07-02 中国移动通信集团贵州有限公司 Method and device for database optimization
CN102163231A (en) * 2011-04-13 2011-08-24 浪潮(北京)电子信息产业有限公司 Method and device for data collection
US8639619B1 (en) 2012-07-13 2014-01-28 Scvngr, Inc. Secure payment method and system
US8770478B2 (en) 2013-07-11 2014-07-08 Scvngr, Inc. Payment processing with automatic no-touch mode selection
CN103425779A (en) * 2013-08-19 2013-12-04 曙光信息产业股份有限公司 Data processing method and data processing device
CN107577803A (en) * 2017-09-25 2018-01-12 北京维联众诚科技有限公司 Data processing method based on class SQL engines

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150220898A1 (en) * 2014-02-04 2015-08-06 Seth Priebatsch Dynamic ingestion and processing of transactional data at the point of sale

Also Published As

Publication number Publication date
CN101593197A (en) 2009-12-02

Similar Documents

Publication Publication Date Title
CN101593197B (en) Method for processing mass data based on SQL like function of file
Buneman et al. Comprehension syntax
CN109559231B (en) Block chain-oriented tracing query method
CN102289507B (en) Method for mining data flow weighted frequent mode based on sliding window
CN104268428A (en) Visual configuration method for index calculation
EP1875335A2 (en) System and method for analyzing and reporting extensible data from multiple sources in multiple formats
CN108304522A (en) Comparison method, device and the terminal device of difference between a kind of database
CN103646100A (en) Report data organization model
CN107463706B (en) Hadoop-based mass wave recording data storage and analysis method and system
CN113642299A (en) One-key generation method based on power grid statistical form
CN101944116B (en) Complex multi-dimensional hierarchical connection and aggregation method for data warehouse
US20230067182A1 (en) Data Processing Device and Method, and Computer Readable Storage Medium
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN102508971B (en) Method for establishing product function model in concept design stage
CN101710336A (en) Method for accelerating data processing by using relational middleware
CN100589101C (en) Data access method based on the Oracle relational database of routine call interface
CN112651618A (en) Construction method of audit dimension model for online audit of metering data
CN115687468A (en) System for processing data in distributed service by ETL process button
CN115145736B (en) Cloud platform quota intelligent distribution system based on Spark distributed computing
CN202433952U (en) General network reporting system
CN107329998A (en) User's increment class data capture method, device and equipment
CN100403308C (en) SQL load mining-based automatic design method for physical database
CN110347726A (en) A kind of efficient time series data is integrated to store inquiry system and method
Tuijn et al. CGOOD, a categorical graph-oriented object data model
Sousa et al. Clustering relations into abstract er schemas for database reverse engineering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111005

Termination date: 20131230