CN101593197A - A kind of method of handling mass data based on SQL like function of file - Google Patents

A kind of method of handling mass data based on SQL like function of file Download PDF

Info

Publication number
CN101593197A
CN101593197A CNA200810249730XA CN200810249730A CN101593197A CN 101593197 A CN101593197 A CN 101593197A CN A200810249730X A CNA200810249730X A CN A200810249730XA CN 200810249730 A CN200810249730 A CN 200810249730A CN 101593197 A CN101593197 A CN 101593197A
Authority
CN
China
Prior art keywords
data
file
sql
mass data
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA200810249730XA
Other languages
Chinese (zh)
Other versions
CN101593197B (en
Inventor
祝乃国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Communication Information System Co Ltd
Original Assignee
Inspur Communication Information System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Communication Information System Co Ltd filed Critical Inspur Communication Information System Co Ltd
Priority to CN200810249730XA priority Critical patent/CN101593197B/en
Publication of CN101593197A publication Critical patent/CN101593197A/en
Application granted granted Critical
Publication of CN101593197B publication Critical patent/CN101593197B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the data processing field in the communication network, a kind of method of handling mass data based on SQL like function of file specifically is provided.A kind of method based on SQL like function of file processing mass data of the present invention may further comprise the steps: A. carries out standardized format to the text of gathering the back generation to be handled; B. carry out mass data processing based on standardized text file, use the mode of class SQL statement to carry out the related data operation; C. Cao Zuo result can define and be output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues this document is carried out data manipulation, until reaching requirement as a result.The present invention combines the advantage separately of file and database mode, has both guaranteed the high-level efficiency of mass data processing, makes also that operation is very easy, flexibly and have a very strong extended capability.

Description

A kind of method of handling mass data based on SQL like function of file
Technical field
The present invention relates to the data processing field in the communication network, a kind of method of handling mass data based on SQL like function of file specifically is provided.
Background technology
Development of computer is synchronous with the adaptation data processing, is synergistic.The data processing at initial stage all exists in the data file mode: file mode storage data have a lot of shortcomings, such as operation inconvenience, can not be multiplexing, lack standard.So in evolution, produced various relevant databases, promoted the development of data handling utility.
To the processing of mass data, at data characteristic, select corresponding processing mode, be the key that improves treatment effeciency.The measure that mass data processing is at present raised the efficiency mainly contains:
● select outstanding data base tool for use;
● write good program code;
● mass data is carried out division operation;
● set up index widely;
● improve hardware condition, strengthen CPU and internal memory;
● set up caching mechanism;
● strengthen virtual memory;
● batch treatment;
● optimize the query SQL statement;
● use text formatting to handle;
● customize powerful cleaning rule and error handling processing mechanism;
● set up view or Materialized View;
● avoid using 32 servers (extreme case);
● consider the operating system problem;
● use data warehouse and multidimensional data library storage;
● use sampled data, carry out data mining;
● memory database.
China is vast in territory, and network such as electric power, communication is integration operation, causes network size huge.But these networks all have its design feature, can select suitable mass data processing mode to improve data processing efficiency.The management of communication network has its regularity, and each node of forming network is the base unit of management, carries out statistical study respectively according to administrative area, Local Area Network, overall network, is referred to as the management of network element granularity; The generation of data can be 5 minutes, 15 minutes, 60 minutes uniformly-spaced modes according to the time tissue, generally requires 60 minutes (1 hour), day, week, month, year etc. in statistical study, is referred to as time granularity.
Produce the data of certain time according to different node (network element) on data produce, may there be time delay in the network element data in the whole network owing to some reasons on data produce.
The network management requirements data must be real-time, and the data of analysis must be complete.The characteristics that data Network Based produce, in the data acquisition of adopting database to carry out, gather etc. in the operation and need to do a large amount of marks, the time point that gathers as record mark data acquisition, data.And data gather the big data quantity operation of itself, have taken the ample resources of database itself, make database provide the ability of service to weaken to the user; The time-delay that data produce causes gathering difference in the triggering in data, thereby may cause the imperfect of data.All all be unable to do without database based on the optimization and the operation of database; The restriction of database manipulation has caused a little less than untimely, imperfect, the external service provision capacity of data.
In the data that network element produces, network element granularity relation has clear and definite sign, and network element is the base unit that produces data, according to these characteristics these mass datas is realized the data accumulation, deletion, association, maximum, minimum based on file mode, the SQL operation commonly used of database such as average.
Along with the development of server technology, very fast for direct data computation, this provides hardware foundation for data processing; The descriptive array of Hash array is directly located, for the data computing mode provides basis of software.Has outstanding extendability according to the implementation that opens and closes the principle design, growth data operation easily (calculating) such as certain the special formula that increases data.
Summary of the invention
A kind of method based on SQL like function of file processing mass data of the present invention is at above situation, based on communication network, adopt file mode, the convenience of imitation database manipulation, a kind of mass data processing scheme that realizes, so as to guaranteeing the high-level efficiency of mass data processing, make also that operation is very easy, flexibly and have a very strong extended capability.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of method based on SQL like function of file processing mass data may further comprise the steps:
A, carry out standardized format and handle gathering text that the back produces;
B, carry out mass data processing, use the mode of class SQL statement to carry out the related data operation based on standardized text file;
The result of C, operation can define and be output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues this document is carried out data manipulation, until reaching requirement as a result.
The document format data that steps A is used is as follows:
##STAR|HEADER
COMPANY|DEPARTMENT
Inspur|oss
##END|HEADER
##START|DATA_BLOCK
CELL|TRX|COUNT1|COUNT2|COUNT3
DF0001|1|1|2|3
DF0001|2|1|2|3
DF0001|3|1|2|3
DF0001|4|1|2|3
DF0002|1|1|2|3
DF0002|2|1|2|3
DF0002|3|1|2|3
DF0002|4|1|2|3
##END|DATA_BLOCK
Use the disengaging database scheme at mass data among the step B, carry out operation, in this operating process, used the mode of class SQL statement, promptly pass through the processing of the predicate realization of SQL statement data based on text.
Adopted among the step C exporting redefining of result, promptly can reuse the output of any operation, realized data are exported result's requirement by class SQL operation an original input.
Realization of the present invention is adopted and is opened and closed the principle design, realizes the flexible expansion and the customization exploitation of class SQL statement.
Opening and closing principle is one of Object-Oriented Design method, ' open to demand, modification is closed ', the meaning is that the system that realizes has high flexible expansion ability, the demand that proposes for the user can be unlimited admittance, but do not need to revise original program, only need realize getting final product at the new demand that proposes of user.The content that realizes promptly can realize the function that the user needs by calling of former function.The function of tonic chord that realization is called new expanded function is referred to as the engine mode.In this example, can be understood as aggregate function or other specific (special) requirements used in the SQL statement of adding other if desired, can realize this function, describe according to configuration then, just can in this function, use by coding.The purpose of this patent is to realize the efficient, convenient of mass data handled, and conveniently is exactly that finger print has been intended this mode that everybody is familiar with of SQL operation.
Of the present inventionly a kind ofly handle the method for mass data, both guaranteed the high-level efficiency of mass data processing, make also that operation is very easy, flexibly and have a very strong extended capability based on SQL like function of file.Its characteristics mainly contain:
1, breaking away from data of database handles
Timely, the complete process of mass data are the keys that realizes network management.But based on the processing of database mode, not only take database resource in a large number, and because the restriction of database processing ability can't obtain timely data.Influenced the use of user, and influenced based on the derived data application use of (data that produce by basic data and certain judgment rule are called derived data) to other application functions.Class database language operation by based on file mode can improve service efficiency, and the use of database is given finally to use the user as far as possible, saves investment.
The adding up of data processing index certificate, delete, operation such as related, maximum, minimum, average.These operations also are the operations commonly used of data processing in the database.
2, the class SQL based on file operates
Can realize any processing by programming to some data in the data file, but these handle and personalization is just arranged and be not easy to be called, use also inconvenient.
By opening and closing the principle design architecture, be convenient to call and use among the present invention, mainly realize function such as following table:
Function SQL operation of equal value Remarks
Add up Select sum (a) from tab where condition group by col1, col2 Can be by the grouping condition setting, and can calculate again summation earlier according to arithmetic
Deletion Delete* from tab where condition Different with SQL operation, in this deletion action, can also be set to keep, that is to say staying of the not operation of the condition of doing-satisfy condition
Down, the deletion that does not satisfy condition
Related Select a.col, b.col From a, b Where condition The association of multilist
Maximum Select max (col) From table Where condition Group by grouping row Get the maximal value under the branch set condition
Minimum Select min (col) From table Where condition Group by grouping row Get the minimum value under the branch set condition
On average Select avg (col) From table Where condition Group by grouping row Get the mean value under the branch set condition
3, intermediate data operation
The processing of data just can not obtain net result through a SQL statement under many circumstances, often needs just can achieve the goal through the operation of several steps, has inevitably used the temporary table storage intermediate data of database in this process.
In order to make data processing more flexible, also has processing operation in the present invention to middle data.Also can produce intermediate result (temporary table) by being provided with in the file handling procedure, can carry out same operation this middle table, promptly this intermediate file as original pending data file.In this way, can be divided into several steps to the data computation of complexity and realize, improve practicality and adaptability.
4, meet the driving engine that opens and closes principle
Each data processing function can be articulated on the master routine easily as plug-in unit, finishes function by the driven by engine of master routine.The operation that needs to carry out just can be finished function by configuration setting, and the data processing function that increases newly also can call and use easily according to same setting.
The system that realizes in this invention is that open, extendible, realizes the processing of data by driving engine.Drive engine and have multistage daily record measure and debug mechanism, can find the problem that exists easily.
Description of drawings
Fig. 1 is a kind of method flow diagram of handling the method for mass data based on SQL like function of file of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments a kind of method based on SQL like function of file processing mass data of the present invention is further described.
Based on the data processing scheme of file, made full use of present computer hardware, pending data are put in the internal memory by array handle, the speed of data processing is accelerated greatly.Array is a treatment variable mode commonly used in the program, but array generally is to arrange foundation with the numeral, can't directly find the name variable (general by the traversal array, as relatively to realize with value corresponding) that needs processing; The Hash array of introducing in this programme has been avoided this shortcoming, can be directly with the subscript of variable as array, difference and array data.The use of Hash array makes direct operation to variable become convenient and flexible, and has accelerated data processing speed.
Drive engine and adopt the design of switching principle, the so-called principle that opens and closes is exactly ' sealing revising, open to demand ', and this characteristic makes engine have stronger adaptability and better expansibility.
In order to make data processing more convenient and flexible, introduced the pattern of ephemeral data record, stipulated that in system input has identical form with temporary file, the convenient and speed of the processing of taking into account system, form is done following requirement:
##START|HEADER
Figure A20081024973000111
##END|HEADER
##START|DATA_BLOCK_________ piece name is used for distinguishes data; The START mark
Figure A20081024973000112
##END|DATA_BLOCK_______END indicates the end of this blocks of data
Master routine as shown in Figure 1.
These operations do not have dependence successively in use, and can repeatedly be called in a processing, can a plurality of operation associated treatment reach certain requirement result yet.To the whole network granularity, can do deletion action as a data accumulation that satisfies certain condition earlier one time, operate the data of realization demand then by adding up.
Program call processing with specific data, realize by the rule configuration.Following configuration is the configuration of data accumulation rule:
Title Explanation Remarks Fill in Type
RULE_TYPE The title of functional module SUM Must fill out Scalar
RULE_DESC The description explanation of carrying out, this content will appear in the daily record Rule description Optional.Do not fill out the content that then in daily record, shows RULE_TYPE. Scalar
INPUT_FILE_DESCRIPTION Handle the name of file Can be according to canonical Must fill out Array
Claim to describe Expression formula is described
OPUPUT_BLOCK_NAME The title of output block Can distinguish with raw data and distinguish Must fill out Scalar
COUNTERS_TO_SORT_ON The sort field that adds up, the condition field that promptly adds up Can a plurality of fields Must fill out Array
REDUNDANT_COUNTERS Unnecessary counter tabulation, a plurality of middle using ", " cut apart Unwanted row in the file that produces Optional, acquiescence does not have Array
PRODUCE_PIF Interim formatted file in the middle of producing True-produces, and 0-does not produce Optional, acquiescence produces Scalar
PRODUCE_LIF Produce the warehouse-in formatted file True-produces, and 0-does not produce Optional, acquiescence produces Scalar
NON_ADDITIVE_COUNTERS Do not need the explanation tabulation of the field that adds up, a plurality of in the middle of with ", " cut apart As title, the time etc. do not need to add up, and the field of the ordering that adds up does not need to specify again, and this field is not done and added up Optional, acquiescence does not have Array
APPEND_STR The additional character string promptly adds this character string after the row title that participation adds up Be not provided with, then ignore this option Optional, acquiescence does not have Scalar
OLD_COUNTER_NAMES The row name list that need rename Be not provided with, then ignore this option Optional, acquiescence does not have Array
NEW_COUNTER_NAMES Row title after renaming, corresponding with last list placement Be not provided with, then ignore this option Optional, acquiescence does not have Array
OUTPUT_DIR The path of warehouse-in document storage The specific position of can specified file depositing Optional, acquiescence is seen note Scalar
Keep_files The backup path of warehouse-in document storage If be not provided with then do not back up, backup mainly is to provide data source to the third party. Optional, acquiescence is seen note Scalar
COMPUTE_EXPRESSION Calculated column is expressed formula Optional, acquiescence does not have Array
COMPUTE_NAME The name of output Optional, acquiescence does not have Array
Must fill out is the item that must be provided with in the configuration, and option can not dispose when using.Object lesson is as follows:
' RUL E TYPE '=>' ACCUMULATE ', the handle that accumulation function is called
‘RULE_DESC’ =>‘Acccumulate IN’,
‘PRODUCE_PIF’ =>‘True’,
‘PRODUCE_LIF’ =>0,
‘OUTPUT_BLOCK_NAME’ =>‘NICELASS_0’,
' INPUT _ FILE-DESCRIPTION '=>[' NICELASS#*#E, pif] import file name, but wildcard
' COU NTERS_TO_SORT_ON '=>[' OBJ_ID_1 '] SQL statement in, the variable name of GROUP BY part
' COMP UTE_EXPRESSION '=>[' COL1/COL2 '] two variablees do and remove operation, obtains one and newly be listed as
' COMP UTE_NAME '=>[' COMPUTE_1 '] new row title, add up by the value after calculating among the result
‘APPEND_STR’=>‘_0’.
Above-described embodiment, the present invention embodiment a kind of more preferably just, the common variation that those skilled in the art carries out in the technical solution of the present invention scope and replacing all should be included in protection scope of the present invention.

Claims (4)

1, a kind of method based on SQL like function of file processing mass data may further comprise the steps:
A, carry out standardized format and handle gathering text that the back produces;
B, carry out mass data processing, use the mode of class SQL statement to carry out the related data operation based on standardized text file;
The result of C, operation can define and be output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues this document is carried out data manipulation, until reaching requirement as a result.
2, a kind of method based on SQL like function of file processing mass data according to claim 1 is characterized in that the document format data that described steps A is used is as follows:
##STAR|HEADER
COMPANY|DEPARTMENT
Inspur|oss
##END|HEADER
##START|DATA_BLOCK
CELL|TRX|COUNT1|COUNT2|COUNT3
DF0001|1|1|2|3
DF0001|2|1|2|3
DF0001|3|1|2|3
DF0001|4|1|2|3
DF0002|1|1|2|3
DF0002|2|1|2|3
DF0002|3|1|2|3
DF0002|4|1|2|3
##END|DATA_BLOCK
3, a kind of method of handling mass data based on SQL like function of file according to claim 1, it is characterized in that, use the disengaging database scheme at mass data among the described step B, carry out operation based on text, in this operating process, used the mode of class SQL statement, promptly passed through the processing of the predicate realization of SQL statement data.
4, a kind of method of handling mass data based on SQL like function of file according to claim 1, it is characterized in that, adopted redefining among the described step C to the output result, promptly can be to an original input, reuse the output of any operation, realize data are exported result's requirement by class SQL operation.
CN200810249730XA 2008-12-30 2008-12-30 Method for processing mass data based on SQL like function of file Expired - Fee Related CN101593197B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810249730XA CN101593197B (en) 2008-12-30 2008-12-30 Method for processing mass data based on SQL like function of file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810249730XA CN101593197B (en) 2008-12-30 2008-12-30 Method for processing mass data based on SQL like function of file

Publications (2)

Publication Number Publication Date
CN101593197A true CN101593197A (en) 2009-12-02
CN101593197B CN101593197B (en) 2011-10-05

Family

ID=41407855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810249730XA Expired - Fee Related CN101593197B (en) 2008-12-30 2008-12-30 Method for processing mass data based on SQL like function of file

Country Status (1)

Country Link
CN (1) CN101593197B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163231A (en) * 2011-04-13 2011-08-24 浪潮(北京)电子信息产业有限公司 Method and device for data collection
CN102541884A (en) * 2010-12-10 2012-07-04 中国移动通信集团贵州有限公司 Method and device for database optimization
CN103425779A (en) * 2013-08-19 2013-12-04 曙光信息产业股份有限公司 Data processing method and data processing device
US8924260B1 (en) * 2014-02-04 2014-12-30 Scvngr, Inc. Dynamic ingestion and processing of transactional data at the point of sale
US9530289B2 (en) 2013-07-11 2016-12-27 Scvngr, Inc. Payment processing with automatic no-touch mode selection
CN107577803A (en) * 2017-09-25 2018-01-12 北京维联众诚科技有限公司 Data processing method based on class SQL engines
US11481754B2 (en) 2012-07-13 2022-10-25 Scvngr, Inc. Secure payment method and system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541884A (en) * 2010-12-10 2012-07-04 中国移动通信集团贵州有限公司 Method and device for database optimization
CN102541884B (en) * 2010-12-10 2014-07-02 中国移动通信集团贵州有限公司 Method and device for database optimization
CN102163231A (en) * 2011-04-13 2011-08-24 浪潮(北京)电子信息产业有限公司 Method and device for data collection
US11481754B2 (en) 2012-07-13 2022-10-25 Scvngr, Inc. Secure payment method and system
US9530289B2 (en) 2013-07-11 2016-12-27 Scvngr, Inc. Payment processing with automatic no-touch mode selection
CN103425779A (en) * 2013-08-19 2013-12-04 曙光信息产业股份有限公司 Data processing method and data processing device
US8924260B1 (en) * 2014-02-04 2014-12-30 Scvngr, Inc. Dynamic ingestion and processing of transactional data at the point of sale
US20150220898A1 (en) * 2014-02-04 2015-08-06 Seth Priebatsch Dynamic ingestion and processing of transactional data at the point of sale
US10489764B2 (en) * 2014-02-04 2019-11-26 Scvngr, Inc. Dynamic ingestion and processing of transactional data at the point of sale
CN107577803A (en) * 2017-09-25 2018-01-12 北京维联众诚科技有限公司 Data processing method based on class SQL engines

Also Published As

Publication number Publication date
CN101593197B (en) 2011-10-05

Similar Documents

Publication Publication Date Title
CN101593197B (en) Method for processing mass data based on SQL like function of file
Buneman et al. Comprehension syntax
Hoffer et al. The use of cluster analysis in physical data base design
CN102542007B (en) Method and system for synchronization of relational databases
CN109559231B (en) Block chain-oriented tracing query method
CN103761318B (en) A kind of method and system of relationship type synchronization of data in heterogeneous database
CN101542478B (en) Methods and apparatus for improving data warehouse performance
CN102289507B (en) Method for mining data flow weighted frequent mode based on sliding window
CN106951552A (en) A kind of user behavior data processing method based on Hadoop
CN104778540A (en) BOM (bill of material) management method and management system for building material equipment manufacturing
CN101944116B (en) Complex multi-dimensional hierarchical connection and aggregation method for data warehouse
US20230067182A1 (en) Data Processing Device and Method, and Computer Readable Storage Medium
CN101710336A (en) Method for accelerating data processing by using relational middleware
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN100589101C (en) Data access method based on the Oracle relational database of routine call interface
CN102508971B (en) Method for establishing product function model in concept design stage
CN112651618A (en) Construction method of audit dimension model for online audit of metering data
CN115687468A (en) System for processing data in distributed service by ETL process button
US11928083B2 (en) Determining collaboration recommendations from file path information
CN110825718A (en) Information system data architecture model and construction method thereof
CN115145736B (en) Cloud platform quota intelligent distribution system based on Spark distributed computing
CN100403308C (en) SQL load mining-based automatic design method for physical database
CN109754131B (en) SCD file configuration method and device based on NXD
CN111737268B (en) Data processing method based on document database
Ru Design of Archives Management System Based on Data Mining Technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111005

Termination date: 20131230