CN101593197B

CN101593197B - Method for processing mass data based on SQL like function of file

Info

Publication number: CN101593197B
Application number: CN200810249730XA
Authority: CN
Inventors: 祝乃国
Original assignee: Inspur Communication Information System Co Ltd
Current assignee: Inspur Communication Information System Co Ltd
Priority date: 2008-12-30
Filing date: 2008-12-30
Publication date: 2011-10-05
Anticipated expiration: 2028-12-30
Also published as: CN101593197A

Abstract

The invention relates to the field of data processing in telecommunication network, and in particular provides a method for processing mass data based on the SQL like function of a file. The method for processing the mass data based on the SQL like function of the file comprises the following steps: A, carrying out standardized format treatment on a text file generated after acquiring; B, processing the mass data based on the standardized text file, and performing related data operation in a mode of SQL like statements; and c, defining and outputting the operating result as a final result andan intermediate result, and continuing performing data operation on the file until the requirement of the result is met when the intermediate result accords with the standardized format. The method combines the respective advantages of the file and a database mode, ensures the high efficiency of mass data processing, has quite convenient and flexible operation and has strong expansion capability.

Description

A kind of method of handling mass data based on SQL like function of file

Technical field

The present invention relates to the data processing field in the communication network, a kind of method of handling mass data based on SQL like function of file specifically is provided.

Background technology

Development of computer is synchronous with the adaptation data processing, is synergistic.The data processing at initial stage all exists in the data file mode: file mode storage data have a lot of shortcomings, such as operation inconvenience, can not be multiplexing, lack standard.So in evolution, produced various relevant databases, promoted the development of data handling utility.

To the processing of mass data, at data characteristic, select corresponding processing mode, be the key that improves treatment effeciency.The measure that mass data processing is at present raised the efficiency mainly contains:

● select outstanding data base tool for use;

● write good program code;

● mass data is carried out division operation;

● set up index widely;

● improve hardware condition, strengthen CPU and internal memory;

● set up caching mechanism;

● strengthen virtual memory;

● batch treatment;

● optimize the query SQL statement;

● use text formatting to handle;

● customize powerful cleaning rule and error handling processing mechanism;

● set up view or Materialized View;

● avoid using 32 servers (extreme case);

● consider the operating system problem;

● use data warehouse and multidimensional data library storage;

● use sampled data, carry out data mining;

● memory database.

China is vast in territory, and network such as electric power, communication is integration operation, causes network size huge.But these networks all have its design feature, can select suitable mass data processing mode to improve data processing efficiency.The management of communication network has its regularity, and each node of forming network is the base unit of management, carries out statistical study respectively according to administrative area, Local Area Network, overall network, is referred to as the management of network element granularity; The generation of data can be 5 minutes, 15 minutes, 60 minutes uniformly-spaced modes according to the time tissue, generally requires 60 minutes (1 hour), day, week, month, year etc. in statistical study, is referred to as time granularity.

Produce the data of certain time according to different node (network element) on data produce, may there be time delay in the network element data in the whole network owing to some reasons on data produce.

The network management requirements data must be real-time, and the data of analysis must be complete.The characteristics that data Network Based produce, in the data acquisition of adopting database to carry out, gather etc. in the operation and need to do a large amount of marks, the time point that gathers as record mark data acquisition, data.And data gather the big data quantity operation of itself, have taken the ample resources of database itself, make database provide the ability of service to weaken to the user; The time-delay that data produce causes gathering difference in the triggering in data, thereby may cause the imperfect of data.All all be unable to do without database based on the optimization and the operation of database; The restriction of database manipulation has caused a little less than untimely, imperfect, the external service provision capacity of data.

In the data that network element produces, network element granularity relation has clear and definite sign, and network element is the base unit that produces data, according to these characteristics these mass datas is realized the data accumulation, deletion, association, maximum, minimum based on file mode, the SQL operation commonly used of database such as average.

Along with the development of server technology, very fast for direct data computation, this provides hardware foundation for data processing; The descriptive array of Hash array is directly located, for the data computing mode provides basis of software.Has outstanding extendability according to the implementation that opens and closes the principle design, growth data operation easily (calculating) such as certain the special formula that increases data.

Summary of the invention

A kind of method based on SQL like function of file processing mass data of the present invention is at above situation, based on communication network, adopt file mode, the convenience of imitation database manipulation, a kind of mass data processing scheme that realizes, so as to guaranteeing the high-level efficiency of mass data processing, make also that operation is very easy, flexibly and have a very strong extended capability.

The technical solution adopted for the present invention to solve the technical problems is:

A kind of method based on SQL like function of file processing mass data may further comprise the steps:

A, carry out standardized format and handle gathering text that the back produces;

B, carry out mass data processing, use the mode of class SQL statement to carry out the related data operation based on standardized text file;

The result of C, operation can define and be output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues this document is carried out data manipulation, until reaching requirement as a result.

The document format data that steps A is used is as follows:

##STAR|HEADER

COMPANY|DEPARTMENT

Inspur|oss

##END|HEADER

##START|DATA_BLOCK

CELL|TRX|COUNT1|COUNT2|COUNT3

DF0001|1|1|2|3

DF0001|2|1|2|3

DF0001|3|1|2|3

DF0001|4|1|2|3

DF0002|1|1|2|3

DF0002|2|1|2|3

DF0002|3|1|2|3

DF0002|4|1|2|3

##END|DATA_BLOCK

Use the disengaging database scheme at mass data among the step B, carry out operation, in this operating process, used the mode of class SQL statement, promptly pass through the processing of the predicate realization of SQL statement data based on text.

Adopted among the step C exporting redefining of result, promptly can reuse the output of any operation, realized data are exported result's requirement by class SQL operation an original input.

Realization of the present invention is adopted and is opened and closed the principle design, realizes the flexible expansion and the customization exploitation of class SQL statement.

Opening and closing principle is one of Object-Oriented Design method, ' open to demand, modification is closed ', the meaning is that the system that realizes has high flexible expansion ability, the demand that proposes for the user can be unlimited admittance, but do not need to revise original program, only need realize getting final product at the new demand that proposes of user.The content that realizes promptly can realize the function that the user needs by calling of former function.The function of tonic chord that realization is called new expanded function is referred to as the engine mode.In this example, can be understood as aggregate function or other specific (special) requirements used in the SQL statement of adding other if desired, can realize this function, describe according to configuration then, just can in this function, use by coding.The purpose of this patent is to realize the efficient, convenient of mass data handled, and conveniently is exactly that finger print has been intended this mode that everybody is familiar with of SQL operation.

Of the present inventionly a kind ofly handle the method for mass data, both guaranteed the high-level efficiency of mass data processing, make also that operation is very easy, flexibly and have a very strong extended capability based on SQL like function of file.Its characteristics mainly contain:

1, breaking away from data of database handles

Timely, the complete process of mass data are the keys that realizes network management.But based on the processing of database mode, not only take database resource in a large number, and because the restriction of database processing ability can't obtain timely data.Influenced the use of user, and influenced based on the derived data application use of (data that produce by basic data and certain judgment rule are called derived data) to other application functions.Class database language operation by based on file mode can improve service efficiency, and the use of database is given finally to use the user as far as possible, saves investment.

The adding up of data processing index certificate, delete, operation such as related, maximum, minimum, average.These operations also are the operations commonly used of data processing in the database.

2, the class SQL based on file operates

Can realize any processing by programming to some data in the data file, but these handle and personalization is just arranged and be not easy to be called, use also inconvenient.

By opening and closing the principle design architecture, be convenient to call and use among the present invention, mainly realize function such as following table:

Function	SQL operation of equal value	Remarks
			Add up	Select sum (a) from tab where condition group by col1, col2	Can be by the grouping condition setting, and can calculate again summation earlier according to arithmetic
Deletion	Delete* from tab where condition	Different with SQL operation, in this deletion action, can also be set to keep, that is to say staying of the not operation of the condition of doing-satisfy condition

?	?	Down, the deletion that does not satisfy condition
			Related	Select a.col, b.col From a, b Where condition	The association of multilist
Maximum	Select max (col) From table Where condition Group by grouping row	Get the maximal value under the branch set condition
			Minimum	Select min (col) From table Where condition Group by grouping row	Get the minimum value under the branch set condition
On average	Select avg (col) From table Where condition Group by grouping row	Get the mean value under the branch set condition

3, intermediate data operation

The processing of data just can not obtain net result through a SQL statement under many circumstances, often needs just can achieve the goal through the operation of several steps, has inevitably used the temporary table storage intermediate data of database in this process.

In order to make data processing more flexible, also has processing operation in the present invention to middle data.Also can produce intermediate result (temporary table) by being provided with in the file handling procedure, can carry out same operation this middle table, promptly this intermediate file as original pending data file.In this way, can be divided into several steps to the data computation of complexity and realize, improve practicality and adaptability.

4, meet the driving engine that opens and closes principle

Each data processing function can be articulated on the master routine easily as plug-in unit, finishes function by the driven by engine of master routine.The operation that needs to carry out just can be finished function by configuration setting, and the data processing function that increases newly also can call and use easily according to same setting.

The system that realizes in this invention is that open, extendible, realizes the processing of data by driving engine.Drive engine and have multistage daily record measure and debug mechanism, can find the problem that exists easily.

Description of drawings

Fig. 1 is a kind of method flow diagram of handling the method for mass data based on SQL like function of file of the present invention.

Embodiment

Below in conjunction with the drawings and specific embodiments a kind of method based on SQL like function of file processing mass data of the present invention is further described.

Based on the data processing scheme of file, made full use of present computer hardware, pending data are put in the internal memory by array handle, the speed of data processing is accelerated greatly.Array is a treatment variable mode commonly used in the program, but array generally is to arrange foundation with the numeral, can't directly find the name variable (general by the traversal array, as relatively to realize with value corresponding) that needs processing; The Hash array of introducing in this programme has been avoided this shortcoming, can be directly with the subscript of variable as array, difference and array data.The use of Hash array makes direct operation to variable become convenient and flexible, and has accelerated data processing speed.

Drive engine and adopt the design of switching principle, the so-called principle that opens and closes is exactly ' sealing revising, open to demand ', and this characteristic makes engine have stronger adaptability and better expansibility.

In order to make data processing more convenient and flexible, introduced the pattern of ephemeral data record, stipulated that in system input has identical form with temporary file, the convenient and speed of the processing of taking into account system, form is done following requirement:

##START|HEADER

##END|HEADER

##START|DATA_BLOCK_______________ piece name is used for distinguishes data; The START mark

Figure DEST_PATH_GA20168157200810249730X01D00012

##END|DATA_BLOCK_________________END indicates the end of this blocks of data

Master routine as shown in Figure 1.

These operations do not have dependence successively in use, and can repeatedly be called in a processing, can a plurality of operation associated treatment reach certain requirement result yet.To the whole network granularity, can do deletion action as a data accumulation that satisfies certain condition earlier one time, operate the data of realization demand then by adding up.

Program call processing with specific data, realize by the rule configuration.Following configuration is the configuration of data accumulation rule:

Title	Explanation	Remarks	Fill in	Type
					RULE_TYPE	The title of functional module	SUM	Must fill out	Scalar
RULE_DESC	The description explanation of carrying out, this content will appear in the daily record	Rule description	Optional.Do not fill out the content that then in daily record, shows RULE_TYPE.	Scalar
					INPUT_FILE_DESCRIPTION	Handle the name of file	Can be according to canonical	Must fill out	Array

?	Claim to describe	Expression formula is described	?	?
					?OPUPUT_BLOCK_NAME	The title of output block	Can distinguish with raw data and distinguish	Must fill out	Scalar
?COUNTERS_TO_SORT_ON	The sort field that adds up, the condition field that promptly adds up	Can a plurality of fields	Must fill out	Array
					?REDUNDANT_COUNTERS	Unnecessary counter tabulation, a plurality of middle using ", " cut apart	Unwanted row in the file that produces	Optional, acquiescence does not have	Array
?PRODUCE_PIF	Interim formatted file in the middle of producing	True-produces, and 0-does not produce	Optional, acquiescence produces	Scalar
					?PRODUCE_LIF	Produce the warehouse-in formatted file	True-produces, and 0-does not produce	Optional, acquiescence produces	Scalar
?NON_ADDITIVE_COUNTERS	Do not need the explanation tabulation of the field that adds up, a plurality of in the middle of with ", " cut apart	As title, the time etc. do not need to add up, and the field of the ordering that adds up does not need to specify again, and this field is not done and added up	Optional, acquiescence does not have	Array
					?APPEND_STR	The additional character string promptly adds this character string after the row title that participation adds up	Be not provided with, then ignore this option	Optional, acquiescence does not have	Scalar
?OLD_COUNTER_NAMES	The row name list that need rename	Be not provided with, then ignore this option	Optional, acquiescence does not have	Array

NEW_COUNTER_NAMES	Row title after renaming, corresponding with last list placement	Be not provided with, then ignore this option	Optional, acquiescence does not have	Array
					OUTPUT_DIR	The path of warehouse-in document storage	The specific position of can specified file depositing	Optional, acquiescence is seen note	Scalar
Keep_files	The backup path of warehouse-in document storage	If be not provided with then do not back up, backup mainly is to provide data source to the third party.	Optional, acquiescence is seen note	Scalar
					COMPUTE_EXPRESSION	Calculated column is expressed formula		Optional, acquiescence does not have	Array
COMPUTE_NAME	The name of output		Optional, acquiescence does not have	Array

Must fill out is the item that must be provided with in the configuration, and option can not dispose when using.Object lesson is as follows:

' RULE TYPE '=＞' ACCUMULATE ', the handle that accumulation function is called

‘RULE_DESC’ ＝＞‘Acccumulate?IN’，

‘PRODUCE_PIF’ ＝＞‘True’，

‘PRODUCE_LIF’ ＝＞0，

‘OUTPUT_BLOCK_NAME’ ＝＞‘NICELASS_0’

' INPUT_FILE_DESCRIPTION '=＞(' NICELASS#*#Epif) import file name, but wildcard

' COUNTERS_TO_SORT_ON '=＞(' OBJ_ID_1 ') SQL statement in, the variable name of GROUP BY part

' COMPUTE_EXPRESSION '=＞(' COL1/COL2 ') two variablees do and remove operation, obtains one and newly be listed as

' COMPUTE_NAME '=＞(' COMPUTE_1 ') new row title, add up by the value after calculating among the result

‘APPEND_STR’ ＝＞‘ ?0’.

Above-described embodiment, the present invention embodiment a kind of more preferably just, the common variation that those skilled in the art carries out in the technical solution of the present invention scope and replacing all should be included in protection scope of the present invention.

Claims

1. handle the method for mass data based on SQL like function of file for one kind, may further comprise the steps:

B, the text after handling based on standardized format carry out mass data processing, use the mode of class SQL statement to carry out the related data operation;

Result's definition of C, operation is output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues file is carried out data manipulation, until reaching requirement as a result;

The document format data that described steps A is used is as follows:

##STAR|HEADER

COMPANY|DEPARTMENT

Inspur|oss

##END|HEADER

##START?I?DATA_BLOCK

CELL|TRX|COLJNT1|COUNT2|COUNT3

DF0001|1|1|2|3

DF0001|2|1|2|3

DF0001|3|1|2|3

DF0001|4|1|2|3

DF0002|1|1|2|3

DF0002|2|1|2|3

DF0002|3|1|2|3

DF0002|4|1|2|3

##END|DATA_BLOCK

Wherein DATA_BLOCK is the piece name, is used for distinguishes data, and the START of this piece name front indicates the beginning of this blocks of data, the similar tables of data of piece; ' part between ##START I DATA_BLOCK ' and ' ##END|DATA_BLOCK ' is a data volume, every row representative is equivalent to the data line in the tables of data, the first behavior variable wherein is equivalent to the row of tables of data, and the END among the ##END|DATA_BLOCK indicates the end of these data;

Use the disengaging database scheme at mass data among the step B, carry out operation, in this operating process, used the mode of class SQL statement, promptly pass through the processing of the predicate realization of SQL statement data based on text;

Adopted among the step C exporting redefining of result,, reused the output of any operation, realized data are exported result's requirement by class SQL operation promptly to an original input.