CN101593197A

CN101593197A - A kind of method of handling mass data based on SQL like function of file

Info

Publication number: CN101593197A
Application number: CNA200810249730XA
Authority: CN
Inventors: 祝乃国
Original assignee: Inspur Communication Information System Co Ltd
Current assignee: Inspur Communication Information System Co Ltd
Priority date: 2008-12-30
Filing date: 2008-12-30
Publication date: 2009-12-02
Anticipated expiration: 2028-12-30
Also published as: CN101593197B

Abstract

The present invention relates to the data processing field in the communication network, a kind of method of handling mass data based on SQL like function of file specifically is provided.A kind of method based on SQL like function of file processing mass data of the present invention may further comprise the steps: A. carries out standardized format to the text of gathering the back generation to be handled; B. carry out mass data processing based on standardized text file, use the mode of class SQL statement to carry out the related data operation; C. Cao Zuo result can define and be output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues this document is carried out data manipulation, until reaching requirement as a result.The present invention combines the advantage separately of file and database mode, has both guaranteed the high-level efficiency of mass data processing, makes also that operation is very easy, flexibly and have a very strong extended capability.

Description

A kind of method of handling mass data based on SQL like function of file

Technical field

The present invention relates to the data processing field in the communication network, a kind of method of handling mass data based on SQL like function of file specifically is provided.

Background technology

Development of computer is synchronous with the adaptation data processing, is synergistic.The data processing at initial stage all exists in the data file mode: file mode storage data have a lot of shortcomings, such as operation inconvenience, can not be multiplexing, lack standard.So in evolution, produced various relevant databases, promoted the development of data handling utility.

To the processing of mass data, at data characteristic, select corresponding processing mode, be the key that improves treatment effeciency.The measure that mass data processing is at present raised the efficiency mainly contains:

● select outstanding data base tool for use;

● write good program code;

● mass data is carried out division operation;

● set up index widely;

● improve hardware condition, strengthen CPU and internal memory;

● set up caching mechanism;

● strengthen virtual memory;

● batch treatment;

● optimize the query SQL statement;

● use text formatting to handle;

● customize powerful cleaning rule and error handling processing mechanism;

● set up view or Materialized View;

● avoid using 32 servers (extreme case);

● consider the operating system problem;

● use data warehouse and multidimensional data library storage;

● use sampled data, carry out data mining;

● memory database.

China is vast in territory, and network such as electric power, communication is integration operation, causes network size huge.But these networks all have its design feature, can select suitable mass data processing mode to improve data processing efficiency.The management of communication network has its regularity, and each node of forming network is the base unit of management, carries out statistical study respectively according to administrative area, Local Area Network, overall network, is referred to as the management of network element granularity; The generation of data can be 5 minutes, 15 minutes, 60 minutes uniformly-spaced modes according to the time tissue, generally requires 60 minutes (1 hour), day, week, month, year etc. in statistical study, is referred to as time granularity.

Produce the data of certain time according to different node (network element) on data produce, may there be time delay in the network element data in the whole network owing to some reasons on data produce.

The network management requirements data must be real-time, and the data of analysis must be complete.The characteristics that data Network Based produce, in the data acquisition of adopting database to carry out, gather etc. in the operation and need to do a large amount of marks, the time point that gathers as record mark data acquisition, data.And data gather the big data quantity operation of itself, have taken the ample resources of database itself, make database provide the ability of service to weaken to the user; The time-delay that data produce causes gathering difference in the triggering in data, thereby may cause the imperfect of data.All all be unable to do without database based on the optimization and the operation of database; The restriction of database manipulation has caused a little less than untimely, imperfect, the external service provision capacity of data.

In the data that network element produces, network element granularity relation has clear and definite sign, and network element is the base unit that produces data, according to these characteristics these mass datas is realized the data accumulation, deletion, association, maximum, minimum based on file mode, the SQL operation commonly used of database such as average.

Along with the development of server technology, very fast for direct data computation, this provides hardware foundation for data processing; The descriptive array of Hash array is directly located, for the data computing mode provides basis of software.Has outstanding extendability according to the implementation that opens and closes the principle design, growth data operation easily (calculating) such as certain the special formula that increases data.

Summary of the invention

A kind of method based on SQL like function of file processing mass data of the present invention is at above situation, based on communication network, adopt file mode, the convenience of imitation database manipulation, a kind of mass data processing scheme that realizes, so as to guaranteeing the high-level efficiency of mass data processing, make also that operation is very easy, flexibly and have a very strong extended capability.

The technical solution adopted for the present invention to solve the technical problems is:

A kind of method based on SQL like function of file processing mass data may further comprise the steps:

A, carry out standardized format and handle gathering text that the back produces;

B, carry out mass data processing, use the mode of class SQL statement to carry out the related data operation based on standardized text file;

The result of C, operation can define and be output as net result or intermediate result, for the form after the intermediate result conformance with standardization, continues this document is carried out data manipulation, until reaching requirement as a result.

The document format data that steps A is used is as follows:

##STAR|HEADER

COMPANY|DEPARTMENT

Inspur|oss

##END|HEADER

##START|DATA_BLOCK

CELL|TRX|COUNT1|COUNT2|COUNT3

DF0001|1|1|2|3

DF0001|2|1|2|3

DF0001|3|1|2|3

DF0001|4|1|2|3

DF0002|1|1|2|3

DF0002|2|1|2|3

DF0002|3|1|2|3

DF0002|4|1|2|3

##END|DATA_BLOCK

。

Use the disengaging database scheme at mass data among the step B, carry out operation, in this operating process, used the mode of class SQL statement, promptly pass through the processing of the predicate realization of SQL statement data based on text.

Adopted among the step C exporting redefining of result, promptly can reuse the output of any operation, realized data are exported result's requirement by class SQL operation an original input.

Realization of the present invention is adopted and is opened and closed the principle design, realizes the flexible expansion and the customization exploitation of class SQL statement.

Opening and closing principle is one of Object-Oriented Design method, ' open to demand, modification is closed ', the meaning is that the system that realizes has high flexible expansion ability, the demand that proposes for the user can be unlimited admittance, but do not need to revise original program, only need realize getting final product at the new demand that proposes of user.The content that realizes promptly can realize the function that the user needs by calling of former function.The function of tonic chord that realization is called new expanded function is referred to as the engine mode.In this example, can be understood as aggregate function or other specific (special) requirements used in the SQL statement of adding other if desired, can realize this function, describe according to configuration then, just can in this function, use by coding.The purpose of this patent is to realize the efficient, convenient of mass data handled, and conveniently is exactly that finger print has been intended this mode that everybody is familiar with of SQL operation.

Of the present inventionly a kind ofly handle the method for mass data, both guaranteed the high-level efficiency of mass data processing, make also that operation is very easy, flexibly and have a very strong extended capability based on SQL like function of file.Its characteristics mainly contain:

1, breaking away from data of database handles

Timely, the complete process of mass data are the keys that realizes network management.But based on the processing of database mode, not only take database resource in a large number, and because the restriction of database processing ability can't obtain timely data.Influenced the use of user, and influenced based on the derived data application use of (data that produce by basic data and certain judgment rule are called derived data) to other application functions.Class database language operation by based on file mode can improve service efficiency, and the use of database is given finally to use the user as far as possible, saves investment.

The adding up of data processing index certificate, delete, operation such as related, maximum, minimum, average.These operations also are the operations commonly used of data processing in the database.

2, the class SQL based on file operates

Can realize any processing by programming to some data in the data file, but these handle and personalization is just arranged and be not easy to be called, use also inconvenient.

By opening and closing the principle design architecture, be convenient to call and use among the present invention, mainly realize function such as following table:

Function	SQL operation of equal value	Remarks
Function	SQL operation of equal value	Remarks	Add up	Select sum (a) from tab where condition group by col1, col2	Can be by the grouping condition setting, and can calculate again summation earlier according to arithmetic
Deletion	Delete* from tab where condition	Different with SQL operation, in this deletion action, can also be set to keep, that is to say staying of the not operation of the condition of doing-satisfy condition	Add up	Select sum (a) from tab where condition group by col1, col2

		Down, the deletion that does not satisfy condition
		Down, the deletion that does not satisfy condition	Related	Select a.col, b.col From a, b Where condition	The association of multilist
Maximum	Select max (col) From table Where condition Group by grouping row	Get the maximal value under the branch set condition	Related	Select a.col, b.col From a, b Where condition	The association of multilist
Maximum		Get the maximal value under the branch set condition	Minimum	Select min (col) From table Where condition Group by grouping row	Get the minimum value under the branch set condition
On average	Select avg (col) From table Where condition Group by grouping row	Get the mean value under the branch set condition	Minimum		Get the minimum value under the branch set condition

3, intermediate data operation

The processing of data just can not obtain net result through a SQL statement under many circumstances, often needs just can achieve the goal through the operation of several steps, has inevitably used the temporary table storage intermediate data of database in this process.

In order to make data processing more flexible, also has processing operation in the present invention to middle data.Also can produce intermediate result (temporary table) by being provided with in the file handling procedure, can carry out same operation this middle table, promptly this intermediate file as original pending data file.In this way, can be divided into several steps to the data computation of complexity and realize, improve practicality and adaptability.

4, meet the driving engine that opens and closes principle

Each data processing function can be articulated on the master routine easily as plug-in unit, finishes function by the driven by engine of master routine.The operation that needs to carry out just can be finished function by configuration setting, and the data processing function that increases newly also can call and use easily according to same setting.

The system that realizes in this invention is that open, extendible, realizes the processing of data by driving engine.Drive engine and have multistage daily record measure and debug mechanism, can find the problem that exists easily.

Description of drawings

Fig. 1 is a kind of method flow diagram of handling the method for mass data based on SQL like function of file of the present invention.

Embodiment

Below in conjunction with the drawings and specific embodiments a kind of method based on SQL like function of file processing mass data of the present invention is further described.

Based on the data processing scheme of file, made full use of present computer hardware, pending data are put in the internal memory by array handle, the speed of data processing is accelerated greatly.Array is a treatment variable mode commonly used in the program, but array generally is to arrange foundation with the numeral, can't directly find the name variable (general by the traversal array, as relatively to realize with value corresponding) that needs processing; The Hash array of introducing in this programme has been avoided this shortcoming, can be directly with the subscript of variable as array, difference and array data.The use of Hash array makes direct operation to variable become convenient and flexible, and has accelerated data processing speed.

Drive engine and adopt the design of switching principle, the so-called principle that opens and closes is exactly ' sealing revising, open to demand ', and this characteristic makes engine have stronger adaptability and better expansibility.

In order to make data processing more convenient and flexible, introduced the pattern of ephemeral data record, stipulated that in system input has identical form with temporary file, the convenient and speed of the processing of taking into account system, form is done following requirement:

##START|HEADER

##END|HEADER

##START|DATA_BLOCK_________ piece name is used for distinguishes data; The START mark

##END|DATA_BLOCK_______END indicates the end of this blocks of data

Master routine as shown in Figure 1.

These operations do not have dependence successively in use, and can repeatedly be called in a processing, can a plurality of operation associated treatment reach certain requirement result yet.To the whole network granularity, can do deletion action as a data accumulation that satisfies certain condition earlier one time, operate the data of realization demand then by adding up.

Program call processing with specific data, realize by the rule configuration.Following configuration is the configuration of data accumulation rule:

Title	Explanation	Remarks	Fill in	Type
Title	Explanation	Remarks	Fill in	Type	RULE_TYPE	The title of functional module	SUM	Must fill out	Scalar
RULE_DESC	The description explanation of carrying out, this content will appear in the daily record	Rule description	Optional.Do not fill out the content that then in daily record, shows RULE_TYPE.	Scalar	RULE_TYPE	The title of functional module	SUM	Must fill out	Scalar
RULE_DESC		Rule description		Scalar	INPUT_FILE_DESCRIPTION	Handle the name of file	Can be according to canonical	Must fill out	Array

	Claim to describe	Expression formula is described
	Claim to describe	Expression formula is described			OPUPUT_BLOCK_NAME	The title of output block	Can distinguish with raw data and distinguish	Must fill out	Scalar
COUNTERS_TO_SORT_ON	The sort field that adds up, the condition field that promptly adds up	Can a plurality of fields	Must fill out	Array	OPUPUT_BLOCK_NAME	The title of output block	Can distinguish with raw data and distinguish	Must fill out	Scalar
COUNTERS_TO_SORT_ON		Can a plurality of fields	Must fill out	Array	REDUNDANT_COUNTERS	Unnecessary counter tabulation, a plurality of middle using ", " cut apart	Unwanted row in the file that produces	Optional, acquiescence does not have	Array
PRODUCE_PIF	Interim formatted file in the middle of producing	True-produces, and 0-does not produce	Optional, acquiescence produces	Scalar	REDUNDANT_COUNTERS		Unwanted row in the file that produces	Optional, acquiescence does not have	Array
PRODUCE_PIF	Interim formatted file in the middle of producing	True-produces, and 0-does not produce	Optional, acquiescence produces	Scalar	PRODUCE_LIF	Produce the warehouse-in formatted file	True-produces, and 0-does not produce	Optional, acquiescence produces	Scalar
NON_ADDITIVE_COUNTERS	Do not need the explanation tabulation of the field that adds up, a plurality of in the middle of with ", " cut apart	As title, the time etc. do not need to add up, and the field of the ordering that adds up does not need to specify again, and this field is not done and added up	Optional, acquiescence does not have	Array	PRODUCE_LIF	Produce the warehouse-in formatted file	True-produces, and 0-does not produce	Optional, acquiescence produces	Scalar
NON_ADDITIVE_COUNTERS			Optional, acquiescence does not have	Array	APPEND_STR	The additional character string promptly adds this character string after the row title that participation adds up	Be not provided with, then ignore this option	Optional, acquiescence does not have	Scalar
OLD_COUNTER_NAMES	The row name list that need rename	Be not provided with, then ignore this option	Optional, acquiescence does not have	Array	APPEND_STR		Be not provided with, then ignore this option	Optional, acquiescence does not have	Scalar

NEW_COUNTER_NAMES	Row title after renaming, corresponding with last list placement	Be not provided with, then ignore this option	Optional, acquiescence does not have	Array
NEW_COUNTER_NAMES		Be not provided with, then ignore this option	Optional, acquiescence does not have	Array	OUTPUT_DIR	The path of warehouse-in document storage	The specific position of can specified file depositing	Optional, acquiescence is seen note	Scalar
Keep_files	The backup path of warehouse-in document storage	If be not provided with then do not back up, backup mainly is to provide data source to the third party.	Optional, acquiescence is seen note	Scalar	OUTPUT_DIR	The path of warehouse-in document storage	The specific position of can specified file depositing	Optional, acquiescence is seen note	Scalar
Keep_files	The backup path of warehouse-in document storage		Optional, acquiescence is seen note	Scalar	COMPUTE_EXPRESSION	Calculated column is expressed formula		Optional, acquiescence does not have	Array
COMPUTE_NAME	The name of output		Optional, acquiescence does not have	Array	COMPUTE_EXPRESSION	Calculated column is expressed formula		Optional, acquiescence does not have	Array

Must fill out is the item that must be provided with in the configuration, and option can not dispose when using.Object lesson is as follows:

' RUL E TYPE '=＞' ACCUMULATE ', the handle that accumulation function is called

‘RULE_DESC’ ＝＞‘Acccumulate IN’，

‘PRODUCE_PIF’ ＝＞‘True’，

‘PRODUCE_LIF’ ＝＞0，

‘OUTPUT_BLOCK_NAME’ ＝＞‘NICELASS_0’，

' INPUT _ FILE-DESCRIPTION '=＞[' NICELASS#*#E, pif] import file name, but wildcard

' COU NTERS_TO_SORT_ON '=＞[' OBJ_ID_1 '] SQL statement in, the variable name of GROUP BY part

' COMP UTE_EXPRESSION '=＞[' COL1/COL2 '] two variablees do and remove operation, obtains one and newly be listed as

' COMP UTE_NAME '=＞[' COMPUTE_1 '] new row title, add up by the value after calculating among the result

‘APPEND_STR’＝＞‘_0’.

Above-described embodiment, the present invention embodiment a kind of more preferably just, the common variation that those skilled in the art carries out in the technical solution of the present invention scope and replacing all should be included in protection scope of the present invention.

Claims

1, a kind of method based on SQL like function of file processing mass data may further comprise the steps:

2, a kind of method based on SQL like function of file processing mass data according to claim 1 is characterized in that the document format data that described steps A is used is as follows:

##STAR|HEADER

COMPANY|DEPARTMENT

Inspur|oss

##END|HEADER

##START|DATA_BLOCK

CELL|TRX|COUNT1|COUNT2|COUNT3

DF0001|1|1|2|3

DF0001|2|1|2|3

DF0001|3|1|2|3

DF0001|4|1|2|3

DF0002|1|1|2|3

DF0002|2|1|2|3

DF0002|3|1|2|3

DF0002|4|1|2|3

##END|DATA_BLOCK

。

3, a kind of method of handling mass data based on SQL like function of file according to claim 1, it is characterized in that, use the disengaging database scheme at mass data among the described step B, carry out operation based on text, in this operating process, used the mode of class SQL statement, promptly passed through the processing of the predicate realization of SQL statement data.

4, a kind of method of handling mass data based on SQL like function of file according to claim 1, it is characterized in that, adopted redefining among the described step C to the output result, promptly can be to an original input, reuse the output of any operation, realize data are exported result's requirement by class SQL operation.