CN105095247B - symbol data analysis method and system - Google Patents

symbol data analysis method and system Download PDF

Info

Publication number
CN105095247B
CN105095247B CN201410184644.0A CN201410184644A CN105095247B CN 105095247 B CN105095247 B CN 105095247B CN 201410184644 A CN201410184644 A CN 201410184644A CN 105095247 B CN105095247 B CN 105095247B
Authority
CN
China
Prior art keywords
data
database
database table
update
presetting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410184644.0A
Other languages
Chinese (zh)
Other versions
CN105095247A (en
Inventor
鲍明曦
朱源
何忠江
邓丽华
武翊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201410184644.0A priority Critical patent/CN105095247B/en
Publication of CN105095247A publication Critical patent/CN105095247A/en
Application granted granted Critical
Publication of CN105095247B publication Critical patent/CN105095247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of symbol data analysis method and systems, wherein method includes:When DB triggers monitor that data update occurs for the database table for the operation system for storing data with row storage mode, data update information is recorded in log recording table;Ranks storage converting unit reads newer data update information in log recording table in real time;If have presetting database table occur data update, from the corresponding position reading update data in the database of operation system and its be synchronized to row storage mode store database table data memory database in corresponding position;Presetting database table is the preset database table needed in real-time synchronization to memory database;Data symbol analytic unit symbolization data analysing method analyzes the data of the presetting database table updated the data, generates the symbol data table of each target variable range format in the presetting database table updated the data.Efficient real-time data analysis may be implemented in the embodiment of the present invention.

Description

Symbol data analysis method and system
Technical field
The present invention relates to computer technology, especially a kind of symbol data analysis method and system.
Background technology
In traditional application system, data are stored in traditional database.When the front-end operations that user passes through application After interface sends out the operational order to data, application layer reads data from database and carries out logical operation in application layer, and will Operation result feeds back to the operation that front-end operations interface is showed or carries out next step.In this process, from database Read data because disk input/output (I/O) performance limitation and become bottleneck, and the bottleneck that mass data reading In embody especially prominent, the statement analysis based on data warehouse is exactly a most apparent example.There are problems that this original Because being, traditional database is actually to be stored on disk and provide for application to access connecing for data in the form of a file by data Mouthful.It is that file is read from disk from the essence of data is read in database, and in the hardware advances of the past few decades, memory Always in promotion at full speed, the performance boost of only magnetic disc i/o is not obvious performance with central processing unit (CPU).From disk The upper speed for reading data is Millisecond.
General data analysis technique has very big limitation in processing " in good order " but data acquisition system of " pile up like a mountain " Property, main difficulty is two aspects:1) due to the influence of sample point and dimension, often so that amount of calculation is quite big; 2) it is difficult to obtain the overall permanence of data group point.
For above-mentioned both sides difficulty, the prior art proposes a kind of symbol data based on row storage data warehouse point Analysis method, such as《Canonical Correlation Analysis of Interval Data and its application in Stock Market Analysis》(system engineering, volume 22 8 phases),《A kind of analytical technology of mass data》(BJ University of Aeronautics & Astronautics's journal (Social Science Edition) the 2nd phase of volume 17).Symbol Number analysis method establishes more higher leveled data with the thought of " data packing " in original multidimensional sample space Stratum, to greatly simplify the calculating process to large sample set, the dimensionality reduction for changing previous sample space is often difficult to explain The situation of its physical meaning.
In the implementation of the present invention, inventor has found, the existing symbol data analysis based on row storage data warehouse Method improves data process effects, however it remains problems with although can dimensionality reduction effectively be carried out to high dimensional variable space:
The existing symbol data analysis method based on row storage data warehouse is a non real-time number for facing operation system It according to analysis, arrives in face of nowadays big data, the requirement of efficient real-time data analysis shows its limitation, cannot achieve efficient reality When data analysis;In addition, often will appear when carrying out symbol data analysis due to data sample space choose it is unreasonable Data distortion when data sample can be caused to be converted to symbol interval censored data.
Invention content
One of technical problem to be solved of the embodiment of the present invention is:A kind of symbol data analysis method is provided and is System, to realize efficiently real-time data analysis.
A kind of symbol data analysis method provided in an embodiment of the present invention, including:
The database table of the database D B triggers monitoring operation system of operation system, the database of the operation system with Row storage mode stores the data of the database table;
When in response to monitoring that data update occurs for the database table of the operation system, DB triggers will indicate the number The data update information of the data update situation occurred according to library table is recorded in log recording table, wherein the database table hair Raw data update includes that the database table increases, modifies or deletes data event;The data update record includes hair The database table mark ID and the location information that data update occurs of raw data update;
Ranks storage converting unit reads newer data update information in the log recording table in real time;
If the newer data update information table is shown with presetting database table generation data update, ranks storage conversion Unit is according to the location information in the data update information of the presetting database table, from the database of the operation system Corresponding position reading update data, and this is updated the data to the memory for being synchronized to the data that database table is stored with row storage mode Corresponding position in database;The presetting database table is the preset number needed in real-time synchronization to memory database According to library table;
Data symbol analytic unit symbolization data analysing method is to the preset data that is updated the data in memory database The data of library table are analyzed, and the symbol data of each target variable range format in the presetting database table updated the data is generated Table.
In the further embodiment of the above method of the present invention, further include:
Converting unit, which is stored, by ranks presets the number needed in operation system in real-time synchronization to memory database The database table in real-time synchronization to memory database is needed according to Ku Biao or further update.
In the further embodiment of the above method of the present invention, the ranks storage converting unit reads the daily record note in real time Newer data update information includes in record table:
Described in the reading unit in control module real-time calling operation system in the ranks storage converting unit is read Newer data update information in log recording table;
If the newer data update information table is shown with presetting database table generation data update, ranks storage conversion Unit is according to the location information in the data update information of the presetting database table, from the database of the operation system Corresponding position reading update data, and this is updated the data to the memory for being synchronized to the data that database table is stored with row storage mode Corresponding position in database includes:
The control module judges whether there is preset data according to newer data update information in the log recording table Data update occurs for library table;
If there is presetting database table that data update occurs, the control module calls the reading unit according to described default Location information in the data update information of database table reads update from the corresponding position in the database of the operation system Data;
Updating the data of reading is transferred in the ranks storage converting unit and writes data mould by the control module Block, and the position rule of correspondence is converted according to pre-set ranks, instruction writes data module and is synchronously written described update the data In the memory database for storing the data of database table with row storage mode;
Write data module updates the data the correspondence position being synchronized in the memory database by write operation by described It sets.
In the further embodiment of the above method of the present invention, the control module calls RFC connections to adjust by remote functionality The data update information is read with reading unit and described is updated the data.
In the further embodiment of the above method of the present invention, the symbolization data analysing method is in memory database The data of the presetting database table updated the data carry out analysis:
Using the concurrent capability of multi-core central processing unit CPU, symbol data analysis method is respectively adopted to memory database In each column data of presetting database table for updating the data carry out parallel parsing.
In the further embodiment of the above method of the present invention, further include:
It is pre- that data sample pretreatment unit is sampled analysis to the data of the presetting database table based on row storage Processing identifies using data smoothing technology and deletes the data for deviateing pre-set business value normal range (NR);
The data symbol analytic unit symbolization data analysing method is default to what is updated the data in memory database The data of database table carry out analysis:The data symbol analytic unit symbolization data analysing method is to internal storage data The presetting database table updated the data in library is analyzed by the pretreated data of sampling analysis.
In the further embodiment of the above method of the present invention, further include:
Applied analysis unit applies the symbol data table of each target variable range format according to application demand Analysis obtains the characteristic state of the data sample of the relationship and each target variable between each target variable.
A kind of symbol data analysis system provided in an embodiment of the present invention, including:
The database of database D B triggers, the database table for monitoring operation system, the operation system is deposited with row Storage mode stores the data of the database table;In response to monitoring that data update occurs for the database table of the operation system When, it will indicate that the data update information of the data update situation of the database table generation is recorded in log recording table, wherein It includes that the database table increases, modifies or deletes data event that data update, which occurs, for the database table;The data More new record includes the database table mark ID that data update occurs and the location information that data update occurs;
First storage unit, for storing the log recording table;
Ranks store converting unit, for reading newer data update information in the log recording table in real time;If institute State newer data update information table be shown with presetting database table occur data update when, according to the number of the presetting database table According to the location information in fresh information, from the corresponding position reading update data in the database of the operation system, and should Update the data the corresponding position being synchronized in the memory database for the data for storing database table with row storage mode;It is described default Database table is the preset database table needed in real-time synchronization to memory database;
Second storage unit needs sync database table list for storing, described to need to record in sync database table list Presetting database table information in real-time synchronization to memory database in need;
Memory database, the data for storing database table with row storage mode;
Data symbol analytic unit, it is default to what is updated the data in memory database for symbolization data analysing method The data of database table are analyzed, and the symbolic number of each target variable range format in the presetting database table updated the data is generated According to table.
In the further embodiment of above system of the present invention, the ranks storage converting unit is additionally operable to according to user's operation It presets and needs database table in real-time synchronization to memory database in operation system or further update the needs Database table in real-time synchronization to memory database.
In the further embodiment of above system of the present invention, further include:
Reading unit, for reading newer data update information in the log recording table, and from the business system Reading update data in the database of system;
The ranks storage converting unit includes control module and writes data module;
The control module, more for newer data in the reading unit reading log recording table described in real-time calling New information;According to newer data update information in the log recording table, judges whether there is presetting database table and data occur Update;If there is presetting database table that data update occurs, call the reading unit according to the data of the presetting database table Location information in fresh information, from the corresponding position reading update data in the database of the operation system;It will read Update the data to be transferred in ranks storage converting unit and write data module, and according to pre-set ranks translation bit The rule of correspondence is set, instruction writes data module and described update the data is synchronously written the data for storing database table with row storage mode Memory database in;
Write data module, for updating the data pair being synchronized in the memory database by described by write operation Answer position.
In the further embodiment of above system of the present invention, the control module calls RFC to connect especially by remote functionality It connects and reading unit is called to read the data update information and described update the data.
In the further embodiment of above system of the present invention, the data symbol analytic unit specifically utilizes multinuclear centre Symbol data analysis method is respectively adopted to the presetting database that is updated the data in memory database in the concurrent capability for managing device CPU Each column data of table carries out parallel parsing.
In the further embodiment of above system of the present invention, further include:
Data sample pretreatment unit, for being taken out by the data to the presetting database table based on row storage Sample analysis pretreatment, identifies using data smoothing technology and deletes the data for deviateing pre-set business value normal range (NR);
The specific symbolization data analysing method of data symbol analytic unit in memory database to updating the data Presetting database table is analyzed by the pretreated data of sampling analysis.
In the further embodiment of above system of the present invention, further include:
Applied analysis unit, for being carried out to the symbol data table of each target variable range format according to application demand Applied analysis obtains the characteristic state of the data sample of the relationship and each target variable between each target variable.
Based on symbol data analysis method and system that the above embodiment of the present invention provides, number is being stored with row storage mode According to operation system in setting DB triggers monitor operation system database table, operation system database table occur data When update, data update information is recorded in log recording table DB triggers;Ranks storage converting unit reads daily record in real time Newer data update information in record sheet, if newer data update information table is shown with presetting database table and data update occurs When, the corresponding position being synchronized to in the memory database of row storage mode storage database table data will be updated the data, then The data of the presetting database table in memory database are carried out by data symbol analytic unit symbolization data analysing method Analysis, generates the symbol data table of each target variable range format, and symbol is realized using the memory computing technique of row storage mode Number analysis method, to realize the efficient real-time data analysis to mass data;Optionally, in symbolization data Before analysis method analyzes the data of memory database, the data to the presetting database table based on row storage can be passed through It is sampled analysis pretreatment, symbol data analysis is carried out again after deleting the data for deviateing pre-set business value normal range (NR), avoids Data are lost when the unreasonable symbol data table for causing data sample to be converted to range format chosen due to data sample space Very.
Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Description of the drawings
The attached drawing of a part for constitution instruction describes the embodiment of the present invention, and together with description for explaining The principle of the present invention.
The present invention can be more clearly understood according to following detailed description with reference to attached drawing, wherein:
Fig. 1 is the flow chart of symbol data analysis method one embodiment of the present invention.
Fig. 2 is the flow chart of another embodiment of symbol data analysis method of the present invention.
Fig. 3 is the structural schematic diagram of symbol data analysis system one embodiment of the present invention.
Specific implementation mode
Carry out the various exemplary embodiments of detailed description of the present invention now with reference to attached drawing.It should be noted that:Unless in addition having Body illustrates that the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The range of invention.
Simultaneously, it should be appreciated that for ease of description, the size of attached various pieces shown in the drawings is not according to reality Proportionate relationship draw.
It is illustrative to the description only actually of at least one exemplary embodiment below, is never used as to the present invention And its application or any restrictions that use.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.
In shown here and discussion all examples, any occurrence should be construed as merely illustrative, without It is as limitation.Therefore, the other examples of exemplary embodiment can have different values.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined, then it need not be further discussed in subsequent attached drawing in a attached drawing.
Currently, the speed for reading data from memory is nanosecond, this is based on the digital independent of memory than based on disk Digital independent performance wants fast 1,000,000 times.Memory computing technique refers to the row by efficient data compression scheme to optimize by data Storage mode is stored entirely in memory database, gives full play to the ability of multi-core CPU, and parallel processing is carried out to data.So When based on data warehouse carry out statement analysis when, if from traditional database reading mass data need dozens of minutes when Between, then reading same data from memory database only needs time less than one second.In data explosion, this becomes greatly Under gesture, memory database and internal storage data computing engines bring efficiently fast data processing and analysis ability to user.
Data reading speed based on memory database is fast, and treatment effeciency is high, can be incited somebody to action by internal storage data computing engines Originally the operation carried out in application layer is transferred to database level and is handled, and is realized to data-intensive fortune in database level It calculates.Based on the above feature, memory computing technique can do real-time analytic operation to extensive mass data, without prior Modeling and data prediction.Such as, it is desirable to analysis data are gone with any dimension, model can be established in real time, completed at analysis More than one hundred million datas may only be needed just deal within several seconds by reason, and processing speed is very fast, so can quickly attempt arbitrary number According to model, a variety of future scenarios are simulated.
As shown in table 1 below, traditional database table is a bivariate table in operation system, is made of columns and rows, User can in the bivariate table recording data information.As shown in the table, 2 be to store database shown in Fig. 1 with row storage mode One data organizational form of table.As shown in table 3 below, it is that the embodiment of the present invention stores database shown in Fig. 1 with row storage mode One data organizational form of table.
1 database table of table
Name Length of service Income
Zhang San 4 20000
Li Ming 5 37000
Liu Li 8 52000
2 row of table stores
The row storage of table 3
In actual data analysis, it usually needs be (such as the surname in upper table 1,2,3 of some target variable in database table Name, the time limit, income, etc.) attribute value or certain Column Properties is calculated, using row storage relative to row storage efficiency higher, Incoherent attribute need not be read.Therefore row storage mode has two big remarkable advantages:1. the handling capacity of querying attributes train value is improved, Reduce I/O operation.Database table data is based on row and stores, and can quickly locate the data row of needs, while will not read Unrelated column data reduces invalid disk read-write operation.When database table has more data row, effect promoting is brighter It is aobvious.2. row are stored with conducive to data compression.It is stored relative to row, row storage is more suitable for data compression.Because of data Column Properties There are identical data type, data similarity larger;And go storage, attribute is one record with recording mode Coutinuous store In each attribute have different data types, therefore, it is difficult to for different data type datas use unified compression algorithm.
For example, a retailer when client buys product, needs to create a data record pin in operation system The data of the target variables field such as time, place, client, the amount of money, address for selling, after the typing of data and submission are completed in front end After platform system, a line record can be inserted into tables of data in the database, can include that this selling operation operates in this record Relevant data.However, the database based on row storage then seems inefficient and unable to do what one wishes when supporting data analysis application.Together The example of sample, it is assumed that this retail company of family saves 300,000,000 records in traditional database with row storage mode, and needs base In the average amount of these sales figures analysis single sale, then need to read all this 300,000,000 records first, and take out wherein Consumption sum this field, then carry out mean value calculation again.This means that data (the spending amount actually analyzed Field) only account for 5% (assuming that per data 20 fields) of conceptual data, it is clear that and this is very inefficient mode.And based on In the mechanism for arranging storage, this 300,000,000 records are actually to be stored in a manner of arranging and storing, i.e., there was only 20 records in total (20 fields, one record of each field).When similarly being analyzed, it is only necessary to take out consumption sum this target variable The record of row simultaneously calculates average value, compared with the mechanism based on row storage, under this exemplary application scenarios, at data The efficiency of reason improves 50 times.
Fig. 1 is the flow chart of symbol data analysis method one embodiment of the present invention.As shown in Figure 1, the symbol of the embodiment Number is analyzed:
110, the database table of database (Database, DB) trigger monitoring operation system of operation system.
Operation system therein for example can be Enterprise Resources Plan (Enterprise Resource Planning, ERP) operation system, the database of operation system store the data of database table with row storage mode.
120, when in response to monitoring that data update occurs for the database table of operation system, DB triggers will indicate database The data update information for the data update situation that table occurs is recorded in log recording table.
Wherein, it includes that the database table in operation system is increased, modified or deleted that data update, which occurs, for database table Data event;Data update record includes the database table mark (ID) and database table generation data update that data update occurs Location information.Each database table id can be one in the unique marks operation system such as the title of database table, number Database table.
130, ranks storage converting unit reads newer data update information in log recording table in real time.
140, if newer data update information table is shown with presetting database table generation data update, ranks storage conversion Unit is according to the location information in the data update information for the presetting database table that data update occurs, from the data of operation system Corresponding position reading update data in library, and this is updated the data to be synchronized to, database table data is stored with row storage mode Corresponding position in memory database.
Presetting database table therein is the preset database table needed in real-time synchronization to memory database, tool Body can store converting unit by ranks, which establishes one, needs sync database table list, in this needs sync database table list Record needs the database table id in real-time synchronization to memory database, this needs the database recorded in sync database table list Table id can be updated as needed, such as newly-increased or deletion database table id, and certain data can also be set as needed Library table id needs period in real-time synchronization to memory database or permanently needs in real-time synchronization to memory database.
150, data symbol analytic unit symbolization data analysing method is default to what is updated the data in memory database The data of database table are analyzed, and the symbol of each target variable range format in the presetting database table updated the data is generated Tables of data.
Wherein, symbolization data analysing method to the data of the presetting database table updated the data in memory database into Row analysis, has used the thought of " data packing ", in original multidimensional data sample space, establishes more higher leveled data rank Layer, i.e., the bound peak value of target variable dimension in determining data sample space generate the symbol data table of range format, realize Data Dimensionality Reduction to enormously simplify the calculating process to large sample set changes the dimensionality reduction of previous sample space often It is difficult to explain the situation of its physical meaning, it is whole to analysis data group point special to solve data sample space and variable space dimension Property influence, so that data analysis is more efficiently accurately observed the characteristic state of data sample in real time.
Based on the symbol data analysis method that the above embodiment of the present invention provides, in the industry for storing data with row storage mode The database table that DB triggers monitor operation system is set in business system, data update occurs in the database table of operation system When, data update information is recorded in log recording table DB triggers;Ranks storage converting unit reads log recording in real time Newer data update information in table, if newer data update information table is shown with presetting database table generation data update, The corresponding position being synchronized to in the memory database of row storage mode storage database table data will be updated the data, then by counting The data of the presetting database table in memory database are analyzed according to symbolic analysis unit symbolization data analysing method, The symbol data table for generating each target variable range format, symbol data is realized using the memory computing technique of row storage mode Analysis method, to realize the efficient real-time data analysis to mass data.
Fig. 2 is the flow chart of another embodiment of symbol data analysis method of the present invention.It is analyzed in symbol data of the present invention In another embodiment of method, compared with embodiment shown in FIG. 1, operation 130 can specifically be realized in the following way: 230, ranks store the control module in converting unit and read log recording by the reading unit in real-time calling operation system Newer data update information in table.Correspondingly, operation 140 can specifically be realized in the following way:
240, control module judges whether there is presetting database table according to newer data update information in log recording table Data update occurs.Whether specific may determine that in log recording table in newer data update information includes presetting database Table id.
If there is presetting database table that data update occurs, 250 operation is executed.Otherwise, if Non-precondition database table is sent out Raw data update, does not execute the follow-up process of the present embodiment.
250, control module calls reading unit according to the data update information for the presetting database table that data update occurs In location information, from the corresponding position reading update data in the database of operation system.
Illustratively, control module can specifically call (Remote Function Call, RFC) even by remote functionality It connects and reading unit is called to read data update information and update the data.
260, the data module of writing being transferred in ranks storage converting unit that updates the data that control module will be read, and The position rule of correspondence is converted according to pre-set ranks, instruction, which is write data module and will be updated the data, to be synchronously written to arrange storage side Formula stores in the memory database of database table data.
Wherein, the ranks conversion position rule of correspondence can be that row storage location is stored with row when row storage is converted to row storage Target variable in the rule of correspondence of relationship or operation system database table between correspondence or position between position Each numerical value (be known as data sample) memory database database table storage location rule, or need as the case may be The Else Rule to be arranged is known that and will answer according to the ranks conversion position rule of correspondence in short, ranks store converting unit The data of corresponding position in the database by operation system are written to the specific location in memory database.
270, the corresponding position being synchronized in memory database will be updated the data by write operation by writing data module.
It is unrestricted according to a specific example of the embodiment of symbol data analysis method of the present invention, data symbol analysis When unit symbolization data analysing method analyzes the data of the presetting database table updated the data in memory database, The concurrent capability that multi-core CPU can specifically be utilized, is respectively adopted symbol data analysis method to being updated the data in memory database Each column data of presetting database table carry out parallel parsing, for example, can be carried out respectively to the data of each target variable parallel Analysis, to realize the parallel processing to storing data with row storage mode so as to the analyzing processing speed of data into one Step improves several times.
In addition, in another embodiment of symbol data analysis method of the present invention, symbol is being carried out for the prior art Often it will appear when data analysis since what data sample space was chosen unreasonable can cause data sample to be converted to symbol section When data the problem of data distortion, the presetting database that is updated the data in symbolization data analysing method is to memory database It, can be by operation 280, using data sample pretreatment unit to being preset based on row storage before the data of table are analyzed The data of database table are sampled analysis pretreatment, and it is normal that deviation pre-set business value is identified and deleted using data smoothing technology The data of range determine the reasonable data sample space that data analysis is selected, to prevent not conforming to for data sample space selection Data distortion when reason causes data sample to be converted to the symbol data of range format.Then real especially by following operation 290 again Operation 150 in existing Fig. 1:By data symbol analytic unit symbolization data analysing method to being updated the data in memory database Presetting database table analyzed by the pretreated data of sampling analysis.
In addition, in another embodiment of symbol data analysis method of the present invention, in the present count that generation updates the data It, can also be by applied analysis unit, according to using need after the symbol data table of each target variable range format in the table of library It asks and a variety of applied analyses is carried out to the symbol data table of each target variable range format, such as symbol data factorial analysis, symbolic number According to canonical correlation analysis etc., the spy of the relationship and the data sample of intension and each target variable between each target variable is obtained Symptom condition.
One of ordinary skill in the art will appreciate that:Realize that all or part of step of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer read/write memory medium, the program When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes:ROM, RAM, magnetic disc or light The various media that can store program code such as disk.
Fig. 3 is the structural schematic diagram of symbol data analysis system one embodiment of the present invention.The symbol data of the embodiment Analysis system can be used for realizing the symbol data analysis method of the various embodiments described above of the present invention.As shown in Figure 3 comprising DB is triggered Device 310, the first storage unit 320, ranks storage converting unit 330, the second storage unit 340, memory database 350 and data Symbolic analysis unit 360.Wherein:
DB triggers 310 can be selectively disposed in operation system, the database table for monitoring operation system, The database of the operation system stores the data of database table with row storage mode;In response to monitoring the database of operation system When data update occurs for table, it will indicate that log recording is recorded in the data update information of the data update situation of database table generation In table.Wherein, it includes that database table increases, modifies or deletes data event that data update, which occurs, for database table;Data are more New record includes the database table id that data update occurs and the location information that data update occurs.
First storage unit 320, for storing log recording table.
Ranks store converting unit 330, for reading newer data update information in log recording table in real time;If this is more When new data update information table is shown with presetting database table generation data update, according to the presetting database that data update occurs Location information in the data update information of table, from the corresponding position reading update data in the database of operation system, and will This updates the data the corresponding position being synchronized in the memory database 350 for the data for storing database table with row storage mode.Its In presetting database table be preset to need real-time synchronization to the database table in memory database 350.
Optionally, ranks storage converting unit 330, which can also be used to presetting one according to user's operation, needs synchrodata Library table list further needs sync database table list to be updated this according to user's operation, and sync database table is needed to arrange Record has the database table needed in operation system in real-time synchronization to memory database in table.
Second storage unit 340 can be selectively disposed in ranks storage converting unit 330, be needed together for storing The table list of step data library, it is described to need to record in real-time synchronization in need to memory database 350 in sync database table list Presetting database table information.
Memory database 350, the data for storing database table with row storage mode.
Data symbol analytic unit 360, for symbolization data analysing method to being updated the data in memory database 350 The data of presetting database table analyzed, generate each target variable range format in the presetting database table updated the data Symbol data table.
Referring back to Fig. 3, in another embodiment of symbol data analysis system of the present invention, symbol data analysis system is also It may include reading unit 370, can be selectively disposed in operation system, for reading newer number in log recording table According to fresh information, and the reading update data from the database of operation system.Correspondingly, ranks storage converting unit 330 has Body may include control module and write data module.Wherein:
Control module reads newer data update information in log recording table for real-time calling reading unit 370;Root According to newer data update information in log recording table, judges whether there is presetting database table and data update occurs;If having default Data update occurs for database table, calls reading unit 370 according to the data update for the presetting database table that data update occurs Location information in information, from the corresponding position reading update data in the database of operation system;The update number that will be read According to the data module of writing being transferred in ranks storage converting unit 330, and converts position according to pre-set ranks and correspond to rule Then, indicate that the memory database for being synchronously written and storing database table data with row storage mode will be updated the data by writing data module In.Specifically, control module can be read data update information and be updated the data by RFC call connecteds reading unit 370.
Data module is write, for the corresponding position being synchronized in memory database will to be updated the data by write operation.
Unrestricted according to a specific example of symbol data analysis system embodiment of the present invention, data symbol analysis is single Member 360 can specifically utilize the concurrent capability of multi-core CPU, and symbol data analysis method is respectively adopted to being updated in memory database Each column data of the presetting database table of data carries out parallel parsing.
Can also include data sample in another embodiment of symbol data analysis system of the present invention referring back to Fig. 3 Pretreatment unit 380 is utilized for being sampled analysis pretreatment by the data to the presetting database table based on row storage Data smoothing technology identifies and deletes the data for deviateing pre-set business value normal range (NR).Correspondingly, data symbol analytic unit 360 When the data of presetting database table to being updated the data in memory database 350 are analyzed, specifically in memory database more The presetting database table of new data is analyzed by the pretreated data of sampling analysis.
Further, it can also be wrapped in the further embodiment of symbol data analysis system of the present invention referring back to Fig. 3 Applied analysis unit 390 is included, is divided for the symbol data table of each target variable range format apply according to application demand Analysis, obtains the characteristic state of the data sample of the relationship and each target variable between each target variable.
Each embodiment is described in a progressive manner in this specification, the highlights of each of the examples are with its The difference of its embodiment, same or analogous part cross-reference between each embodiment.For system embodiment For, since it is substantially corresponding with embodiment of the method, so description is fairly simple, referring to the portion of embodiment of the method in place of correlation It defends oneself bright.
Method, the system of the present invention may be achieved in many ways.For example, software, hardware, firmware or soft can be passed through Part, hardware, firmware any combinations come realize the present invention method and system.The said sequence of the step of for the method is only It is to illustrate, the step of method of the invention is not limited to sequence described in detail above, unless otherwise especially Explanation.In addition, in some embodiments, also the present invention can be embodied as to record program in the recording medium, these program bags It includes for realizing machine readable instructions according to the method for the present invention.Thus, the present invention also covers storage for executing according to this The recording medium of the program of the method for invention.
Description of the invention provides for the sake of example and description, and is not exhaustively or will be of the invention It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those skilled in the art is enable to manage Various embodiments with various modifications of the solution present invention to design suitable for special-purpose.

Claims (12)

1. a kind of symbol data analysis method, which is characterized in that including:
The database table of the database D B triggers monitoring operation system of operation system, the database of the operation system are deposited with row Storage mode stores the data of the database table;
When in response to monitoring that data update occurs for the database table of the operation system, DB triggers will indicate the database The data update information for the data update situation that table occurs is recorded in log recording table, wherein number occurs for the database table Include that the database table increases, modifies or deletes data event according to update;The data update information includes that number occurs ID and the location information that data update occurs are identified according to newer database table;
Ranks storage converting unit reads newer data update information in the log recording table in real time;
If the newer data update information table is shown with presetting database table generation data update, ranks store converting unit According to the location information in the data update information of the presetting database table, from corresponding in the database of the operation system Position reading update data, and this is updated the data to the internal storage data for being synchronized to the data that database table is stored with row storage mode Corresponding position in library;The presetting database table is the preset database needed in real-time synchronization to memory database Table;
Data sample pretreatment unit is sampled analysis pretreatment to the data of the presetting database table based on row storage, The data for deviateing pre-set business value normal range (NR) are identified and deleted using data smoothing technology;
Data symbol analytic unit symbolization data analysing method is to the presetting database table that is updated the data in memory database It is analyzed by the pretreated data of sampling analysis, generates each target variable section shape in the presetting database table updated the data The symbol data table of formula.
2. according to the method described in claim 1, it is characterized in that, further including:
Converting unit, which is stored, by ranks presets the database needed in operation system in real-time synchronization to memory database Table or further update need the database table in real-time synchronization to memory database.
3. according to the method described in claim 2, it is characterized in that, ranks storage converting unit reads the daily record in real time Newer data update information includes in record sheet:
The reading unit in control module real-time calling operation system in the ranks storage converting unit reads the daily record Newer data update information in record sheet;
If the newer data update information table is shown with presetting database table generation data update, ranks store converting unit According to the location information in the data update information of the presetting database table, from corresponding in the database of the operation system Position reading update data, and this is updated the data to the internal storage data for being synchronized to the data that database table is stored with row storage mode Corresponding position in library includes:
The control module judges whether there is presetting database table according to newer data update information in the log recording table Data update occurs;
If there is presetting database table that data update occurs, the control module calls the reading unit according to the preset data Location information in the data update information of library table reads update number from the corresponding position in the database of the operation system According to;
The data module of writing being transferred in the ranks storage converting unit that updates the data that the control module will be read, and According to pre-set ranks convert the position rule of correspondence, instruction write data module by it is described update the data be synchronously written with arrange deposit Storage mode stores in the memory database of the data of database table;
Write data module updates the data the corresponding position being synchronized in the memory database by write operation by described.
4. according to the method described in claim 3, it is characterized in that, the control module calls RFC connections by remote functionality Reading unit is called to read the data update information and described update the data.
5. according to the method described in claim 3, it is characterized in that, the symbolization data analysing method is to memory database In the data of presetting database table that update the data carry out analysis and include:
Using the concurrent capability of multi-core central processing unit CPU, be respectively adopted symbol data analysis method in memory database more Each column data of the presetting database table of new data carries out parallel parsing.
6. according to the method described in claim 1, it is characterized in that, further including:
Applied analysis unit carries out applied analysis according to application demand to the symbol data table of each target variable range format, Obtain the characteristic state of the data sample of the relationship and each target variable between each target variable.
7. a kind of symbol data analysis system, which is characterized in that including:
Database D B triggers, the database table for monitoring operation system, the database of the operation system is with row storage side Formula stores the data of the database table;It, will when in response to monitoring that data update occurs for the database table of the operation system Indicate that the data update information for the data update situation that the database table occurs is recorded in log recording table, wherein described It includes that the database table increases, modifies or deletes data event that data update, which occurs, for database table;The data update Information includes the database table mark ID that data update occurs and the location information that data update occurs;
First storage unit, for storing the log recording table;
Ranks store converting unit, for reading newer data update information in the log recording table in real time;If it is described more When new data update information table is shown with presetting database table generation data update, more according to the data of the presetting database table Location information in new information, from the corresponding position reading update data in the database of the operation system, and by the update Data are synchronized to the corresponding position in the memory database for the data for storing database table with row storage mode;The preset data Library table is the preset database table needed in real-time synchronization to memory database;
Second storage unit needs sync database table list for storing, described to need to record to have in sync database table list to need Want the presetting database table information in real-time synchronization to memory database;
Memory database, the data for storing database table with row storage mode;
Data sample pretreatment unit, for being sampled point by the data to the presetting database table based on row storage Analysis pretreatment identifies using data smoothing technology and deletes the data for deviateing pre-set business value normal range (NR);
Data symbol analytic unit, for symbolization data analysing method to the preset data that is updated the data in memory database Library table is analyzed by the pretreated data of sampling analysis, generates the presetting database Biao Zhongge target variables area updated the data Between form symbol data table.
8. system according to claim 7, which is characterized in that the ranks storage converting unit is additionally operable to be grasped according to user It presets and needs database table in real-time synchronization to memory database in operation system or further update the need Want the database table in real-time synchronization to memory database.
9. system according to claim 8, which is characterized in that further include:
Reading unit, for reading newer data update information in the log recording table, and from the operation system Reading update data in database;
The ranks storage converting unit includes control module and writes data module;
The control module reads newer data update letter in the log recording table for reading unit described in real-time calling Breath;According to newer data update information in the log recording table, judges whether there is presetting database table and data update occurs; If there is presetting database table that data update occurs, the reading unit is called to be believed according to the data update of the presetting database table Location information in breath, from the corresponding position reading update data in the database of the operation system;The update that will be read Data are transferred to the data module of writing in the ranks storage converting unit, and convert position according to pre-set ranks and correspond to Rule, instruction write data module and described update the data are synchronously written with the memory of the data of row storage mode storage database table In database;
Write data module, for updating the data the correspondence position being synchronized in the memory database by described by write operation It sets.
10. system according to claim 9, which is characterized in that the control module calls RFC especially by remote functionality Call connected reading unit reads the data update information and described updates the data.
11. system according to claim 9, which is characterized in that the data symbol analytic unit specifically utilizes in multinuclear Symbol data analysis method is respectively adopted to the present count that is updated the data in memory database in the concurrent capability of central processor CPU Parallel parsing is carried out according to each column data of library table.
12. system according to claim 7, which is characterized in that further include:
Applied analysis unit, for being applied to the symbol data table of each target variable range format according to application demand Analysis obtains the characteristic state of the data sample of the relationship and each target variable between each target variable.
CN201410184644.0A 2014-05-05 2014-05-05 symbol data analysis method and system Active CN105095247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410184644.0A CN105095247B (en) 2014-05-05 2014-05-05 symbol data analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410184644.0A CN105095247B (en) 2014-05-05 2014-05-05 symbol data analysis method and system

Publications (2)

Publication Number Publication Date
CN105095247A CN105095247A (en) 2015-11-25
CN105095247B true CN105095247B (en) 2018-07-17

Family

ID=54575705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410184644.0A Active CN105095247B (en) 2014-05-05 2014-05-05 symbol data analysis method and system

Country Status (1)

Country Link
CN (1) CN105095247B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10671594B2 (en) 2014-09-17 2020-06-02 Futurewei Technologies, Inc. Statement based migration for adaptively building and updating a column store database from a row store database based on query demands using disparate database systems
US9836507B2 (en) * 2014-09-17 2017-12-05 Futurewei Technologies, Inc. Method and system for adaptively building a column store database from a temporal row store database based on query demands
CN105787129B (en) * 2016-03-29 2020-06-23 联想(北京)有限公司 Data storage method and electronic equipment
CN106570314A (en) * 2016-10-19 2017-04-19 北京千医健康管理有限公司 ICCINO (Insurance, Check, Check, Inform, Nursing and Observe) door-to-door nurse service standard
CN108108411A (en) * 2017-12-12 2018-06-01 苏州蜗牛数字科技股份有限公司 A kind of reading system and method for information list file
CN111159176A (en) * 2019-11-29 2020-05-15 中国科学院计算技术研究所 Method and system for storing and reading mass stream data
CN113515569B (en) * 2020-04-09 2023-12-26 阿里巴巴集团控股有限公司 Data synchronization method, device and system
CN113064919B (en) * 2021-03-31 2022-11-22 北京达佳互联信息技术有限公司 Data processing method, data storage system, computer device and storage medium
CN113901069B (en) * 2021-12-08 2022-03-15 威讯柏睿数据科技(北京)有限公司 Data storage method and device of distributed database

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN102880709A (en) * 2012-09-28 2013-01-16 用友软件股份有限公司 Data warehouse management system and data warehouse management method
CN103218415A (en) * 2013-03-27 2013-07-24 互爱互动(北京)科技有限公司 Data processing system and method based on data warehouse
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
CN103744906A (en) * 2013-12-26 2014-04-23 乐视网信息技术(北京)股份有限公司 System, method and device for data synchronization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN102880709A (en) * 2012-09-28 2013-01-16 用友软件股份有限公司 Data warehouse management system and data warehouse management method
CN103218415A (en) * 2013-03-27 2013-07-24 互爱互动(北京)科技有限公司 Data processing system and method based on data warehouse
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
CN103744906A (en) * 2013-12-26 2014-04-23 乐视网信息技术(北京)股份有限公司 System, method and device for data synchronization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《一种海量数据的分析技术——符号数据分析及应用》;胡艳等;《北京航空航天大学学报(社会科学版)》;20040625;第17卷(第2期);第40-44页 *

Also Published As

Publication number Publication date
CN105095247A (en) 2015-11-25

Similar Documents

Publication Publication Date Title
CN105095247B (en) symbol data analysis method and system
JP6697392B2 (en) Transparent discovery of semi-structured data schema
Wang et al. Performance prediction for apache spark platform
US11941016B2 (en) Using specified performance attributes to configure machine learning pipepline stages for an ETL job
US9367574B2 (en) Efficient query processing in columnar databases using bloom filters
JP5298117B2 (en) Data merging in distributed computing
US10417265B2 (en) High performance parallel indexing for forensics and electronic discovery
Humbetov Data-intensive computing with map-reduce and hadoop
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN107145432A (en) A kind of method and client for setting up model database
CN109558421A (en) Data processing method, system, device and storage medium based on caching
Gupta et al. Faster as well as early measurements from big data predictive analytics model
JP6877435B2 (en) Database operation method and equipment
US20170228396A1 (en) Pre-Processing Of Geo-Spatial Sensor Data
CN105786877B (en) A kind of date storage method, system and querying method
CN109522273A (en) A kind of method and device for realizing data write-in
CN107402982A (en) Data write-in, data matching method, device and computing device
US9965355B2 (en) System and method for dynamic collection of system management data in a mainframe computing environment
Del Grosso et al. An approach for mining services in database oriented applications
Sinthong et al. AFrame: Extending DataFrames for large-scale modern data analysis (Extended Version)
CN111062603B (en) Enterprise life cycle analysis method, device and storage medium
Kiraz et al. Iot data storage: Relational & non-relational database management systems performance comparison
Taori et al. Big Data Management
CN113656362B (en) Spark stream file storage method and device
Koutsimpogiorgos Comparative analysis of SQL queries performance on vehicle sensor data in RDBMS and Apache Spark

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20151125

Assignee: Tianyiyun Technology Co.,Ltd.

Assignor: CHINA TELECOM Corp.,Ltd.

Contract record no.: X2024110000020

Denomination of invention: Symbolic Data Analysis Methods and Systems

Granted publication date: 20180717

License type: Common License

Record date: 20240315