CN105095247A - Symbolic data analysis method and system - Google Patents

Symbolic data analysis method and system Download PDF

Info

Publication number
CN105095247A
CN105095247A CN201410184644.0A CN201410184644A CN105095247A CN 105095247 A CN105095247 A CN 105095247A CN 201410184644 A CN201410184644 A CN 201410184644A CN 105095247 A CN105095247 A CN 105095247A
Authority
CN
China
Prior art keywords
data
database
database table
presetting
data update
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410184644.0A
Other languages
Chinese (zh)
Other versions
CN105095247B (en
Inventor
鲍明曦
朱源
何忠江
邓丽华
武翊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201410184644.0A priority Critical patent/CN105095247B/en
Publication of CN105095247A publication Critical patent/CN105095247A/en
Application granted granted Critical
Publication of CN105095247B publication Critical patent/CN105095247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a symbolic data analysis method and a system. The symbolic data analysis method comprises recording data updated information into a log recording sheet when data update of a database sheet of a business system storing data in a row storage way is monitored by a DB trigger; reading updated data information by a row storage conversion unit in real time, reading the updated data in a corresponding position of the business system database upon preset database updating, and synchronizing the updated data information to the corresponding position in a memory database storing data of the database sheet in a row storage way. The preset database sheet is preset and synchronizes data into the memory database in real time; the data symbolic analysis unit can analyze data in the preset database sheet with updated data via a symbolic data analyzing way; and a symbolic data sheet with preset database sheet having updated data in target variable areas is generated. High efficient and real-time data analysis can be achieved by the use of the symbolic data analysis method and system.

Description

Symbol data analytical approach and system
Technical field
The present invention relates to computer technology, especially a kind of symbol data analytical approach and system.
Background technology
In traditional application system, data are stored in traditional database.After user sends the operational order to data by the front-end operations interface of application, application layer reads data and carries out logical operation in application layer from database, and operation result is fed back to the operation that next step is carried out representing or carrying out in front-end operations interface.In this process, from database, read data become bottleneck because of the performance limitations of disk I/O (I/O), and this bottleneck embodies particularly outstanding in the reading of mass data, the statement analysis based on data warehouse is exactly an example the most obvious.The reason that there is this problem is, traditional database is actually and data is stored in the form of a file on disk and provides the interface of visit data for application.The essence reading data from database is file reading from disk, and in the hardware advances of decades in the past, and the performance of internal memory and central processing unit (CPU), all the time in lifting at full speed, only has the performance boost of magnetic disc i/o and not obvious.The speed reading data from disk is Millisecond.
General data analysis technique is when processing the data acquisition of " in good order " but " pile up like a mountain ", and have very large limitation, main difficulty is two aspects: 1) due to the impact of sample point and dimension, often make amount of calculation quite large; 2) overall permanence obtaining data group point is difficult to.
For the difficulty of above-mentioned two aspects, prior art proposes a kind of symbol data analytical approach storing data warehouse based on row, such as " Canonical Correlation Analysis of Interval Data and the application in Stock Market Analysis thereof " (systems engineering, the 22nd volume the 8th phase), " a kind of analytical technology of mass data " (BJ University of Aeronautics & Astronautics's journal (Social Science Edition) the 17th volume the 2nd phase).Symbol data analytical approach uses the thought of " data packing ", in original multidimensional sample space, set up more higher leveled data stratum, thus greatly simplify the calculating process to large sample set, the dimensionality reduction changing sample space in the past is often difficult to the situation explaining its physical meaning.
Realizing in process of the present invention, inventor finds, the existing symbol data analytical approach storing data warehouse based on row, although effectively can carry out dimensionality reduction to high dimensional variable space, improves data process effects, still there is following problem:
The existing symbol data analytical approach based on row storage data warehouse is a non-real-time data analysis in the face of operation system, and in the face of nowadays large data arrive, the requirement of efficient real-time data analysis shows its limitation, cannot realize data analysis during efficient real; In addition, data distortion when often occurring that the unreasonable meeting chosen due to data sample space causes data sample to be converted to symbol interval censored data when carrying out symbol data and analyzing.
Summary of the invention
The embodiment of the present invention one of them technical matters to be solved is: provide a kind of symbol data analytical approach and system, to realize data analysis during efficient real.
A kind of symbol data analytical approach that the embodiment of the present invention provides, comprising:
The database table of the database D B trigger monitoring operation system of operation system, the database of described operation system stores the data of described database table with row storage mode;
In response to when monitoring the database table generation Data Update of described operation system, DB trigger will represent that the Data Update information of the Data Update situation that described database table occurs is recorded in log recording table, wherein, described database table generation Data Update comprises described database table and increases, modifies or deletes data event; Described Data Update record comprises the positional information of database table mark ID and the described generation Data Update that Data Update occurs;
Ranks store converting unit and read the Data Update information upgraded in described log recording table in real time;
If when the Data Update information table of described renewal is shown with presetting database table generation Data Update, ranks store converting unit according to the positional information in the Data Update information of described presetting database table, relevant position reading update data from the database of described operation system, and by this renewal data syn-chronization to the correspondence position in the memory database of the data of row storage mode stored data base table; Described presetting database table be preset need real-time synchronization to the database table in memory database;
Data symbol analytic unit symbolization data analysing method, to the data analysis of the more presetting database table of new data in memory database, generates the symbol data table of each target variable range format in the presetting database table of more new data.
In the further embodiment of said method of the present invention, also comprise:
Being preset in operation system by ranks storage converting unit needs real-time synchronization to need real-time synchronization to the database table in memory database to the database table in memory database or further renewal.
In the further embodiment of said method of the present invention, described ranks storage converting unit reads the Data Update information upgraded in described log recording table in real time and comprises:
The reading unit that described ranks store in the control module real-time calling operation system in converting unit reads the Data Update information upgraded in described log recording table;
If when the Data Update information table of described renewal is shown with presetting database table generation Data Update, ranks store converting unit according to the positional information in the Data Update information of described presetting database table, relevant position reading update data from the database of described operation system, and this renewal data syn-chronization is comprised to the correspondence position in the memory database of the data of row storage mode stored data base table:
Described control module, according to the Data Update information upgraded in described log recording table, has judged whether presetting database table generation Data Update;
If there is presetting database table generation Data Update, described control module calls described reading unit according to the positional information in the Data Update information of described presetting database table, the relevant position reading update data from the database of described operation system;
The more new data read is transferred to described ranks and stores and write data module in converting unit by described control module, and according to the ranks dislocation rule of correspondence pre-set, instruction writes data module by described renewal data syn-chronization write with in the memory database of the data of row storage mode stored data base table;
Write data module by write operation by described renewal data syn-chronization to the correspondence position in described memory database.
In the further embodiment of said method of the present invention, described control module is called RFC call connected reading unit by remote functionality and is read described Data Update information and described more new data.
In the further embodiment of said method of the present invention, described symbolization data analysing method in memory database more the data analysis of the presetting database table of new data comprise:
Utilize the concurrent capability of multi-core central processing unit CPU, respectively symbolization data analysing method in memory database more each column data of the presetting database table of new data carry out parallel parsing.
In the further embodiment of said method of the present invention, also comprise:
The data of data sample pretreatment unit to the described presetting database table stored based on row carry out sampling analysis pre-service, utilize the identification of data smoothing technology and delete the data departing from pre-set business value normal range;
Described data symbol analytic unit symbolization data analysing method in memory database more the data analysis of the presetting database table of new data comprise: described data symbol analytic unit symbolization data analysing method in memory database more the presetting database table of new data through the pretreated data analysis of sampling analysis.
In the further embodiment of said method of the present invention, also comprise:
Applied analysis unit carries out applied analysis according to the symbol data table of application demand to described each target variable range format, obtains the characteristic state of the data sample of relation between described each target variable and each target variable.
A kind of symbol data analytic system that the embodiment of the present invention provides, comprising:
Database D B trigger, for monitoring the database table of operation system, the database of described operation system stores the data of described database table with row storage mode; In response to when monitoring the database table generation Data Update of described operation system, to represent that the Data Update information of the Data Update situation that described database table occurs is recorded in log recording table, wherein, described database table generation Data Update comprises described database table and increases, modifies or deletes data event; Described Data Update record comprises the positional information of database table mark ID and the described generation Data Update that Data Update occurs;
First storage unit, for storing described log recording table;
Ranks store converting unit, for reading the Data Update information upgraded in described log recording table in real time; If when the Data Update information table of described renewal is shown with presetting database table generation Data Update, according to the positional information in the Data Update information of described presetting database table, relevant position reading update data from the database of described operation system, and by this renewal data syn-chronization to the correspondence position in the memory database of the data of row storage mode stored data base table; Described presetting database table be preset need real-time synchronization to the database table in memory database;
Second storage unit, for store need sync database tabular table, described need to record in sync database tabular table need real-time synchronization to the presetting database table information in memory database;
Memory database, for the data of row storage mode stored data base table;
Data symbol analytic unit, for the data analysis of symbolization data analysing method to the more presetting database table of new data in memory database, generates the symbol data table of each target variable range format in the presetting database table of more new data.
In the further embodiment of said system of the present invention, described ranks store converting unit also to be needed real-time synchronization to the database table in memory database for presetting in operation system according to user operation or upgrades the described real-time synchronization that needs further to the database table in memory database.
In the further embodiment of said system of the present invention, also comprise:
Reading unit, for reading the Data Update information upgraded in described log recording table, and from the database of described operation system reading update data;
Described ranks store converting unit and comprise control module and write data module;
Described control module, reads the Data Update information upgraded in described log recording table for reading unit described in real-time calling; According to the Data Update information upgraded in described log recording table, judge whether presetting database table generation Data Update; If there is presetting database table generation Data Update, call described reading unit according to the positional information in the Data Update information of described presetting database table, the relevant position reading update data from the database of described operation system; The more new data read is transferred to described ranks to store and write data module in converting unit, and according to the ranks dislocation rule of correspondence pre-set, instruction writes data module by described renewal data syn-chronization write with in the memory database of the data of row storage mode stored data base table;
Write data module, for by write operation by described renewal data syn-chronization to the correspondence position in described memory database.
In the further embodiment of said system of the present invention, described control module is called RFC call connected reading unit especially by remote functionality and is read described Data Update information and described more new data.
In the further embodiment of said system of the present invention, described data symbol analytic unit specifically utilizes the concurrent capability of multi-core central processing unit CPU, respectively symbolization data analysing method in memory database more each column data of the presetting database table of new data carry out parallel parsing.
In the further embodiment of said system of the present invention, also comprise:
Data sample pretreatment unit, for by carrying out sampling analysis pre-service to the data of the described presetting database table stored based on row, utilizing the identification of data smoothing technology and deleting the data departing from pre-set business value normal range;
Described data symbol analytic unit concrete symbolization data analysing method in memory database more the presetting database table of new data through the pretreated data analysis of sampling analysis.
In the further embodiment of said system of the present invention, also comprise:
Applied analysis unit, for carrying out applied analysis according to the symbol data table of application demand to described each target variable range format, obtains the characteristic state of the data sample of relation between described each target variable and each target variable.
The symbol data analytical approach provided based on the above embodiment of the present invention and system, the database table of DB trigger monitoring operation system is set in the operation system storing data with row storage mode, when the database table generation Data Update of operation system, Data Update information is recorded in log recording table by DB trigger, ranks store converting unit and read the Data Update information upgraded in log recording table in real time, if when the Data Update information table upgraded is shown with presetting database table generation Data Update, by renewal data syn-chronization to the correspondence position in the memory database of row storage mode stored data base table data, then by data symbol analytic unit symbolization data analysing method to the data analysis of this presetting database table in memory database, generate the symbol data table of each target variable range format, utilize the internal memory computing technique of row storage mode to realize symbol data analytical approach, thus data analysis when achieving the efficient real to mass data, alternatively, before symbolization data analysing method is to the data analysis of memory database, can by carrying out sampling analysis pre-service to the data of the presetting database table stored based on row, delete after departing from the data of pre-set business value normal range and carry out symbol data analysis again, avoid due to data sample space choose unreasonable cause data sample to be converted to the symbol data table of range format time data distortion.
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Description of the invention provides in order to example with for the purpose of describing, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Selecting and describing embodiment is in order to principle of the present invention and practical application are better described, and enables those of ordinary skill in the art understand the present invention thus design the various embodiments with various amendment being suitable for special-purpose.
Accompanying drawing explanation
What form a part for instructions drawings describes embodiments of the invention, and is used from explanation principle of the present invention together with description one.
With reference to accompanying drawing, according to detailed description below, clearly the present invention can be understood, wherein:
Fig. 1 is the process flow diagram of a symbol data analytical approach of the present invention embodiment.
Fig. 2 is the process flow diagram of another embodiment of symbol data analytical approach of the present invention.
Fig. 3 is the structural representation of a symbol data analytic system of the present invention embodiment.
Embodiment
Various exemplary embodiment of the present invention is described in detail now with reference to accompanying drawing.It should be noted that: unless specifically stated otherwise, otherwise positioned opposite, the numerical expression of the parts of setting forth in these embodiments and step and numerical value do not limit the scope of the invention.
Meanwhile, it should be understood that for convenience of description, the size of the various piece shown in accompanying drawing is not draw according to the proportionate relationship of reality.
Illustrative to the description only actually of at least one exemplary embodiment below, never as any restriction to the present invention and application or use.
May not discuss in detail for the known technology of person of ordinary skill in the relevant, method and apparatus, but in the appropriate case, described technology, method and apparatus should be regarded as a part for instructions.
In all examples with discussing shown here, any occurrence should be construed as merely exemplary, instead of as restriction.Therefore, other example of exemplary embodiment can have different values.
It should be noted that: represent similar terms in similar label and letter accompanying drawing below, therefore, once be defined in an a certain Xiang Yi accompanying drawing, then do not need to be further discussed it in accompanying drawing subsequently.
At present, the speed reading data from internal memory is nanosecond, and this digital independent based on internal memory is than fast 1,000,000 times of the digital independent performance based on disk.Internal memory computing technique refers to and data is all stored in memory database with the row storage mode optimized by efficient data compression scheme, gives full play to the ability of multi-core CPU, carries out parallel processing to data.So when carrying out statement analysis based on data warehouse, if reading mass data needs the time of several tens minutes from traditional database, from memory database, so read the time that same data only need less than a second.Under this megatrend of data explosion, memory database and internal storage data computing engines bring efficiently fast data processing and analysis ability to user.
Data reading speed based on memory database is fast, and treatment effeciency is high, the computing originally carried out in application layer can be transferred to database aspect and process, realize data-intensive computing in database aspect by internal storage data computing engines.Based on above feature, internal memory computing technique can do real-time analytic operation to extensive mass data, and does not need prior modeling and data prediction.Such as, want to go to analyze data with any dimension, in real time just can Modling model, complete analyzing and processing, may only need just deal a few second to more than one hundred million data, processing speed quickly, so arbitrary data models can be attempted fast, multiple future scenarios is simulated.
As shown in table 1 below, in operation system, traditional database table is a bivariate table, is made up of columns and rows, user can in this bivariate table recording data information.As shown in the table, 2 for a data organizational form of database table shown in row storage mode storage figure l.As shown in table 3 below, for the embodiment of the present invention is with a data organizational form of database table shown in row storage mode storage figure 1.
Table 1 database table
Name Length of service Income
Zhang San 4 20000
Li Ming 5 37000
Liu Li 8 52000
The capable storage of table 2
Table 3 arranges and stores
In actual data analysis, usually it is desirable that in database table certain target variable (such as go up name, the time limit, the income in table 1,2,3, Deng) property value or certain Column Properties is calculated, adopt row to store higher relative to row storage efficiency, need not incoherent attribute be read.Therefore row storage mode has two large remarkable advantages: the handling capacity 1. improving querying attributes train value, reduces I/O operation.Database table data stores based on row, can navigate to the data rows of needs rapidly, can not read irrelevant column data simultaneously, reduces the operation of invalid disk read-write.When database table has more data rows, effect promoting is more obvious.2. row store and are beneficial to data compression.Store relative to row, row store and are more applicable to data compression.Because data rows attribute has identical data type, data similarity is larger; And row stores, attribute is with recording mode Coutinuous store, and in a record, each attribute has different data types, therefore, is difficult to different data type data and adopts unified compression algorithm.
Such as, a retailer is when client buys product, need the data creating the target variable field such as time, place, client, the amount of money, address that a data record is sold in operation system, when front end complete data typing and after submitting background system to, a line item can be inserted in tables of data in a database, the data that the operation of this selling operation is relevant in this record, can be comprised.But the database stored based on row then seems when supported data analytical applications poor efficiency and unable to do what one wishes.Same example, suppose that this retail company of family saves 300,000,000 records with row storage mode in traditional database, and need the average amount analyzing single sale based on these sales figures, then need first to read these 300,000,000 records all, and this field of consumption sum of taking out wherein, and then the value that is averaged calculates.This means that actual data (spending amount field) of carrying out analyzing only account for 5% (supposing every bar data 20 fields) of conceptual data, obviously this is the mode of very poor efficiency.And in the mechanism stored based on row, the mode that these 300,000,000 records are actually to arrange storage stores, and namely altogether only has 20 records (20 fields, each field one record).When carrying out same analysis, only need to take out the record of consumption sum this target variable row and calculating mean value, with compared with the capable mechanism stored, under the application scenarios of this example, the efficiency of data processing improves 50 times.
Fig. 1 is the process flow diagram of a symbol data analytical approach of the present invention embodiment.As shown in Figure 1, the symbol data analysis of this embodiment comprises:
110, the database table of database (Database, DB) the trigger monitoring operation system of operation system.
Operation system wherein can be such as Enterprise Resources Plan (EnterpriseResourcePlanning, ERP) operation system, and the database of operation system is with the data of row storage mode stored data base table.
120, in response to when monitoring the database table generation Data Update of operation system, DB trigger will represent that the Data Update information of the Data Update situation that database table occurs is recorded in log recording table.
Wherein, the database table that database table generation Data Update comprises in operation system increases, modifies or deletes data event; Data Update record comprises the positional information of database table mark (ID) and the database table generation Data Update that Data Update occurs.Each database table ID can be a database table in the unique identification operation system such as title, numbering of database table.
130, ranks store converting unit and read the Data Update information upgraded in log recording table in real time.
140, if when the Data Update information table upgraded is shown with presetting database table generation Data Update, ranks store converting unit according to the positional information in the Data Update information of the presetting database table of generation Data Update, from the relevant position reading update data the database of operation system, and by this renewal data syn-chronization to the correspondence position in the memory database of row storage mode stored data base table data.
Presetting database table be wherein preset need real-time synchronization to the database table in memory database, specifically can store converting unit by ranks to set up one and need sync database tabular table, record in sync database tabular table is needed to need real-time synchronization to the database table ID in memory database at this, this needs the database table ID recorded in sync database tabular table to upgrade as required, such as newly-increased or delete database Table I D, some database table ID can also be set as required need real-time synchronization to the time period in memory database or forever need real-time synchronization in memory database.
150, data symbol analytic unit symbolization data analysing method, to the data analysis of the more presetting database table of new data in memory database, generates the symbol data table of each target variable range format in the presetting database table of this more new data.
Wherein, symbolization data analysing method is to the data analysis of the more presetting database table of new data in memory database, use the thought of " data packing ", in original multidimensional data sample space, set up more higher leveled data stratum, the i.e. bound peak value of target variable dimension in determining data sample space, generate the symbol data table of range format, achieve Data Dimensionality Reduction, thus the calculating process that enormously simplify large sample set, the dimensionality reduction changing sample space in the past is often difficult to the situation explaining its physical meaning, solve data sample space and variable space dimension to the impact analyzing data group point overall permanence, enable data analysis more efficient real time observe the characteristic state of data sample accurately.
Based on the symbol data analytical approach that the above embodiment of the present invention provides, the database table of DB trigger monitoring operation system is set in the operation system storing data with row storage mode, when the database table generation Data Update of operation system, Data Update information is recorded in log recording table by DB trigger, ranks store converting unit and read the Data Update information upgraded in log recording table in real time, if when the Data Update information table upgraded is shown with presetting database table generation Data Update, by renewal data syn-chronization to the correspondence position in the memory database of row storage mode stored data base table data, then by data symbol analytic unit symbolization data analysing method to the data analysis of this presetting database table in memory database, generate the symbol data table of each target variable range format, utilize the internal memory computing technique of row storage mode to realize symbol data analytical approach, thus data analysis when achieving the efficient real to mass data.
Fig. 2 is the process flow diagram of another embodiment of symbol data analytical approach of the present invention.In another embodiment of symbol data analytical approach of the present invention, compared with the embodiment shown in Fig. 1, operation 130 specifically can realize in the following way: 230, and the control module that ranks store in converting unit reads the Data Update information upgraded in log recording table by the reading unit in real-time calling operation system.Correspondingly, operate 140 specifically can realize in the following way:
240, control module, according to the Data Update information upgraded in log recording table, has judged whether presetting database table generation Data Update.Specifically can judge whether include presetting database Table I D in the Data Update information upgraded in log recording table.
If there is presetting database table generation Data Update, perform the operation of 250.Otherwise, if Non-precondition database table generation Data Update, do not perform the follow-up flow process of the present embodiment.
250, control module calls reading unit according to the positional information in the Data Update information of the presetting database table of generation Data Update, from the relevant position reading update data the database of operation system.
Exemplarily, control module specifically can be called (RemoteFunctionCall, RFC) call connected reading unit by remote functionality and read Data Update information and more new data.
260, the more new data read is transferred to ranks and stores and write data module in converting unit by control module, and according to the ranks dislocation rule of correspondence pre-set, instruction writes data module by the write of renewal data syn-chronization with in the memory database of row storage mode stored data base table data.
Wherein, the ranks dislocation rule of correspondence can be that row stores and is converted to the rule of correspondence of relation between corresponding relation when row store between row memory location and row memory location or position, or each numerical value (being called data sample) of target variable is regular in the memory location of the database table of memory database in operation system database table, or need the Else Rule arranged as the case may be, in a word, ranks store converting unit can know the particular location that the data of the relevant position in the database of operation system be written in memory database according to this ranks dislocation rule of correspondence.
270, write data module and will upgrade data syn-chronization to the correspondence position in memory database by write operation.
Unrestricted according to a concrete example of the embodiment of symbol data analytical approach of the present invention, during the data analysis of data symbol analytic unit symbolization data analysing method to the more presetting database table of new data in memory database, specifically can utilize the concurrent capability of multi-core CPU, respectively symbolization data analysing method in memory database more each column data of the presetting database table of new data carry out parallel parsing, such as, parallel parsing can be carried out respectively to the data of each target variable, thus the parallel processing achieved storing data with row storage mode, make to further increase several times to the analyzing and processing speed of data.
In addition, in another embodiment of symbol data analytical approach of the present invention, the problem of data distortion when often occurring that the unreasonable meeting chosen due to data sample space causes data sample to be converted to symbol interval censored data for prior art when carrying out symbol data and analyzing, in symbolization data analysing method is to memory database the more presetting database table of new data data analysis before, operation 280 can be passed through, the data of data sample pretreatment unit to the presetting database table stored based on row are utilized to carry out sampling analysis pre-service, utilize the identification of data smoothing technology and delete the data departing from pre-set business value normal range, determine the reasonable data sample space that data analysis is selected, so that unreasonable data distortion when causing data sample to be converted to the symbol data of range format preventing data sample space from choosing.And then the operation 150 realized especially by following operation 290 in Fig. 1: by data symbol analytic unit symbolization data analysing method in memory database more the presetting database table of new data through the pretreated data analysis of sampling analysis.
In addition, in another embodiment of symbol data analytical approach of the present invention, in the presetting database table generating more new data each target variable range format symbol data table after, applied analysis unit can also be passed through, multiple applied analysis is carried out according to the symbol data table of application demand to each target variable range format, as symbol data factorial analysis, symbol data canonical correlation analysis etc., obtain the characteristic state of the data sample of relation between each target variable and intension and each target variable.
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can have been come by the hardware that programmed instruction is relevant, aforesaid program can be stored in a computer read/write memory medium, this program, when performing, performs the step comprising said method embodiment; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium.
Fig. 3 is the structural representation of a symbol data analytic system of the present invention embodiment.The symbol data analytic system of this embodiment can be used in the present invention the symbol data analytical approach of the various embodiments described above.As shown in Figure 3, it comprises DB trigger 310, first storage unit 320, ranks store converting unit 330, second storage unit 340, memory database 350 and data symbol analytic unit 360.Wherein:
DB trigger 310, can optionally be arranged in operation system, and for monitoring the database table of operation system, the database of this operation system is with the data of row storage mode stored data base table; In response to when monitoring the database table generation Data Update of operation system, will represent that the Data Update information of the Data Update situation that database table occurs is recorded in log recording table.Wherein, database table generation Data Update comprises database table and increases, modifies or deletes data event; Data Update record comprises the database table ID that Data Update occurs and the positional information that Data Update occurs.
First storage unit 320, for storing daily record record sheet.
Ranks store converting unit 330, for reading the Data Update information upgraded in log recording table in real time; If when the Data Update information table of this renewal is shown with presetting database table generation Data Update, according to the positional information in the Data Update information of the presetting database table of generation Data Update, from the relevant position reading update data the database of operation system, and by this renewal data syn-chronization to the correspondence position in the memory database 350 of the data of row storage mode stored data base table.Presetting database table be wherein preset need real-time synchronization to the database table in memory database 350.
Alternatively, ranks storage converting unit 330 also can be used for presetting one according to user operation to be needed sync database tabular table or needs sync database tabular table to upgrade according to user operation to this further, needs to record in operation system in sync database tabular table to need real-time synchronization to the database table in memory database.
Second storage unit 340, optionally can be arranged on ranks stores in converting unit 330, for store need sync database tabular table, described need to record in sync database tabular table need real-time synchronization to the presetting database table information in memory database 350.
Memory database 350, for the data of row storage mode stored data base table.
Data symbol analytic unit 360, for the data analysis of symbolization data analysing method to the more presetting database table of new data in memory database 350, generates the symbol data table of each target variable range format in the presetting database table of more new data.
Again see Fig. 3, in another embodiment of symbol data analytic system of the present invention, symbol data analytic system can also comprise reading unit 370, can optionally be arranged in operation system, for reading the Data Update information upgraded in log recording table, and from the database of operation system reading update data.Correspondingly, ranks storage converting unit 330 specifically can comprise control module and write data module.Wherein:
Control module, reads the Data Update information upgraded in log recording table for real-time calling reading unit 370; According to the Data Update information upgraded in log recording table, judge whether presetting database table generation Data Update; If there is presetting database table generation Data Update, call reading unit 370 according to the positional information in the Data Update information of the presetting database table of generation Data Update, from the relevant position reading update data the database of operation system; The more new data read is transferred to ranks to store and write data module in converting unit 330, and according to the ranks dislocation rule of correspondence pre-set, instruction writes data module by the write of renewal data syn-chronization with in the memory database of row storage mode stored data base table data.Particularly, control module can read Data Update information and more new data by RFC call connected reading unit 370.
Write data module, for will data syn-chronization be upgraded to the correspondence position in memory database by write operation.
Unrestricted according to a concrete example of symbol data analytic system embodiment of the present invention, data symbol analytic unit 360 specifically can utilize the concurrent capability of multi-core CPU, respectively symbolization data analysing method in memory database more each column data of the presetting database table of new data carry out parallel parsing.
Again see Fig. 3, in another embodiment of symbol data analytic system of the present invention, data sample pretreatment unit 380 can also be comprised, for by carrying out sampling analysis pre-service to the data of the presetting database table stored based on row, utilizing the identification of data smoothing technology and deleting the data departing from pre-set business value normal range.Correspondingly, in data symbol analytic unit 360 pairs of memory databases 350 during the data analysis of the more presetting database table of new data, specifically in memory database more the presetting database table of new data through the pretreated data analysis of sampling analysis.
Further, again see Fig. 3, in another embodiment of symbol data analytic system of the present invention, applied analysis unit 390 can also be comprised, for carrying out applied analysis according to the symbol data table of application demand to each target variable range format, obtain the characteristic state of the data sample of relation between each target variable and each target variable.
In this instructions, each embodiment all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiment, same or analogous part cross-reference between each embodiment.For system embodiment, because itself and embodiment of the method are substantially corresponding, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
Method of the present invention, system may be realized in many ways.Such as, any combination by software, hardware, firmware or software, hardware, firmware realizes method and system of the present invention.Said sequence for the step of described method is only to be described, and the step of method of the present invention is not limited to above specifically described order, unless specifically stated otherwise.In addition, in certain embodiments, can be also record program in the recording medium by the invention process, these programs comprise the machine readable instructions for realizing according to method of the present invention.Thus, the present invention also covers the recording medium stored for performing the program according to method of the present invention.

Claims (14)

1. a symbol data analytical approach, is characterized in that, comprising:
The database table of the database D B trigger monitoring operation system of operation system, the database of described operation system stores the data of described database table with row storage mode;
In response to when monitoring the database table generation Data Update of described operation system, DB trigger will represent that the Data Update information of the Data Update situation that described database table occurs is recorded in log recording table, wherein, described database table generation Data Update comprises described database table and increases, modifies or deletes data event; Described Data Update record comprises the positional information of database table mark ID and the described generation Data Update that Data Update occurs;
Ranks store converting unit and read the Data Update information upgraded in described log recording table in real time;
If when the Data Update information table of described renewal is shown with presetting database table generation Data Update, ranks store converting unit according to the positional information in the Data Update information of described presetting database table, relevant position reading update data from the database of described operation system, and by this renewal data syn-chronization to the correspondence position in the memory database of the data of row storage mode stored data base table; Described presetting database table be preset need real-time synchronization to the database table in memory database;
Data symbol analytic unit symbolization data analysing method, to the data analysis of the more presetting database table of new data in memory database, generates the symbol data table of each target variable range format in the presetting database table of more new data.
2. method according to claim 1, is characterized in that, also comprises:
Being preset in operation system by ranks storage converting unit needs real-time synchronization to need real-time synchronization to the database table in memory database to the database table in memory database or further renewal.
3. method according to claim 2, is characterized in that, described ranks storage converting unit reads the Data Update information upgraded in described log recording table in real time and comprises:
The reading unit that described ranks store in the control module real-time calling operation system in converting unit reads the Data Update information upgraded in described log recording table;
If when the Data Update information table of described renewal is shown with presetting database table generation Data Update, ranks store converting unit according to the positional information in the Data Update information of described presetting database table, relevant position reading update data from the database of described operation system, and this renewal data syn-chronization is comprised to the correspondence position in the memory database of the data of row storage mode stored data base table:
Described control module, according to the Data Update information upgraded in described log recording table, has judged whether presetting database table generation Data Update;
If there is presetting database table generation Data Update, described control module calls described reading unit according to the positional information in the Data Update information of described presetting database table, the relevant position reading update data from the database of described operation system;
The more new data read is transferred to described ranks and stores and write data module in converting unit by described control module, and according to the ranks dislocation rule of correspondence pre-set, instruction writes data module by described renewal data syn-chronization write with in the memory database of the data of row storage mode stored data base table;
Write data module by write operation by described renewal data syn-chronization to the correspondence position in described memory database.
4. method according to claim 3, is characterized in that, described control module is called RFC call connected reading unit by remote functionality and read described Data Update information and described more new data.
5. method according to claim 3, is characterized in that, described symbolization data analysing method in memory database more the data analysis of the presetting database table of new data comprise:
Utilize the concurrent capability of multi-core central processing unit CPU, respectively symbolization data analysing method in memory database more each column data of the presetting database table of new data carry out parallel parsing.
6. the method according to claim 1 to 5 any one, is characterized in that, also comprises:
The data of data sample pretreatment unit to the described presetting database table stored based on row carry out sampling analysis pre-service, utilize the identification of data smoothing technology and delete the data departing from pre-set business value normal range;
Described data symbol analytic unit symbolization data analysing method in memory database more the data analysis of the presetting database table of new data comprise: described data symbol analytic unit symbolization data analysing method in memory database more the presetting database table of new data through the pretreated data analysis of sampling analysis.
7. method according to claim 6, is characterized in that, also comprises:
Applied analysis unit carries out applied analysis according to the symbol data table of application demand to described each target variable range format, obtains the characteristic state of the data sample of relation between described each target variable and each target variable.
8. a symbol data analytic system, is characterized in that, comprising:
Database D B trigger, for monitoring the database table of operation system, the database of described operation system stores the data of described database table with row storage mode; In response to when monitoring the database table generation Data Update of described operation system, to represent that the Data Update information of the Data Update situation that described database table occurs is recorded in log recording table, wherein, described database table generation Data Update comprises described database table and increases, modifies or deletes data event; Described Data Update record comprises the positional information of database table mark ID and the described generation Data Update that Data Update occurs;
First storage unit, for storing described log recording table;
Ranks store converting unit, for reading the Data Update information upgraded in described log recording table in real time; If when the Data Update information table of described renewal is shown with presetting database table generation Data Update, according to the positional information in the Data Update information of described presetting database table, relevant position reading update data from the database of described operation system, and by this renewal data syn-chronization to the correspondence position in the memory database of the data of row storage mode stored data base table; Described presetting database table be preset need real-time synchronization to the database table in memory database;
Second storage unit, for store need sync database tabular table, described need to record in sync database tabular table need real-time synchronization to the presetting database table information in memory database;
Memory database, for the data of row storage mode stored data base table;
Data symbol analytic unit, for the data analysis of symbolization data analysing method to the more presetting database table of new data in memory database, generates the symbol data table of each target variable range format in the presetting database table of more new data.
9. system according to claim 8, it is characterized in that, described ranks store converting unit also to be needed real-time synchronization to the database table in memory database for presetting in operation system according to user operation or upgrades the described real-time synchronization that needs further to the database table in memory database.
10. system according to claim 9, is characterized in that, also comprises:
Reading unit, for reading the Data Update information upgraded in described log recording table, and from the database of described operation system reading update data;
Described ranks store converting unit and comprise control module and write data module;
Described control module, reads the Data Update information upgraded in described log recording table for reading unit described in real-time calling; According to the Data Update information upgraded in described log recording table, judge whether presetting database table generation Data Update; If there is presetting database table generation Data Update, call described reading unit according to the positional information in the Data Update information of described presetting database table, the relevant position reading update data from the database of described operation system; The more new data read is transferred to described ranks to store and write data module in converting unit, and according to the ranks dislocation rule of correspondence pre-set, instruction writes data module by described renewal data syn-chronization write with in the memory database of the data of row storage mode stored data base table;
Write data module, for by write operation by described renewal data syn-chronization to the correspondence position in described memory database.
11. systems according to claim 10, is characterized in that, described control module is called RFC call connected reading unit especially by remote functionality and read described Data Update information and described more new data.
12. systems according to claim 10, it is characterized in that, described data symbol analytic unit specifically utilizes the concurrent capability of multi-core central processing unit CPU, respectively symbolization data analysing method in memory database more each column data of the presetting database table of new data carry out parallel parsing.
System described in 13. according to Claim 8 to 12 any one, is characterized in that, also comprise:
Data sample pretreatment unit, for by carrying out sampling analysis pre-service to the data of the described presetting database table stored based on row, utilizing the identification of data smoothing technology and deleting the data departing from pre-set business value normal range;
Described data symbol analytic unit concrete symbolization data analysing method in memory database more the presetting database table of new data through the pretreated data analysis of sampling analysis.
14. systems according to claim 13, is characterized in that, also comprise:
Applied analysis unit, for carrying out applied analysis according to the symbol data table of application demand to described each target variable range format, obtains the characteristic state of the data sample of relation between described each target variable and each target variable.
CN201410184644.0A 2014-05-05 2014-05-05 symbol data analysis method and system Active CN105095247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410184644.0A CN105095247B (en) 2014-05-05 2014-05-05 symbol data analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410184644.0A CN105095247B (en) 2014-05-05 2014-05-05 symbol data analysis method and system

Publications (2)

Publication Number Publication Date
CN105095247A true CN105095247A (en) 2015-11-25
CN105095247B CN105095247B (en) 2018-07-17

Family

ID=54575705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410184644.0A Active CN105095247B (en) 2014-05-05 2014-05-05 symbol data analysis method and system

Country Status (1)

Country Link
CN (1) CN105095247B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787129A (en) * 2016-03-29 2016-07-20 联想(北京)有限公司 Data storage method and electronic equipment
CN106570314A (en) * 2016-10-19 2017-04-19 北京千医健康管理有限公司 ICCINO (Insurance, Check, Check, Inform, Nursing and Observe) door-to-door nurse service standard
CN107077480A (en) * 2014-09-17 2017-08-18 华为技术有限公司 The method and system of column storage database is adaptively built from the row data storage storehouse of current time based on query demand
CN108108411A (en) * 2017-12-12 2018-06-01 苏州蜗牛数字科技股份有限公司 A kind of reading system and method for information list file
CN111159176A (en) * 2019-11-29 2020-05-15 中国科学院计算技术研究所 Method and system for storing and reading mass stream data
US10671594B2 (en) 2014-09-17 2020-06-02 Futurewei Technologies, Inc. Statement based migration for adaptively building and updating a column store database from a row store database based on query demands using disparate database systems
CN113064919A (en) * 2021-03-31 2021-07-02 北京达佳互联信息技术有限公司 Data processing method, data storage system, computer device and storage medium
CN113515569A (en) * 2020-04-09 2021-10-19 阿里巴巴集团控股有限公司 Data synchronization method, device and system
CN113901069A (en) * 2021-12-08 2022-01-07 威讯柏睿数据科技(北京)有限公司 Data storage method and device of distributed database

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN102880709A (en) * 2012-09-28 2013-01-16 用友软件股份有限公司 Data warehouse management system and data warehouse management method
CN103218415A (en) * 2013-03-27 2013-07-24 互爱互动(北京)科技有限公司 Data processing system and method based on data warehouse
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
CN103744906A (en) * 2013-12-26 2014-04-23 乐视网信息技术(北京)股份有限公司 System, method and device for data synchronization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN102880709A (en) * 2012-09-28 2013-01-16 用友软件股份有限公司 Data warehouse management system and data warehouse management method
CN103218415A (en) * 2013-03-27 2013-07-24 互爱互动(北京)科技有限公司 Data processing system and method based on data warehouse
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
CN103744906A (en) * 2013-12-26 2014-04-23 乐视网信息技术(北京)股份有限公司 System, method and device for data synchronization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡艳等: "《一种海量数据的分析技术——符号数据分析及应用》", 《北京航空航天大学学报(社会科学版)》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10671594B2 (en) 2014-09-17 2020-06-02 Futurewei Technologies, Inc. Statement based migration for adaptively building and updating a column store database from a row store database based on query demands using disparate database systems
CN107077480A (en) * 2014-09-17 2017-08-18 华为技术有限公司 The method and system of column storage database is adaptively built from the row data storage storehouse of current time based on query demand
CN107077480B (en) * 2014-09-17 2020-04-28 华为技术有限公司 Method and system for constructing column storage database
CN105787129A (en) * 2016-03-29 2016-07-20 联想(北京)有限公司 Data storage method and electronic equipment
CN105787129B (en) * 2016-03-29 2020-06-23 联想(北京)有限公司 Data storage method and electronic equipment
CN106570314A (en) * 2016-10-19 2017-04-19 北京千医健康管理有限公司 ICCINO (Insurance, Check, Check, Inform, Nursing and Observe) door-to-door nurse service standard
CN108108411A (en) * 2017-12-12 2018-06-01 苏州蜗牛数字科技股份有限公司 A kind of reading system and method for information list file
CN111159176A (en) * 2019-11-29 2020-05-15 中国科学院计算技术研究所 Method and system for storing and reading mass stream data
CN113515569A (en) * 2020-04-09 2021-10-19 阿里巴巴集团控股有限公司 Data synchronization method, device and system
CN113515569B (en) * 2020-04-09 2023-12-26 阿里巴巴集团控股有限公司 Data synchronization method, device and system
CN113064919A (en) * 2021-03-31 2021-07-02 北京达佳互联信息技术有限公司 Data processing method, data storage system, computer device and storage medium
CN113064919B (en) * 2021-03-31 2022-11-22 北京达佳互联信息技术有限公司 Data processing method, data storage system, computer device and storage medium
CN113901069A (en) * 2021-12-08 2022-01-07 威讯柏睿数据科技(北京)有限公司 Data storage method and device of distributed database

Also Published As

Publication number Publication date
CN105095247B (en) 2018-07-17

Similar Documents

Publication Publication Date Title
CN105095247A (en) Symbolic data analysis method and system
US11941016B2 (en) Using specified performance attributes to configure machine learning pipepline stages for an ETL job
US9367574B2 (en) Efficient query processing in columnar databases using bloom filters
Wang et al. Performance prediction for apache spark platform
US9767174B2 (en) Efficient query processing using histograms in a columnar database
JP6697392B2 (en) Transparent discovery of semi-structured data schema
US10417265B2 (en) High performance parallel indexing for forensics and electronic discovery
CN111400408A (en) Data synchronization method, device, equipment and storage medium
CN107408114B (en) Identifying join relationships based on transactional access patterns
CN109997126A (en) Event-driven is extracted, transformation, loads (ETL) processing
US10114846B1 (en) Balanced distribution of sort order values for a multi-column sort order of a relational database
CN105144080A (en) System for metadata management
US20140207820A1 (en) Method for parallel mining of temporal relations in large event file
CN105493028A (en) Data consistency and rollback for cloud analytics
CN101866358A (en) Multidimensional interval querying method and system thereof
WO2011090519A1 (en) Accessing large collection object tables in a database
CN105630934A (en) Data statistic method and system
CN111639121A (en) Big data platform and method for constructing customer portrait
CN114547204A (en) Data synchronization method and device, computer equipment and storage medium
CN114064707A (en) Data query method and device for data virtualization server and storage medium
US20070282804A1 (en) Apparatus and method for extracting database information from a report
CN104408097A (en) Hybrid indexing method and system based on character field hot update
Zhou et al. An ETL strategy for real-time data warehouse
Millham et al. Pattern mining algorithms
Punn et al. Testing big data application

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20151125

Assignee: Tianyiyun Technology Co.,Ltd.

Assignor: CHINA TELECOM Corp.,Ltd.

Contract record no.: X2024110000020

Denomination of invention: Symbolic Data Analysis Methods and Systems

Granted publication date: 20180717

License type: Common License

Record date: 20240315

EE01 Entry into force of recordation of patent licensing contract