CN105095247B - symbol data analysis method and system - Google Patents
symbol data analysis method and system Download PDFInfo
- Publication number
- CN105095247B CN105095247B CN201410184644.0A CN201410184644A CN105095247B CN 105095247 B CN105095247 B CN 105095247B CN 201410184644 A CN201410184644 A CN 201410184644A CN 105095247 B CN105095247 B CN 105095247B
- Authority
- CN
- China
- Prior art keywords
- data
- database
- database table
- update
- presetting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000007405 data analysis Methods 0.000 title claims abstract description 48
- 230000001360 synchronised effect Effects 0.000 claims abstract description 17
- 238000004458 analytical method Methods 0.000 claims description 51
- 238000012544 monitoring process Methods 0.000 claims description 12
- 238000005516 engineering process Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 10
- 238000009499 grossing Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000009467 reduction Effects 0.000 description 4
- 238000013144 data compression Methods 0.000 description 3
- 241001269238 Data Species 0.000 description 2
- 238000010219 correlation analysis Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012856 packing Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of symbol data analysis method and systems, wherein method includes:When DB triggers monitor that data update occurs for the database table for the operation system for storing data with row storage mode, data update information is recorded in log recording table;Ranks storage converting unit reads newer data update information in log recording table in real time;If have presetting database table occur data update, from the corresponding position reading update data in the database of operation system and its be synchronized to row storage mode store database table data memory database in corresponding position;Presetting database table is the preset database table needed in real-time synchronization to memory database;Data symbol analytic unit symbolization data analysing method analyzes the data of the presetting database table updated the data, generates the symbol data table of each target variable range format in the presetting database table updated the data.Efficient real-time data analysis may be implemented in the embodiment of the present invention.
Description
Technical field
The present invention relates to computer technology, especially a kind of symbol data analysis method and system.
Background technology
In traditional application system, data are stored in traditional database.When the front-end operations that user passes through application
After interface sends out the operational order to data, application layer reads data from database and carries out logical operation in application layer, and will
Operation result feeds back to the operation that front-end operations interface is showed or carries out next step.In this process, from database
Read data because disk input/output (I/O) performance limitation and become bottleneck, and the bottleneck that mass data reading
In embody especially prominent, the statement analysis based on data warehouse is exactly a most apparent example.There are problems that this original
Because being, traditional database is actually to be stored on disk and provide for application to access connecing for data in the form of a file by data
Mouthful.It is that file is read from disk from the essence of data is read in database, and in the hardware advances of the past few decades, memory
Always in promotion at full speed, the performance boost of only magnetic disc i/o is not obvious performance with central processing unit (CPU).From disk
The upper speed for reading data is Millisecond.
General data analysis technique has very big limitation in processing " in good order " but data acquisition system of " pile up like a mountain "
Property, main difficulty is two aspects:1) due to the influence of sample point and dimension, often so that amount of calculation is quite big;
2) it is difficult to obtain the overall permanence of data group point.
For above-mentioned both sides difficulty, the prior art proposes a kind of symbol data based on row storage data warehouse point
Analysis method, such as《Canonical Correlation Analysis of Interval Data and its application in Stock Market Analysis》(system engineering, volume 22
8 phases),《A kind of analytical technology of mass data》(BJ University of Aeronautics & Astronautics's journal (Social Science Edition) the 2nd phase of volume 17).Symbol
Number analysis method establishes more higher leveled data with the thought of " data packing " in original multidimensional sample space
Stratum, to greatly simplify the calculating process to large sample set, the dimensionality reduction for changing previous sample space is often difficult to explain
The situation of its physical meaning.
In the implementation of the present invention, inventor has found, the existing symbol data analysis based on row storage data warehouse
Method improves data process effects, however it remains problems with although can dimensionality reduction effectively be carried out to high dimensional variable space:
The existing symbol data analysis method based on row storage data warehouse is a non real-time number for facing operation system
It according to analysis, arrives in face of nowadays big data, the requirement of efficient real-time data analysis shows its limitation, cannot achieve efficient reality
When data analysis;In addition, often will appear when carrying out symbol data analysis due to data sample space choose it is unreasonable
Data distortion when data sample can be caused to be converted to symbol interval censored data.
Invention content
One of technical problem to be solved of the embodiment of the present invention is:A kind of symbol data analysis method is provided and is
System, to realize efficiently real-time data analysis.
A kind of symbol data analysis method provided in an embodiment of the present invention, including:
The database table of the database D B triggers monitoring operation system of operation system, the database of the operation system with
Row storage mode stores the data of the database table;
When in response to monitoring that data update occurs for the database table of the operation system, DB triggers will indicate the number
The data update information of the data update situation occurred according to library table is recorded in log recording table, wherein the database table hair
Raw data update includes that the database table increases, modifies or deletes data event;The data update record includes hair
The database table mark ID and the location information that data update occurs of raw data update;
Ranks storage converting unit reads newer data update information in the log recording table in real time;
If the newer data update information table is shown with presetting database table generation data update, ranks storage conversion
Unit is according to the location information in the data update information of the presetting database table, from the database of the operation system
Corresponding position reading update data, and this is updated the data to the memory for being synchronized to the data that database table is stored with row storage mode
Corresponding position in database;The presetting database table is the preset number needed in real-time synchronization to memory database
According to library table;
Data symbol analytic unit symbolization data analysing method is to the preset data that is updated the data in memory database
The data of library table are analyzed, and the symbol data of each target variable range format in the presetting database table updated the data is generated
Table.
In the further embodiment of the above method of the present invention, further include:
Converting unit, which is stored, by ranks presets the number needed in operation system in real-time synchronization to memory database
The database table in real-time synchronization to memory database is needed according to Ku Biao or further update.
In the further embodiment of the above method of the present invention, the ranks storage converting unit reads the daily record note in real time
Newer data update information includes in record table:
Described in the reading unit in control module real-time calling operation system in the ranks storage converting unit is read
Newer data update information in log recording table;
If the newer data update information table is shown with presetting database table generation data update, ranks storage conversion
Unit is according to the location information in the data update information of the presetting database table, from the database of the operation system
Corresponding position reading update data, and this is updated the data to the memory for being synchronized to the data that database table is stored with row storage mode
Corresponding position in database includes:
The control module judges whether there is preset data according to newer data update information in the log recording table
Data update occurs for library table;
If there is presetting database table that data update occurs, the control module calls the reading unit according to described default
Location information in the data update information of database table reads update from the corresponding position in the database of the operation system
Data;
Updating the data of reading is transferred in the ranks storage converting unit and writes data mould by the control module
Block, and the position rule of correspondence is converted according to pre-set ranks, instruction writes data module and is synchronously written described update the data
In the memory database for storing the data of database table with row storage mode;
Write data module updates the data the correspondence position being synchronized in the memory database by write operation by described
It sets.
In the further embodiment of the above method of the present invention, the control module calls RFC connections to adjust by remote functionality
The data update information is read with reading unit and described is updated the data.
In the further embodiment of the above method of the present invention, the symbolization data analysing method is in memory database
The data of the presetting database table updated the data carry out analysis:
Using the concurrent capability of multi-core central processing unit CPU, symbol data analysis method is respectively adopted to memory database
In each column data of presetting database table for updating the data carry out parallel parsing.
In the further embodiment of the above method of the present invention, further include:
It is pre- that data sample pretreatment unit is sampled analysis to the data of the presetting database table based on row storage
Processing identifies using data smoothing technology and deletes the data for deviateing pre-set business value normal range (NR);
The data symbol analytic unit symbolization data analysing method is default to what is updated the data in memory database
The data of database table carry out analysis:The data symbol analytic unit symbolization data analysing method is to internal storage data
The presetting database table updated the data in library is analyzed by the pretreated data of sampling analysis.
In the further embodiment of the above method of the present invention, further include:
Applied analysis unit applies the symbol data table of each target variable range format according to application demand
Analysis obtains the characteristic state of the data sample of the relationship and each target variable between each target variable.
A kind of symbol data analysis system provided in an embodiment of the present invention, including:
The database of database D B triggers, the database table for monitoring operation system, the operation system is deposited with row
Storage mode stores the data of the database table;In response to monitoring that data update occurs for the database table of the operation system
When, it will indicate that the data update information of the data update situation of the database table generation is recorded in log recording table, wherein
It includes that the database table increases, modifies or deletes data event that data update, which occurs, for the database table;The data
More new record includes the database table mark ID that data update occurs and the location information that data update occurs;
First storage unit, for storing the log recording table;
Ranks store converting unit, for reading newer data update information in the log recording table in real time;If institute
State newer data update information table be shown with presetting database table occur data update when, according to the number of the presetting database table
According to the location information in fresh information, from the corresponding position reading update data in the database of the operation system, and should
Update the data the corresponding position being synchronized in the memory database for the data for storing database table with row storage mode;It is described default
Database table is the preset database table needed in real-time synchronization to memory database;
Second storage unit needs sync database table list for storing, described to need to record in sync database table list
Presetting database table information in real-time synchronization to memory database in need;
Memory database, the data for storing database table with row storage mode;
Data symbol analytic unit, it is default to what is updated the data in memory database for symbolization data analysing method
The data of database table are analyzed, and the symbolic number of each target variable range format in the presetting database table updated the data is generated
According to table.
In the further embodiment of above system of the present invention, the ranks storage converting unit is additionally operable to according to user's operation
It presets and needs database table in real-time synchronization to memory database in operation system or further update the needs
Database table in real-time synchronization to memory database.
In the further embodiment of above system of the present invention, further include:
Reading unit, for reading newer data update information in the log recording table, and from the business system
Reading update data in the database of system;
The ranks storage converting unit includes control module and writes data module;
The control module, more for newer data in the reading unit reading log recording table described in real-time calling
New information;According to newer data update information in the log recording table, judges whether there is presetting database table and data occur
Update;If there is presetting database table that data update occurs, call the reading unit according to the data of the presetting database table
Location information in fresh information, from the corresponding position reading update data in the database of the operation system;It will read
Update the data to be transferred in ranks storage converting unit and write data module, and according to pre-set ranks translation bit
The rule of correspondence is set, instruction writes data module and described update the data is synchronously written the data for storing database table with row storage mode
Memory database in;
Write data module, for updating the data pair being synchronized in the memory database by described by write operation
Answer position.
In the further embodiment of above system of the present invention, the control module calls RFC to connect especially by remote functionality
It connects and reading unit is called to read the data update information and described update the data.
In the further embodiment of above system of the present invention, the data symbol analytic unit specifically utilizes multinuclear centre
Symbol data analysis method is respectively adopted to the presetting database that is updated the data in memory database in the concurrent capability for managing device CPU
Each column data of table carries out parallel parsing.
In the further embodiment of above system of the present invention, further include:
Data sample pretreatment unit, for being taken out by the data to the presetting database table based on row storage
Sample analysis pretreatment, identifies using data smoothing technology and deletes the data for deviateing pre-set business value normal range (NR);
The specific symbolization data analysing method of data symbol analytic unit in memory database to updating the data
Presetting database table is analyzed by the pretreated data of sampling analysis.
In the further embodiment of above system of the present invention, further include:
Applied analysis unit, for being carried out to the symbol data table of each target variable range format according to application demand
Applied analysis obtains the characteristic state of the data sample of the relationship and each target variable between each target variable.
Based on symbol data analysis method and system that the above embodiment of the present invention provides, number is being stored with row storage mode
According to operation system in setting DB triggers monitor operation system database table, operation system database table occur data
When update, data update information is recorded in log recording table DB triggers;Ranks storage converting unit reads daily record in real time
Newer data update information in record sheet, if newer data update information table is shown with presetting database table and data update occurs
When, the corresponding position being synchronized to in the memory database of row storage mode storage database table data will be updated the data, then
The data of the presetting database table in memory database are carried out by data symbol analytic unit symbolization data analysing method
Analysis, generates the symbol data table of each target variable range format, and symbol is realized using the memory computing technique of row storage mode
Number analysis method, to realize the efficient real-time data analysis to mass data;Optionally, in symbolization data
Before analysis method analyzes the data of memory database, the data to the presetting database table based on row storage can be passed through
It is sampled analysis pretreatment, symbol data analysis is carried out again after deleting the data for deviateing pre-set business value normal range (NR), avoids
Data are lost when the unreasonable symbol data table for causing data sample to be converted to range format chosen due to data sample space
Very.
Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Description of the drawings
The attached drawing of a part for constitution instruction describes the embodiment of the present invention, and together with description for explaining
The principle of the present invention.
The present invention can be more clearly understood according to following detailed description with reference to attached drawing, wherein:
Fig. 1 is the flow chart of symbol data analysis method one embodiment of the present invention.
Fig. 2 is the flow chart of another embodiment of symbol data analysis method of the present invention.
Fig. 3 is the structural schematic diagram of symbol data analysis system one embodiment of the present invention.
Specific implementation mode
Carry out the various exemplary embodiments of detailed description of the present invention now with reference to attached drawing.It should be noted that:Unless in addition having
Body illustrates that the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally
The range of invention.
Simultaneously, it should be appreciated that for ease of description, the size of attached various pieces shown in the drawings is not according to reality
Proportionate relationship draw.
It is illustrative to the description only actually of at least one exemplary embodiment below, is never used as to the present invention
And its application or any restrictions that use.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable
In the case of, the technology, method and apparatus should be considered as part of specification.
In shown here and discussion all examples, any occurrence should be construed as merely illustrative, without
It is as limitation.Therefore, the other examples of exemplary embodiment can have different values.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined, then it need not be further discussed in subsequent attached drawing in a attached drawing.
Currently, the speed for reading data from memory is nanosecond, this is based on the digital independent of memory than based on disk
Digital independent performance wants fast 1,000,000 times.Memory computing technique refers to the row by efficient data compression scheme to optimize by data
Storage mode is stored entirely in memory database, gives full play to the ability of multi-core CPU, and parallel processing is carried out to data.So
When based on data warehouse carry out statement analysis when, if from traditional database reading mass data need dozens of minutes when
Between, then reading same data from memory database only needs time less than one second.In data explosion, this becomes greatly
Under gesture, memory database and internal storage data computing engines bring efficiently fast data processing and analysis ability to user.
Data reading speed based on memory database is fast, and treatment effeciency is high, can be incited somebody to action by internal storage data computing engines
Originally the operation carried out in application layer is transferred to database level and is handled, and is realized to data-intensive fortune in database level
It calculates.Based on the above feature, memory computing technique can do real-time analytic operation to extensive mass data, without prior
Modeling and data prediction.Such as, it is desirable to analysis data are gone with any dimension, model can be established in real time, completed at analysis
More than one hundred million datas may only be needed just deal within several seconds by reason, and processing speed is very fast, so can quickly attempt arbitrary number
According to model, a variety of future scenarios are simulated.
As shown in table 1 below, traditional database table is a bivariate table in operation system, is made of columns and rows,
User can in the bivariate table recording data information.As shown in the table, 2 be to store database shown in Fig. 1 with row storage mode
One data organizational form of table.As shown in table 3 below, it is that the embodiment of the present invention stores database shown in Fig. 1 with row storage mode
One data organizational form of table.
1 database table of table
Name | Length of service | Income |
Zhang San | 4 | 20000 |
Li Ming | 5 | 37000 |
Liu Li | 8 | 52000 |
2 row of table stores
The row storage of table 3
In actual data analysis, it usually needs be (such as the surname in upper table 1,2,3 of some target variable in database table
Name, the time limit, income, etc.) attribute value or certain Column Properties is calculated, using row storage relative to row storage efficiency higher,
Incoherent attribute need not be read.Therefore row storage mode has two big remarkable advantages:1. the handling capacity of querying attributes train value is improved,
Reduce I/O operation.Database table data is based on row and stores, and can quickly locate the data row of needs, while will not read
Unrelated column data reduces invalid disk read-write operation.When database table has more data row, effect promoting is brighter
It is aobvious.2. row are stored with conducive to data compression.It is stored relative to row, row storage is more suitable for data compression.Because of data Column Properties
There are identical data type, data similarity larger;And go storage, attribute is one record with recording mode Coutinuous store
In each attribute have different data types, therefore, it is difficult to for different data type datas use unified compression algorithm.
For example, a retailer when client buys product, needs to create a data record pin in operation system
The data of the target variables field such as time, place, client, the amount of money, address for selling, after the typing of data and submission are completed in front end
After platform system, a line record can be inserted into tables of data in the database, can include that this selling operation operates in this record
Relevant data.However, the database based on row storage then seems inefficient and unable to do what one wishes when supporting data analysis application.Together
The example of sample, it is assumed that this retail company of family saves 300,000,000 records in traditional database with row storage mode, and needs base
In the average amount of these sales figures analysis single sale, then need to read all this 300,000,000 records first, and take out wherein
Consumption sum this field, then carry out mean value calculation again.This means that data (the spending amount actually analyzed
Field) only account for 5% (assuming that per data 20 fields) of conceptual data, it is clear that and this is very inefficient mode.And based on
In the mechanism for arranging storage, this 300,000,000 records are actually to be stored in a manner of arranging and storing, i.e., there was only 20 records in total
(20 fields, one record of each field).When similarly being analyzed, it is only necessary to take out consumption sum this target variable
The record of row simultaneously calculates average value, compared with the mechanism based on row storage, under this exemplary application scenarios, at data
The efficiency of reason improves 50 times.
Fig. 1 is the flow chart of symbol data analysis method one embodiment of the present invention.As shown in Figure 1, the symbol of the embodiment
Number is analyzed:
110, the database table of database (Database, DB) trigger monitoring operation system of operation system.
Operation system therein for example can be Enterprise Resources Plan (Enterprise Resource Planning,
ERP) operation system, the database of operation system store the data of database table with row storage mode.
120, when in response to monitoring that data update occurs for the database table of operation system, DB triggers will indicate database
The data update information for the data update situation that table occurs is recorded in log recording table.
Wherein, it includes that the database table in operation system is increased, modified or deleted that data update, which occurs, for database table
Data event;Data update record includes the database table mark (ID) and database table generation data update that data update occurs
Location information.Each database table id can be one in the unique marks operation system such as the title of database table, number
Database table.
130, ranks storage converting unit reads newer data update information in log recording table in real time.
140, if newer data update information table is shown with presetting database table generation data update, ranks storage conversion
Unit is according to the location information in the data update information for the presetting database table that data update occurs, from the data of operation system
Corresponding position reading update data in library, and this is updated the data to be synchronized to, database table data is stored with row storage mode
Corresponding position in memory database.
Presetting database table therein is the preset database table needed in real-time synchronization to memory database, tool
Body can store converting unit by ranks, which establishes one, needs sync database table list, in this needs sync database table list
Record needs the database table id in real-time synchronization to memory database, this needs the database recorded in sync database table list
Table id can be updated as needed, such as newly-increased or deletion database table id, and certain data can also be set as needed
Library table id needs period in real-time synchronization to memory database or permanently needs in real-time synchronization to memory database.
150, data symbol analytic unit symbolization data analysing method is default to what is updated the data in memory database
The data of database table are analyzed, and the symbol of each target variable range format in the presetting database table updated the data is generated
Tables of data.
Wherein, symbolization data analysing method to the data of the presetting database table updated the data in memory database into
Row analysis, has used the thought of " data packing ", in original multidimensional data sample space, establishes more higher leveled data rank
Layer, i.e., the bound peak value of target variable dimension in determining data sample space generate the symbol data table of range format, realize
Data Dimensionality Reduction to enormously simplify the calculating process to large sample set changes the dimensionality reduction of previous sample space often
It is difficult to explain the situation of its physical meaning, it is whole to analysis data group point special to solve data sample space and variable space dimension
Property influence, so that data analysis is more efficiently accurately observed the characteristic state of data sample in real time.
Based on the symbol data analysis method that the above embodiment of the present invention provides, in the industry for storing data with row storage mode
The database table that DB triggers monitor operation system is set in business system, data update occurs in the database table of operation system
When, data update information is recorded in log recording table DB triggers;Ranks storage converting unit reads log recording in real time
Newer data update information in table, if newer data update information table is shown with presetting database table generation data update,
The corresponding position being synchronized to in the memory database of row storage mode storage database table data will be updated the data, then by counting
The data of the presetting database table in memory database are analyzed according to symbolic analysis unit symbolization data analysing method,
The symbol data table for generating each target variable range format, symbol data is realized using the memory computing technique of row storage mode
Analysis method, to realize the efficient real-time data analysis to mass data.
Fig. 2 is the flow chart of another embodiment of symbol data analysis method of the present invention.It is analyzed in symbol data of the present invention
In another embodiment of method, compared with embodiment shown in FIG. 1, operation 130 can specifically be realized in the following way:
230, ranks store the control module in converting unit and read log recording by the reading unit in real-time calling operation system
Newer data update information in table.Correspondingly, operation 140 can specifically be realized in the following way:
240, control module judges whether there is presetting database table according to newer data update information in log recording table
Data update occurs.Whether specific may determine that in log recording table in newer data update information includes presetting database
Table id.
If there is presetting database table that data update occurs, 250 operation is executed.Otherwise, if Non-precondition database table is sent out
Raw data update, does not execute the follow-up process of the present embodiment.
250, control module calls reading unit according to the data update information for the presetting database table that data update occurs
In location information, from the corresponding position reading update data in the database of operation system.
Illustratively, control module can specifically call (Remote Function Call, RFC) even by remote functionality
It connects and reading unit is called to read data update information and update the data.
260, the data module of writing being transferred in ranks storage converting unit that updates the data that control module will be read, and
The position rule of correspondence is converted according to pre-set ranks, instruction, which is write data module and will be updated the data, to be synchronously written to arrange storage side
Formula stores in the memory database of database table data.
Wherein, the ranks conversion position rule of correspondence can be that row storage location is stored with row when row storage is converted to row storage
Target variable in the rule of correspondence of relationship or operation system database table between correspondence or position between position
Each numerical value (be known as data sample) memory database database table storage location rule, or need as the case may be
The Else Rule to be arranged is known that and will answer according to the ranks conversion position rule of correspondence in short, ranks store converting unit
The data of corresponding position in the database by operation system are written to the specific location in memory database.
270, the corresponding position being synchronized in memory database will be updated the data by write operation by writing data module.
It is unrestricted according to a specific example of the embodiment of symbol data analysis method of the present invention, data symbol analysis
When unit symbolization data analysing method analyzes the data of the presetting database table updated the data in memory database,
The concurrent capability that multi-core CPU can specifically be utilized, is respectively adopted symbol data analysis method to being updated the data in memory database
Each column data of presetting database table carry out parallel parsing, for example, can be carried out respectively to the data of each target variable parallel
Analysis, to realize the parallel processing to storing data with row storage mode so as to the analyzing processing speed of data into one
Step improves several times.
In addition, in another embodiment of symbol data analysis method of the present invention, symbol is being carried out for the prior art
Often it will appear when data analysis since what data sample space was chosen unreasonable can cause data sample to be converted to symbol section
When data the problem of data distortion, the presetting database that is updated the data in symbolization data analysing method is to memory database
It, can be by operation 280, using data sample pretreatment unit to being preset based on row storage before the data of table are analyzed
The data of database table are sampled analysis pretreatment, and it is normal that deviation pre-set business value is identified and deleted using data smoothing technology
The data of range determine the reasonable data sample space that data analysis is selected, to prevent not conforming to for data sample space selection
Data distortion when reason causes data sample to be converted to the symbol data of range format.Then real especially by following operation 290 again
Operation 150 in existing Fig. 1:By data symbol analytic unit symbolization data analysing method to being updated the data in memory database
Presetting database table analyzed by the pretreated data of sampling analysis.
In addition, in another embodiment of symbol data analysis method of the present invention, in the present count that generation updates the data
It, can also be by applied analysis unit, according to using need after the symbol data table of each target variable range format in the table of library
It asks and a variety of applied analyses is carried out to the symbol data table of each target variable range format, such as symbol data factorial analysis, symbolic number
According to canonical correlation analysis etc., the spy of the relationship and the data sample of intension and each target variable between each target variable is obtained
Symptom condition.
One of ordinary skill in the art will appreciate that:Realize that all or part of step of above method embodiment can pass through
The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer read/write memory medium, the program
When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes:ROM, RAM, magnetic disc or light
The various media that can store program code such as disk.
Fig. 3 is the structural schematic diagram of symbol data analysis system one embodiment of the present invention.The symbol data of the embodiment
Analysis system can be used for realizing the symbol data analysis method of the various embodiments described above of the present invention.As shown in Figure 3 comprising DB is triggered
Device 310, the first storage unit 320, ranks storage converting unit 330, the second storage unit 340, memory database 350 and data
Symbolic analysis unit 360.Wherein:
DB triggers 310 can be selectively disposed in operation system, the database table for monitoring operation system,
The database of the operation system stores the data of database table with row storage mode;In response to monitoring the database of operation system
When data update occurs for table, it will indicate that log recording is recorded in the data update information of the data update situation of database table generation
In table.Wherein, it includes that database table increases, modifies or deletes data event that data update, which occurs, for database table;Data are more
New record includes the database table id that data update occurs and the location information that data update occurs.
First storage unit 320, for storing log recording table.
Ranks store converting unit 330, for reading newer data update information in log recording table in real time;If this is more
When new data update information table is shown with presetting database table generation data update, according to the presetting database that data update occurs
Location information in the data update information of table, from the corresponding position reading update data in the database of operation system, and will
This updates the data the corresponding position being synchronized in the memory database 350 for the data for storing database table with row storage mode.Its
In presetting database table be preset to need real-time synchronization to the database table in memory database 350.
Optionally, ranks storage converting unit 330, which can also be used to presetting one according to user's operation, needs synchrodata
Library table list further needs sync database table list to be updated this according to user's operation, and sync database table is needed to arrange
Record has the database table needed in operation system in real-time synchronization to memory database in table.
Second storage unit 340 can be selectively disposed in ranks storage converting unit 330, be needed together for storing
The table list of step data library, it is described to need to record in real-time synchronization in need to memory database 350 in sync database table list
Presetting database table information.
Memory database 350, the data for storing database table with row storage mode.
Data symbol analytic unit 360, for symbolization data analysing method to being updated the data in memory database 350
The data of presetting database table analyzed, generate each target variable range format in the presetting database table updated the data
Symbol data table.
Referring back to Fig. 3, in another embodiment of symbol data analysis system of the present invention, symbol data analysis system is also
It may include reading unit 370, can be selectively disposed in operation system, for reading newer number in log recording table
According to fresh information, and the reading update data from the database of operation system.Correspondingly, ranks storage converting unit 330 has
Body may include control module and write data module.Wherein:
Control module reads newer data update information in log recording table for real-time calling reading unit 370;Root
According to newer data update information in log recording table, judges whether there is presetting database table and data update occurs;If having default
Data update occurs for database table, calls reading unit 370 according to the data update for the presetting database table that data update occurs
Location information in information, from the corresponding position reading update data in the database of operation system;The update number that will be read
According to the data module of writing being transferred in ranks storage converting unit 330, and converts position according to pre-set ranks and correspond to rule
Then, indicate that the memory database for being synchronously written and storing database table data with row storage mode will be updated the data by writing data module
In.Specifically, control module can be read data update information and be updated the data by RFC call connecteds reading unit 370.
Data module is write, for the corresponding position being synchronized in memory database will to be updated the data by write operation.
Unrestricted according to a specific example of symbol data analysis system embodiment of the present invention, data symbol analysis is single
Member 360 can specifically utilize the concurrent capability of multi-core CPU, and symbol data analysis method is respectively adopted to being updated in memory database
Each column data of the presetting database table of data carries out parallel parsing.
Can also include data sample in another embodiment of symbol data analysis system of the present invention referring back to Fig. 3
Pretreatment unit 380 is utilized for being sampled analysis pretreatment by the data to the presetting database table based on row storage
Data smoothing technology identifies and deletes the data for deviateing pre-set business value normal range (NR).Correspondingly, data symbol analytic unit 360
When the data of presetting database table to being updated the data in memory database 350 are analyzed, specifically in memory database more
The presetting database table of new data is analyzed by the pretreated data of sampling analysis.
Further, it can also be wrapped in the further embodiment of symbol data analysis system of the present invention referring back to Fig. 3
Applied analysis unit 390 is included, is divided for the symbol data table of each target variable range format apply according to application demand
Analysis, obtains the characteristic state of the data sample of the relationship and each target variable between each target variable.
Each embodiment is described in a progressive manner in this specification, the highlights of each of the examples are with its
The difference of its embodiment, same or analogous part cross-reference between each embodiment.For system embodiment
For, since it is substantially corresponding with embodiment of the method, so description is fairly simple, referring to the portion of embodiment of the method in place of correlation
It defends oneself bright.
Method, the system of the present invention may be achieved in many ways.For example, software, hardware, firmware or soft can be passed through
Part, hardware, firmware any combinations come realize the present invention method and system.The said sequence of the step of for the method is only
It is to illustrate, the step of method of the invention is not limited to sequence described in detail above, unless otherwise especially
Explanation.In addition, in some embodiments, also the present invention can be embodied as to record program in the recording medium, these program bags
It includes for realizing machine readable instructions according to the method for the present invention.Thus, the present invention also covers storage for executing according to this
The recording medium of the program of the method for invention.
Description of the invention provides for the sake of example and description, and is not exhaustively or will be of the invention
It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches
It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those skilled in the art is enable to manage
Various embodiments with various modifications of the solution present invention to design suitable for special-purpose.
Claims (12)
1. a kind of symbol data analysis method, which is characterized in that including:
The database table of the database D B triggers monitoring operation system of operation system, the database of the operation system are deposited with row
Storage mode stores the data of the database table;
When in response to monitoring that data update occurs for the database table of the operation system, DB triggers will indicate the database
The data update information for the data update situation that table occurs is recorded in log recording table, wherein number occurs for the database table
Include that the database table increases, modifies or deletes data event according to update;The data update information includes that number occurs
ID and the location information that data update occurs are identified according to newer database table;
Ranks storage converting unit reads newer data update information in the log recording table in real time;
If the newer data update information table is shown with presetting database table generation data update, ranks store converting unit
According to the location information in the data update information of the presetting database table, from corresponding in the database of the operation system
Position reading update data, and this is updated the data to the internal storage data for being synchronized to the data that database table is stored with row storage mode
Corresponding position in library;The presetting database table is the preset database needed in real-time synchronization to memory database
Table;
Data sample pretreatment unit is sampled analysis pretreatment to the data of the presetting database table based on row storage,
The data for deviateing pre-set business value normal range (NR) are identified and deleted using data smoothing technology;
Data symbol analytic unit symbolization data analysing method is to the presetting database table that is updated the data in memory database
It is analyzed by the pretreated data of sampling analysis, generates each target variable section shape in the presetting database table updated the data
The symbol data table of formula.
2. according to the method described in claim 1, it is characterized in that, further including:
Converting unit, which is stored, by ranks presets the database needed in operation system in real-time synchronization to memory database
Table or further update need the database table in real-time synchronization to memory database.
3. according to the method described in claim 2, it is characterized in that, ranks storage converting unit reads the daily record in real time
Newer data update information includes in record sheet:
The reading unit in control module real-time calling operation system in the ranks storage converting unit reads the daily record
Newer data update information in record sheet;
If the newer data update information table is shown with presetting database table generation data update, ranks store converting unit
According to the location information in the data update information of the presetting database table, from corresponding in the database of the operation system
Position reading update data, and this is updated the data to the internal storage data for being synchronized to the data that database table is stored with row storage mode
Corresponding position in library includes:
The control module judges whether there is presetting database table according to newer data update information in the log recording table
Data update occurs;
If there is presetting database table that data update occurs, the control module calls the reading unit according to the preset data
Location information in the data update information of library table reads update number from the corresponding position in the database of the operation system
According to;
The data module of writing being transferred in the ranks storage converting unit that updates the data that the control module will be read, and
According to pre-set ranks convert the position rule of correspondence, instruction write data module by it is described update the data be synchronously written with arrange deposit
Storage mode stores in the memory database of the data of database table;
Write data module updates the data the corresponding position being synchronized in the memory database by write operation by described.
4. according to the method described in claim 3, it is characterized in that, the control module calls RFC connections by remote functionality
Reading unit is called to read the data update information and described update the data.
5. according to the method described in claim 3, it is characterized in that, the symbolization data analysing method is to memory database
In the data of presetting database table that update the data carry out analysis and include:
Using the concurrent capability of multi-core central processing unit CPU, be respectively adopted symbol data analysis method in memory database more
Each column data of the presetting database table of new data carries out parallel parsing.
6. according to the method described in claim 1, it is characterized in that, further including:
Applied analysis unit carries out applied analysis according to application demand to the symbol data table of each target variable range format,
Obtain the characteristic state of the data sample of the relationship and each target variable between each target variable.
7. a kind of symbol data analysis system, which is characterized in that including:
Database D B triggers, the database table for monitoring operation system, the database of the operation system is with row storage side
Formula stores the data of the database table;It, will when in response to monitoring that data update occurs for the database table of the operation system
Indicate that the data update information for the data update situation that the database table occurs is recorded in log recording table, wherein described
It includes that the database table increases, modifies or deletes data event that data update, which occurs, for database table;The data update
Information includes the database table mark ID that data update occurs and the location information that data update occurs;
First storage unit, for storing the log recording table;
Ranks store converting unit, for reading newer data update information in the log recording table in real time;If it is described more
When new data update information table is shown with presetting database table generation data update, more according to the data of the presetting database table
Location information in new information, from the corresponding position reading update data in the database of the operation system, and by the update
Data are synchronized to the corresponding position in the memory database for the data for storing database table with row storage mode;The preset data
Library table is the preset database table needed in real-time synchronization to memory database;
Second storage unit needs sync database table list for storing, described to need to record to have in sync database table list to need
Want the presetting database table information in real-time synchronization to memory database;
Memory database, the data for storing database table with row storage mode;
Data sample pretreatment unit, for being sampled point by the data to the presetting database table based on row storage
Analysis pretreatment identifies using data smoothing technology and deletes the data for deviateing pre-set business value normal range (NR);
Data symbol analytic unit, for symbolization data analysing method to the preset data that is updated the data in memory database
Library table is analyzed by the pretreated data of sampling analysis, generates the presetting database Biao Zhongge target variables area updated the data
Between form symbol data table.
8. system according to claim 7, which is characterized in that the ranks storage converting unit is additionally operable to be grasped according to user
It presets and needs database table in real-time synchronization to memory database in operation system or further update the need
Want the database table in real-time synchronization to memory database.
9. system according to claim 8, which is characterized in that further include:
Reading unit, for reading newer data update information in the log recording table, and from the operation system
Reading update data in database;
The ranks storage converting unit includes control module and writes data module;
The control module reads newer data update letter in the log recording table for reading unit described in real-time calling
Breath;According to newer data update information in the log recording table, judges whether there is presetting database table and data update occurs;
If there is presetting database table that data update occurs, the reading unit is called to be believed according to the data update of the presetting database table
Location information in breath, from the corresponding position reading update data in the database of the operation system;The update that will be read
Data are transferred to the data module of writing in the ranks storage converting unit, and convert position according to pre-set ranks and correspond to
Rule, instruction write data module and described update the data are synchronously written with the memory of the data of row storage mode storage database table
In database;
Write data module, for updating the data the correspondence position being synchronized in the memory database by described by write operation
It sets.
10. system according to claim 9, which is characterized in that the control module calls RFC especially by remote functionality
Call connected reading unit reads the data update information and described updates the data.
11. system according to claim 9, which is characterized in that the data symbol analytic unit specifically utilizes in multinuclear
Symbol data analysis method is respectively adopted to the present count that is updated the data in memory database in the concurrent capability of central processor CPU
Parallel parsing is carried out according to each column data of library table.
12. system according to claim 7, which is characterized in that further include:
Applied analysis unit, for being applied to the symbol data table of each target variable range format according to application demand
Analysis obtains the characteristic state of the data sample of the relationship and each target variable between each target variable.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410184644.0A CN105095247B (en) | 2014-05-05 | 2014-05-05 | symbol data analysis method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410184644.0A CN105095247B (en) | 2014-05-05 | 2014-05-05 | symbol data analysis method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105095247A CN105095247A (en) | 2015-11-25 |
CN105095247B true CN105095247B (en) | 2018-07-17 |
Family
ID=54575705
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410184644.0A Active CN105095247B (en) | 2014-05-05 | 2014-05-05 | symbol data analysis method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105095247B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10671594B2 (en) | 2014-09-17 | 2020-06-02 | Futurewei Technologies, Inc. | Statement based migration for adaptively building and updating a column store database from a row store database based on query demands using disparate database systems |
US9836507B2 (en) * | 2014-09-17 | 2017-12-05 | Futurewei Technologies, Inc. | Method and system for adaptively building a column store database from a temporal row store database based on query demands |
CN105787129B (en) * | 2016-03-29 | 2020-06-23 | 联想(北京)有限公司 | Data storage method and electronic equipment |
CN106570314A (en) * | 2016-10-19 | 2017-04-19 | 北京千医健康管理有限公司 | ICCINO (Insurance, Check, Check, Inform, Nursing and Observe) door-to-door nurse service standard |
CN108108411A (en) * | 2017-12-12 | 2018-06-01 | 苏州蜗牛数字科技股份有限公司 | A kind of reading system and method for information list file |
CN111159176A (en) * | 2019-11-29 | 2020-05-15 | 中国科学院计算技术研究所 | Method and system for storing and reading mass stream data |
CN113515569B (en) * | 2020-04-09 | 2023-12-26 | 阿里巴巴集团控股有限公司 | Data synchronization method, device and system |
CN113064919B (en) * | 2021-03-31 | 2022-11-22 | 北京达佳互联信息技术有限公司 | Data processing method, data storage system, computer device and storage medium |
CN113901069B (en) * | 2021-12-08 | 2022-03-15 | 威讯柏睿数据科技(北京)有限公司 | Data storage method and device of distributed database |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663116A (en) * | 2012-04-11 | 2012-09-12 | 中国人民大学 | Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse |
CN102880709A (en) * | 2012-09-28 | 2013-01-16 | 用友软件股份有限公司 | Data warehouse management system and data warehouse management method |
CN103218415A (en) * | 2013-03-27 | 2013-07-24 | 互爱互动(北京)科技有限公司 | Data processing system and method based on data warehouse |
CN103678665A (en) * | 2013-12-24 | 2014-03-26 | 焦点科技股份有限公司 | Heterogeneous large data integration method and system based on data warehouses |
CN103744906A (en) * | 2013-12-26 | 2014-04-23 | 乐视网信息技术(北京)股份有限公司 | System, method and device for data synchronization |
-
2014
- 2014-05-05 CN CN201410184644.0A patent/CN105095247B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663116A (en) * | 2012-04-11 | 2012-09-12 | 中国人民大学 | Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse |
CN102880709A (en) * | 2012-09-28 | 2013-01-16 | 用友软件股份有限公司 | Data warehouse management system and data warehouse management method |
CN103218415A (en) * | 2013-03-27 | 2013-07-24 | 互爱互动(北京)科技有限公司 | Data processing system and method based on data warehouse |
CN103678665A (en) * | 2013-12-24 | 2014-03-26 | 焦点科技股份有限公司 | Heterogeneous large data integration method and system based on data warehouses |
CN103744906A (en) * | 2013-12-26 | 2014-04-23 | 乐视网信息技术(北京)股份有限公司 | System, method and device for data synchronization |
Non-Patent Citations (1)
Title |
---|
《一种海量数据的分析技术——符号数据分析及应用》;胡艳等;《北京航空航天大学学报(社会科学版)》;20040625;第17卷(第2期);第40-44页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105095247A (en) | 2015-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105095247B (en) | symbol data analysis method and system | |
JP6697392B2 (en) | Transparent discovery of semi-structured data schema | |
Wang et al. | Performance prediction for apache spark platform | |
US9367574B2 (en) | Efficient query processing in columnar databases using bloom filters | |
JP5298117B2 (en) | Data merging in distributed computing | |
US10417265B2 (en) | High performance parallel indexing for forensics and electronic discovery | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
CN107145432A (en) | A kind of method and client for setting up model database | |
Gupta et al. | Faster as well as early measurements from big data predictive analytics model | |
CN109558421A (en) | Data processing method, system, device and storage medium based on caching | |
US10515102B2 (en) | Pre-processing of geo-spatial sensor data | |
JP6877435B2 (en) | Database operation method and equipment | |
CN105786877B (en) | A kind of date storage method, system and querying method | |
CN109522273A (en) | A kind of method and device for realizing data write-in | |
CN107402982A (en) | Data write-in, data matching method, device and computing device | |
Del Grosso et al. | An approach for mining services in database oriented applications | |
US20150347233A1 (en) | System and method for dynamic collection of system management data in a mainframe computing environment | |
Martinho et al. | An architecture for data warehousing in big data environments | |
Sinthong et al. | AFrame: Extending DataFrames for large-scale modern data analysis (Extended Version) | |
CN111062603B (en) | Enterprise life cycle analysis method, device and storage medium | |
Zhu et al. | SP-TSRM: a data grouping strategy in distributed storage system | |
Kiraz et al. | Iot data storage: Relational & non-relational database management systems performance comparison | |
Taori et al. | Big Data Management | |
US20170300516A1 (en) | System and method for building a dwarf data structure | |
CN113656362B (en) | Spark stream file storage method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20151125 Assignee: Tianyiyun Technology Co.,Ltd. Assignor: CHINA TELECOM Corp.,Ltd. Contract record no.: X2024110000020 Denomination of invention: Symbolic Data Analysis Methods and Systems Granted publication date: 20180717 License type: Common License Record date: 20240315 |