CN106844478A - A kind of method and apparatus for assessing data message - Google Patents
A kind of method and apparatus for assessing data message Download PDFInfo
- Publication number
- CN106844478A CN106844478A CN201611207105.XA CN201611207105A CN106844478A CN 106844478 A CN106844478 A CN 106844478A CN 201611207105 A CN201611207105 A CN 201611207105A CN 106844478 A CN106844478 A CN 106844478A
- Authority
- CN
- China
- Prior art keywords
- data
- daily record
- information
- template
- multiple logical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/217—Database tuning
Abstract
The present invention provides a kind of method and apparatus for assessing data message, including multiple template table, and a kind of class table of category attribute of each template table correspondence includes the logical table of multiple difference information per type table, and method includes:The daily record of gathered data;The daily record is parsed, the keyword of data is obtained;According to the keyword of data, the category attribute identical template table with the data is found out, and from the corresponding class table of template table, find out the information identical multiple logical table with the data;According to the use information of each data in the multiple logical table, the information to data is estimated.The daily record that the present invention is produced from during data application is started with, the service condition of each data in analysis and its information identical multiple logical tables, so as to the information of the data to currently collecting, such as redundant field in storage cycle, the data of data is estimated, and can scientifically evaluate data and most reasonably store the information such as the redundant field in cycle, data.
Description
Technical field
The present invention relates to the minimum ageing technical field of data, more specifically to a kind of side for assessing data message
Method and device, to obtain the relevant parameters such as the minimum ageing of data, field use information.
Background technology
In the minimum ageing technical field of current data, the method for existing assessment data message is, with artificial industry
Business experience or high-level leader's decision-making for types of databases in data center or platform (DB2, Oracle, MPP, TERADATA,
HADOOP etc.) data set up set of parameter configuration scheme, parameter configuration such as the cleaning date including data, storage cycle
Deng, and then inform that the operation maintenance personnel of data center is managed according to parameter configuration to data by periodically prompting.
Data center is stored with substantial amounts of data, and each data is stored in the form of a table.So for per number
According to should specifically store how long, how data redundancy optimizes all is defined by artificial subjective experience or decision-making.And show
Right this artificially defined mode can not scientifically evaluate redundant field that data are most reasonably stored in cycle, data etc.
Information.
The content of the invention
In view of this, the present invention provides a kind of method and apparatus for assessing data message, is believed with solving existing assessment data
The method of breath can not scientifically evaluate the problem of the information such as the redundant field that data are most reasonably stored in cycle, data.Skill
Art scheme is as follows:
Based on an aspect of of the present present invention, the present invention provides a kind of method for assessing data message, including multiple template table, often
A kind of class table of category attribute of individual template table correspondence, the logical table of multiple difference information is included per type table;Methods described includes:
The daily record of gathered data;
The daily record is parsed, the keyword of the data is obtained;
According to the keyword of the data, the category attribute identical template table with the data is found out, and from described
In the corresponding class table of template table, the information identical multiple logical table with the data is found out;
According to the use information of each data in the multiple logical table, the information to the data is estimated.
Preferably, the daily record includes:Task run daily record and database real time access daily record.
Preferably, the parsing daily record includes:The daily record is parsed using SQL SQL analytics engines.
Preferably, the class table of different classes of attribute at least includes following one kind:Day table, menology, chronology.
Preferably, the use information according to each data in the multiple logical table, the information to the data is carried out
Assessment includes:
The usage cycles of the data are assessed according to the service condition of each data in the multiple logical table;
The reference field of the data is assessed according to the service condition of each data field in the multiple logical table and non-draw
Use field.
Based on another aspect of the present invention, the present invention provides a kind of device for assessing data message, including multiple template table,
A kind of class table of category attribute of each template table correspondence, the logical table of multiple difference information is included per type table;Described device bag
Include:
Collecting unit, for the daily record of gathered data;
Resolution unit, for parsing the daily record, obtains the keyword of the data;
Searching unit, for the keyword according to the data, finds out the category attribute identical mould with the data
Plate table, and from the corresponding class table of the template table, find out the information identical multiple logical table with the data;
Assessment unit, for the use information according to each data in the multiple logical table, the information to the data is entered
Row assessment.
Preferably, the daily record includes:Task run daily record and database real time access daily record.
Preferably, the resolution unit using SQL SQL analytics engines specifically for parsing the daily record.
Preferably, the class table of different classes of attribute at least includes following one kind:Day table, menology, chronology.
Preferably, the assessment unit includes:
First assessment subelement, the data are assessed for the service condition according to each data in the multiple logical table
Usage cycles;
Second assessment subelement, the number is assessed for the service condition according to each data field in the multiple logical table
According to reference field and non-quoted field.
The method of the assessment data message that the present invention is provided, including multiple template table, and each template table one species of correspondence
The class table of other attribute, the class table of every kind of category attribute includes the logical table of multiple difference information again.The present invention is carried out to data
During assessment, the daily record of data is gathered and parsed first, and the keyword search of the data obtained according to parsing goes out and the data
Category attribute identical template table, and from the corresponding class table of the template table, find out identical with the information of the data
Multiple logical tables, and then according to the use information of each data in the plurality of logical table, the information to the data is estimated.
The daily record that the present invention is produced from during data application is started with, and analysis makes with each data in the multiple logical tables of its information identical
With situation, so as to the information of the data to currently collecting, such as in storage cycle (i.e. minimum ageing), the data of data
Redundant field is estimated, and can scientifically evaluate data and most reasonably store the information such as the redundant field in cycle, data.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this
Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis
The accompanying drawing of offer obtains other accompanying drawings.
A kind of flow chart of the method for assessment data message that Fig. 1 is provided for the present invention;
Fig. 2 is the logical relation schematic diagram between each node in the present invention;
A kind of structural representation of the device of assessment data message that Fig. 3 is provided for the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
Fig. 1 is referred to, a kind of flow chart of the method for the assessment data message provided it illustrates the present invention.In the present invention
In embodiment, data center is stored with multiple template table, a kind of class table of category attribute of each of which template table correspondence, specifically such as
Shown in table 1 below, the class table of different classes of attribute at least includes following one kind in the present invention:Day table, menology, chronology, month to date table,
Other tables etc..Wherein, the class table of every kind of category attribute includes the logical table of multiple difference information.For by taking day table as an example, template
The correspondence day table of table 1, includes multiple day call list tables, such as day call list table 20160901, day call list again under this day table
Table 20160902, day call list table 20160903 etc., and multiple daily flow tablet menus, such as daily flow tablet menu
20160901st, daily flow tablet menu 20160902, daily flow tablet menu 20160903 etc..
Table 1
On the setting of mapping relations can be adopted with the following method between class table and template table in the application, such as day table
Its variable date and time information part 20150102 is replaced with YYYYMMDD fields by ODS_USER_20150102, the application, thus
The day table of prefix+YYYYMMDD can be set up mapping relations with the template table of " prefix+YYYYMMDD ", and realize number simultaneously
According to classification.
Used as currently preferred, the present invention can close corresponding between each template table and the class table of every kind of category attribute
System also stores in the form of a table.
Specifically, the method for the assessment data message of present invention offer includes:
Step 101, the daily record of gathered data.
Data storage in the present invention in data center is in relational database and hadoop platforms.Daily record is transported including task
Row daily record and database real time access daily record.Wherein task run daily record is mainly program in data center in the process of running
The daily record of record;Database real time access daily record is that the real-time SQL of database generated when program connection database is calculated is visited
Ask record, i.e. database SQL daily record.
Step 102, parses the daily record, obtains the keyword of the data.
Institute is parsed using SQL (Structure Query Language, SQL) analytics engine in the present invention
State daily record.Wherein, keyword can include delete, insert, update, select, from, where, order by,
Group by etc..The embodiment of the present invention is primarily upon insert and select sentences.
In the present invention, SQL statement can be cut into multiple root nodes, as shown in Fig. 2 including select, from,
where、Column List.Under each root node again connect at least one child node, therefore each root node and it is connected at least
Dependence in logic is formed between one child node.
The logical relation between each node according to Fig. 2, is arranged and is obtained table 2.
Logic table name | Field is quoted | Reference condition |
ODS_USER_20150102 | USER_ID、AREA、AGE | USER_ID |
ODS_USER_20150103 | USER_ID | USER_ID |
Table 2
Step 103, according to the keyword of the data, finds out the category attribute identical template table with the data,
And from the corresponding class table of the template table, find out the information identical multiple logical table with the data.
For by taking day table ODS_USER_20150102 as an example, its keyword includes 20150102, i.e., its description form is
YYYYMMDD, is thus matched the YYYYMMDD with the description form described in each template table one by one.When note in template table 1
The description form of+YYYYMMDD is loaded with, just can determine that a day table ODS_USER_20150102 is matched with template table 1, determine template
Table 1.And then from the corresponding class table of template table 1, finding out the information identical multiple logical table with the data.
Wherein, the information of the data can include the gauge outfit title of logical table, such as ODS_USER etc..
Step 104, according to the use information of each data in the multiple logical table, the information to the data is commented
Estimate.
Specifically, the present invention can assess the use week of the data according to the service condition of each data in multiple logical tables
Phase;The reference field and non-quoted field of the data are assessed according to the service condition of each data field in multiple logical tables.
With reference to shown in table 3 below,
Table 3
That is logical table of the present invention for ODS_USER_YYYYMMDD, it is proposed that storage 2 cycles, i.e. data use week
Phase is 2 weeks, and the reference field of data includes that USER_ID, AREA, AGE, non-quoted field include name, remakr, bak.
The method of the assessment data message that therefore the application present invention is provided, including multiple template table, and each template table pair
Answer a kind of class table of category attribute, the class table of every kind of category attribute includes the logical table of multiple difference information again.The present invention is right
When data are estimated, gather and parse the daily record of data first, and the data obtained according to parsing keyword search go out with
The category attribute identical template table of the data, and from the corresponding class table of the template table, find out and the data
Information identical multiple logical table, and then according to the use information of each data in the plurality of logical table, to the information of the data
It is estimated.The daily record that the present invention is produced from during data application is started with, in analysis and its information identical multiple logical tables
The service condition of each data, so as to the information of the data to currently collecting, such as storage cycle (the i.e. minimum timeliness of data
Property), the redundant field in data be estimated, can scientifically evaluate data most reasonably superfluous in storage cycle, data
The information such as remaining field.
Based on a kind of method of assessment data message that the present invention is provided above, the present invention also provides a kind of assessment data letter
The device of breath, the device includes multiple template table, and a kind of class table of category attribute of each template table correspondence includes many per type table
The logical table of individual different information;Wherein, the class table of different classes of attribute at least includes following one kind:Day table, menology, chronology.Tool
Body, the structure of described device as shown in figure 3, including:
Collecting unit 100, for the daily record of gathered data;Wherein, the daily record can include:Task run daily record sum
According to storehouse real time access daily record;
Resolution unit 200, for parsing the daily record, obtains the keyword of the data;
Wherein, resolution unit 200 can be specifically for parsing the daily record using SQL analytics engines;
Searching unit 300, for the keyword according to the data, finds out the category attribute identical with the data
Template table, and from the corresponding class table of the template table, find out the information identical multiple logical table with the data;
Assessment unit 400, for the use information according to each data in the multiple logical table, to the information of the data
It is estimated.
Wherein described assessment unit 400 includes:
First assessment subelement 401, the number is assessed for the service condition according to each data in the multiple logical table
According to usage cycles;
Second assessment subelement 402, institute is assessed for the service condition according to each data field in the multiple logical table
State the reference field and non-quoted field of data.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight
Point explanation is all difference with other embodiment, between each embodiment identical similar part mutually referring to.
For device class embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, related part ginseng
See the part explanation of embodiment of the method.
Finally, in addition it is also necessary to explanation, herein, such as first and second or the like relational terms be used merely to by
One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation
Between there is any this actual relation or order.And, term " including ", "comprising" or its any other variant meaning
Covering including for nonexcludability, so that process, method, article or equipment including a series of key elements not only include that
A little key elements, but also other key elements including being not expressly set out, or also include for this process, method, article or
The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", does not arrange
Except also there is other identical element in the process including the key element, method, article or equipment.
The method and apparatus to a kind of assessment data message provided herein are described in detail above, herein
Apply specific case to be set forth the principle and implementation method of the application, the explanation of above example is only intended to help
Understand the present processes and its core concept;Simultaneously for those of ordinary skill in the art, according to the thought of the application,
Will change in specific embodiments and applications, in sum, this specification content should not be construed as to this
The limitation of application.
Claims (10)
1. a kind of method for assessing data message, it is characterised in that including multiple template table, a kind of each template table classification of correspondence
The class table of attribute, the logical table of multiple difference information is included per type table;Methods described includes:
The daily record of gathered data;
The daily record is parsed, the keyword of the data is obtained;
According to the keyword of the data, the category attribute identical template table with the data is found out, and from the template
In the corresponding class table of table, the information identical multiple logical table with the data is found out;
According to the use information of each data in the multiple logical table, the information to the data is estimated.
2. method according to claim 1, it is characterised in that the daily record includes:Task run daily record and database reality
When access log.
3. method according to claim 1, it is characterised in that the parsing daily record includes:Using structure query language
Speech SQL analytics engines parse the daily record.
4. method according to claim 1, it is characterised in that the class table of different classes of attribute at least includes following one kind:
Day table, menology, chronology.
5. the method according to claim any one of 1-4, it is characterised in that described according to each number in the multiple logical table
According to use information, the information of the data is estimated including:
The usage cycles of the data are assessed according to the service condition of each data in the multiple logical table;
The reference field and non-quoted word of the data are assessed according to the service condition of each data field in the multiple logical table
Section.
6. a kind of device for assessing data message, it is characterised in that including multiple template table, a kind of each template table classification of correspondence
The class table of attribute, the logical table of multiple difference information is included per type table;Described device includes:
Collecting unit, for the daily record of gathered data;
Resolution unit, for parsing the daily record, obtains the keyword of the data;
Searching unit, for the keyword according to the data, finds out the category attribute identical template table with the data,
And from the corresponding class table of the template table, find out the information identical multiple logical table with the data;
Assessment unit, for the use information according to each data in the multiple logical table, the information to the data is commented
Estimate.
7. device according to claim 6, it is characterised in that the daily record includes:Task run daily record and database reality
When access log.
8. device according to claim 6, it is characterised in that the resolution unit is specifically for using structure query language
Speech SQL analytics engines parse the daily record.
9. device according to claim 6, it is characterised in that the class table of different classes of attribute at least includes following one kind:
Day table, menology, chronology.
10. the device according to claim any one of 6-9, it is characterised in that the assessment unit includes:
First assessment subelement, the use of the data is assessed for the service condition according to each data in the multiple logical table
Cycle;
Second assessment subelement, the data are assessed for the service condition according to each data field in the multiple logical table
Reference field and non-quoted field.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611207105.XA CN106844478B (en) | 2016-12-23 | 2016-12-23 | Method and device for evaluating data information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611207105.XA CN106844478B (en) | 2016-12-23 | 2016-12-23 | Method and device for evaluating data information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106844478A true CN106844478A (en) | 2017-06-13 |
CN106844478B CN106844478B (en) | 2020-08-04 |
Family
ID=59136974
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611207105.XA Active CN106844478B (en) | 2016-12-23 | 2016-12-23 | Method and device for evaluating data information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106844478B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040243576A1 (en) * | 1998-12-07 | 2004-12-02 | Oracle International Corporation | System and method for querying data for implicit hierarchies |
CN101141525A (en) * | 2007-10-10 | 2008-03-12 | 中兴通讯股份有限公司 | Information management system and information management method |
CN105868373A (en) * | 2016-03-31 | 2016-08-17 | 国网江西省电力公司信息通信分公司 | Method and device for processing key data of power service information system |
-
2016
- 2016-12-23 CN CN201611207105.XA patent/CN106844478B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040243576A1 (en) * | 1998-12-07 | 2004-12-02 | Oracle International Corporation | System and method for querying data for implicit hierarchies |
CN101141525A (en) * | 2007-10-10 | 2008-03-12 | 中兴通讯股份有限公司 | Information management system and information management method |
CN105868373A (en) * | 2016-03-31 | 2016-08-17 | 国网江西省电力公司信息通信分公司 | Method and device for processing key data of power service information system |
Also Published As
Publication number | Publication date |
---|---|
CN106844478B (en) | 2020-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649464B (en) | A kind of construction method and device of Chinese address tree | |
CN104750496B (en) | A kind of model changes disturbance degree automatic check method | |
US20130006968A1 (en) | Data integration system | |
CN106933906B (en) | Data multi-dimensional query method and device | |
CN104462429B (en) | The generation method and device of query sentence of database | |
CN107103032A (en) | The global mass data paging query method sorted is avoided under a kind of distributed environment | |
CN111813956A (en) | Knowledge graph construction method and device, and information penetration method and system | |
CN104699786A (en) | Semantic intelligent search communication network complaint system | |
KR100898465B1 (en) | Data storage and inquiry method for time series analysis of weblog and system for executing the method | |
CN106708965A (en) | Data processing method and apparatus | |
CN109815254A (en) | Cross-region method for scheduling task and system based on big data | |
CN107025298A (en) | A kind of big data calculates processing system and method in real time | |
He et al. | Stylus: a strongly-typed store for serving massive RDF data | |
CN115757689A (en) | Information query system, method and equipment | |
CN108228787A (en) | According to the method and apparatus of multistage classification processing information | |
CN112800083B (en) | Government decision-oriented government affair big data analysis method and equipment | |
US11113348B2 (en) | Device, system, and method for determining content relevance through ranked indexes | |
CN105653576A (en) | Information searching method and apparatus, manual position service method and system | |
Sheng et al. | Dynamic top-k range reporting in external memory | |
KR101955376B1 (en) | Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method | |
CN113407807A (en) | Query optimization method and device for search engine and electronic equipment | |
CN106933844A (en) | Towards the construction method of the accessibility search index of extensive RDF data | |
CN106844478A (en) | A kind of method and apparatus for assessing data message | |
CN109460506A (en) | A kind of resource matched method for pushing of user demand driving | |
CN109344261B (en) | Common word and common introduction analysis-based knowledge graph analysis system for primary and secondary education |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20171020 Address after: 100193 floor 19, building 1, No. 10 East Hospital, 101 East Road, Haidian District, Beijing Applicant after: AsiaInfo Science & Technology (China) Co., Ltd. Address before: 100193, Beijing, Haidian District East Road 10 East Asia headquarters R & D center building C block 5 northeast side of the building Applicant before: BEIJING ASIALNFO SMART DATA TECHNOLOGY CO., LTD. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |