CN106844478A - A kind of method and apparatus for assessing data message - Google Patents

A kind of method and apparatus for assessing data message Download PDF

Info

Publication number
CN106844478A
CN106844478A CN201611207105.XA CN201611207105A CN106844478A CN 106844478 A CN106844478 A CN 106844478A CN 201611207105 A CN201611207105 A CN 201611207105A CN 106844478 A CN106844478 A CN 106844478A
Authority
CN
China
Prior art keywords
data
daily record
information
template
multiple logical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611207105.XA
Other languages
Chinese (zh)
Other versions
CN106844478B (en
Inventor
冯文
王全胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Asiainfo Technologies China Inc
Original Assignee
Beijing Asialnfo Smart Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Asialnfo Smart Data Technology Co Ltd filed Critical Beijing Asialnfo Smart Data Technology Co Ltd
Priority to CN201611207105.XA priority Critical patent/CN106844478B/en
Publication of CN106844478A publication Critical patent/CN106844478A/en
Application granted granted Critical
Publication of CN106844478B publication Critical patent/CN106844478B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning

Abstract

The present invention provides a kind of method and apparatus for assessing data message, including multiple template table, and a kind of class table of category attribute of each template table correspondence includes the logical table of multiple difference information per type table, and method includes:The daily record of gathered data;The daily record is parsed, the keyword of data is obtained;According to the keyword of data, the category attribute identical template table with the data is found out, and from the corresponding class table of template table, find out the information identical multiple logical table with the data;According to the use information of each data in the multiple logical table, the information to data is estimated.The daily record that the present invention is produced from during data application is started with, the service condition of each data in analysis and its information identical multiple logical tables, so as to the information of the data to currently collecting, such as redundant field in storage cycle, the data of data is estimated, and can scientifically evaluate data and most reasonably store the information such as the redundant field in cycle, data.

Description

A kind of method and apparatus for assessing data message
Technical field
The present invention relates to the minimum ageing technical field of data, more specifically to a kind of side for assessing data message Method and device, to obtain the relevant parameters such as the minimum ageing of data, field use information.
Background technology
In the minimum ageing technical field of current data, the method for existing assessment data message is, with artificial industry Business experience or high-level leader's decision-making for types of databases in data center or platform (DB2, Oracle, MPP, TERADATA, HADOOP etc.) data set up set of parameter configuration scheme, parameter configuration such as the cleaning date including data, storage cycle Deng, and then inform that the operation maintenance personnel of data center is managed according to parameter configuration to data by periodically prompting.
Data center is stored with substantial amounts of data, and each data is stored in the form of a table.So for per number According to should specifically store how long, how data redundancy optimizes all is defined by artificial subjective experience or decision-making.And show Right this artificially defined mode can not scientifically evaluate redundant field that data are most reasonably stored in cycle, data etc. Information.
The content of the invention
In view of this, the present invention provides a kind of method and apparatus for assessing data message, is believed with solving existing assessment data The method of breath can not scientifically evaluate the problem of the information such as the redundant field that data are most reasonably stored in cycle, data.Skill Art scheme is as follows:
Based on an aspect of of the present present invention, the present invention provides a kind of method for assessing data message, including multiple template table, often A kind of class table of category attribute of individual template table correspondence, the logical table of multiple difference information is included per type table;Methods described includes:
The daily record of gathered data;
The daily record is parsed, the keyword of the data is obtained;
According to the keyword of the data, the category attribute identical template table with the data is found out, and from described In the corresponding class table of template table, the information identical multiple logical table with the data is found out;
According to the use information of each data in the multiple logical table, the information to the data is estimated.
Preferably, the daily record includes:Task run daily record and database real time access daily record.
Preferably, the parsing daily record includes:The daily record is parsed using SQL SQL analytics engines.
Preferably, the class table of different classes of attribute at least includes following one kind:Day table, menology, chronology.
Preferably, the use information according to each data in the multiple logical table, the information to the data is carried out Assessment includes:
The usage cycles of the data are assessed according to the service condition of each data in the multiple logical table;
The reference field of the data is assessed according to the service condition of each data field in the multiple logical table and non-draw Use field.
Based on another aspect of the present invention, the present invention provides a kind of device for assessing data message, including multiple template table, A kind of class table of category attribute of each template table correspondence, the logical table of multiple difference information is included per type table;Described device bag Include:
Collecting unit, for the daily record of gathered data;
Resolution unit, for parsing the daily record, obtains the keyword of the data;
Searching unit, for the keyword according to the data, finds out the category attribute identical mould with the data Plate table, and from the corresponding class table of the template table, find out the information identical multiple logical table with the data;
Assessment unit, for the use information according to each data in the multiple logical table, the information to the data is entered Row assessment.
Preferably, the daily record includes:Task run daily record and database real time access daily record.
Preferably, the resolution unit using SQL SQL analytics engines specifically for parsing the daily record.
Preferably, the class table of different classes of attribute at least includes following one kind:Day table, menology, chronology.
Preferably, the assessment unit includes:
First assessment subelement, the data are assessed for the service condition according to each data in the multiple logical table Usage cycles;
Second assessment subelement, the number is assessed for the service condition according to each data field in the multiple logical table According to reference field and non-quoted field.
The method of the assessment data message that the present invention is provided, including multiple template table, and each template table one species of correspondence The class table of other attribute, the class table of every kind of category attribute includes the logical table of multiple difference information again.The present invention is carried out to data During assessment, the daily record of data is gathered and parsed first, and the keyword search of the data obtained according to parsing goes out and the data Category attribute identical template table, and from the corresponding class table of the template table, find out identical with the information of the data Multiple logical tables, and then according to the use information of each data in the plurality of logical table, the information to the data is estimated. The daily record that the present invention is produced from during data application is started with, and analysis makes with each data in the multiple logical tables of its information identical With situation, so as to the information of the data to currently collecting, such as in storage cycle (i.e. minimum ageing), the data of data Redundant field is estimated, and can scientifically evaluate data and most reasonably store the information such as the redundant field in cycle, data.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis The accompanying drawing of offer obtains other accompanying drawings.
A kind of flow chart of the method for assessment data message that Fig. 1 is provided for the present invention;
Fig. 2 is the logical relation schematic diagram between each node in the present invention;
A kind of structural representation of the device of assessment data message that Fig. 3 is provided for the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
Fig. 1 is referred to, a kind of flow chart of the method for the assessment data message provided it illustrates the present invention.In the present invention In embodiment, data center is stored with multiple template table, a kind of class table of category attribute of each of which template table correspondence, specifically such as Shown in table 1 below, the class table of different classes of attribute at least includes following one kind in the present invention:Day table, menology, chronology, month to date table, Other tables etc..Wherein, the class table of every kind of category attribute includes the logical table of multiple difference information.For by taking day table as an example, template The correspondence day table of table 1, includes multiple day call list tables, such as day call list table 20160901, day call list again under this day table Table 20160902, day call list table 20160903 etc., and multiple daily flow tablet menus, such as daily flow tablet menu 20160901st, daily flow tablet menu 20160902, daily flow tablet menu 20160903 etc..
Table 1
On the setting of mapping relations can be adopted with the following method between class table and template table in the application, such as day table Its variable date and time information part 20150102 is replaced with YYYYMMDD fields by ODS_USER_20150102, the application, thus The day table of prefix+YYYYMMDD can be set up mapping relations with the template table of " prefix+YYYYMMDD ", and realize number simultaneously According to classification.
Used as currently preferred, the present invention can close corresponding between each template table and the class table of every kind of category attribute System also stores in the form of a table.
Specifically, the method for the assessment data message of present invention offer includes:
Step 101, the daily record of gathered data.
Data storage in the present invention in data center is in relational database and hadoop platforms.Daily record is transported including task Row daily record and database real time access daily record.Wherein task run daily record is mainly program in data center in the process of running The daily record of record;Database real time access daily record is that the real-time SQL of database generated when program connection database is calculated is visited Ask record, i.e. database SQL daily record.
Step 102, parses the daily record, obtains the keyword of the data.
Institute is parsed using SQL (Structure Query Language, SQL) analytics engine in the present invention State daily record.Wherein, keyword can include delete, insert, update, select, from, where, order by, Group by etc..The embodiment of the present invention is primarily upon insert and select sentences.
In the present invention, SQL statement can be cut into multiple root nodes, as shown in Fig. 2 including select, from, where、Column List.Under each root node again connect at least one child node, therefore each root node and it is connected at least Dependence in logic is formed between one child node.
The logical relation between each node according to Fig. 2, is arranged and is obtained table 2.
Logic table name Field is quoted Reference condition
ODS_USER_20150102 USER_ID、AREA、AGE USER_ID
ODS_USER_20150103 USER_ID USER_ID
Table 2
Step 103, according to the keyword of the data, finds out the category attribute identical template table with the data, And from the corresponding class table of the template table, find out the information identical multiple logical table with the data.
For by taking day table ODS_USER_20150102 as an example, its keyword includes 20150102, i.e., its description form is YYYYMMDD, is thus matched the YYYYMMDD with the description form described in each template table one by one.When note in template table 1 The description form of+YYYYMMDD is loaded with, just can determine that a day table ODS_USER_20150102 is matched with template table 1, determine template Table 1.And then from the corresponding class table of template table 1, finding out the information identical multiple logical table with the data.
Wherein, the information of the data can include the gauge outfit title of logical table, such as ODS_USER etc..
Step 104, according to the use information of each data in the multiple logical table, the information to the data is commented Estimate.
Specifically, the present invention can assess the use week of the data according to the service condition of each data in multiple logical tables Phase;The reference field and non-quoted field of the data are assessed according to the service condition of each data field in multiple logical tables.
With reference to shown in table 3 below,
Table 3
That is logical table of the present invention for ODS_USER_YYYYMMDD, it is proposed that storage 2 cycles, i.e. data use week Phase is 2 weeks, and the reference field of data includes that USER_ID, AREA, AGE, non-quoted field include name, remakr, bak.
The method of the assessment data message that therefore the application present invention is provided, including multiple template table, and each template table pair Answer a kind of class table of category attribute, the class table of every kind of category attribute includes the logical table of multiple difference information again.The present invention is right When data are estimated, gather and parse the daily record of data first, and the data obtained according to parsing keyword search go out with The category attribute identical template table of the data, and from the corresponding class table of the template table, find out and the data Information identical multiple logical table, and then according to the use information of each data in the plurality of logical table, to the information of the data It is estimated.The daily record that the present invention is produced from during data application is started with, in analysis and its information identical multiple logical tables The service condition of each data, so as to the information of the data to currently collecting, such as storage cycle (the i.e. minimum timeliness of data Property), the redundant field in data be estimated, can scientifically evaluate data most reasonably superfluous in storage cycle, data The information such as remaining field.
Based on a kind of method of assessment data message that the present invention is provided above, the present invention also provides a kind of assessment data letter The device of breath, the device includes multiple template table, and a kind of class table of category attribute of each template table correspondence includes many per type table The logical table of individual different information;Wherein, the class table of different classes of attribute at least includes following one kind:Day table, menology, chronology.Tool Body, the structure of described device as shown in figure 3, including:
Collecting unit 100, for the daily record of gathered data;Wherein, the daily record can include:Task run daily record sum According to storehouse real time access daily record;
Resolution unit 200, for parsing the daily record, obtains the keyword of the data;
Wherein, resolution unit 200 can be specifically for parsing the daily record using SQL analytics engines;
Searching unit 300, for the keyword according to the data, finds out the category attribute identical with the data Template table, and from the corresponding class table of the template table, find out the information identical multiple logical table with the data;
Assessment unit 400, for the use information according to each data in the multiple logical table, to the information of the data It is estimated.
Wherein described assessment unit 400 includes:
First assessment subelement 401, the number is assessed for the service condition according to each data in the multiple logical table According to usage cycles;
Second assessment subelement 402, institute is assessed for the service condition according to each data field in the multiple logical table State the reference field and non-quoted field of data.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference with other embodiment, between each embodiment identical similar part mutually referring to. For device class embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, related part ginseng See the part explanation of embodiment of the method.
Finally, in addition it is also necessary to explanation, herein, such as first and second or the like relational terms be used merely to by One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation Between there is any this actual relation or order.And, term " including ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that process, method, article or equipment including a series of key elements not only include that A little key elements, but also other key elements including being not expressly set out, or also include for this process, method, article or The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", does not arrange Except also there is other identical element in the process including the key element, method, article or equipment.
The method and apparatus to a kind of assessment data message provided herein are described in detail above, herein Apply specific case to be set forth the principle and implementation method of the application, the explanation of above example is only intended to help Understand the present processes and its core concept;Simultaneously for those of ordinary skill in the art, according to the thought of the application, Will change in specific embodiments and applications, in sum, this specification content should not be construed as to this The limitation of application.

Claims (10)

1. a kind of method for assessing data message, it is characterised in that including multiple template table, a kind of each template table classification of correspondence The class table of attribute, the logical table of multiple difference information is included per type table;Methods described includes:
The daily record of gathered data;
The daily record is parsed, the keyword of the data is obtained;
According to the keyword of the data, the category attribute identical template table with the data is found out, and from the template In the corresponding class table of table, the information identical multiple logical table with the data is found out;
According to the use information of each data in the multiple logical table, the information to the data is estimated.
2. method according to claim 1, it is characterised in that the daily record includes:Task run daily record and database reality When access log.
3. method according to claim 1, it is characterised in that the parsing daily record includes:Using structure query language Speech SQL analytics engines parse the daily record.
4. method according to claim 1, it is characterised in that the class table of different classes of attribute at least includes following one kind: Day table, menology, chronology.
5. the method according to claim any one of 1-4, it is characterised in that described according to each number in the multiple logical table According to use information, the information of the data is estimated including:
The usage cycles of the data are assessed according to the service condition of each data in the multiple logical table;
The reference field and non-quoted word of the data are assessed according to the service condition of each data field in the multiple logical table Section.
6. a kind of device for assessing data message, it is characterised in that including multiple template table, a kind of each template table classification of correspondence The class table of attribute, the logical table of multiple difference information is included per type table;Described device includes:
Collecting unit, for the daily record of gathered data;
Resolution unit, for parsing the daily record, obtains the keyword of the data;
Searching unit, for the keyword according to the data, finds out the category attribute identical template table with the data, And from the corresponding class table of the template table, find out the information identical multiple logical table with the data;
Assessment unit, for the use information according to each data in the multiple logical table, the information to the data is commented Estimate.
7. device according to claim 6, it is characterised in that the daily record includes:Task run daily record and database reality When access log.
8. device according to claim 6, it is characterised in that the resolution unit is specifically for using structure query language Speech SQL analytics engines parse the daily record.
9. device according to claim 6, it is characterised in that the class table of different classes of attribute at least includes following one kind: Day table, menology, chronology.
10. the device according to claim any one of 6-9, it is characterised in that the assessment unit includes:
First assessment subelement, the use of the data is assessed for the service condition according to each data in the multiple logical table Cycle;
Second assessment subelement, the data are assessed for the service condition according to each data field in the multiple logical table Reference field and non-quoted field.
CN201611207105.XA 2016-12-23 2016-12-23 Method and device for evaluating data information Active CN106844478B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611207105.XA CN106844478B (en) 2016-12-23 2016-12-23 Method and device for evaluating data information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611207105.XA CN106844478B (en) 2016-12-23 2016-12-23 Method and device for evaluating data information

Publications (2)

Publication Number Publication Date
CN106844478A true CN106844478A (en) 2017-06-13
CN106844478B CN106844478B (en) 2020-08-04

Family

ID=59136974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611207105.XA Active CN106844478B (en) 2016-12-23 2016-12-23 Method and device for evaluating data information

Country Status (1)

Country Link
CN (1) CN106844478B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243576A1 (en) * 1998-12-07 2004-12-02 Oracle International Corporation System and method for querying data for implicit hierarchies
CN101141525A (en) * 2007-10-10 2008-03-12 中兴通讯股份有限公司 Information management system and information management method
CN105868373A (en) * 2016-03-31 2016-08-17 国网江西省电力公司信息通信分公司 Method and device for processing key data of power service information system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243576A1 (en) * 1998-12-07 2004-12-02 Oracle International Corporation System and method for querying data for implicit hierarchies
CN101141525A (en) * 2007-10-10 2008-03-12 中兴通讯股份有限公司 Information management system and information management method
CN105868373A (en) * 2016-03-31 2016-08-17 国网江西省电力公司信息通信分公司 Method and device for processing key data of power service information system

Also Published As

Publication number Publication date
CN106844478B (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN106649464B (en) A kind of construction method and device of Chinese address tree
CN104750496B (en) A kind of model changes disturbance degree automatic check method
US20130006968A1 (en) Data integration system
CN106933906B (en) Data multi-dimensional query method and device
CN104462429B (en) The generation method and device of query sentence of database
CN107103032A (en) The global mass data paging query method sorted is avoided under a kind of distributed environment
CN111813956A (en) Knowledge graph construction method and device, and information penetration method and system
CN104699786A (en) Semantic intelligent search communication network complaint system
KR100898465B1 (en) Data storage and inquiry method for time series analysis of weblog and system for executing the method
CN106708965A (en) Data processing method and apparatus
CN109815254A (en) Cross-region method for scheduling task and system based on big data
CN107025298A (en) A kind of big data calculates processing system and method in real time
He et al. Stylus: a strongly-typed store for serving massive RDF data
CN115757689A (en) Information query system, method and equipment
CN108228787A (en) According to the method and apparatus of multistage classification processing information
CN112800083B (en) Government decision-oriented government affair big data analysis method and equipment
US11113348B2 (en) Device, system, and method for determining content relevance through ranked indexes
CN105653576A (en) Information searching method and apparatus, manual position service method and system
Sheng et al. Dynamic top-k range reporting in external memory
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
CN113407807A (en) Query optimization method and device for search engine and electronic equipment
CN106933844A (en) Towards the construction method of the accessibility search index of extensive RDF data
CN106844478A (en) A kind of method and apparatus for assessing data message
CN109460506A (en) A kind of resource matched method for pushing of user demand driving
CN109344261B (en) Common word and common introduction analysis-based knowledge graph analysis system for primary and secondary education

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20171020

Address after: 100193 floor 19, building 1, No. 10 East Hospital, 101 East Road, Haidian District, Beijing

Applicant after: AsiaInfo Science & Technology (China) Co., Ltd.

Address before: 100193, Beijing, Haidian District East Road 10 East Asia headquarters R & D center building C block 5 northeast side of the building

Applicant before: BEIJING ASIALNFO SMART DATA TECHNOLOGY CO., LTD.

GR01 Patent grant
GR01 Patent grant