CN106202580A - The double publicity production data acquisition systems realized based on ETL data warehouse technology - Google Patents

The double publicity production data acquisition systems realized based on ETL data warehouse technology Download PDF

Info

Publication number
CN106202580A
CN106202580A CN201610753313.3A CN201610753313A CN106202580A CN 106202580 A CN106202580 A CN 106202580A CN 201610753313 A CN201610753313 A CN 201610753313A CN 106202580 A CN106202580 A CN 106202580A
Authority
CN
China
Prior art keywords
data
etl
double
publicity
data acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610753313.3A
Other languages
Chinese (zh)
Inventor
曾水根
朱科支
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Number Plus Data Technology Co Ltd
Original Assignee
Jiangsu Number Plus Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Number Plus Data Technology Co Ltd filed Critical Jiangsu Number Plus Data Technology Co Ltd
Priority to CN201610753313.3A priority Critical patent/CN106202580A/en
Publication of CN106202580A publication Critical patent/CN106202580A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of double publicity production data acquisition systems realized based on ETL data warehouse technology, including collection and the process of data of data.The present invention compares with existing pair of publicity production data acquisition system, the double publicity production data acquisition system server resource utilizations height realized based on ETL data warehouse technology that the present invention realizes, double publicity production data acquisition manual operation process speed are fast, data cleansing is effective, owing to using idle automatically to carry out data cleansing and validity check at server, more complicated verification principle can be set and improve the availability of data, and can the server resource of the double publicity production data acquisition system of more Appropriate application.

Description

The double publicity production data acquisition systems realized based on ETL data warehouse technology
Technical field
The present invention relates to big market demand, Internet technology or Computer Applied Technology field, particularly relate to a kind of based on Double publicity production data acquisition systems that ETL data warehouse technology realizes.
Background technology
In the prior art, administration's administrative permission publicity and administrative penalty publicity (the most double publicity) produce number The same with other production data acquisition system of major part according to acquisition system, the account with corresponding authority is typically used by administration Double public by this administration of the data inputting function provided by the acquisition system of double publicity creation datas after number login system Show the acquisition system data base of the double publicity creation data of creation data typing, system direct effective to data in Input Process Property and whether repeat the accurate judgement of various dimensions.The judgement accurately that the effectiveness of data carries out various dimensions is inevitable Take double publicity production data acquisition system server and more hardware resources of client operation computer;And to data be No existence in data base carries out judging with the data in data warehouse, logging data to be carried out real-time comparison one by one, For the comparison of big data quantity, will significantly take the server hardware resource of the acquisition system of double publicity creation data, also will Expend operator's more data inputting time, if it is relatively slow to run into network failure, network speed in the process, holds very much and grasp Making time-out, often a gatherer process needs to attempt the most just completing, and once runs into multiple each administration and carry out simultaneously When the typing of creation data operates, due to the frequent retrieval to double publicity acquisition system data bases, double publicity is often caused to adopt The occupancy of collecting system server hardware resource is too high, causes double publicity production data acquisition inefficiency.
Summary of the invention
For the deficiency of existing pair of publicity production data acquisition system on market, the present invention provides a kind of from the double publicity of equilibrium Acquisition system server resource utilization ratio, the double publicity production data acquisition efficiency of raising are set out, and utilize ETL data warehouse technology Carry out double extraction of publicity creation data in server idle, cleaning, effectiveness are checked, sentence weight, are changed, consumption time-consuming with loading etc. Double publicity production data acquisition systems of the operation of resource.
It is an object of the invention to be achieved through the following technical solutions:
A kind of based on ETL data warehouse technology realize double publicity production data acquisition systems, including data collection with The process of data,
The collection of data comprises the steps:
S1, enters login interface input account and the password of double publicity production data acquisition system;
S2, account is verified by double publicity production data acquisition systems, is verified, logins successfully;Authentication failed, Then return login interface;
S3, after logining successfully, presets the time that automatically starts of ETL, enters subsequently into double publicity production data acquisition systems The typing of row data;
S4, the effectiveness of the data of input is verified by double publicity production data acquisition systems, points out after authentication failed Failure cause, and return the input interface of double publicity production data acquisition system;
S5, data are after verification effectively, and system is transferred the data of data in ETL volatile data base and typing and compared Judge whether to repeat, then point out such as Data duplication and ETL volatile data base has existed related data, it is not necessary to typing again;
S6, through step S5 sentence weight after data validation not have with ETL volatile data base in Data duplication after, by number According to being stored in ETL volatile data base;
The process of data comprises the steps:
S7, after arriving the automatic startup time of ETL, ETL accesses volatile data base, there are not data in volatile data base Time, ETL directly exits;When there are data in volatile data base, ETL carry out data extraction, cleaning, effectiveness check and with number Weight, conversion operation is sentenced according to data in warehouse;
S8, the data after the every data manipulation in step S7 being screened are loaded in data warehouse, and ETL exits.
Compared with prior art, the embodiment of the present invention at least has the advantage that
The present invention compares with existing pair of publicity production data acquisition system, the present invention realize based on ETL data warehouse Double publicity production data acquisition system server resource utilizations that technology realizes are high, double publicity production data acquisition manual operations Process speed is fast, and data cleansing is effective, owing to using idle automatically to carry out data cleansing and validity check at server, and can To arrange the availability of more complicated verification principle raising data, and can the double publicity production data acquisition of more Appropriate application The server resource of system.
Accompanying drawing explanation
Fig. 1 is the data acquisition of double publicity production data acquisition systems that the present invention realizes based on ETL data warehouse technology Schematic flow sheet;
Fig. 2 is that the data of double publicity production data acquisition systems that the present invention realizes based on ETL data warehouse technology process Flow chart.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is The a part of embodiment of the present invention rather than whole embodiments.Generally implement with the present invention illustrated described in accompanying drawing herein The assembly of example can be arranged with various different configurations and design.Therefore, reality to the present invention provided in the accompanying drawings below The detailed description executing example is not intended to limit the scope of claimed invention, but is merely representative of the selected enforcement of the present invention Example.Based on the embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise Every other embodiment, broadly falls into the scope of protection of the invention.
Embodiments of the invention are described below in detail, and the example of described embodiment is shown in the drawings, the most from start to finish Same or similar label represents same or similar element or has the element of same or like function.Below with reference to attached The embodiment that figure describes is exemplary, it is intended to is used for explaining the present invention, and is not considered as limiting the invention.
The invention will be further described with embodiment below in conjunction with the accompanying drawings.
As depicted in figs. 1 and 2, a kind of double publicity production data acquisition systems realized based on ETL data warehouse technology, bag Include the collection of data and the process of data,
The collection of data comprises the steps:
S1, enters login interface input account and the password of double publicity production data acquisition system;
S2, account is verified by double publicity production data acquisition systems, is verified, logins successfully;Authentication failed, Then return login interface;
S3, after logining successfully, presets the time that automatically starts of ETL, enters subsequently into double publicity production data acquisition systems The typing of row data;
S4, the effectiveness of the data of input is verified by double publicity production data acquisition systems, points out after authentication failed Failure cause, and return the input interface of double publicity production data acquisition system;
S5, data are after verification effectively, and system is transferred the data of data in ETL volatile data base and typing and compared Judge whether to repeat, then point out such as Data duplication and ETL volatile data base has existed related data, it is not necessary to typing again;
S6, through step S5 sentence weight after data validation not have with ETL volatile data base in Data duplication after, by number According to being stored in ETL volatile data base;
The process of data comprises the steps:
S7, after arriving the automatic startup time of ETL, ETL accesses volatile data base, there are not data in volatile data base Time, ETL directly exits;When there are data in volatile data base, ETL carry out data extraction, cleaning, effectiveness check and with number Weight, conversion operation is sentenced according to data in warehouse;
S8, the data after the every data manipulation in step S7 being screened are loaded in data warehouse, and ETL exits.
After operator log in double publicity production data acquisition system by authority account number, in double publicity production data acquisition circle Face, the data inputting provided by system (or importing) function by data inputting (or importing) double publicity acquisition system temporary libraries, In the process, the effectiveness of data is done simple effectiveness and checks by system, and carries out sentencing weight with the data in temporary library, Owing in temporary library, data are the most few, the more of double publicity acquisition system server and client operation computer will not be taken Resource, does not results in waiting as long for of operator yet.Double publicity production data acquisition systems are using this temporary library as data Source, automatically starts ETL process in system idle and the data of temporary library data source is carried out the extraction of data, cleaning, effectiveness school Data in core and data warehouse sentence the operations such as weight, conversion, finally according to the data warehouse model pre-defined, by data It is loaded in data warehouse.
For avoiding the data in the extraction of ETL data, cleaning, effectiveness check and data warehouse to sentence weight, change and loading etc. Process needs to take certain system resource and normally uses other function of system and impact, double publicity production data acquisition system ETL process is set to the idle of 1 in morning of every day and automatically performs by system, uses with more reasonably equalizing system resource, Improve acquisition system server resource service efficiency.
The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto, Any those familiar with the art in the technical scope that the invention discloses, the change that can readily occur in or replacement, All should contain within protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims Enclose and be as the criterion.

Claims (1)

1. the double publicity production data acquisition systems realized based on ETL data warehouse technology, it is characterised in that include data Collection and the process of data,
The collection of data comprises the steps:
S1, enters login interface input account and the password of double publicity production data acquisition system;
S2, account is verified by double publicity production data acquisition systems, is verified, logins successfully;Authentication failed, then return Return login interface;
S3, after logining successfully, presets the time that automatically starts of ETL, subsequently into double publicity production data acquisition system numbers According to typing;
S4, the effectiveness of the data of input is verified, points out unsuccessfully after authentication failed by double publicity production data acquisition systems Reason, and return the input interface of double publicity production data acquisition system;
S5, data are after verification effectively, and system is transferred the data of data in ETL volatile data base and typing and compared judgement Whether repeat, then point out such as Data duplication and ETL volatile data base has existed related data, it is not necessary to typing again;
S6, through step S5 sentence weight after data validation not have with ETL volatile data base in Data duplication after, data are deposited Enter ETL volatile data base;
The process of data comprises the steps:
S7, after arriving the automatic startup time of ETL, ETL accesses volatile data base, when there are not data in volatile data base, ETL Directly exit;When there are data in volatile data base, ETL carries out data extraction, cleaning, effectiveness check and and data warehouse Middle data sentence weight, conversion operation;
S8, the data after the every data manipulation in step S7 being screened are loaded in data warehouse, and ETL exits.
CN201610753313.3A 2016-08-29 2016-08-29 The double publicity production data acquisition systems realized based on ETL data warehouse technology Pending CN106202580A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610753313.3A CN106202580A (en) 2016-08-29 2016-08-29 The double publicity production data acquisition systems realized based on ETL data warehouse technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610753313.3A CN106202580A (en) 2016-08-29 2016-08-29 The double publicity production data acquisition systems realized based on ETL data warehouse technology

Publications (1)

Publication Number Publication Date
CN106202580A true CN106202580A (en) 2016-12-07

Family

ID=57526452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610753313.3A Pending CN106202580A (en) 2016-08-29 2016-08-29 The double publicity production data acquisition systems realized based on ETL data warehouse technology

Country Status (1)

Country Link
CN (1) CN106202580A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101105793A (en) * 2006-07-11 2008-01-16 阿里巴巴公司 Data processing method and system of data library
CN103902268A (en) * 2012-12-27 2014-07-02 方正国际软件(北京)有限公司 ETL process execution system and method
CN104933098A (en) * 2015-05-28 2015-09-23 浪潮软件集团有限公司 Data cleaning platform design method based on elimination of repeated records

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101105793A (en) * 2006-07-11 2008-01-16 阿里巴巴公司 Data processing method and system of data library
CN103902268A (en) * 2012-12-27 2014-07-02 方正国际软件(北京)有限公司 ETL process execution system and method
CN104933098A (en) * 2015-05-28 2015-09-23 浪潮软件集团有限公司 Data cleaning platform design method based on elimination of repeated records

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
德妍.: "电信关键性指标分析系统中ETL技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
曹爱华.: "数据仓库技术研究及在电信经营分析系统的应用", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Similar Documents

Publication Publication Date Title
US10607190B2 (en) Mobile check-in with push notification services
US9811445B2 (en) Methods and systems for the use of synthetic users to performance test cloud applications
CN101193027A (en) A single-point login system and method for integrated isomerous system
CN103248699A (en) Multi-account processing method of single sign on (SSO) information system
CN105868258A (en) Crawler system
CN103136619B (en) The online management method of acceptance of engineering quality form
CN101626369A (en) Method, device and system for single sign-on
CN107547595A (en) cloud resource scheduling system, method and device
CN106656514A (en) kerberos authentication cluster access method, SparkStandalone cluster, and driving node of SparkStandalone cluster
CN107070894A (en) A kind of software integrating method based on enterprise's cloud service platform
CN102542367A (en) Cloud computing network workflow processing method, device and system based on domain model
WO2018226807A1 (en) Centralized authenticating abstraction layer with adaptive assembly line pathways
CN107689941A (en) A kind of apparatus and method for preventing same user's repeat logon
CN111368165A (en) Spatio-temporal streaming data integration platform
CN104182846A (en) Client management system
CN104991831A (en) SSO system integration method based on server
CN107566406A (en) A kind of meeting summary management system based on cloud storage
CN106657112A (en) Authentication method and apparatus
CN106375334A (en) Authentication method for distributed system
CN113688376A (en) Tenant authority control method for realizing container cloud platform based on CMDB system and RBAC model
CN109241712A (en) A kind of method and apparatus for accessing file system
CN103179089A (en) System and method for identity authentication for accessing of different software development platforms
CN104539658B (en) One kind is based on enterprise's private clound big data processing method
CN106202580A (en) The double publicity production data acquisition systems realized based on ETL data warehouse technology
CN101945130A (en) Composite domain name based service array load balancing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161207

RJ01 Rejection of invention patent application after publication