CN106407329A - Method for automatically importing incremental data from massive platform to hadoop platform - Google Patents
Method for automatically importing incremental data from massive platform to hadoop platform Download PDFInfo
- Publication number
- CN106407329A CN106407329A CN201610801065.5A CN201610801065A CN106407329A CN 106407329 A CN106407329 A CN 106407329A CN 201610801065 A CN201610801065 A CN 201610801065A CN 106407329 A CN106407329 A CN 106407329A
- Authority
- CN
- China
- Prior art keywords
- data
- platform
- magnanimity
- hadoop
- imports
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
- G06F16/2386—Bulk updating operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for automatically importing incremental data from a massive platform to a hadoop platform. The method is characterized in that the massive platform is taken as a source database, the hadoop big data platform is taken as a target database, and an automatic incremental data import method of Java development is combined to realize the aim of importing the incremental data in the source database into the target database. Compared with the traditional Hadoop platform data import method, the method is capable of greatly improving the data import efficiency and improving the quality of the imported quality, so that a favorable foundation is provided for the correctness of subsequent big data statistical analysis.
Description
Technical field
The present invention relates to a kind of method that magnanimity platform imports incremental data toward hadoop platform automatization.
Background technology
The power industry of China is through high speed development in decades, complete with Intelligent electric power system Construction of future generation
Face is launched, and the power system of China has had become as the professional Internet of Things that maximum-norm in the world involves the interests of the state and the people, or even
In a way, this, throughout the relations of production net of each link of production and operation, has constructed the big data meter of Largest In China scale
Calculate platform, be to carry out large-scale energy resources allotment from multiple dimension such as time and space to lay a good foundation.For electric power row
For industry, electric power big data will pass through the links such as future electrical energy commercial production and management, plays unique and huge work
With, to be China electric power industry successfully manage resource-constrained, ambient pressure etc. in power industry systematic procedure of future generation and ask making
Topic, the key that the thick long-pending thickness of realization is sent out, green suslainability develops.
In recent years, electric power trade informationization had also obtained significant progress, and China's Electric Power Enterprise Information originates from 20 generation
Recorded for the 60's, the IT application in management construction with financial computerization as representative from initial electrical production automation the to the eighties, then arrive
In recent years large-scale corporation's informatization, particularly along with the all-round construction of intelligent power network of future generation, with electric power big data
For the extensive application in power industry for the IT technology of new generation representing, electric power data resource starts sharp increase and defines
Certain scale.In the long run, as " barometer " of China's economic social development, electric power data is with itself and economic development
Closely and widely contact, it will present unmatched straight outside, to China's socio-economic development down to human society
Progress also will form more powerful motive force.
For the mass data of the continuous generation of power industry, this to be processed using which kind of technology?Certainly, Hadoop
Become the first-selection of numerous enterprises, Hadoop has wide applicability and good easy-to-use because of it in big data process field
Property, after releasing from 2007, it is generalizable in industrial quarters quickly, has obtained extensive concern and the research of academia simultaneously.
In short several years, Hadoop is quickly become up to the present successful, most widely accepted use big data and processes master
Flow Technique and system platform, and become a kind of big data and process actual industrial standard, obtained industrial quarters substantial amounts of enter
One step exploitation and improvement, and be in the industry cycle widely used with application industry especially internet industry.
There are mass data and corresponding Hadoop big data treatment technology it is necessary to it is contemplated that how extra large by these this is
Amount data imports to Hadoop platform, and under such a background, this patent proposes a kind of efficient and stable solution.
Content of the invention
It is an object of the invention to provide a kind of efficient and stable magnanimity platform imports toward hadoop platform automatization increasing
The method of amount data.
The technical solution of the present invention is:
A kind of method that magnanimity platform imports incremental data toward hadoop platform automatization, is characterized in that:With magnanimity platform as source
Data base, with Hadoop big data platform as target database, and combines automatization's incremental data introduction method that Java develops,
Import to the purpose of target database to realize the incremental data in source database;
In Hadoop big data platform, using MapReduce as the computing engines of big data, with HDFS distributed file system
Storage destructuring and partly-structured data, with HBase distributed data base structured data;Java automatization increases
Amount data imports program needs daily execution once, and the realization of java applet function is divided into two parts:
(1)Java applet realizes the function to the collection of magnanimity platform incremental data;Magnanimity platform provides the interface of inquiry data, this
Interface comprises time range, table name, querying condition parameter, and java applet calls magnanimity platform to connect by configuring corresponding parameter
Mouthful, obtain corresponding data;Because java applet is the data to obtain using by the way of calling remote interface on magnanimity platform,
So the situation of Network Abnormal can be there is, for this situation, before java applet obtains the method for data in execution, first judge
Whether network connection is normal, if abnormal, records corresponding daily record, and records corresponding time tag, etc. network environment just
Chang Shi, java applet is further according to the date tag of record, rerun routine, and reacquires this date corresponding data;Separately
Outward, if there is java applet when execution obtains data method, occur as soon as the situation of Network Abnormal, then the number having obtained
According to not importing to big data platform, same log and label;
(2)After the incremental data in magnanimity platform is obtained by java applet, import data to big data platform;But doing
Before this operation, first verify the data obtaining, if normal, abnormal data will be rejected, the number that java applet obtains
According to importing to big data platform by the way of additional, before data imports, java applet needs also exist for first judging network even
Connect whether normal, during Network Abnormal, log and date tag, the introduction method of this program is not performed;In addition, such as
The situation of Network Abnormal when executing introduction method in fruit, then log, date tag and data label, thus may be used
To know which data of current date has been imported into having been introduced into the data of big data platform in big data platform, will not
Repeated to import.
Described first verify the data obtaining, if normal, be to judge that data whether there is null value, the feelings of negative value
Condition.
By Eclipse3.6 developing instrument, develop and realize export function, import feature and different using Java language
Often processing function.
By Eclipse3.6 developing instrument, develop and realize export function, import feature and different using Java language
Often processing function.
Realize export function it needs to be determined that whether the network port between java applet and magnanimity platform opens, and Java
Program can successfully call the query interface that magnanimity platform is provided.
Realize import feature it needs to be determined that whether the network port between java applet and big data platform opens, and
The order that java applet can successfully be provided using Hadoop big data platform, by the data supplementing obtaining to Hdfs file system
In system.
Realize the function that Network Abnormal judges data verification, derive data in magnanimity platform and big data platform imports number
According to when, be required for judging whether network abnormal, and record correlation log and date tag, go out in order to identify data when
Existing problem, and in the case of network is normal, again these problematic data is derived or import;When data imports
Wait, judge whether data is problematic, when null value, negative value abnormal data, program filters out in itself, so can improve and lead
Enter the total quality of data, provide high-quality data for quality evaluation below.
Compared with traditional Hadoop platform data lead-in method, this method can not only greatly improve data and import the present invention
Efficiency, can also improve the quality importing data simultaneously, and the accuracy for follow-up big data statistical analysiss provides good basis.Solution
Traditional data of having determined introduction method efficiency is low, error is big, problem poor in real time, being imported based on Hadoop platform automatization of proposition
The method of incremental data, substantially increases the efficiency importing data toward Hadoop platform, and ensure that and import the accurate real of data
Shi Xing, accuracy.
Brief description
The invention will be further described with reference to the accompanying drawings and examples.
Fig. 1 is the system flow chart of the present invention.
Specific embodiment
A kind of method that magnanimity platform imports incremental data toward hadoop platform automatization, with magnanimity platform as source data
Storehouse, with Hadoop big data platform as target database, and combines automatization's incremental data introduction method that Java develops, and comes real
Incremental data in existing source database imports to the purpose of target database;
In Hadoop big data platform, using MapReduce as the computing engines of big data, with HDFS distributed file system
Storage destructuring and partly-structured data, with HBase distributed data base structured data;Java automatization increases
Amount data imports program needs daily execution once, and due to magnanimity platform, frequency of usage is less at night, so the execution of program
Time is set at 24 points, and the realization of java applet function is divided into two parts:
(1)Java applet realizes the function to the collection of magnanimity platform incremental data;Magnanimity platform provides the interface of inquiry data, this
Interface comprises time range, table name, querying condition parameter, and java applet calls magnanimity platform to connect by configuring corresponding parameter
Mouthful, obtain corresponding data;Because java applet is the data to obtain using by the way of calling remote interface on magnanimity platform,
So the situation of Network Abnormal can be there is, for this situation, before java applet obtains the method for data in execution, first judge
Whether network connection is normal, if abnormal, records corresponding daily record, and records corresponding time tag, etc. network environment just
Chang Shi, java applet is further according to the date tag of record, rerun routine, and reacquires this date corresponding data;Separately
Outward, if there is java applet when execution obtains data method, occur as soon as the situation of Network Abnormal, then the number having obtained
According to not importing to big data platform, same log and label;
(2)After the incremental data in magnanimity platform is obtained by java applet, import data to big data platform;But doing
Before this operation, first verify the data obtaining, if normal, abnormal data will be rejected, the number that java applet obtains
According to importing to big data platform by the way of additional, before data imports, java applet needs also exist for first judging network even
Connect whether normal, during Network Abnormal, log and date tag, the introduction method of this program is not performed;In addition, such as
The situation of Network Abnormal when executing introduction method in fruit, then log, date tag and data label, thus may be used
To know which data of current date has been imported into having been introduced into the data of big data platform in big data platform, will not
Repeated to import.
Described first verify the data obtaining, if normal, be to judge that data whether there is null value, the feelings of negative value
Condition.
Deployment hardware environment and software environment, cluster server installs the operating system of CentOS 6.5, after installing, needs
Want Configuration network environment, due to java applet based on Hadoop big data platform support programs, so need in server
Cluster Hadoop installed above related software.
By Eclipse3.6 developing instrument, develop and realize export function, import feature and different using Java language
Often processing function.
By Eclipse3.6 developing instrument, develop and realize export function, import feature and different using Java language
Often processing function.
Realize export function it needs to be determined that whether the network port between java applet and magnanimity platform opens, and Java
Program can successfully call the query interface that magnanimity platform is provided.
Realize import feature it needs to be determined that whether the network port between java applet and big data platform opens, and
The order that java applet can successfully be provided using Hadoop big data platform, by the data supplementing obtaining to Hdfs file system
In system,
Code is as follows:
String hdfs_path = "hdfs://mycluster/home/wyp/wyp.txt";// file path
Configuration conf = new Configuration();
conf.setBoolean("dfs.support.append", true);
String inpath = "/home/wyp/append.txt";
FileSystem fs = null;
try {
fs = FileSystem.get(URI.create(hdfs_path), conf);
// file stream to be added, inpath is file
InputStream in = new
BufferedInputStream(new FileInputStream(inpath));
OutputStream out = fs.append(new Path(hdfs_path));
IOUtils.copyBytes(in, out, 4096, true);
} catch (IOException e) {
e.printStackTrace();
}
Realize the function that Network Abnormal judges data verification, derive data in magnanimity platform and big data platform imports data
When, it is required for judging whether network is abnormal, and records correlation log and date tag, occur asking in order to identify data when
Topic, and in the case of network is normal, again these problematic data is derived or import;When data imports,
Judge whether data is problematic, when null value, negative value abnormal data, program filters out in itself, so can improve importing
The total quality of data, provides high-quality data for quality evaluation below.
Claims (7)
1. a kind of method that magnanimity platform imports incremental data toward hadoop platform automatization, is characterized in that:With magnanimity platform it is
Source database, with Hadoop big data platform as target database, and combines the incremental data importing side of automatization of Java exploitation
Method, to realize the purpose that the incremental data in source database imports to target database;
In Hadoop big data platform, using MapReduce as the computing engines of big data, with HDFS distributed file system
Storage destructuring and partly-structured data, with HBase distributed data base structured data;Java automatization increases
Amount data imports program needs daily execution once, and the realization of java applet function is divided into two parts:
(1)Java applet realizes the function to the collection of magnanimity platform incremental data;Magnanimity platform provides the interface of inquiry data, this
Interface comprises time range, table name, querying condition parameter, and java applet calls magnanimity platform to connect by configuring corresponding parameter
Mouthful, obtain corresponding data;Because java applet is the data to obtain using by the way of calling remote interface on magnanimity platform,
So the situation of Network Abnormal can be there is, for this situation, before java applet obtains the method for data in execution, first judge
Whether network connection is normal, if abnormal, records corresponding daily record, and records corresponding time tag, etc. network environment just
Chang Shi, java applet is further according to the date tag of record, rerun routine, and reacquires this date corresponding data;Separately
Outward, if there is java applet when execution obtains data method, occur as soon as the situation of Network Abnormal, then the number having obtained
According to not importing to big data platform, same log and label;
(2)After the incremental data in magnanimity platform is obtained by java applet, import data to big data platform;But doing
Before this operation, first verify the data obtaining, if normal, abnormal data will be rejected, the number that java applet obtains
According to importing to big data platform by the way of additional, before data imports, java applet needs also exist for first judging network even
Connect whether normal, during Network Abnormal, log and date tag, the introduction method of this program is not performed;In addition, such as
The situation of Network Abnormal when executing introduction method in fruit, then log, date tag and data label, thus may be used
To know which data of current date has been imported into having been introduced into the data of big data platform in big data platform, will not
Repeated to import.
2. the method that magnanimity platform according to claim 1 imports incremental data toward hadoop platform automatization, its feature
It is:Described first verify the data obtaining, if normal, be to judge that data whether there is null value, the situation of negative value.
3. the method that magnanimity platform according to claim 1 imports incremental data toward hadoop platform automatization, its feature
It is:By Eclipse3.6 developing instrument, develop and realize export function, import feature and abnormality processing using Java language
Function.
4. the method that magnanimity platform according to claim 2 imports incremental data toward hadoop platform automatization, its feature
It is:By Eclipse3.6 developing instrument, develop and realize export function, import feature and abnormality processing using Java language
Function.
5. the method that magnanimity platform according to claim 1 imports incremental data toward hadoop platform automatization, its feature
It is:Realize export function it needs to be determined that whether the network port between java applet and magnanimity platform opens, and java applet
The query interface that magnanimity platform is provided can successfully be called.
6. the method that magnanimity platform according to claim 1 imports incremental data toward hadoop platform automatization, its feature
It is:Realize import feature it needs to be determined that whether the network port between java applet and big data platform opens, and Java journey
The order that sequence can successfully be provided using Hadoop big data platform, by the data supplementing obtaining to Hdfs file system.
7. the method that magnanimity platform according to claim 1 imports incremental data toward hadoop platform automatization, its feature
It is:Realize the function that Network Abnormal judges data verification, derive data in magnanimity platform and big data platform imports data
When, it is required for judging whether network is abnormal, and records correlation log and date tag, occur asking in order to identify data when
Topic, and in the case of network is normal, again these problematic data is derived or import;When data imports,
Judge whether data is problematic, when null value, negative value abnormal data, program filters out in itself, so can improve importing
The total quality of data, provides high-quality data for quality evaluation below.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610801065.5A CN106407329B (en) | 2016-09-05 | 2016-09-05 | Magnanimity platform automates the method for importing incremental data toward hadoop platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610801065.5A CN106407329B (en) | 2016-09-05 | 2016-09-05 | Magnanimity platform automates the method for importing incremental data toward hadoop platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106407329A true CN106407329A (en) | 2017-02-15 |
CN106407329B CN106407329B (en) | 2019-06-25 |
Family
ID=57999461
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610801065.5A Active CN106407329B (en) | 2016-09-05 | 2016-09-05 | Magnanimity platform automates the method for importing incremental data toward hadoop platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106407329B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107085613A (en) * | 2017-05-17 | 2017-08-22 | 广州四三九九信息科技有限公司 | Enter the filter method and device of library file |
CN108596686A (en) * | 2018-05-09 | 2018-09-28 | 珠海横琴盛达兆业科技投资有限公司 | A method of realizing that chain partner drugstore management system and Foshan food medicine prison system data are integrated |
CN108647362A (en) * | 2018-05-21 | 2018-10-12 | 珠海横琴盛达兆业科技投资有限公司 | A method of realizing that Mono-drugstore management system and Foshan food medicine prison system data are integrated |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102546247A (en) * | 2011-12-29 | 2012-07-04 | 华中科技大学 | Massive data continuous analysis system suitable for stream processing |
CN105243067A (en) * | 2014-07-07 | 2016-01-13 | 北京明略软件系统有限公司 | Method and apparatus for realizing real-time increment synchronization of data |
US20160196311A1 (en) * | 2013-09-26 | 2016-07-07 | Shenzhen Audaque Data Technology Ltd | Data quality measurement method and system based on a quartile graph |
-
2016
- 2016-09-05 CN CN201610801065.5A patent/CN106407329B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102546247A (en) * | 2011-12-29 | 2012-07-04 | 华中科技大学 | Massive data continuous analysis system suitable for stream processing |
US20160196311A1 (en) * | 2013-09-26 | 2016-07-07 | Shenzhen Audaque Data Technology Ltd | Data quality measurement method and system based on a quartile graph |
CN105243067A (en) * | 2014-07-07 | 2016-01-13 | 北京明略软件系统有限公司 | Method and apparatus for realizing real-time increment synchronization of data |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107085613A (en) * | 2017-05-17 | 2017-08-22 | 广州四三九九信息科技有限公司 | Enter the filter method and device of library file |
CN107085613B (en) * | 2017-05-17 | 2020-07-28 | 广州四三九九信息科技有限公司 | Method and device for filtering files to be put in storage |
CN108596686A (en) * | 2018-05-09 | 2018-09-28 | 珠海横琴盛达兆业科技投资有限公司 | A method of realizing that chain partner drugstore management system and Foshan food medicine prison system data are integrated |
CN108647362A (en) * | 2018-05-21 | 2018-10-12 | 珠海横琴盛达兆业科技投资有限公司 | A method of realizing that Mono-drugstore management system and Foshan food medicine prison system data are integrated |
Also Published As
Publication number | Publication date |
---|---|
CN106407329B (en) | 2019-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107645562A (en) | Data transmission processing method, device, equipment and system | |
CN109189379A (en) | code generating method and device | |
CN109062780A (en) | The development approach and terminal device of automatic test cases | |
CN110413595B (en) | Data migration method applied to distributed database and related device | |
CN111382073A (en) | Automatic test case determination method, device, equipment and storage medium | |
CN110502425A (en) | Test data generating method, device, electronic equipment and storage medium | |
CN106407329A (en) | Method for automatically importing incremental data from massive platform to hadoop platform | |
CN104767795A (en) | LTE MRO data statistical method and system based on HADOOP | |
Li et al. | Microservice migration using strangler fig pattern: A case study on the green button system | |
CN111159897B (en) | Target optimization method and device based on system modeling application | |
CN102955739B (en) | A kind of method improving performance test script reuse rate | |
CN112559525B (en) | Data checking system, method, device and server | |
CN113761079A (en) | Data access method, system and storage medium | |
CN116204428A (en) | Test case generation method and device | |
US10447807B1 (en) | Dynamic middleware source selection for optimizing data retrieval from network nodes | |
CN113992736B (en) | Interconnection method of structured data based on cloud computing service platform and server | |
CN104216986A (en) | Device and method for improving data query efficiency through pre-operation according to data update period | |
CN114706839A (en) | Log data processing method and device, electronic equipment and storage medium | |
CN113868116A (en) | Test dependent data generation method and device, server and storage medium | |
CN113468509A (en) | User authentication migration method, device, equipment and storage medium | |
CN109284278B (en) | Calculation logic migration method based on data analysis technology and terminal equipment | |
CN112835932A (en) | Batch processing method and device of service table and nonvolatile storage medium | |
CN113515306B (en) | System transplanting method and device | |
CN110727655B (en) | Method, device, equipment and medium for building shadow database of block chain | |
CN115794609A (en) | Script sharing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |