CN109710667A - A kind of shared realization method and system of the multisource data fusion based on big data platform - Google Patents

A kind of shared realization method and system of the multisource data fusion based on big data platform Download PDF

Info

Publication number
CN109710667A
CN109710667A CN201811426832.4A CN201811426832A CN109710667A CN 109710667 A CN109710667 A CN 109710667A CN 201811426832 A CN201811426832 A CN 201811426832A CN 109710667 A CN109710667 A CN 109710667A
Authority
CN
China
Prior art keywords
data
fusion
record
platform
submodule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811426832.4A
Other languages
Chinese (zh)
Inventor
张帅
谢莹莹
郭庆
宋怀明
蒋丹东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Dawning International Information Industry Co Ltd
Dawning Information Industry Co Ltd
Original Assignee
Zhongke Dawning International Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Dawning International Information Industry Co Ltd filed Critical Zhongke Dawning International Information Industry Co Ltd
Priority to CN201811426832.4A priority Critical patent/CN109710667A/en
Publication of CN109710667A publication Critical patent/CN109710667A/en
Pending legal-status Critical Current

Links

Abstract

The present invention provides a kind of shared realization method and system of the multisource data fusion based on big data platform, the method includes configuring at least one data source information and clocking discipline, and data access operation is executed according to the clocking discipline configured, data access operation is that data or internet data acquisition or change data or loading data are extracted from least one acquired data source to big data platform;Data fusion operation is carried out according to the clocking discipline configured to the data accessed in data access operation;It stores to form repository to a layering point library is carried out through the post-job data of data fusion, and constructs secondary index library on the repository;Data sharing is carried out by the way that unified data exchange interface is arranged in constructed big data platform.The present invention only can need to be greatly improved the online deployment efficiency of project, greatly simplify retrieval of the upper layer application to data in big data platform in face of different scenes and multi-source data by flexibly configuring without developing again.

Description

A kind of shared realization method and system of the multisource data fusion based on big data platform
Technical field
The present invention relates to big data technical field more particularly to a kind of multisource data fusion based on big data platform are shared Realization method and system.
Background technique
In recent years, with the rapid development of the IT such as internet, social networks, cloud computing, search engine and the communication technology, number All a large amount of data are being generated daily with hundred million grades of user.Emerging in large numbers for large-scale data brings valuable machine to many industries Chance, but the adjoint typical characteristics of these data simultaneously, such as extensive, multi-source (multi-source), type and mode various (isomery), High-dimensional and quality is very different etc., so that the expression of data, understand, be calculated and applied over etc. that multiple links all suffer from greatly Challenge.The quality of data is " bottleneck " for restricting data and using, and as the important solution technology for improving the quality of data, data are clear Wash be with data fusion multi-source heterogeneous Data processing hot research field, have important value and meaning.But it is traditional Data cleaning method by hard coded method realize service logic, cause the reusability, scalability and flexibility of system compared with Difference.In addition, many applications in reality are frequently necessary to the integrated isomeric data from different approaches, how to ensure these data Consistency is increasingly becoming one and has to solve the problems, such as, i.e. entity recognition techniques.
At present with traffic service system 31,194, traffic signalization crossing, public security test the speed 66 sections of bayonet, Make a dash across the red light 192 crossings of capturing system, 86 sets of system for traffic guiding, 369 sets of flow monitoring system, road video 652, high-altitude 32 sets of HD video, 45 sets of vehicle-bone 3 G video, 248 sets of event monitoring system, mobile enforcement terminal 273 etc.
" big data " of field of traffic control mainly includes motor vehicle, the driver, road of administrative acquisition from data source The file datas such as road, road surface law enfrocement official acquisition vehicle and driver information, the traffic offence information of investigation, processing traffic The data such as accident, road, traffic data information, video, picture, vehicle flowrate, the GPS rail of road electronic monitoring equipment automatic collection The data such as mark, the public service the relevant fragmentation data of generated all kinds of traffic administrations and same population, insurance, tax The information exchange data of the relevant departments such as business, planning.These data from type, it is including picture, video, bivariate table, Structuring, semi-structured, non-structured data;It include the number such as traditional business window, internet, mobile Internet from channel According to application scenarios.
Therefore, it is necessary to a kind of according to practical business demand, data accumulation, and using advanced big data technology, building is efficient Stablize high performance big data basic platform, collect multi-source heterogeneous data, is provided using unified big data storage processing framework Corresponding data access, data fusion, data storage, data calculating, data sharing etc., for being provided with for all kinds of big datas application The support and guarantee of power.
During IT application in enterprise, due to each operation system build and implement data management system stage, The factors such as technical and other economy and human factor influence, and cause enterprise to have accumulated in development process a large amount of using different The business datum of storage mode, the data management system including use also differ widely, from simple document data bank to complexity Network data base, they constitute the heterogeneous data source of enterprise.
For existing solution usually with high time overhead, runing time can be with attribute dimensions in data set Increase and is exponentially increased;Under big data environment, due to the architectural difference of data is big, data source is wide, value density is lower, It the features such as real-time is updated, brings huge challenge to multisource data fusion technology, and multi-source heterogeneous data are fused to researcher It carries out knowledge acquisition, knowledge organization under big data environment and utilizes to provide very effective means and method.But at present Knowledge fusion method from theory into action, there are also many insufficient.
Summary of the invention
Multisource data fusion provided by the invention based on big data platform shares realization method and system, can be in face of not It with scene and multi-source data, only need to be not necessarily to be developed again by flexibly configuring, greatly improve the online deployment effect of project Rate greatlies simplify retrieval of the upper layer application to data in big data platform.
In a first aspect, the present invention provides a kind of shared implementation method of the multisource data fusion based on big data platform, comprising:
At least one data source information and clocking discipline are configured, and executes data access according to the clocking discipline configured and makees Industry, wherein the data access operation is that extraction data or internet data are adopted from least one acquired data source Collection or change data or loading data are to big data platform;
Data fusion operation is carried out according to the clocking discipline configured to the data accessed in data access operation;
It stores to form repository to a layering point library is carried out through the post-job data of data fusion, and the structure on the repository Build secondary index library;
Data sharing is carried out by the way that unified data exchange interface is arranged in constructed big data platform.
Optionally, described that the data accessed in data access operation are melted according to the clocking discipline progress data configured Cooperating industry includes:
It then include that will remember to the fusion operation of the record rank data when the data accessed are record rank data The data for recording each condition carry out information checking;
It then include field to the fusion operation of the record rank data when the data accessed are field rank data Verification or field conversion.
Optionally, the data fusion operation is treated fused data by ETL method and is handled;Wherein,
ETL is realized that class uses decorative mode in the ETL method, and configures corresponding configuration file to successively real Existing filter course, conversion process and filter course.
Optionally, a layering point library is carried out through the post-job data of data fusion store to form repository for described pair, and in institute Stating building secondary index library on repository includes:
Input data catalogue, data word number of segment, data rowkey field, one or any group in thematic library name parameter It closes;
According to Hbase connection type and thematic library name, instantiation connection;
The data corresponding types newest primary load date or time record are read, between calculating load time last time Every;
Judge whether the time interval is greater than the time cycle configured in the clocking discipline;
When the time interval be greater than the clocking discipline in configured time cycle when, then log recording it is previous or Multiple load failed cycles, then audit log and execute reload operation;
Alternatively, when the time interval is no more than the time cycle configured in the clocking discipline, then according to incoming Separator, one by one split record;
Array length after fractionation compares with incoming field sum, retains the identical data of the two;
According to incoming field subscript, field is integrated into major key;
Data put to hbase;
After execution, records secondary time cycle execution and load successfully.
Optionally, it stores to form repository carrying out a layering point library to the data after convergence analysis, and in the repository After upper building secondary index library, the method also includes:
Configurable script is set, and realizes the automation creation and data load in library and table.
Optionally, described to carry out data sharing packet by establishing standard uniform data Fabric Interface in big data platform It includes:
When the data sharing carried out is shared for data query, provided by JavaAPI or Rest to upper layer application Request the shared process of response modes;
When the data sharing carried out is data retrieval, retrieval permissions are set in access control in system administration and are carried out Constraint, wherein described to retrieve the retrieval data that can return to any request;
When the data sharing carried out is data access, data access log is recorded by external shared interface.
Second aspect, the present invention provide a kind of shared realization system of the multisource data fusion based on big data platform, comprising:
Configuration module, for configuring at least one data source information and clocking discipline;
Data access module, for executing data access operation according to the clocking discipline configured, wherein the data connect Entering operation is to extract data or internet data acquisition or change data from least one acquired data source or load Data are to big data platform;
Data fusion module, for being carried out to the data accessed in data access operation according to the clocking discipline configured Data fusion operation;
Memory module, for through the post-job data of data fusion carry out layering a point library store to form repository, and Secondary index library is constructed on the repository;
Data sharing module, for being carried out by the way that unified data exchange interface is arranged in constructed big data platform Data sharing.
Optionally, the data fusion module includes:
First fusion submodule, for when the data accessed be record rank data when, then to the record number of levels According to fusion operation include will record each condition data carry out information checking;
Second fusion submodule, for when the data accessed be field rank data when, then to the record number of levels According to fusion operation include field verification or field conversion.
Optionally, the memory module includes:
Parameter input submodule, for input data catalogue, data word number of segment, data rowkey field, thematic library name One or any combination in parameter;
Instantiation connection submodule, for according to Hbase connection type and thematic library name, instantiation connection;
Computational submodule is calculated for reading the data corresponding types newest primary load date or time record Load time last time interval;
Judging submodule, for judging whether the time interval is greater than the week time configured in the clocking discipline Phase;
First operation submodule, when the time interval is greater than the time cycle configured in the clocking discipline, then Log recording is previous or multiple load failed cycles, then audit log and executes and reloads operation;
Second operation submodule, for when the time interval is no more than the time cycle configured in the clocking discipline When, then according to incoming separator, record is split one by one;Array length after fractionation compares with incoming field sum, retains The identical data of the two;According to incoming field subscript, field is integrated into major key;Data put to hbase;After execution, note Secondary time cycle execution is recorded to load successfully.
Optionally, the data sharing module includes:
Data query shares submodule, for providing request response modes to upper layer application by JavaAPI or Rest Shared process;
Data retrieval submodule is constrained, wherein institute for retrieval permissions to be arranged in access control in system administration Stating retrieval can return to the retrieval data of any request;
Data access submodule, for recording data access log by external shared interface.
Multisource data fusion provided in an embodiment of the present invention based on big data platform shares realization method and system, described Method is mainly by corresponding to configuration data source information directly flexible in big data platform and each operation of Data processing Clocking discipline, in a first aspect, the method is by directly flexibly configuring at least one data source information, so that institute The method of stating can face different scenes and multi-source data, only need to be by flexibly configuring, without being developed again, data Accessing loading procedure, all automation is realized, greatly improves the online deployment efficiency of project.Second aspect, the method can also Clocking discipline corresponding to each operation is configured, and carries out data access operation, data fusion operation according to the clocking discipline, with Enable established big data platform by using timer-triggered scheduler frame, automatic quantizer input quantization increment accesses multi-source heterogeneous data.Third Aspect, the method is by storing the unified layering point library that carries out of data to form repository, for example, multi-source to be stored is different The configurable unified storage of structure data setting promotes big number in addition, also constructing secondary index library on the unified repository established According to the inquiry velocity of multi-source data under platform.Fourth aspect, the method can also be by being arranged unified data exchange interface It is shared to carry out data query, greatlies simplify upper layer application to the retrieval complexity of data in big data platform.
Detailed description of the invention
Fig. 1 is the flow chart that one embodiment of the invention shares implementation method based on the multisource data fusion of big data platform;
Fig. 2 is the process that another embodiment of the present invention shares implementation method based on the multisource data fusion of big data platform Figure;
Fig. 3 is the flow chart of data fusion operation in one embodiment of the invention;
Fig. 4 is the structural representation that one embodiment of the invention shares realization system based on the multisource data fusion of big data platform Figure.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a kind of shared implementation method of the multisource data fusion based on big data platform, such as Fig. 1 institute Show, which comprises
S11, clocking discipline corresponding at least one data source information and each operation is configured, and according to being configured Clocking discipline executes data access operation, wherein the data access operation is to take out from least one acquired data source Access according to or internet data acquisition or change data or loading data to big data platform;
S12, data fusion operation is carried out according to the clocking discipline configured to the data accessed in data access operation;
S13, it stores to form repository to carrying out a layering point library through the post-job data of data fusion, and in the repository Upper building secondary index library;
S14, data sharing is carried out by the way that unified data exchange interface is arranged in constructed big data platform.
Multisource data fusion provided in an embodiment of the present invention based on big data platform share implementation method mainly by Clocking discipline corresponding to directly flexible configuration data source information and each operation of Data processing in big data platform, first Aspect, the method is by directly flexibly configuring at least one data source information, so that the method can face Different scenes and multi-source data, only need to be by flexibly configuring, without being developed again, and data access loading procedure is complete Portion's automation is realized, the online deployment efficiency of project is greatly improved.Second aspect, it is right that the method can also configure each operation institute The clocking discipline answered, and data access operation, data fusion operation are carried out according to the clocking discipline, so that the big number established Multi-source heterogeneous data can be accessed by using timer-triggered scheduler frame, automatic quantizer input quantization increment according to platform.The third aspect, the method are logical It crosses and stores the unified layering point library that carries out of data to form repository, for example, multi-source heterogeneous data setting to be stored can be matched The unified storage set promotes multi-source number under big data platform in addition, also constructing secondary index library on the unified repository established According to inquiry velocity.Fourth aspect, the method can also be total by the way that unified data exchange interface progress data query is arranged It enjoys, greatlies simplify upper layer application to the retrieval complexity of data in big data platform.
Specifically, data access operation described in the present embodiment the method is from multiple and different operation systems, Duo Geping Data or internet data acquisition or change data or loading data are extracted in the data source of platform to big data platform;Wherein, The data pick-up is to be acquired extraction, the data source by configuration data, formulation to data using data pick-up client The step of collection rule, carry data pick-up operation, extracts data, and the process of data pick-up is not influencing original system just Often operation;
The data receiver is to provide the reception of source data, receives the data outside the data or system in system Source, additionally it is possible to which two functional modules: data reception service and data collection client are set.
The internet data acquisition is the acquisition URL provided using user
The relevant configuration of (Uniform Resoure Locator, uniform resource locator) address and rule is to internet Webpage data information, and ultimately form Hdfs (Hadoop Distributed File System, distributed file system) text Part.
Optionally, as shown in Fig. 2, it is described to the data accessed in data access operation according to the clocking discipline configured Carrying out data fusion operation includes:
It then include that will remember to the fusion operation of the record rank data when the data accessed are record rank data The data for recording each condition carry out information checking;Wherein, the data format accessed includes non-isomery or isomery;
It then include field to the fusion operation of the record rank data when the data accessed are field rank data Verification or field conversion.
Optionally, the data fusion operation passes through the ETL (contracting of Extraction-Transformation-Loading Write, i.e., data pick-up (Extract), conversion (Transform), load (Load) process) method treat fused data progress Processing;Wherein,
ETL is realized that class uses decorative mode in the ETL method, and configures corresponding configuration file to successively real Existing filter course, conversion process and filter course.
Specifically, the data fusion operation that data fusion described in the present embodiment the method is configurable by setting, packet Record rank and the other data fusion of field level are included, wherein;The fusion operation of record rank data is included to recording a variety of conditions Cleaning verification etc.;The other data fusion operation of field level includes verifying to field, the operation such as field conversion.Shown in Fig. 3, the number According to fusion operation by the corresponding Hdfs file of data to be fused by TextInputETLMapper frame, TextInputETLReducer frame carries out fusion treatment and ultimately forms new Hdfs file format, and above-mentioned process is related to multiple Call the treatment process of same functions.In addition, ETL is realized that class uses decorative mode, configured in configuration file, such as realize Filter A (FilterA) → filtering B (FilterB) → conversion A (TransferA) → filtering A (FilterA) → filtering B (FilterB) then repetitive operation submits operation operation by job scheduling module.
Optionally, a layering point library is carried out through the post-job data of data fusion store to form repository for described pair, and in institute Stating building secondary index library on repository includes:
Input data catalogue, data word number of segment, data rowkey field, one or any group in thematic library name parameter It closes;
According to Hbase connection type and thematic library name, instantiation connection;
The data corresponding types newest primary load date or time record are read, between calculating load time last time Every;
Judge whether the time interval is greater than the time cycle configured in the clocking discipline;
When the time interval be greater than the clocking discipline in configured time cycle when, then log recording it is previous or Multiple load failed cycles, then audit log and execute reload operation;
Alternatively, when the time interval is no more than the time cycle configured in the clocking discipline, then according to incoming Separator, one by one split record;
Array length after fractionation compares with incoming field sum, retains the identical data of the two;
According to incoming field subscript, field is integrated into major key;
Data put to HBase (Hadoop Database, distributed memory system);
After execution, records secondary time cycle execution and load successfully.
Optionally, it stores to form repository carrying out a layering point library to the data after convergence analysis, and in the repository After upper building secondary index library, the method also includes:
Configurable script is set, and realizes the automation creation and data load in library and table.
Specifically, the present embodiment the method is by storing and being formed to through the post-job data hierarchy point library of data fusion Unified repository, wherein the unification repository that is formed by includes base library, thematic library, Full-text Database etc., then by setting Configurable script is set, realizes the automation creation and data load in library and table;And secondary index library is constructed on repository, Guarantee to big data search efficiency.
Optionally, described to carry out data sharing packet by establishing standard uniform data Fabric Interface in big data platform It includes:
When the data sharing carried out is shared for data query, provided by JavaAPI or Rest to upper layer application Request the shared process of response modes;
When the data sharing carried out is data retrieval, retrieval permissions are set in access control in system administration and are carried out Constraint, wherein described to retrieve the retrieval data that can return to any request;
When the data sharing carried out is data access, data access log is recorded by external shared interface.
Specifically, it is unified by setting that data query performed in the present embodiment the method, which shares operation, JavaAPI (Application Programming Interface, application programming interface) and Rest services two kinds of sides Formula provides the shared service of request response modes to upper layer application.Performed retrieval permissions operation is visited in system administration Control is asked to constrain, and can return to the retrieval data of any request by its respective modules default.The performed data access is made Industry is to go to record when data access log is called the above method by external shared interface.
The embodiment of the present invention also provides a kind of shared realization system of the multisource data fusion based on big data platform, such as Fig. 4 It is shown, the system comprises:
Configuration module 11, for configuring at least one data source information and clocking discipline;
Data access module 12, for executing data access operation according to the clocking discipline configured, wherein the data Accessing operation is that data or internet data acquisition or change data or dress are extracted from least one acquired data source Data are carried to big data platform;
Data fusion module 13, for the data accessed in data access operation according to the clocking discipline configured into Row data fusion operation;
Memory module 14, for through the post-job data of data fusion carry out layering a point library store to form repository, and Secondary index library is constructed on the repository;
Data sharing module 15, for by be arranged in constructed big data platform unified data exchange interface into Row data sharing.
The shared realization system of multisource data fusion provided in an embodiment of the present invention based on big data platform, which mainly passes through, matches Set module timing corresponding to directly flexible configuration data source information and each operation of Data processing in big data platform Rule, in a first aspect, the configuration module in the system is by directly flexibly configuring at least one data source information, So that the method can face different scenes and multi-source data, it only need to be by flexibly configuring, without being opened again Hair, data access loading procedure all realize by automation, greatly improves the online deployment efficiency of project.Second aspect, the system Configuration module in system can also be as configuring clocking discipline corresponding to each operation, and by data access module or data fusion mould Block carries out data access operation, data fusion operation according to the clocking discipline, so that the big data platform established can lead to It crosses using timer-triggered scheduler frame, automatic quantizer input quantization increment accesses multi-source heterogeneous data.The third aspect, the memory module in the system are logical It crosses and stores the unified layering point library that carries out of data to form repository, for example, multi-source heterogeneous data setting to be stored can be matched The unified storage set promotes multi-source number under big data platform in addition, also constructing secondary index library on the unified repository established According to inquiry velocity.Fourth aspect, the data sharing module in the system can also be connect by the way that unified data exchange is arranged Mouth carries out data query and shares, and greatlies simplify upper layer application to the retrieval complexity of data in big data platform.
Optionally, the data fusion module includes:
First fusion submodule, for when the data accessed be record rank data when, then to the record number of levels According to fusion operation include will record each condition data carry out information checking;
Second fusion submodule, for when the data accessed be field rank data when, then to the record number of levels According to fusion operation include field verification or field conversion.
Optionally, the memory module includes:
Parameter input submodule is used for input data catalogue, data word number of segment, data rowkey field, thematic library name One or any combination in parameter;
Instantiation connection submodule, for according to Hbase connection type and thematic library name, instantiation connection;
Computational submodule is calculated for reading the data corresponding types newest primary load date or time record Load time last time interval;
Judging submodule, for judging whether the time interval is greater than the week time configured in the clocking discipline Phase;
First operation submodule, when the time interval is greater than the time cycle configured in the clocking discipline, then Log recording is previous or multiple load failed cycles, then audit log and executes and reloads operation;
Second operation submodule, for when the time interval is no more than the time cycle configured in the clocking discipline When, then according to incoming separator, record is split one by one;Array length after fractionation compares with incoming field sum, retains The identical data of the two;According to incoming field subscript, field is integrated into major key;Data put to hbase;After execution, note Secondary time cycle execution is recorded to load successfully.
Optionally, the data sharing module includes:
Data query shares submodule, for providing request response modes to upper layer application by JavaAPI or Rest Shared process;
Data retrieval submodule is constrained, wherein institute for retrieval permissions to be arranged in access control in system administration Stating retrieval can return to the retrieval data of any request;
Data access submodule, for recording data access log by external shared interface.
The device of the present embodiment can be used for executing the technical solution of above method embodiment, realization principle and technology Effect is similar, and details are not described herein again.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers It is included within the scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims (10)

1. a kind of multisource data fusion based on big data platform shares implementation method characterized by comprising
At least one data source information and clocking discipline are configured, and executes data access operation according to the clocking discipline configured, Wherein, the data access operation be from least one acquired data source extract data or internet data acquisition or Change data or loading data are to big data platform;
Data fusion operation is carried out according to the clocking discipline configured to the data accessed in data access operation;
It stores to form repository to a layering point library is carried out through the post-job data of data fusion, and constructs two on the repository Grade index database;
Data sharing is carried out by the way that unified data exchange interface is arranged in constructed big data platform.
2. the method according to claim 1, wherein it is described to the data accessed in data access operation according to The clocking discipline configured carries out data fusion operation
It then include that will record respectively to the fusion operation of the record rank data when the data accessed are record rank data The data of condition carry out information checking;
It then include field school to the fusion operation of the record rank data when the data accessed are field rank data It tests or field is converted.
3. according to the method described in claim 2, it is characterized in that, the data fusion operation is by ETL method to be fused Data are handled;Wherein,
ETL is realized that class uses decorative mode in the ETL method, and configures corresponding configuration file successively to realize Filter process, conversion process and filter course.
4. method according to claim 1 to 3, which is characterized in that described pair through the post-job data of data fusion into A row layering point library stores to form repository, and constructs secondary index library on the repository and include:
Input data catalogue, data word number of segment, data rowkey field, one or any combination in thematic library name parameter;
According to Hbase connection type and thematic library name, instantiation connection;
The data corresponding types newest primary load date or time record are read, load time last time interval is calculated;
Judge whether the time interval is greater than the time cycle configured in the clocking discipline;
When the time interval is greater than the time cycle configured in the clocking discipline, then log recording is previous or multiple Load failed cycle, then audit log and execute reload operation;
Alternatively, when the time interval is no more than the time cycle configured in the clocking discipline, then according to incoming point Every symbol, record is split one by one;
Array length after fractionation compares with incoming field sum, retains the identical data of the two;
According to incoming field subscript, field is integrated into major key;
Data put to hbase;
After execution, records secondary time cycle execution and load successfully.
5. method according to claim 1 to 4, which is characterized in that carrying out layering point to the data after convergence analysis Library stores to form repository, and on the repository after building secondary index library, the method also includes:
Configurable script is set, and realizes the automation creation and data load in library and table.
6. -5 any method according to claim 1, which is characterized in that described by establishing standard in big data platform Uniform data Fabric Interface carries out data sharing
When the data sharing carried out is shared for data query, request is provided to upper layer application by JavaAPI or Rest The shared process of response modes;
When the data sharing carried out is data retrieval, retrieval permissions are set in access control in system administration and are carried out about Beam, wherein described to retrieve the retrieval data that can return to any request;
When the data sharing carried out is data access, data access log is recorded by external shared interface.
7. a kind of multisource data fusion based on big data platform shares realization system characterized by comprising
Configuration module, for configuring at least one data source information and clocking discipline;
Data access module, for executing data access operation according to the clocking discipline configured, wherein the data access is made Industry is that data or internet data acquisition or change data or loading data are extracted from least one acquired data source To big data platform;
Data fusion module, for carrying out data according to the clocking discipline configured to the data accessed in data access operation Merge operation;
Memory module, for storing to form repository to carrying out a layering point library through the post-job data of data fusion, and described Secondary index library is constructed on repository;
Data sharing module, for carrying out data by the way that unified data exchange interface is arranged in constructed big data platform It is shared.
8. system according to claim 7, which is characterized in that the data fusion module includes:
First fusion submodule, for when the data accessed are record rank data, then to the record rank data Fusion operation includes the data progress information checking that will record each condition;
Second fusion submodule, for when the data accessed are field rank data, then to the record rank data Fusion operation includes field verification or field conversion.
9. system according to claim 7 or 8, which is characterized in that the memory module includes:
Parameter input submodule, for input data catalogue, data word number of segment, data rowkey field, thematic library name parameter In one or any combination;
Instantiation connection submodule, for according to Hbase connection type and thematic library name, instantiation connection;
Computational submodule calculates last time for reading the data corresponding types newest primary load date or time record Load time interval;
Judging submodule, for judging whether the time interval is greater than the time cycle configured in the clocking discipline;
First operation submodule, when the time interval is greater than the time cycle configured in the clocking discipline, then log Record previous or multiple load failed cycles, then audit log and execute reload operation;
Second operation submodule, for when the time interval is no more than the time cycle configured in the clocking discipline, Then according to incoming separator, record is split one by one;Both array length after fractionation compares with incoming field sum, retain Identical data;According to incoming field subscript, field is integrated into major key;Data put to hbase;After execution, record should The secondary time cycle executes and loads successfully.
10. according to any system of claim 7-9, which is characterized in that the data sharing module includes:
Data query shares submodule, for providing being total to for request response modes to upper layer application by JavaAPI or Rest Enjoy process;
Data retrieval submodule is constrained, wherein the inspection for retrieval permissions to be arranged in access control in system administration Rope can return to the retrieval data of any request;
Data access submodule, for recording data access log by external shared interface.
CN201811426832.4A 2018-11-27 2018-11-27 A kind of shared realization method and system of the multisource data fusion based on big data platform Pending CN109710667A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811426832.4A CN109710667A (en) 2018-11-27 2018-11-27 A kind of shared realization method and system of the multisource data fusion based on big data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811426832.4A CN109710667A (en) 2018-11-27 2018-11-27 A kind of shared realization method and system of the multisource data fusion based on big data platform

Publications (1)

Publication Number Publication Date
CN109710667A true CN109710667A (en) 2019-05-03

Family

ID=66254399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811426832.4A Pending CN109710667A (en) 2018-11-27 2018-11-27 A kind of shared realization method and system of the multisource data fusion based on big data platform

Country Status (1)

Country Link
CN (1) CN109710667A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695000A (en) * 2020-06-16 2020-09-22 山东蓝海领航大数据发展有限公司 Multi-source big data loading method and system
CN110110234B (en) * 2019-05-13 2020-10-16 重庆天蓬网络有限公司 Big data real-time searching system and method
CN112732811A (en) * 2020-12-31 2021-04-30 广西中科曙光云计算有限公司 Data open platform
CN112765183A (en) * 2021-02-02 2021-05-07 浙江公共安全技术研究院有限公司 Multi-source data fusion method and device, storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216962A (en) * 2014-08-22 2014-12-17 南京邮电大学 Mass network management data indexing design method based on HBase
CN105159951A (en) * 2015-08-17 2015-12-16 成都中科大旗软件有限公司 Open tourism multi-source heterogeneous data fusion method and system
CN105389402A (en) * 2015-12-29 2016-03-09 曙光信息产业(北京)有限公司 Big-data-oriented ETL (Extraction-Transformation-Loading) method and device
US20160164924A1 (en) * 2014-12-05 2016-06-09 Cisco Technology, Inc. Stack Fusion Software Communication Service
US20160299959A1 (en) * 2011-12-19 2016-10-13 Microsoft Corporation Sensor Fusion Interface for Multiple Sensor Input
CN106326381A (en) * 2016-08-16 2017-01-11 梁猛 HBase data retrieval method based on MapDB construction
CN106777227A (en) * 2016-12-26 2017-05-31 河南信安通信技术股份有限公司 Multidimensional data convergence analysis system and method based on cloud platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160299959A1 (en) * 2011-12-19 2016-10-13 Microsoft Corporation Sensor Fusion Interface for Multiple Sensor Input
CN104216962A (en) * 2014-08-22 2014-12-17 南京邮电大学 Mass network management data indexing design method based on HBase
US20160164924A1 (en) * 2014-12-05 2016-06-09 Cisco Technology, Inc. Stack Fusion Software Communication Service
CN105159951A (en) * 2015-08-17 2015-12-16 成都中科大旗软件有限公司 Open tourism multi-source heterogeneous data fusion method and system
CN105389402A (en) * 2015-12-29 2016-03-09 曙光信息产业(北京)有限公司 Big-data-oriented ETL (Extraction-Transformation-Loading) method and device
CN106326381A (en) * 2016-08-16 2017-01-11 梁猛 HBase data retrieval method based on MapDB construction
CN106777227A (en) * 2016-12-26 2017-05-31 河南信安通信技术股份有限公司 Multidimensional data convergence analysis system and method based on cloud platform

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XIANSHANNAN: ""Html5 Player"", 《HTTPS://GITHUB.COM/DOG-DAYS/HTML5-PLAYER/TREE/B7C6091FDB910EBEFF7F0B57277C36DDB7922095》 *
孟亚辉; 张党进: "".NET应用系统中超时问题的分析与解决"", 《茂名学院学报》 *
沐海—化茧成蝶: ""jQuery AJAX timeout 超时问题详解"", 《HTTPS://WWW.JB51.NET/ARTICLE/87003.HTM》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110234B (en) * 2019-05-13 2020-10-16 重庆天蓬网络有限公司 Big data real-time searching system and method
CN111695000A (en) * 2020-06-16 2020-09-22 山东蓝海领航大数据发展有限公司 Multi-source big data loading method and system
CN112732811A (en) * 2020-12-31 2021-04-30 广西中科曙光云计算有限公司 Data open platform
CN112765183A (en) * 2021-02-02 2021-05-07 浙江公共安全技术研究院有限公司 Multi-source data fusion method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN112685385B (en) Big data platform for smart city construction
Khare et al. Big data in IoT
Fiore et al. An integrated big and fast data analytics platform for smart urban transportation management
CN105608203B (en) A kind of Internet of Things log processing method and device based on Hadoop platform
CN109710667A (en) A kind of shared realization method and system of the multisource data fusion based on big data platform
CN103838847B (en) Data organization method oriented to sea-cloud collaboration network computing network
CN112732811A (en) Data open platform
US10970322B2 (en) Training an artificial intelligence to generate an answer to a query based on an answer table pattern
CN109074387A (en) Versioned hierarchical data structure in Distributed Storage area
CN106982150A (en) A kind of mobile Internet user behavior analysis method based on Hadoop
CN111258978B (en) Data storage method
Panda et al. Optimization of block query response using evolutionary algorithm
Walker et al. Practicing environmental data justice: From DataRescue to data together
CN106649602B (en) Business object data processing method, device and server
Scannapieco et al. Placing big data in official statistics: a big challenge
CN109510721A (en) A kind of network log management method and system based on Syslog
CN105893456B (en) The isolated method and system of the computing basic facility of geography fence perception
CN108268468A (en) The analysis method and system of a kind of big data
CN106055546A (en) Optical disk library full-text retrieval system based on Lucene
CN103248511B (en) A kind of analysis methods, devices and systems of single-point service feature
CN116415203A (en) Government information intelligent fusion system and method based on big data
CN111026709A (en) Data processing method and device based on cluster access
Xiong et al. Data vitalization's perspective towards smart city: a reference model for data service oriented architecture
CN112163017B (en) Knowledge mining system and method
US20240127379A1 (en) Generating actionable information from documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190503

RJ01 Rejection of invention patent application after publication