CN109710667A - A kind of shared realization method and system of the multisource data fusion based on big data platform - Google Patents
A kind of shared realization method and system of the multisource data fusion based on big data platform Download PDFInfo
- Publication number
- CN109710667A CN109710667A CN201811426832.4A CN201811426832A CN109710667A CN 109710667 A CN109710667 A CN 109710667A CN 201811426832 A CN201811426832 A CN 201811426832A CN 109710667 A CN109710667 A CN 109710667A
- Authority
- CN
- China
- Prior art keywords
- data
- fusion
- record
- platform
- submodule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The present invention provides a kind of shared realization method and system of the multisource data fusion based on big data platform, the method includes configuring at least one data source information and clocking discipline, and data access operation is executed according to the clocking discipline configured, data access operation is that data or internet data acquisition or change data or loading data are extracted from least one acquired data source to big data platform;Data fusion operation is carried out according to the clocking discipline configured to the data accessed in data access operation;It stores to form repository to a layering point library is carried out through the post-job data of data fusion, and constructs secondary index library on the repository;Data sharing is carried out by the way that unified data exchange interface is arranged in constructed big data platform.The present invention only can need to be greatly improved the online deployment efficiency of project, greatly simplify retrieval of the upper layer application to data in big data platform in face of different scenes and multi-source data by flexibly configuring without developing again.
Description
Technical field
The present invention relates to big data technical field more particularly to a kind of multisource data fusion based on big data platform are shared
Realization method and system.
Background technique
In recent years, with the rapid development of the IT such as internet, social networks, cloud computing, search engine and the communication technology, number
All a large amount of data are being generated daily with hundred million grades of user.Emerging in large numbers for large-scale data brings valuable machine to many industries
Chance, but the adjoint typical characteristics of these data simultaneously, such as extensive, multi-source (multi-source), type and mode various (isomery),
High-dimensional and quality is very different etc., so that the expression of data, understand, be calculated and applied over etc. that multiple links all suffer from greatly
Challenge.The quality of data is " bottleneck " for restricting data and using, and as the important solution technology for improving the quality of data, data are clear
Wash be with data fusion multi-source heterogeneous Data processing hot research field, have important value and meaning.But it is traditional
Data cleaning method by hard coded method realize service logic, cause the reusability, scalability and flexibility of system compared with
Difference.In addition, many applications in reality are frequently necessary to the integrated isomeric data from different approaches, how to ensure these data
Consistency is increasingly becoming one and has to solve the problems, such as, i.e. entity recognition techniques.
At present with traffic service system 31,194, traffic signalization crossing, public security test the speed 66 sections of bayonet,
Make a dash across the red light 192 crossings of capturing system, 86 sets of system for traffic guiding, 369 sets of flow monitoring system, road video 652, high-altitude
32 sets of HD video, 45 sets of vehicle-bone 3 G video, 248 sets of event monitoring system, mobile enforcement terminal 273 etc.
" big data " of field of traffic control mainly includes motor vehicle, the driver, road of administrative acquisition from data source
The file datas such as road, road surface law enfrocement official acquisition vehicle and driver information, the traffic offence information of investigation, processing traffic
The data such as accident, road, traffic data information, video, picture, vehicle flowrate, the GPS rail of road electronic monitoring equipment automatic collection
The data such as mark, the public service the relevant fragmentation data of generated all kinds of traffic administrations and same population, insurance, tax
The information exchange data of the relevant departments such as business, planning.These data from type, it is including picture, video, bivariate table,
Structuring, semi-structured, non-structured data;It include the number such as traditional business window, internet, mobile Internet from channel
According to application scenarios.
Therefore, it is necessary to a kind of according to practical business demand, data accumulation, and using advanced big data technology, building is efficient
Stablize high performance big data basic platform, collect multi-source heterogeneous data, is provided using unified big data storage processing framework
Corresponding data access, data fusion, data storage, data calculating, data sharing etc., for being provided with for all kinds of big datas application
The support and guarantee of power.
During IT application in enterprise, due to each operation system build and implement data management system stage,
The factors such as technical and other economy and human factor influence, and cause enterprise to have accumulated in development process a large amount of using different
The business datum of storage mode, the data management system including use also differ widely, from simple document data bank to complexity
Network data base, they constitute the heterogeneous data source of enterprise.
For existing solution usually with high time overhead, runing time can be with attribute dimensions in data set
Increase and is exponentially increased;Under big data environment, due to the architectural difference of data is big, data source is wide, value density is lower,
It the features such as real-time is updated, brings huge challenge to multisource data fusion technology, and multi-source heterogeneous data are fused to researcher
It carries out knowledge acquisition, knowledge organization under big data environment and utilizes to provide very effective means and method.But at present
Knowledge fusion method from theory into action, there are also many insufficient.
Summary of the invention
Multisource data fusion provided by the invention based on big data platform shares realization method and system, can be in face of not
It with scene and multi-source data, only need to be not necessarily to be developed again by flexibly configuring, greatly improve the online deployment effect of project
Rate greatlies simplify retrieval of the upper layer application to data in big data platform.
In a first aspect, the present invention provides a kind of shared implementation method of the multisource data fusion based on big data platform, comprising:
At least one data source information and clocking discipline are configured, and executes data access according to the clocking discipline configured and makees
Industry, wherein the data access operation is that extraction data or internet data are adopted from least one acquired data source
Collection or change data or loading data are to big data platform;
Data fusion operation is carried out according to the clocking discipline configured to the data accessed in data access operation;
It stores to form repository to a layering point library is carried out through the post-job data of data fusion, and the structure on the repository
Build secondary index library;
Data sharing is carried out by the way that unified data exchange interface is arranged in constructed big data platform.
Optionally, described that the data accessed in data access operation are melted according to the clocking discipline progress data configured
Cooperating industry includes:
It then include that will remember to the fusion operation of the record rank data when the data accessed are record rank data
The data for recording each condition carry out information checking;
It then include field to the fusion operation of the record rank data when the data accessed are field rank data
Verification or field conversion.
Optionally, the data fusion operation is treated fused data by ETL method and is handled;Wherein,
ETL is realized that class uses decorative mode in the ETL method, and configures corresponding configuration file to successively real
Existing filter course, conversion process and filter course.
Optionally, a layering point library is carried out through the post-job data of data fusion store to form repository for described pair, and in institute
Stating building secondary index library on repository includes:
Input data catalogue, data word number of segment, data rowkey field, one or any group in thematic library name parameter
It closes;
According to Hbase connection type and thematic library name, instantiation connection;
The data corresponding types newest primary load date or time record are read, between calculating load time last time
Every;
Judge whether the time interval is greater than the time cycle configured in the clocking discipline;
When the time interval be greater than the clocking discipline in configured time cycle when, then log recording it is previous or
Multiple load failed cycles, then audit log and execute reload operation;
Alternatively, when the time interval is no more than the time cycle configured in the clocking discipline, then according to incoming
Separator, one by one split record;
Array length after fractionation compares with incoming field sum, retains the identical data of the two;
According to incoming field subscript, field is integrated into major key;
Data put to hbase;
After execution, records secondary time cycle execution and load successfully.
Optionally, it stores to form repository carrying out a layering point library to the data after convergence analysis, and in the repository
After upper building secondary index library, the method also includes:
Configurable script is set, and realizes the automation creation and data load in library and table.
Optionally, described to carry out data sharing packet by establishing standard uniform data Fabric Interface in big data platform
It includes:
When the data sharing carried out is shared for data query, provided by JavaAPI or Rest to upper layer application
Request the shared process of response modes;
When the data sharing carried out is data retrieval, retrieval permissions are set in access control in system administration and are carried out
Constraint, wherein described to retrieve the retrieval data that can return to any request;
When the data sharing carried out is data access, data access log is recorded by external shared interface.
Second aspect, the present invention provide a kind of shared realization system of the multisource data fusion based on big data platform, comprising:
Configuration module, for configuring at least one data source information and clocking discipline;
Data access module, for executing data access operation according to the clocking discipline configured, wherein the data connect
Entering operation is to extract data or internet data acquisition or change data from least one acquired data source or load
Data are to big data platform;
Data fusion module, for being carried out to the data accessed in data access operation according to the clocking discipline configured
Data fusion operation;
Memory module, for through the post-job data of data fusion carry out layering a point library store to form repository, and
Secondary index library is constructed on the repository;
Data sharing module, for being carried out by the way that unified data exchange interface is arranged in constructed big data platform
Data sharing.
Optionally, the data fusion module includes:
First fusion submodule, for when the data accessed be record rank data when, then to the record number of levels
According to fusion operation include will record each condition data carry out information checking;
Second fusion submodule, for when the data accessed be field rank data when, then to the record number of levels
According to fusion operation include field verification or field conversion.
Optionally, the memory module includes:
Parameter input submodule, for input data catalogue, data word number of segment, data rowkey field, thematic library name
One or any combination in parameter;
Instantiation connection submodule, for according to Hbase connection type and thematic library name, instantiation connection;
Computational submodule is calculated for reading the data corresponding types newest primary load date or time record
Load time last time interval;
Judging submodule, for judging whether the time interval is greater than the week time configured in the clocking discipline
Phase;
First operation submodule, when the time interval is greater than the time cycle configured in the clocking discipline, then
Log recording is previous or multiple load failed cycles, then audit log and executes and reloads operation;
Second operation submodule, for when the time interval is no more than the time cycle configured in the clocking discipline
When, then according to incoming separator, record is split one by one;Array length after fractionation compares with incoming field sum, retains
The identical data of the two;According to incoming field subscript, field is integrated into major key;Data put to hbase;After execution, note
Secondary time cycle execution is recorded to load successfully.
Optionally, the data sharing module includes:
Data query shares submodule, for providing request response modes to upper layer application by JavaAPI or Rest
Shared process;
Data retrieval submodule is constrained, wherein institute for retrieval permissions to be arranged in access control in system administration
Stating retrieval can return to the retrieval data of any request;
Data access submodule, for recording data access log by external shared interface.
Multisource data fusion provided in an embodiment of the present invention based on big data platform shares realization method and system, described
Method is mainly by corresponding to configuration data source information directly flexible in big data platform and each operation of Data processing
Clocking discipline, in a first aspect, the method is by directly flexibly configuring at least one data source information, so that institute
The method of stating can face different scenes and multi-source data, only need to be by flexibly configuring, without being developed again, data
Accessing loading procedure, all automation is realized, greatly improves the online deployment efficiency of project.Second aspect, the method can also
Clocking discipline corresponding to each operation is configured, and carries out data access operation, data fusion operation according to the clocking discipline, with
Enable established big data platform by using timer-triggered scheduler frame, automatic quantizer input quantization increment accesses multi-source heterogeneous data.Third
Aspect, the method is by storing the unified layering point library that carries out of data to form repository, for example, multi-source to be stored is different
The configurable unified storage of structure data setting promotes big number in addition, also constructing secondary index library on the unified repository established
According to the inquiry velocity of multi-source data under platform.Fourth aspect, the method can also be by being arranged unified data exchange interface
It is shared to carry out data query, greatlies simplify upper layer application to the retrieval complexity of data in big data platform.
Detailed description of the invention
Fig. 1 is the flow chart that one embodiment of the invention shares implementation method based on the multisource data fusion of big data platform;
Fig. 2 is the process that another embodiment of the present invention shares implementation method based on the multisource data fusion of big data platform
Figure;
Fig. 3 is the flow chart of data fusion operation in one embodiment of the invention;
Fig. 4 is the structural representation that one embodiment of the invention shares realization system based on the multisource data fusion of big data platform
Figure.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill
Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a kind of shared implementation method of the multisource data fusion based on big data platform, such as Fig. 1 institute
Show, which comprises
S11, clocking discipline corresponding at least one data source information and each operation is configured, and according to being configured
Clocking discipline executes data access operation, wherein the data access operation is to take out from least one acquired data source
Access according to or internet data acquisition or change data or loading data to big data platform;
S12, data fusion operation is carried out according to the clocking discipline configured to the data accessed in data access operation;
S13, it stores to form repository to carrying out a layering point library through the post-job data of data fusion, and in the repository
Upper building secondary index library;
S14, data sharing is carried out by the way that unified data exchange interface is arranged in constructed big data platform.
Multisource data fusion provided in an embodiment of the present invention based on big data platform share implementation method mainly by
Clocking discipline corresponding to directly flexible configuration data source information and each operation of Data processing in big data platform, first
Aspect, the method is by directly flexibly configuring at least one data source information, so that the method can face
Different scenes and multi-source data, only need to be by flexibly configuring, without being developed again, and data access loading procedure is complete
Portion's automation is realized, the online deployment efficiency of project is greatly improved.Second aspect, it is right that the method can also configure each operation institute
The clocking discipline answered, and data access operation, data fusion operation are carried out according to the clocking discipline, so that the big number established
Multi-source heterogeneous data can be accessed by using timer-triggered scheduler frame, automatic quantizer input quantization increment according to platform.The third aspect, the method are logical
It crosses and stores the unified layering point library that carries out of data to form repository, for example, multi-source heterogeneous data setting to be stored can be matched
The unified storage set promotes multi-source number under big data platform in addition, also constructing secondary index library on the unified repository established
According to inquiry velocity.Fourth aspect, the method can also be total by the way that unified data exchange interface progress data query is arranged
It enjoys, greatlies simplify upper layer application to the retrieval complexity of data in big data platform.
Specifically, data access operation described in the present embodiment the method is from multiple and different operation systems, Duo Geping
Data or internet data acquisition or change data or loading data are extracted in the data source of platform to big data platform;Wherein,
The data pick-up is to be acquired extraction, the data source by configuration data, formulation to data using data pick-up client
The step of collection rule, carry data pick-up operation, extracts data, and the process of data pick-up is not influencing original system just
Often operation;
The data receiver is to provide the reception of source data, receives the data outside the data or system in system
Source, additionally it is possible to which two functional modules: data reception service and data collection client are set.
The internet data acquisition is the acquisition URL provided using user
The relevant configuration of (Uniform Resoure Locator, uniform resource locator) address and rule is to internet
Webpage data information, and ultimately form Hdfs (Hadoop Distributed File System, distributed file system) text
Part.
Optionally, as shown in Fig. 2, it is described to the data accessed in data access operation according to the clocking discipline configured
Carrying out data fusion operation includes:
It then include that will remember to the fusion operation of the record rank data when the data accessed are record rank data
The data for recording each condition carry out information checking;Wherein, the data format accessed includes non-isomery or isomery;
It then include field to the fusion operation of the record rank data when the data accessed are field rank data
Verification or field conversion.
Optionally, the data fusion operation passes through the ETL (contracting of Extraction-Transformation-Loading
Write, i.e., data pick-up (Extract), conversion (Transform), load (Load) process) method treat fused data progress
Processing;Wherein,
ETL is realized that class uses decorative mode in the ETL method, and configures corresponding configuration file to successively real
Existing filter course, conversion process and filter course.
Specifically, the data fusion operation that data fusion described in the present embodiment the method is configurable by setting, packet
Record rank and the other data fusion of field level are included, wherein;The fusion operation of record rank data is included to recording a variety of conditions
Cleaning verification etc.;The other data fusion operation of field level includes verifying to field, the operation such as field conversion.Shown in Fig. 3, the number
According to fusion operation by the corresponding Hdfs file of data to be fused by TextInputETLMapper frame,
TextInputETLReducer frame carries out fusion treatment and ultimately forms new Hdfs file format, and above-mentioned process is related to multiple
Call the treatment process of same functions.In addition, ETL is realized that class uses decorative mode, configured in configuration file, such as realize
Filter A (FilterA) → filtering B (FilterB) → conversion A (TransferA) → filtering A (FilterA) → filtering B
(FilterB) then repetitive operation submits operation operation by job scheduling module.
Optionally, a layering point library is carried out through the post-job data of data fusion store to form repository for described pair, and in institute
Stating building secondary index library on repository includes:
Input data catalogue, data word number of segment, data rowkey field, one or any group in thematic library name parameter
It closes;
According to Hbase connection type and thematic library name, instantiation connection;
The data corresponding types newest primary load date or time record are read, between calculating load time last time
Every;
Judge whether the time interval is greater than the time cycle configured in the clocking discipline;
When the time interval be greater than the clocking discipline in configured time cycle when, then log recording it is previous or
Multiple load failed cycles, then audit log and execute reload operation;
Alternatively, when the time interval is no more than the time cycle configured in the clocking discipline, then according to incoming
Separator, one by one split record;
Array length after fractionation compares with incoming field sum, retains the identical data of the two;
According to incoming field subscript, field is integrated into major key;
Data put to HBase (Hadoop Database, distributed memory system);
After execution, records secondary time cycle execution and load successfully.
Optionally, it stores to form repository carrying out a layering point library to the data after convergence analysis, and in the repository
After upper building secondary index library, the method also includes:
Configurable script is set, and realizes the automation creation and data load in library and table.
Specifically, the present embodiment the method is by storing and being formed to through the post-job data hierarchy point library of data fusion
Unified repository, wherein the unification repository that is formed by includes base library, thematic library, Full-text Database etc., then by setting
Configurable script is set, realizes the automation creation and data load in library and table;And secondary index library is constructed on repository,
Guarantee to big data search efficiency.
Optionally, described to carry out data sharing packet by establishing standard uniform data Fabric Interface in big data platform
It includes:
When the data sharing carried out is shared for data query, provided by JavaAPI or Rest to upper layer application
Request the shared process of response modes;
When the data sharing carried out is data retrieval, retrieval permissions are set in access control in system administration and are carried out
Constraint, wherein described to retrieve the retrieval data that can return to any request;
When the data sharing carried out is data access, data access log is recorded by external shared interface.
Specifically, it is unified by setting that data query performed in the present embodiment the method, which shares operation,
JavaAPI (Application Programming Interface, application programming interface) and Rest services two kinds of sides
Formula provides the shared service of request response modes to upper layer application.Performed retrieval permissions operation is visited in system administration
Control is asked to constrain, and can return to the retrieval data of any request by its respective modules default.The performed data access is made
Industry is to go to record when data access log is called the above method by external shared interface.
The embodiment of the present invention also provides a kind of shared realization system of the multisource data fusion based on big data platform, such as Fig. 4
It is shown, the system comprises:
Configuration module 11, for configuring at least one data source information and clocking discipline;
Data access module 12, for executing data access operation according to the clocking discipline configured, wherein the data
Accessing operation is that data or internet data acquisition or change data or dress are extracted from least one acquired data source
Data are carried to big data platform;
Data fusion module 13, for the data accessed in data access operation according to the clocking discipline configured into
Row data fusion operation;
Memory module 14, for through the post-job data of data fusion carry out layering a point library store to form repository, and
Secondary index library is constructed on the repository;
Data sharing module 15, for by be arranged in constructed big data platform unified data exchange interface into
Row data sharing.
The shared realization system of multisource data fusion provided in an embodiment of the present invention based on big data platform, which mainly passes through, matches
Set module timing corresponding to directly flexible configuration data source information and each operation of Data processing in big data platform
Rule, in a first aspect, the configuration module in the system is by directly flexibly configuring at least one data source information,
So that the method can face different scenes and multi-source data, it only need to be by flexibly configuring, without being opened again
Hair, data access loading procedure all realize by automation, greatly improves the online deployment efficiency of project.Second aspect, the system
Configuration module in system can also be as configuring clocking discipline corresponding to each operation, and by data access module or data fusion mould
Block carries out data access operation, data fusion operation according to the clocking discipline, so that the big data platform established can lead to
It crosses using timer-triggered scheduler frame, automatic quantizer input quantization increment accesses multi-source heterogeneous data.The third aspect, the memory module in the system are logical
It crosses and stores the unified layering point library that carries out of data to form repository, for example, multi-source heterogeneous data setting to be stored can be matched
The unified storage set promotes multi-source number under big data platform in addition, also constructing secondary index library on the unified repository established
According to inquiry velocity.Fourth aspect, the data sharing module in the system can also be connect by the way that unified data exchange is arranged
Mouth carries out data query and shares, and greatlies simplify upper layer application to the retrieval complexity of data in big data platform.
Optionally, the data fusion module includes:
First fusion submodule, for when the data accessed be record rank data when, then to the record number of levels
According to fusion operation include will record each condition data carry out information checking;
Second fusion submodule, for when the data accessed be field rank data when, then to the record number of levels
According to fusion operation include field verification or field conversion.
Optionally, the memory module includes:
Parameter input submodule is used for input data catalogue, data word number of segment, data rowkey field, thematic library name
One or any combination in parameter;
Instantiation connection submodule, for according to Hbase connection type and thematic library name, instantiation connection;
Computational submodule is calculated for reading the data corresponding types newest primary load date or time record
Load time last time interval;
Judging submodule, for judging whether the time interval is greater than the week time configured in the clocking discipline
Phase;
First operation submodule, when the time interval is greater than the time cycle configured in the clocking discipline, then
Log recording is previous or multiple load failed cycles, then audit log and executes and reloads operation;
Second operation submodule, for when the time interval is no more than the time cycle configured in the clocking discipline
When, then according to incoming separator, record is split one by one;Array length after fractionation compares with incoming field sum, retains
The identical data of the two;According to incoming field subscript, field is integrated into major key;Data put to hbase;After execution, note
Secondary time cycle execution is recorded to load successfully.
Optionally, the data sharing module includes:
Data query shares submodule, for providing request response modes to upper layer application by JavaAPI or Rest
Shared process;
Data retrieval submodule is constrained, wherein institute for retrieval permissions to be arranged in access control in system administration
Stating retrieval can return to the retrieval data of any request;
Data access submodule, for recording data access log by external shared interface.
The device of the present embodiment can be used for executing the technical solution of above method embodiment, realization principle and technology
Effect is similar, and details are not described herein again.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers
It is included within the scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.
Claims (10)
1. a kind of multisource data fusion based on big data platform shares implementation method characterized by comprising
At least one data source information and clocking discipline are configured, and executes data access operation according to the clocking discipline configured,
Wherein, the data access operation be from least one acquired data source extract data or internet data acquisition or
Change data or loading data are to big data platform;
Data fusion operation is carried out according to the clocking discipline configured to the data accessed in data access operation;
It stores to form repository to a layering point library is carried out through the post-job data of data fusion, and constructs two on the repository
Grade index database;
Data sharing is carried out by the way that unified data exchange interface is arranged in constructed big data platform.
2. the method according to claim 1, wherein it is described to the data accessed in data access operation according to
The clocking discipline configured carries out data fusion operation
It then include that will record respectively to the fusion operation of the record rank data when the data accessed are record rank data
The data of condition carry out information checking;
It then include field school to the fusion operation of the record rank data when the data accessed are field rank data
It tests or field is converted.
3. according to the method described in claim 2, it is characterized in that, the data fusion operation is by ETL method to be fused
Data are handled;Wherein,
ETL is realized that class uses decorative mode in the ETL method, and configures corresponding configuration file successively to realize
Filter process, conversion process and filter course.
4. method according to claim 1 to 3, which is characterized in that described pair through the post-job data of data fusion into
A row layering point library stores to form repository, and constructs secondary index library on the repository and include:
Input data catalogue, data word number of segment, data rowkey field, one or any combination in thematic library name parameter;
According to Hbase connection type and thematic library name, instantiation connection;
The data corresponding types newest primary load date or time record are read, load time last time interval is calculated;
Judge whether the time interval is greater than the time cycle configured in the clocking discipline;
When the time interval is greater than the time cycle configured in the clocking discipline, then log recording is previous or multiple
Load failed cycle, then audit log and execute reload operation;
Alternatively, when the time interval is no more than the time cycle configured in the clocking discipline, then according to incoming point
Every symbol, record is split one by one;
Array length after fractionation compares with incoming field sum, retains the identical data of the two;
According to incoming field subscript, field is integrated into major key;
Data put to hbase;
After execution, records secondary time cycle execution and load successfully.
5. method according to claim 1 to 4, which is characterized in that carrying out layering point to the data after convergence analysis
Library stores to form repository, and on the repository after building secondary index library, the method also includes:
Configurable script is set, and realizes the automation creation and data load in library and table.
6. -5 any method according to claim 1, which is characterized in that described by establishing standard in big data platform
Uniform data Fabric Interface carries out data sharing
When the data sharing carried out is shared for data query, request is provided to upper layer application by JavaAPI or Rest
The shared process of response modes;
When the data sharing carried out is data retrieval, retrieval permissions are set in access control in system administration and are carried out about
Beam, wherein described to retrieve the retrieval data that can return to any request;
When the data sharing carried out is data access, data access log is recorded by external shared interface.
7. a kind of multisource data fusion based on big data platform shares realization system characterized by comprising
Configuration module, for configuring at least one data source information and clocking discipline;
Data access module, for executing data access operation according to the clocking discipline configured, wherein the data access is made
Industry is that data or internet data acquisition or change data or loading data are extracted from least one acquired data source
To big data platform;
Data fusion module, for carrying out data according to the clocking discipline configured to the data accessed in data access operation
Merge operation;
Memory module, for storing to form repository to carrying out a layering point library through the post-job data of data fusion, and described
Secondary index library is constructed on repository;
Data sharing module, for carrying out data by the way that unified data exchange interface is arranged in constructed big data platform
It is shared.
8. system according to claim 7, which is characterized in that the data fusion module includes:
First fusion submodule, for when the data accessed are record rank data, then to the record rank data
Fusion operation includes the data progress information checking that will record each condition;
Second fusion submodule, for when the data accessed are field rank data, then to the record rank data
Fusion operation includes field verification or field conversion.
9. system according to claim 7 or 8, which is characterized in that the memory module includes:
Parameter input submodule, for input data catalogue, data word number of segment, data rowkey field, thematic library name parameter
In one or any combination;
Instantiation connection submodule, for according to Hbase connection type and thematic library name, instantiation connection;
Computational submodule calculates last time for reading the data corresponding types newest primary load date or time record
Load time interval;
Judging submodule, for judging whether the time interval is greater than the time cycle configured in the clocking discipline;
First operation submodule, when the time interval is greater than the time cycle configured in the clocking discipline, then log
Record previous or multiple load failed cycles, then audit log and execute reload operation;
Second operation submodule, for when the time interval is no more than the time cycle configured in the clocking discipline,
Then according to incoming separator, record is split one by one;Both array length after fractionation compares with incoming field sum, retain
Identical data;According to incoming field subscript, field is integrated into major key;Data put to hbase;After execution, record should
The secondary time cycle executes and loads successfully.
10. according to any system of claim 7-9, which is characterized in that the data sharing module includes:
Data query shares submodule, for providing being total to for request response modes to upper layer application by JavaAPI or Rest
Enjoy process;
Data retrieval submodule is constrained, wherein the inspection for retrieval permissions to be arranged in access control in system administration
Rope can return to the retrieval data of any request;
Data access submodule, for recording data access log by external shared interface.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811426832.4A CN109710667A (en) | 2018-11-27 | 2018-11-27 | A kind of shared realization method and system of the multisource data fusion based on big data platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811426832.4A CN109710667A (en) | 2018-11-27 | 2018-11-27 | A kind of shared realization method and system of the multisource data fusion based on big data platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109710667A true CN109710667A (en) | 2019-05-03 |
Family
ID=66254399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811426832.4A Pending CN109710667A (en) | 2018-11-27 | 2018-11-27 | A kind of shared realization method and system of the multisource data fusion based on big data platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109710667A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111695000A (en) * | 2020-06-16 | 2020-09-22 | 山东蓝海领航大数据发展有限公司 | Multi-source big data loading method and system |
CN110110234B (en) * | 2019-05-13 | 2020-10-16 | 重庆天蓬网络有限公司 | Big data real-time searching system and method |
CN112732811A (en) * | 2020-12-31 | 2021-04-30 | 广西中科曙光云计算有限公司 | Data open platform |
CN112765183A (en) * | 2021-02-02 | 2021-05-07 | 浙江公共安全技术研究院有限公司 | Multi-source data fusion method and device, storage medium and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216962A (en) * | 2014-08-22 | 2014-12-17 | 南京邮电大学 | Mass network management data indexing design method based on HBase |
CN105159951A (en) * | 2015-08-17 | 2015-12-16 | 成都中科大旗软件有限公司 | Open tourism multi-source heterogeneous data fusion method and system |
CN105389402A (en) * | 2015-12-29 | 2016-03-09 | 曙光信息产业(北京)有限公司 | Big-data-oriented ETL (Extraction-Transformation-Loading) method and device |
US20160164924A1 (en) * | 2014-12-05 | 2016-06-09 | Cisco Technology, Inc. | Stack Fusion Software Communication Service |
US20160299959A1 (en) * | 2011-12-19 | 2016-10-13 | Microsoft Corporation | Sensor Fusion Interface for Multiple Sensor Input |
CN106326381A (en) * | 2016-08-16 | 2017-01-11 | 梁猛 | HBase data retrieval method based on MapDB construction |
CN106777227A (en) * | 2016-12-26 | 2017-05-31 | 河南信安通信技术股份有限公司 | Multidimensional data convergence analysis system and method based on cloud platform |
-
2018
- 2018-11-27 CN CN201811426832.4A patent/CN109710667A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160299959A1 (en) * | 2011-12-19 | 2016-10-13 | Microsoft Corporation | Sensor Fusion Interface for Multiple Sensor Input |
CN104216962A (en) * | 2014-08-22 | 2014-12-17 | 南京邮电大学 | Mass network management data indexing design method based on HBase |
US20160164924A1 (en) * | 2014-12-05 | 2016-06-09 | Cisco Technology, Inc. | Stack Fusion Software Communication Service |
CN105159951A (en) * | 2015-08-17 | 2015-12-16 | 成都中科大旗软件有限公司 | Open tourism multi-source heterogeneous data fusion method and system |
CN105389402A (en) * | 2015-12-29 | 2016-03-09 | 曙光信息产业(北京)有限公司 | Big-data-oriented ETL (Extraction-Transformation-Loading) method and device |
CN106326381A (en) * | 2016-08-16 | 2017-01-11 | 梁猛 | HBase data retrieval method based on MapDB construction |
CN106777227A (en) * | 2016-12-26 | 2017-05-31 | 河南信安通信技术股份有限公司 | Multidimensional data convergence analysis system and method based on cloud platform |
Non-Patent Citations (3)
Title |
---|
XIANSHANNAN: ""Html5 Player"", 《HTTPS://GITHUB.COM/DOG-DAYS/HTML5-PLAYER/TREE/B7C6091FDB910EBEFF7F0B57277C36DDB7922095》 * |
孟亚辉; 张党进: "".NET应用系统中超时问题的分析与解决"", 《茂名学院学报》 * |
沐海—化茧成蝶: ""jQuery AJAX timeout 超时问题详解"", 《HTTPS://WWW.JB51.NET/ARTICLE/87003.HTM》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110234B (en) * | 2019-05-13 | 2020-10-16 | 重庆天蓬网络有限公司 | Big data real-time searching system and method |
CN111695000A (en) * | 2020-06-16 | 2020-09-22 | 山东蓝海领航大数据发展有限公司 | Multi-source big data loading method and system |
CN112732811A (en) * | 2020-12-31 | 2021-04-30 | 广西中科曙光云计算有限公司 | Data open platform |
CN112765183A (en) * | 2021-02-02 | 2021-05-07 | 浙江公共安全技术研究院有限公司 | Multi-source data fusion method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112685385B (en) | Big data platform for smart city construction | |
Khare et al. | Big data in IoT | |
Fiore et al. | An integrated big and fast data analytics platform for smart urban transportation management | |
CN105608203B (en) | A kind of Internet of Things log processing method and device based on Hadoop platform | |
CN109710667A (en) | A kind of shared realization method and system of the multisource data fusion based on big data platform | |
CN103838847B (en) | Data organization method oriented to sea-cloud collaboration network computing network | |
CN112732811A (en) | Data open platform | |
US10970322B2 (en) | Training an artificial intelligence to generate an answer to a query based on an answer table pattern | |
CN109074387A (en) | Versioned hierarchical data structure in Distributed Storage area | |
CN106982150A (en) | A kind of mobile Internet user behavior analysis method based on Hadoop | |
CN111258978B (en) | Data storage method | |
Panda et al. | Optimization of block query response using evolutionary algorithm | |
Walker et al. | Practicing environmental data justice: From DataRescue to data together | |
CN106649602B (en) | Business object data processing method, device and server | |
Scannapieco et al. | Placing big data in official statistics: a big challenge | |
CN109510721A (en) | A kind of network log management method and system based on Syslog | |
CN105893456B (en) | The isolated method and system of the computing basic facility of geography fence perception | |
CN108268468A (en) | The analysis method and system of a kind of big data | |
CN106055546A (en) | Optical disk library full-text retrieval system based on Lucene | |
CN103248511B (en) | A kind of analysis methods, devices and systems of single-point service feature | |
CN116415203A (en) | Government information intelligent fusion system and method based on big data | |
CN111026709A (en) | Data processing method and device based on cluster access | |
Xiong et al. | Data vitalization's perspective towards smart city: a reference model for data service oriented architecture | |
CN112163017B (en) | Knowledge mining system and method | |
US20240127379A1 (en) | Generating actionable information from documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190503 |
|
RJ01 | Rejection of invention patent application after publication |