CN112100179A

CN112100179A - HBASE-based data fusion method, HBASE-based data fusion device, HBASE-based data fusion equipment and computer readable medium

Info

Publication number: CN112100179A
Application number: CN202010953729.6A
Authority: CN
Inventors: 李亚飞; 刘建辉; 乔智; 孙军锋; 汪月
Original assignee: Beijing Minglue Zhaohui Technology Co Ltd
Current assignee: Beijing Minglue Zhaohui Technology Co Ltd
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2020-12-18

Abstract

The application relates to a data fusion method, a data fusion device, data fusion equipment and a computer readable medium based on HBASE. The method comprises the following steps: acquiring data to be fused, wherein the data to be fused is multi-source heterogeneous data; determining a data access module matched with the data type of the data to be fused; generating a data identifier matched with the data to be fused by using a data access module; and forming key value pairs by the data identification and the data to be fused, and storing the key value pairs in a target database, wherein the target database is a distributed open source database. The method integrates and simplifies the fusion process of multi-source heterogeneous data, can reduce the complexity of data fusion and improve the efficiency of data fusion, and meanwhile, by combining a distributed search service cluster (Solr cluster), multi-condition real-time query is realized.

Description

HBASE-based data fusion method, HBASE-based data fusion device, HBASE-based data fusion equipment and computer readable medium

Technical Field

The present application relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, and a computer readable medium for data fusion based on HBASE.

Background

With the advent of the big data era, various data are continuously generated, and meanwhile, the internal relation between data is more difficult to mine and analyze with the continuous increase of the data magnitude, so that a data island is formed.

At present, a data fusion scheme in the related art can be realized by querying structured data through SQL, or by writing a customized program, for example, a plurality of independent tables are associated through SQL, so that the purpose of data fusion is achieved, or a corresponding Spark/MapReduce program is written to fuse a plurality of data, so that fusion and communication of the plurality of data are achieved, but no matter SQL query analysis or customized program analysis is performed by a professional with a certain technical background, technical requirements and complexity are high, an entry threshold is invisibly improved, meanwhile, a processing mode is not flexible and diverse and low in efficiency, data under a big data background often have requirements on instantaneity and comprehensiveness, and therefore, a traditional data fusion mode cannot meet actual requirements.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The application provides a data fusion method, a data fusion device, data fusion equipment and a computer readable medium based on HBASE, and aims to solve the technical problems of high complexity and low efficiency of the existing data fusion scheme.

According to an aspect of the embodiments of the present application, there is provided a data fusion method based on HBASE, including: acquiring data to be fused, wherein the data to be fused is multi-source heterogeneous data; determining a data access module matched with the data type of the data to be fused; generating a data identifier matched with the data to be fused by using a data access module; and forming key value pairs by the data identification and the data to be fused, and storing the key value pairs in a target database, wherein the target database is a distributed open source database.

Optionally, the determining a data access module matched with the data type of the data to be fused includes: under the condition that the data type is structured data, calling a structured data processing tool, wherein the data access module comprises the structured data processing tool; under the condition that the data type is semi-structured data, calling a data analysis interface, wherein the data access module comprises a data analysis interface; and in the case that the data type is unstructured data, calling a metadata processing tool, wherein the data access module comprises the metadata processing tool.

Optionally, after determining a data access module matched with the data type of the data to be fused, the method further includes: calling a data warehouse, wherein the data warehouse is created for a target object and is used for converting data to be fused into metadata and then storing the metadata in the target database, and the metadata is data stored with characteristics indicating the data to be fused; extracting data characteristics of data to be fused by using a data warehouse; and generating metadata matched with the data to be fused according to the data characteristics.

Optionally, the generating, by the data access module, a data identifier matched with the data to be fused includes: and carrying out encryption calculation on the data to be fused by using the data access module to obtain a data identifier.

Optionally, after the data identifier and the data to be fused are combined into a key-value pair and the key-value pair is stored in the target database, the method further includes: an index is created for the data identification in the target database, and the index is stored in the distributed search service cluster.

Optionally, after creating an index for the data identifier in the target database, the method further includes: under the condition that a multi-condition query statement is received, acquiring an index set matched with the multi-condition query statement, wherein the index set comprises a plurality of indexes corresponding to the multi-condition; determining an identification set according to the index set, wherein the identification set comprises data identifications matched with all indexes in the index set; and returning the data to be fused corresponding to each data identifier in the identifier set to the query object as a query result.

Optionally, obtaining the index set matched with the multi-conditional query statement comprises: preprocessing a multi-conditional query statement; dividing the preprocessed multi-conditional query statement into an output block, a data table block and a query condition block; generating an index query statement by using the output block, the data table block and the query condition block; and obtaining an index set according to the index query statement.

According to another aspect of the embodiments of the present application, there is provided an HBASE-based data fusion apparatus including: the data acquisition module is used for acquiring data to be fused, and the data to be fused is multi-source heterogeneous data; the type matching module is used for determining a data access module matched with the data type of the data to be fused; the identification generation module is used for generating a data identification matched with the data to be fused by using the data access module; and the data fusion storage module is used for forming key value pairs by the data identification and the data to be fused, and storing the key value pairs in a target database, wherein the target database is a distributed open source database.

According to another aspect of the embodiments of the present application, there is provided a computer device, including a memory and a processor, where a computer program operable on the processor is stored in the memory, and the processor implements the steps of the method when executing the computer program.

According to another aspect of embodiments of the present application, there is also provided a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the above-mentioned method.

Compared with the related art, the technical scheme provided by the embodiment of the application has the following advantages:

the technical scheme includes that data to be fused are obtained and are multi-source heterogeneous data; determining a data access module matched with the data type of the data to be fused; generating a data identifier matched with the data to be fused by using a data access module; and forming key value pairs by the data identification and the data to be fused, and storing the key value pairs in a target database, wherein the target database is a distributed open source database. The method integrates and simplifies the fusion process of multi-source heterogeneous data, can reduce the complexity of data fusion and improve the efficiency of data fusion, and meanwhile, by combining a distributed search service cluster (Solr cluster), multi-condition real-time query is realized.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings needed to be used in the description of the embodiments or related technologies will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without any creative effort.

FIG. 1 is a schematic diagram of an alternative HBASE-based data fusion method hardware environment according to an embodiment of the present application;

FIG. 2 is a flow chart of an alternative HBASE-based data fusion method according to an embodiment of the present application;

FIG. 3 is a block diagram of an alternative HBASE-based data fusion device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.

The data fusion scheme in the related technology can be realized by querying structured data through SQL (structured query language), or by writing a customized program, for example, a plurality of independent tables are associated through SQL, so that the purpose of data fusion is achieved, or a corresponding Spark/MapReduce program is written to fuse various data, so that the fusion and communication of various data are achieved, but no matter SQL query analysis or customized program analysis is carried out by professionals with certain technical backgrounds, the technical requirements and complexity are high, the entry threshold is invisibly improved, meanwhile, the processing mode is not flexible and diverse and the efficiency is low, and the data under the large data background often has the requirements of real-time performance and comprehensiveness, so the traditional data fusion mode cannot meet the actual requirements.

To address the problems noted in the background, according to an aspect of embodiments of the present application, an embodiment of an HBASE-based data fusion method is provided.

Alternatively, in the embodiment of the present application, the HBASE-based data fusion method may be applied to a hardware environment formed by the terminal 101 and the server 103 as shown in fig. 1. As shown in fig. 1, a server 103 is connected to a terminal 101 through a network, which may be used to provide services for the terminal or a client installed on the terminal, and a database 105 may be provided on the server or separately from the server, and is used to provide data storage services for the server 103, and the network includes but is not limited to: wide area network, metropolitan area network, or local area network, and the terminal 101 includes but is not limited to a PC, a cell phone, a tablet computer, and the like.

In the embodiment of the present application, an HBASE-based data fusion method may be executed by the server 103, or may be executed by both the server 103 and the terminal 101, as shown in fig. 2, where the method may include the following steps:

step S202, data to be fused are obtained, and the data to be fused are multi-source heterogeneous data.

The multi-source heterogeneous data can be heterogeneous data with different sources, such as data from different enterprises and different organizations, including different periods, different storage modes, different management modes and different data structures from the same source. The heterogeneous data may be heterogeneous in a computer architecture, that is, the physical storage of the data originates from computers in different architectures, such as: mainframe, minicomputer, workstation, PC, or embedded systems; the data format can be heterogeneous, that is, the storage management mechanism of the data is different, and the data can be a relational database system, such as: oracle, SQL Server, DB2, etc., and may also be file-type two-dimensional data, such as: txt, CSV, XLS, etc.; the data storage logic model can be heterogeneous, that is, data is stored and maintained in different business logics, so that the data with the same meaning has heterogeneous performance, such as: and the code of departments is inconsistent in the independent sales system and the independent purchasing system.

In the embodiment of the present application, the stream data may also be processed by combining with a Flume or Kafka module.

And step S204, determining a data access module matched with the data type of the data to be fused.

In the embodiment of the application, the multi-source heterogeneous data can be distributed to different data access modules, so that the corresponding data access modules are utilized to process the multi-source heterogeneous data, and the complexity of fusing the multi-source heterogeneous data is reduced.

And step S206, generating a data identifier matched with the data to be fused by using the data access module.

In the embodiment of the application, different keys (keys) can be generated according to different data to serve as unique identifiers of the corresponding data in the database, so that the keys are distinguished from other data, and data collision is avoided.

And S208, forming key value pairs by the data identifications and the data to be fused, and storing the key value pairs in a target database, wherein the target database is a distributed open source database.

In the embodiment of the application, data can be used as a value (value) of a data table, so that the value and a corresponding key form a unique key-value pair (key/value), and then the value and the corresponding key are put into the data table, that is, stored in a database, and a user can view other data corresponding to the key by the key, wherein a Distributed open source database, such as HBASE (Hadoop database), can be adopted, the HBASE uses hdfs (Hadoop Distributed File system) as its File storage system, and uses Hadoop MapReduce to process mass data in the HBASE, and the Distributed storage system is high in reliability, performance, orientation and scalability.

In the big data era, the flexible and various data storage, data management, data structures and data contents lead to the explosive growth of heterogeneous data. By adopting the technical scheme, the fusion process of multi-source heterogeneous data is integrated and simplified, the complexity of data fusion can be reduced, and the efficiency of data fusion is improved.

Optionally, the step S204 of determining the data access module matching the data type of the data to be fused may include:

under the condition that the data type is structured data, calling a structured data processing tool, wherein the data access module comprises the structured data processing tool;

under the condition that the data type is semi-structured data, calling a data analysis interface, wherein the data access module comprises a data analysis interface;

and in the case that the data type is unstructured data, calling a metadata processing tool, wherein the data access module comprises the metadata processing tool.

In the embodiment of the application, structured data, semi-structured data and unstructured data can be processed, and different data access modules are called.

For structured data such as Oracle or MySQL, a Sqoop tool can be provided for data migration, and then subsequent processing is performed through a data processing unit, wherein the data access module comprises the Sqoop tool and the data processing unit. The Sqoop is mainly used for data transmission between a Hadoop (hive) and a traditional database (MySQL, postgresql., etc.), and can lead data in a relational database (such as MySQL, Oracle, Postgres, etc.) into an HDFS of the Hadoop and can also lead data of the HDFS into the relational database.

And aiming at the semi-structured data, a corresponding data analysis interface can be provided to analyze the semi-structured data, so that the data is guided into the HDFS, and the data is processed by a data processing unit, wherein the data access module comprises a data analysis interface and a data processing unit.

For unstructured data, a metadata processing tool, such as a metadata platform, may be provided to encode the unstructured data to represent the unstructured data in structured data.

Optionally, after determining the data access module matched with the data type of the data to be fused in step S204, the method may further include the following steps:

step 1, calling a data warehouse, wherein the data warehouse is created for a target object and is used for converting data to be fused into metadata and then storing the metadata in the target database, and the metadata is data stored with characteristics indicating the data to be fused;

step 2, extracting data characteristics of the data to be fused by using a data warehouse;

and 3, generating metadata matched with the data to be fused according to the data characteristics.

In the embodiment of the application, the obtained data can be standardized by utilizing a user-defined data warehouse. For example, the structured data itself is standardized, and without processing, the semi-structured data and the unstructured data can extract data features, so that the structured data (namely metadata) is generated according to the data features, and the semi-structured data and the unstructured data are characterized by the metadata and stored in a database.

Optionally, the step S206 of generating, by using the data access module, a data identifier matched with the data to be fused may include:

and carrying out encryption calculation on the data to be fused by using the data access module to obtain a data identifier.

In the embodiment of the application, the data identifier can be calculated by adopting encryption modes such as MD5 encryption or SHA256 encryption, so that data collision can be effectively avoided. The data can be processed by modes of changing into upper case, changing into lower case, cutting off the data, turning over and the like, and then the data is encrypted, so that the possibility of data collision is further reduced.

Optionally, after the step S208 combines the data identifier and the data to be fused into a key-value pair and stores the key-value pair in the target database, the method may further include:

an index is created for the data identification in the target database, and the index is stored in the distributed search service cluster.

In the embodiment of the application, an index can be created for data in HBASE by combining with a distributed search service cluster, such as a Solr cluster. Solr is a stand-alone enterprise-level search application server that provides an API interface to the outside similar to Web-services. A user can submit an XML file with a certain format to a search engine server through an http request to generate an index; and a search request can also be provided through an Http Get operation, and a return result in an XML format is obtained.

The application provides a method for querying fused data by a user, which can be used for multi-condition query, and can be carried out through a website when querying a small amount of data, so that the use threshold of the fused data query is reduced, and people without related technologies can query and use the data in a visual interface mode. The following explains the inquiry method.

Optionally, after creating an index for the data identifier in the target database, the method further includes:

step 1, under the condition of receiving a multi-condition query statement, acquiring an index set matched with the multi-condition query statement, wherein the index set comprises a plurality of indexes corresponding to the multi-condition;

step 2, determining an identification set according to the index set, wherein the identification set comprises data identifications matched with each index in the index set;

and 3, returning the data to be fused corresponding to each data identifier in the identifier set to the query object as a query result.

In the embodiment of the application, when multi-condition query is performed, the Solr index can be queried first, the corresponding key set is returned, and then HBASE data is queried according to the returned key set, so that the problem that HBASE cannot support multi-condition query work is solved.

Optionally, obtaining the index set matching the multi-conditional query statement may include the steps of:

step 1, preprocessing a multi-condition query statement;

step 2, dividing the preprocessed multi-conditional query statement into an output block, a data table block and a query condition block;

step 3, generating an index query statement by using the output block, the data table block and the query condition block;

and 4, obtaining an index set according to the index query statement.

The specific implementation is as follows:

the first step is as follows: preprocessing the HSQL sentence, standardizing the HSQL to remove redundant space, carriage return symbols and other special characters.

The second step is that: dividing the HSQL statement, and dividing the HSQL statement into an output block, a data table block and a query condition block through a regular expression, for example: the HSQL statement "select f: name, f: age, f: generator from test. person where >20and generator ═ destination" can be split into output blocks by the regular expression (select) (+) (from) (+) (where | on | having | group by | order by | ENDOFCSQL) (+): name, age, gen, and f, data table block: person and query condition block: age >20and generator ═ simple'.

The third step: and further analyzing each divided part to generate a corresponding index query statement, wherein if the query field is f: name, f: age and f: generator, the library name to be queried is test, the table name is person, and the filtering condition of the data is that age is greater than 20and generator is 'role'.

The fourth step: query is performed and results are returned.

According to still another aspect of the embodiments of the present application, as shown in fig. 3, there is provided an HBASE-based data fusion apparatus including: the data acquisition module 301 is configured to acquire data to be fused, where the data to be fused is multi-source heterogeneous data; the type matching module 303 is used for determining a data access module matched with the data type of the data to be fused; an identifier generating module 305, configured to generate a data identifier matching the data to be fused by using the data access module; and the data fusion storage module 307 is configured to combine the data identifier and the data to be fused into a key value pair, and store the key value pair in a target database, where the target database is a distributed open source database.

It should be noted that the data obtaining module 301 in this embodiment may be configured to execute step S202 in this embodiment, the type matching module 303 in this embodiment may be configured to execute step S204 in this embodiment, the identifier generating module 305 in this embodiment may be configured to execute step S206 in this embodiment, and the data fusion storage module 307 in this embodiment may be configured to execute step S208 in this embodiment.

It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may operate in a hardware environment as shown in fig. 1, and may be implemented by software or hardware.

Optionally, the type matching module is specifically configured to: under the condition that the data type is structured data, calling a structured data processing tool, wherein the data access module comprises the structured data processing tool; under the condition that the data type is semi-structured data, calling a data analysis interface, wherein the data access module comprises a data analysis interface; and in the case that the data type is unstructured data, calling a metadata processing tool, wherein the data access module comprises the metadata processing tool.

Optionally, the HBASE-based data fusion apparatus further includes a data warehouse module, configured to: calling a data warehouse, wherein the data warehouse is created for a target object and is used for converting data to be fused into metadata and then storing the metadata in the target database, and the metadata is data stored with characteristics indicating the data to be fused; extracting data characteristics of data to be fused by using a data warehouse; and generating metadata matched with the data to be fused according to the data characteristics.

Optionally, the identifier generating module is specifically configured to: and carrying out encryption calculation on the data to be fused by using the data access module to obtain a data identifier.

Optionally, the HBASE-based data fusion apparatus further includes an indexing module, configured to: an index is created for the data identification in the target database, and the index is stored in the distributed search service cluster.

Optionally, the HBASE-based data fusion apparatus further includes a query module, configured to: under the condition that a multi-condition query statement is received, acquiring an index set matched with the multi-condition query statement, wherein the index set comprises a plurality of indexes corresponding to the multi-condition; determining an identification set according to the index set, wherein the identification set comprises data identifications matched with all indexes in the index set; and returning the data to be fused corresponding to each data identifier in the identifier set to the query object as a query result.

Optionally, the query module is further configured to: preprocessing a multi-conditional query statement; dividing the preprocessed multi-conditional query statement into an output block, a data table block and a query condition block; generating an index query statement by using the output block, the data table block and the query condition block; and obtaining an index set according to the index query statement.

There is also provided, in accordance with yet another aspect of the embodiments of the present application, a computer device, including a memory and a processor, the memory having stored therein a computer program executable on the processor, the processor implementing the steps when executing the computer program.

The memory and the processor in the computer device communicate with each other through a communication bus and a communication interface. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

There is also provided, in accordance with yet another aspect of an embodiment of the present application, a computer-readable medium having non-volatile program code executable by a processor.

Optionally, in an embodiment of the present application, a computer readable medium is configured to store program code for the processor to perform the following steps:

acquiring data to be fused, wherein the data to be fused is multi-source heterogeneous data;

determining a data access module matched with the data type of the data to be fused;

generating a data identifier matched with the data to be fused by using a data access module;

and forming key value pairs by the data identification and the data to be fused, and storing the key value pairs in a target database, wherein the target database is a distributed open source database.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.

When the embodiments of the present application are specifically implemented, reference may be made to the above embodiments, and corresponding technical effects are achieved.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk. It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data fusion method based on HBASE is characterized by comprising the following steps:

generating a data identifier matched with the data to be fused by using the data access module;

2. The method of claim 1, wherein determining a data access module matching the data type of the data to be fused comprises:

calling a structured data processing tool under the condition that the data type is structured data, wherein the data access module comprises the structured data processing tool;

calling a data analysis interface under the condition that the data type is semi-structured data, wherein the data access module comprises the data analysis interface;

and calling a metadata processing tool under the condition that the data type is unstructured data, wherein the data access module comprises the metadata processing tool.

3. The method according to claim 2, wherein after determining the data access module matching the data type of the data to be fused, the method further comprises:

calling a data warehouse, wherein the data warehouse is created for a target object and is used for converting the data to be fused into metadata and then storing the metadata in the target database, and the metadata is data stored with characteristics indicating the data to be fused;

extracting data characteristics of the data to be fused by using the data warehouse;

and generating the metadata matched with the data to be fused according to the data characteristics.

4. The method of claim 1, wherein generating, by the data access module, a data identifier matching the data to be fused comprises:

and carrying out encryption calculation on the data to be fused by utilizing the data access module to obtain the data identifier.

5. The method according to any one of claims 1 to 4, wherein after the data identifier and the data to be fused are combined into a key-value pair and the key-value pair is stored in the target database, the method further comprises:

creating an index for the data identification in the target database, wherein the index is stored in a distributed search service cluster.

6. The method of claim 5, wherein after creating an index for the data identity in the target database, the method further comprises:

under the condition that a multi-condition query statement is received, acquiring an index set matched with the multi-condition query statement, wherein the index set comprises a plurality of indexes corresponding to the multi-conditions;

determining an identification set according to the index set, wherein the identification set comprises the data identifications matched with the indexes in the index set;

and returning the data to be fused corresponding to each data identifier in the identifier set to a query object as a query result.

7. The method of claim 6, wherein obtaining a set of indices that match the multi-conditional query statement comprises:

preprocessing the multi-conditional query statement;

dividing the preprocessed multi-condition query statement into an output block, a data table block and a query condition block;

generating an index query statement by using the output block, the data table block and the query condition block;

and obtaining the index set according to the index query statement.

8. An HBASE-based data fusion device, comprising:

the data acquisition module is used for acquiring data to be fused, wherein the data to be fused is multi-source heterogeneous data;

the type matching module is used for determining a data access module matched with the data type of the data to be fused;

the identification generation module is used for generating a data identification matched with the data to be fused by utilizing the data access module;

and the data fusion storage module is used for forming key value pairs by the data identification and the data to be fused and storing the key value pairs in a target database, wherein the target database is a distributed open source database.

9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 1 to 7.