CN110704436A - Hbase-based index generation method and device - Google Patents

Hbase-based index generation method and device Download PDF

Info

Publication number
CN110704436A
CN110704436A CN201910917506.1A CN201910917506A CN110704436A CN 110704436 A CN110704436 A CN 110704436A CN 201910917506 A CN201910917506 A CN 201910917506A CN 110704436 A CN110704436 A CN 110704436A
Authority
CN
China
Prior art keywords
event
hbase
heat
row key
ordered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910917506.1A
Other languages
Chinese (zh)
Other versions
CN110704436B (en
Inventor
刘善炎
李涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Apas Technology Co Ltd
Original Assignee
Zhengzhou Apas Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Apas Technology Co Ltd filed Critical Zhengzhou Apas Technology Co Ltd
Priority to CN201910917506.1A priority Critical patent/CN110704436B/en
Publication of CN110704436A publication Critical patent/CN110704436A/en
Application granted granted Critical
Publication of CN110704436B publication Critical patent/CN110704436B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses an index generation method and device based on Hbase, and aims to solve the problems that in the prior art, the data index structure is unreasonable, and the data query rate is low. The method comprises the following steps: acquiring basic data generated in a specified time period; determining dimensionality heat corresponding to each event dimensionality, and sequentially adjusting each multi-dimensional event according to the dimensionality heat to obtain an ordered multi-dimensional event; counting the event heat of each ordered multidimensional event, and combining each ordered multidimensional event and the event heat corresponding to each ordered multidimensional event to form line key data of Hbase; writing the row key data into the Hbase to generate a row key index of the Hbase. According to the technical scheme, when the row key index is generated, the structure of the row key index can better meet the requirements of a user, the multidimensional events are arranged according to the dimension heat of each event dimension, and the query speed of Hbase data can be increased.

Description

Hbase-based index generation method and device
Technical Field
The invention relates to the technical field of data processing, in particular to an Hbase-based index generation method and device.
Background
The HBase is constructed based on Hadoop and is a distributed, nematic and telescopic mass data storage type database. The biggest difference between it and a general relational database is: HBase is well suited to store unstructured data, which is often used to store some data files (usually above TB level) with simple structure but very large amount of data, such as historical order records, log data, etc. HBase adopts Key-Value column storage, Rowkey is the Key of Key-Value, and represents only one row. HBase is retrieved according to Rowkey, and the system obtains data by finding the Region where a certain Rowkey (or a certain Rowkey range) is located and then routing the request for querying data to the Region.
In the traditional big data multidimensional analysis technology, the Apache Kylin (Apache Kylin) technical framework is the most commonly used. Kylin adopts an OLAP (on-line analysis and processing) engine, firstly a Data Model (Data Model) is established, then a basic fact table is constructed by MapReduce (a programming Model) by configuring Cube attributes, and Data of all Cuboid (a dimension combination) are pre-calculated and stored in Hbase. However, Cube is constructed in such a way that the line bonds of Hbase are formed according to a fixed dimensional sequence, so that the response speed of retrieving Hbase data is slow.
Therefore, there is a need to provide a more rational RowKey design, thereby making the query speed for retrieving data more efficient.
Disclosure of Invention
The embodiment of the invention provides an index generation method and device based on Hbase, and aims to solve the problems that in the prior art, the data index structure is unreasonable, and the data query rate is low.
To solve the above technical problem, the embodiment of the present invention is implemented as follows:
in a first aspect, an embodiment of the present invention provides an index generation method based on Hbase, including:
acquiring basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises event content in a plurality of event dimensions;
determining dimensionality heat corresponding to each event dimensionality, and sequentially adjusting each multi-dimensional event according to the dimensionality heat to obtain an ordered multi-dimensional event;
counting the event heat of each ordered multidimensional event, and combining each ordered multidimensional event and the event heat corresponding to each ordered multidimensional event to form line key data of Hbase; the event heat comprises a first click number of the user aiming at each ordered multi-dimensional event;
writing the row key data into the Hbase to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event.
In a second aspect, an embodiment of the present invention further provides an index generating apparatus based on Hbase, including:
the acquisition module is used for acquiring basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises event content in a plurality of event dimensions;
the determining module is used for determining the dimension heat corresponding to each event dimension and sequentially adjusting each multi-dimensional event according to the dimension heat to obtain an ordered multi-dimensional event;
the execution module is used for counting the event heat of each ordered multi-dimensional event and combining each ordered multi-dimensional event and the event heat corresponding to each ordered multi-dimensional event to form the row key data of the Hbase; the event heat comprises a first click number of the user aiming at each ordered multi-dimensional event;
the generating module is used for writing the row key data into the Hbase to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event.
In a third aspect, an embodiment of the present invention further provides a network device, including:
a memory storing computer program instructions;
a processor that when executed implements the Hbase-based index generation method of any of the above.
In a fourth aspect, embodiments of the present invention further provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to execute the Hbase-based index generation method according to any one of the above.
In the embodiment of the invention, the multidimensional events are sequentially adjusted according to the dimension heat of each event dimension, and then the row key data of Hbase is constructed according to the adjusted event heat of the ordered multidimensional events, so that each ordered multidimensional event in the row key index can be arranged based on the dimension heat of each event dimension, for example, the ordered multidimensional events are arranged according to the dimension heat from high to low, because the higher the dimension heat is, the more concerned the user is, the more concerned the event heat corresponding to the dimensions is, the more concerned the user is, the dimension is arranged in front, and the structure of the row key index is more in line with the requirement of the user; in addition, since the data are sequentially retrieved according to the order from left to right by the row key index, the query speed of the Hbase data can be increased by arranging the multidimensional events according to the dimension heat of each event dimension.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a Hbase-based index generation method in an embodiment of the present invention.
FIG. 2 is a schematic flow chart of a Hbase-based index generation method in another embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an index generating apparatus based on Hbase according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a network device in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a schematic flow chart of a Hbase-based index generation method in an embodiment of the present invention. The method of fig. 1 may include:
s102, acquiring basic data generated in a specified time period.
The basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event includes event content in a plurality of event dimensions.
Wherein the specified time period may be a certain month(s), a certain week(s), a certain day(s), etc.
For example, if the specified time period is the whole 5 months of 2019, and if udids represent user identifiers, and event dimensions include package (product), country, channel (channel), version, and the like, the basic data is event content in at least 4 event dimensions corresponding to the udids and the package (product), country, channel (channel), and version, respectively.
In addition, the basic data can be subjected to data cleaning so as to guarantee the quality of the basic data. And cleaning effective data from all the acquired data, wherein basic data which is used as effective data is cleaned, wherein the basic data is used as the effective data, and the basic data is used for cleaning the user identification, the event content under the event dimension is not empty, and the user identification accords with the preset length.
And S104, determining the dimension heat corresponding to each event dimension, and sequentially adjusting each multi-dimensional event according to the dimension heat to obtain an ordered multi-dimensional event.
For example, the event dimensions include package, count, channel, and version, and the dimension heat relationship corresponding to each event dimension is package > count > channel > version, where the symbol ">" indicates that the former dimension heat is higher than the latter dimension heat. Therefore, the ordered multidimensional events obtained after the sequential adjustment according to the dimension heat degree are package, count, channel and version.
And S106, counting the event heat of each ordered multidimensional event, and combining each ordered multidimensional event and the event heat corresponding to each ordered multidimensional event to form the row key data of the Hbase.
The event heat comprises a first click frequency of the user aiming at each ordered multidimensional event; the first number of clicks may be obtained by counting the records in the underlying data.
And S108, writing the row key data into the Hbase to generate a row key index of the Hbase.
The row key index is used for inquiring the first event heat corresponding to the first ordered multi-dimensional event according to the input first ordered multi-dimensional event.
In the embodiment of the invention, the multidimensional events are sequentially adjusted according to the dimension heat of each event dimension, and then the row key data of Hbase is constructed according to the adjusted event heat of the ordered multidimensional events, so that each ordered multidimensional event in the row key index can be arranged based on the dimension heat of each event dimension, for example, the ordered multidimensional events are arranged according to the dimension heat from high to low, because the higher the dimension heat is, the more concerned the user is, the more concerned the event heat corresponding to the dimensions is, the more concerned the user is, the dimension is arranged in front, and the structure of the row key index is more in line with the requirement of the user; in addition, since the data are sequentially retrieved according to the order from left to right by the row key index, the query speed of the Hbase data can be increased by arranging the multidimensional events according to the dimension heat of each event dimension.
In one embodiment, after generating the row key index of Hbase, when a query instruction for a first ordered multidimensional event input by using a specified query language is received, a first event heat corresponding to the first ordered multidimensional event may be queried according to the row key index.
Alternatively, the specified query language may be the structured query language SQL.
In this embodiment, the query language of Hbase itself is complex and slow in writing query instructions, and the select query syntax of SQL has the advantages of high speed and simple language when querying data, so that the query instruction of the first ordered multidimensional event input by SQL can effectively improve the data query speed.
In one embodiment, the specified query language is the structured query language, SQL; when a query instruction for a first ordered multidimensional event input by using a specified query language is received, the first event heat corresponding to the first ordered multidimensional event can be queried by the following steps:
first, a first mapping table of Hbase is created using specified middleware embedded in Hbase.
The first mapping table is used for mapping row key data in Hbase; the specified middleware is used to provide the Hbase with a query interface for querying data using the structured query language SQL.
Secondly, when a query instruction of a first ordered multidimensional event input by using Structured Query Language (SQL) is received, the first event heat corresponding to the first ordered multidimensional event is queried according to the row key index mapped by the first mapping table.
Optionally, the specified middleware may be Phoenix, and the query interface may be a JDBC interface of Phoenix; the first mapping table may include a dimension field (rk) for mapping the multidimensional event and a heat information field (num) for mapping the heat information.
In the embodiment, the first mapping table of the Hbase is created by using the specified middleware embedded in the Hbase, so that when the first event heat corresponding to the first ordered multidimensional event is queried, the query interface supporting Structured Query Language (SQL) query data provided by the specified middleware can be used for querying the data in the Hbase by using the SQL language, and the data query speed is improved.
In an embodiment, when determining the dimension heat corresponding to each event dimension, the second click times of the user for each event dimension in the specified time period may be counted, and the dimension heat corresponding to each event dimension is determined according to the second click times.
And the dimensionality heat and the second click frequency are positively correlated.
For example, the event dimensions comprise products, countries, channels and versions, the times of clicking each event dimension by a user in the day of 2019-05-01 are counted, the event dimensions are arranged according to the sequence from high to low of the times, the arrangement result is products, countries, channels and versions, and the dimension heat sequence of the day of 2019-05-01 is products, countries, channels and versions.
Optionally, the sorted event dimensions may be saved to a database. Following the example above, the event dimension order "product, country, channel, version" is saved to a database (e.g., mysql).
In this embodiment, the dimensionality heat corresponding to each event dimensionality is determined by specifying the click times of the user for each event dimensionality in a time period, so that the statistical result of the dimensionality heat can reflect the degree of interest of the user for each event dimensionality.
In one embodiment, when writing the row key data into the Hbase to generate the row key index of the Hbase, a message digest algorithm value of the row key data may be determined and inserted into a first mapping table of the Hbase created in advance to generate the row key index of the Hbase.
Alternatively, the Message digest algorithm value may be a higher security MD5(Message-DigestAlgorithm 5, version 5 of the Message digest algorithm) value.
In this embodiment, since the message digest algorithm can calculate an input string of any length to obtain an output of a fixed length, the data length of the RowKey can be unified by calculating the message digest algorithm value of the RowKey data and inserting the message digest algorithm value into the first mapping table of the Hbase created in advance to generate the RowKey index of the Hbase; meanwhile, the excessive data pressure on the database caused by the data stored in the plaintext is reduced.
In one embodiment, a second mapping table for Hbase may be created in a designated database prior to writing row key data into Hbase; based on this, when the row key data is written into the Hbase, the row key data may be inserted into the second mapping table so that the row key data may be operated by the structured query language SQL.
The second mapping table is used for mapping data in Hbase; the specified database may be operated by the structured query language SQL.
Optionally, the database is designated as Hive, and the second mapping table is a Hive table.
In one embodiment, the second mapping table may include a dimension field and a heat information field; the dimension field is used for mapping the multidimensional event, and the heat information field is used for mapping the heat information.
For example, the second mapping table includes a dimension field (rk) and a heat information field (num), and the structure of the second mapping table is: create table result _ info (rk string, num int).
In the above embodiment, the second mapping table of the Hbase is created in the specified database that can be operated by the structured query language SQL, and the row key data is inserted into the second mapping table, so that the row key data is written into the Hbase, when the row key data in the second mapping table is operated, the operation is actually equivalent to the operation of the row key data in the Hbase, compared with the conventional method of performing data insertion or query on the Hbase by using java syntax, the embodiment directly operates the Hive table by using the SQL statement, that is, the row key data can be written into the Hbase, thereby reducing the development cost.
FIG. 2 is a schematic flow chart of a Hbase-based index generation method in another embodiment of the present invention. The method of fig. 2 may include:
s201, acquiring basic data generated in a specified time period.
The basic data may include a plurality of udids (user identifiers) and event contents in at least 4 event dimensions corresponding to the udids, such as package, country, channel, and version.
For example, the obtained basic data generated in the whole 5 months of 2019 includes multiple udids and event contents in 4 event dimensions corresponding to the udids, such as:
the user identifiers "01", "001", "002", "003", "004", "005", "06", and the like below the columns of the user identifiers udid represent different user identifiers, the event dimension package represents different product contents below the columns of the event dimensions "cn.fish", "cn.cut", "cn.fish", "cn.map", "cn.cut", and the like, "cn", "usa", "cn", "usa", "cn", "usa", and the like below the columns of the event dimensions "game", "baidu", "google", "1.1.2", "2.1.0", "1.2.2", "1.3.0", "1.1.2", "25" and the like below the columns of the event dimensions "represent different country contents, and the event dimensions" 2.1.0 "represent different versions.
S202, data cleaning is carried out on the basic data, and the cleaned basic data are stored in a Hive library.
In the step, invalid data with the user identifier being empty and the event content under the event dimension being empty or the user identifier not conforming to the preset length can be screened out, and the valid data with the user identifier and the event content under the event dimension not being empty and the user identifier conforming to the preset length is cleaned out, so that the quality of the basic data is guaranteed.
Following the above example, the basic data after data washing is as follows:
Figure BDA0002216527250000091
s203, creating a Hive table for mapping Hbase in Hive.
Optionally, the Hive table includes a dimension field (rk) and a hotness information field (num), and the Hive table has a structure: create table result _ info (rk string, num int).
It should be noted that the execution order of S203 is not limited in this embodiment. For example, S203 may be executed after S206 described below, in addition to the order in which the Hive table is created after data cleansing is performed on the basic data, which is listed in this embodiment.
And S204, determining the dimension heat corresponding to each event dimension, and sequentially adjusting each multi-dimensional event according to the dimension heat to obtain an ordered multi-dimensional event.
Assuming that the designated time period is the day of 2019-05-01, the number of clicks of the user for each event dimension in the day of 2019-05-01 can be counted, the dimension heat corresponding to each event dimension is determined according to the number of clicks, and the dimensions are arranged in the order of the dimension heat from high to low to obtain the ordered multi-dimensional events.
Optionally, the specified time period and the corresponding ordered multidimensional event may be saved to the mysql database; the mysql database can provide two fields of data and dimension for storing the content, wherein the data is used for storing a specified time period, and the dimension is used for storing the ordered multidimensional event.
For example, if the event dimension order of the day 2019-05-01 is counted as package, count, channel, version, the content in the field data and the field dimension in the corresponding mysql database is:
dt dimension
2019-05-01 package、country、channel、version
s205, counting the event heat of each ordered multidimensional event.
In the step, the click times of the user in the basic data for each ordered multidimensional event can be counted, and the event heat of each ordered multidimensional event is determined according to the value of the click times.
For example, grouping and counting the basic data by using a Spark (calculation engine), and counting the event heat (num) corresponding to each multidimensional event, the following results can be obtained by following the above example:
Figure BDA0002216527250000101
and S206, combining the ordered multidimensional events and the event heat corresponding to the ordered multidimensional events respectively to form row key data of Hbase.
Following the above example, the row key data is:
Figure BDA0002216527250000102
s207, inserting the row key data into the Hive table to generate the row key index of the Hbase.
The row key index is used for inquiring the first event heat corresponding to the first ordered multi-dimensional event according to the input first ordered multi-dimensional event.
This step is writing the row key data into Hbase to generate the row key index of Hbase.
Following the above example, the order of the event dimension inserted in the dimension field is spliced and inserted into the result _ info table in the order of package, count, channel, and version, as follows:
in addition, the message digest algorithm value of the row key data can be determined first, and then the message digest algorithm value is inserted into the Hive table to generate the row key index of Hbase. Because the message digest algorithm can calculate the input string with any length to obtain the output with fixed length, the message digest algorithm value of the row key data is calculated and inserted into the Hive table to generate the row key index of Hbase, and the length of the RowKey can be unified; meanwhile, the excessive data pressure on the database caused by plaintext storage is reduced.
And S208, creating a homonymous table for mapping the Hbase by using Phoenix embedded in the Hbase.
Wherein the homonym table is used for mapping row key data in Hbase; phoenix is used to provide Hbase with a query interface JDBC that supports the structured query language SQL.
Optionally, the list of the same name may include a dimension field (rk) for mapping the multidimensional event and a heat information field (num) for mapping the heat information.
S209, when receiving the query instruction of the first ordered multidimensional event input by SQL language, querying the first event heat corresponding to the first ordered multidimensional event according to the row key index mapped by the same name table.
For example, when looking up cn, cut, in country cn, the channel is baidu, and the version selection is 1.1.2, the second piece of data of S207 is selected, and the SQL query statement "select num from result _ info where rk ═ cn, cut-cn-baidu-1.1.2'" of phoenix is called, so that num ═ 2 can be queried, and the number of results in the event dimension can be shown on the query page is 2.
In the embodiment of the invention, the multidimensional events are sequentially adjusted according to the dimension heat of each event dimension, and then the row key data of the Hbase is constructed according to the adjusted event heat of the ordered multidimensional events, so that each ordered multidimensional event in the row key index can be arranged based on the dimension heat of each event dimension, for example, the ordered multidimensional events are arranged according to the dimension heat from high to low, because the higher the dimension heat is, the more concerned the user is, the more concerned the event heat corresponding to the dimensions is, the more concerned the user is arranged in front, and the structure of the row key index is more in line with the requirement of the user; moreover, because the data are sequentially retrieved according to the sequence from left to right by the row key index query, the query speed of the Hbase data can be increased by arranging the multidimensional events according to the dimension heat of each event dimension; meanwhile, the mapping table of the Hbase is created in the Hive supporting the SQL language, so that the step of inserting operation by using the java grammar of the Hbase is omitted, the Hive table is directly operated by using the SQL statement, and the development cost is reduced.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Fig. 3 is a schematic structural diagram of an index generating apparatus based on Hbase according to an embodiment of the present invention. Referring to fig. 3, an Hbase-based index generating apparatus 300 may include:
an obtaining module 310, configured to obtain basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises event content in a plurality of event dimensions;
the determining module 320 is configured to determine dimension heat corresponding to each event dimension, and sequentially adjust each multidimensional event according to the dimension heat to obtain an ordered multidimensional event;
the execution module 330 is configured to count event heats of the ordered multidimensional events, and combine the ordered multidimensional events and the event heats corresponding to the ordered multidimensional events to form row key data of the Hbase; the event heat comprises the first click times of the user aiming at each ordered multidimensional event;
a generating module 340, configured to write the row key data into the Hbase to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event.
In one embodiment, the apparatus 300 further comprises:
and the query module is used for querying the first event heat corresponding to the first ordered multidimensional event according to the row key index when receiving a query instruction of the first ordered multidimensional event input by using the specified query language.
In one embodiment, the specified query language is the structured query language, SQL; the query module comprises:
the building unit is used for building a first mapping table of the Hbase by using the specified middleware embedded in the Hbase; the first mapping table is used for mapping row key data in Hbase; the specified middleware is used for providing a query interface for querying data by using Structured Query Language (SQL) for Hbase;
and the query unit is used for querying the first event heat corresponding to the first ordered multidimensional event according to the row key index mapped by the first mapping table when receiving a query instruction of the first ordered multidimensional event input by using the Structured Query Language (SQL).
In one embodiment, the determining module 320 includes:
the counting unit is used for counting second click times of the user aiming at each event dimension in a specified time period;
the first determining unit is used for determining the dimension heat degree corresponding to each event dimension according to the second click times; and the dimensionality heat and the second click frequency are positively correlated.
In one embodiment, the generation module 340 includes:
the second determining unit is used for determining the message digest algorithm value of the row key data;
and the generating unit is used for inserting the message digest algorithm value into a first mapping table of the Hbase created in advance so as to generate the line key index of the Hbase.
In one embodiment, the apparatus 300 further comprises:
the creating module is used for creating a second mapping table of the Hbase in the specified database; the second mapping table is used for mapping data in Hbase; specifying that the database can be operated by the structured query language, SQL;
writing row key data into Hbase, including:
the row key data is inserted into the second mapping table so that the row key data can be operated on by the structured query language SQL.
In one embodiment, the second mapping table comprises a dimension field and a hot degree information field; the dimension field is used for mapping the multidimensional event, and the heat information field is used for mapping the heat information.
The Hbase-based index generation apparatus provided in the embodiment of the present invention can implement each process implemented in the above Hbase-based index generation method embodiment, and is not described here again to avoid repetition.
In the embodiment of the invention, the multidimensional events are sequentially adjusted according to the dimension heat of each event dimension, and then the row key data of Hbase is constructed according to the adjusted event heat of the ordered multidimensional events, so that each ordered multidimensional event in the row key index can be arranged based on the dimension heat of each event dimension, for example, the ordered multidimensional events are arranged according to the dimension heat from high to low, because the higher the dimension heat is, the more concerned the user is, the more concerned the event heat corresponding to the dimensions is, the more concerned the user is, the dimension is arranged in front, and the structure of the row key index is more in line with the requirement of the user; in addition, since the data are sequentially retrieved according to the order from left to right by the row key index, the query speed of the Hbase data can be increased by arranging the multidimensional events according to the dimension heat of each event dimension.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a network device according to an embodiment of the present invention, which can implement details of the Hbase-based index generating method performed by the network device in the above embodiment, and achieve the same effect. As shown in fig. 4, the network device 400 includes: a processor 401, a transceiver 402, a memory 403, a user interface 404, and a bus interface, wherein:
in this embodiment of the present invention, the network device 400 further includes: a computer program stored in a memory 403 and executable on a processor 401, the computer program when executed by the processor 401 performing the steps of:
acquiring basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises event content in a plurality of event dimensions;
determining the dimensionality heat corresponding to each event dimensionality, and sequentially adjusting each multi-dimensional event according to the dimensionality heat to obtain an ordered multi-dimensional event;
counting the event heat of each ordered multidimensional event, and combining each ordered multidimensional event and the event heat corresponding to each ordered multidimensional event to form line key data of Hbase; the event heat comprises the first click times of the user aiming at each ordered multidimensional event;
writing the row key data into Hbase to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event.
In FIG. 4, the bus architecture may include any number of interconnected buses and bridges, with one or more processors, represented by processor 401, and various circuits, represented by memory 403, being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 402 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. For different user devices, the user interface 404 may also be an interface capable of interfacing with a desired device, including but not limited to a keypad, display, speaker, microphone, joystick, etc.
The processor 401 is responsible for managing the bus architecture and general processing, and the memory 403 may store data used by the processor 401 in performing operations.
Optionally, the computer program when executed by the processor 401 may further implement the steps of:
and after the row key data is written into the Hbase to generate a row key index of the Hbase, when a query instruction of a first ordered multidimensional event input by using a specified query language is received, querying the first event heat corresponding to the first ordered multidimensional event according to the row key index.
Optionally, the specified query language is structured query language SQL; the computer program, when executed by the processor 401, may further implement the steps of:
establishing a first mapping table of the Hbase by using a specified middleware embedded in the Hbase; the first mapping table is used for mapping row key data in Hbase; the specified middleware is used for providing a query interface for querying data by using Structured Query Language (SQL) for Hbase;
when a query instruction of a first ordered multidimensional event input by using Structured Query Language (SQL) is received, the first event heat corresponding to the first ordered multidimensional event is queried according to the row key index mapped by the first mapping table.
Optionally, the computer program when executed by the processor 401 may further implement the steps of:
counting second click times of the user for each event dimension in a specified time period;
determining the dimensionality heat corresponding to each event dimensionality according to the second click times; and the dimensionality heat and the second click frequency are positively correlated.
Optionally, the computer program when executed by the processor 401 may further implement the steps of:
determining a message digest algorithm value of the row key data;
the message digest algorithm value is inserted into a first mapping table of the Hbase created in advance to generate a row key index of the Hbase.
Optionally, the computer program when executed by the processor 401 may further implement the steps of:
creating a second mapping table of Hbase in a specified database; the second mapping table is used for mapping data in Hbase; specifying that the database can be operated by the structured query language, SQL;
the row key data is inserted into the second mapping table so that the row key data can be operated on by the structured query language SQL.
Optionally, the second mapping table includes a dimension field and a heat information field; the dimension field is used for mapping the multidimensional event, and the heat information field is used for mapping the heat information.
In the embodiment of the invention, the multidimensional events are sequentially adjusted according to the dimension heat of each event dimension, and then the row key data of Hbase is constructed according to the adjusted event heat of the ordered multidimensional events, so that each ordered multidimensional event in the row key index can be arranged based on the dimension heat of each event dimension, for example, the ordered multidimensional events are arranged according to the dimension heat from high to low, because the higher the dimension heat is, the more concerned the user is, the more concerned the event heat corresponding to the dimensions is, the more concerned the user is, the dimension is arranged in front, and the structure of the row key index is more in line with the requirement of the user; in addition, since the data are sequentially retrieved according to the order from left to right by the row key index, the query speed of the Hbase data can be increased by arranging the multidimensional events according to the dimension heat of each event dimension.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned Hbase-based index generation method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. An Hbase-based index generation method is characterized by comprising the following steps:
acquiring basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises event content in a plurality of event dimensions;
determining dimensionality heat corresponding to each event dimensionality, and sequentially adjusting each multi-dimensional event according to the dimensionality heat to obtain an ordered multi-dimensional event;
counting the event heat of each ordered multidimensional event, and combining each ordered multidimensional event and the event heat corresponding to each ordered multidimensional event to form line key data of Hbase; the event heat comprises a first click number of the user aiming at each ordered multi-dimensional event;
writing the row key data into the Hbase to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event.
2. The method of claim 1, wherein after writing the row key data into the Hbase to generate a row key index for the Hbase, further comprising:
and when a query instruction of the first ordered multidimensional event input by using a specified query language is received, querying the heat degree of the first event corresponding to the first ordered multidimensional event according to the row key index.
3. The method of claim 2, wherein the specified query language is Structured Query Language (SQL); when a query instruction for the first ordered multidimensional event input by using a specified query language is received, querying the heat of the first event corresponding to the first ordered multidimensional event according to the row key index, including:
creating a first mapping table of the Hbase by using a specified middleware embedded in the Hbase; the first mapping table is used for mapping the row key data in the Hbase; the specified middleware is used for providing the Hbase with a query interface for querying data by using the Structured Query Language (SQL);
when a query instruction of the first ordered multidimensional event input by using the structured query language SQL is received, the first event heat corresponding to the first ordered multidimensional event is queried according to the row key index mapped by the first mapping table.
4. The method according to claim 1, wherein the determining the dimension heat corresponding to each event dimension comprises:
counting second click times of the user for each event dimension in the specified time period;
determining the dimensionality heat corresponding to each event dimensionality according to the second click times; the dimensionality heat and the second click times are positively correlated.
5. The method of claim 1, wherein said writing said row key data into said Hbase to generate a row key index for said Hbase comprises:
determining a message digest algorithm value of the row key data;
inserting the message digest algorithm value into a first mapping table of the Hbase created in advance to generate a row key index of the Hbase.
6. The method of claim 1, wherein before writing the row key data into the Hbase, further comprising:
creating a second mapping table of the Hbase in a specified database; the second mapping table is used for mapping data in the Hbase; the specified database can be operated by Structured Query Language (SQL);
the writing the row key data into the Hbase includes:
inserting the row key data into the second mapping table so that the row key data can be operated by the structured query language SQL.
7. The method of claim 6, wherein the second mapping table comprises a dimension field and a heat information field; the dimension field is used for mapping the multidimensional event, and the heat information field is used for mapping the heat information.
8. An Hbase-based index generation apparatus, comprising:
the acquisition module is used for acquiring basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises event content in a plurality of event dimensions;
the determining module is used for determining the dimension heat corresponding to each event dimension and sequentially adjusting each multi-dimensional event according to the dimension heat to obtain an ordered multi-dimensional event;
the execution module is used for counting the event heat of each ordered multi-dimensional event and combining each ordered multi-dimensional event and the event heat corresponding to each ordered multi-dimensional event to form the row key data of the Hbase; the event heat comprises a first click number of the user aiming at each ordered multi-dimensional event;
the generating module is used for writing the row key data into the Hbase to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event.
9. A network device, comprising:
a memory storing computer program instructions;
a processor that when executed implements the Hbase-based index generation method of any one of claims 1 to 7.
10. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the Hbase-based index generation method of any one of claims 1 to 7.
CN201910917506.1A 2019-09-26 2019-09-26 Hbase-based index generation method and device Active CN110704436B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910917506.1A CN110704436B (en) 2019-09-26 2019-09-26 Hbase-based index generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910917506.1A CN110704436B (en) 2019-09-26 2019-09-26 Hbase-based index generation method and device

Publications (2)

Publication Number Publication Date
CN110704436A true CN110704436A (en) 2020-01-17
CN110704436B CN110704436B (en) 2022-07-19

Family

ID=69198137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910917506.1A Active CN110704436B (en) 2019-09-26 2019-09-26 Hbase-based index generation method and device

Country Status (1)

Country Link
CN (1) CN110704436B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102306176A (en) * 2011-08-25 2012-01-04 浙江鸿程计算机系统有限公司 On-line analytical processing (OLAP) keyword query method based on intrinsic characteristic of data warehouse
IN2014MU00872A (en) * 2014-03-14 2015-09-25 Tata Consultancy Services Ltd
CN105138592A (en) * 2015-07-31 2015-12-09 武汉虹信技术服务有限责任公司 Distributed framework-based log data storing and retrieving method
CN107220287A (en) * 2017-04-24 2017-09-29 东软集团股份有限公司 For the index managing method of log query, device, storage medium and equipment
CN107239497A (en) * 2017-05-02 2017-10-10 广东万丈金数信息技术股份有限公司 Hot content searching method and system
CN108595668A (en) * 2018-04-28 2018-09-28 深圳春沐源控股有限公司 A kind of auto ordering method of commodity, device and computer readable storage medium
CN109284351A (en) * 2018-08-14 2019-01-29 青海大学 A kind of data query method based on HBase database

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102306176A (en) * 2011-08-25 2012-01-04 浙江鸿程计算机系统有限公司 On-line analytical processing (OLAP) keyword query method based on intrinsic characteristic of data warehouse
IN2014MU00872A (en) * 2014-03-14 2015-09-25 Tata Consultancy Services Ltd
CN105138592A (en) * 2015-07-31 2015-12-09 武汉虹信技术服务有限责任公司 Distributed framework-based log data storing and retrieving method
CN107220287A (en) * 2017-04-24 2017-09-29 东软集团股份有限公司 For the index managing method of log query, device, storage medium and equipment
CN107239497A (en) * 2017-05-02 2017-10-10 广东万丈金数信息技术股份有限公司 Hot content searching method and system
CN108595668A (en) * 2018-04-28 2018-09-28 深圳春沐源控股有限公司 A kind of auto ordering method of commodity, device and computer readable storage medium
CN109284351A (en) * 2018-08-14 2019-01-29 青海大学 A kind of data query method based on HBase database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈希林等: "针对微博信息分析的HBase存储结构设计", 《信息网络安全》 *

Also Published As

Publication number Publication date
CN110704436B (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN108363602B (en) Intelligent UI (user interface) layout method and device, terminal equipment and storage medium
US11281793B2 (en) User permission data query method and apparatus, electronic device and medium
CN110795455B (en) Dependency analysis method, electronic device, computer apparatus, and readable storage medium
KR102097881B1 (en) Method and apparatus for processing a short link, and a short link server
JP7170638B2 (en) Generating, Accessing, and Displaying Lineage Metadata
US9305016B2 (en) Efficient data extraction by a remote application
CN109241159B (en) Partition query method and system for data cube and terminal equipment
US9229961B2 (en) Database management delete efficiency
WO2011103579A2 (en) Operating on time sequences of data
WO2018036549A1 (en) Distributed database query method and device, and management system
CN112434015B (en) Data storage method and device, electronic equipment and medium
CN109299101B (en) Data retrieval method, device, server and storage medium
CN109388659B (en) Data storage method, device and computer readable storage medium
CN108319608A (en) The method, apparatus and system of access log storage inquiry
US20230315727A1 (en) Cost-based query optimization for untyped fields in database systems
CN111221785A (en) Semantic data lake construction method of multi-source heterogeneous data
CN113051460A (en) Elasticissearch-based data retrieval method and system, electronic device and storage medium
WO2019161620A1 (en) Application dependency update method, terminal and device, and storage medium
CN108182204A (en) The processing method and processing device of data query based on house prosperity transaction multi-dimensional data
CN110704472A (en) Data query statistical method and device
CN113297266B (en) Data processing method, device, equipment and computer storage medium
CN104123329A (en) Search method and device
CN112905600A (en) Data query method and device, storage medium and electronic equipment
CN110704436B (en) Hbase-based index generation method and device
CN114443599A (en) Data synchronization method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant