CN110704436B - Hbase-based index generation method and device - Google Patents
Hbase-based index generation method and device Download PDFInfo
- Publication number
- CN110704436B CN110704436B CN201910917506.1A CN201910917506A CN110704436B CN 110704436 B CN110704436 B CN 110704436B CN 201910917506 A CN201910917506 A CN 201910917506A CN 110704436 B CN110704436 B CN 110704436B
- Authority
- CN
- China
- Prior art keywords
- event
- hbase
- heat
- row key
- ordered
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses an index generation method and device based on Hbase, and aims to solve the problems that in the prior art, the data index structure is unreasonable, and the data query rate is low. The method comprises the following steps: acquiring basic data generated in a specified time period; determining dimensionality heat corresponding to each event dimensionality, and sequentially adjusting each multi-dimensional event according to the dimensionality heat to obtain an ordered multi-dimensional event; counting the event heat of each ordered multidimensional event, and combining each ordered multidimensional event and the event heat corresponding to each ordered multidimensional event to form line key data of Hbase; writing the row key data into the Hbase to generate a row key index of the Hbase. According to the technical scheme, when the row key index is generated, the structure of the row key index can better meet the requirements of a user, the multidimensional events are arranged according to the dimensionality of each event, and the query speed of Hbase data can be increased.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an Hbase-based index generation method and device.
Background
The HBase is constructed based on Hadoop and is a distributed, nematic and telescopic mass data storage type database. The biggest difference between it and a general relational database is: HBase is well suited to store unstructured data, which is often used to store some data files (usually above TB level) with simple structure but very large amount of data, such as historical order records, log data, etc. HBase adopts Key-Value column storage, Rowkey is the Key of Key-Value, and represents only one row. HBase is retrieved according to Rowkey, and the system obtains data by finding the Region where a certain Rowkey (or a certain Rowkey range) is located and then routing the request for querying data to the Region.
In the traditional big data multidimensional analysis technology, the most common technology framework is Apache Kylin (Apache Kylin). Kylin adopts an OLAP (on-line analysis and processing) engine, firstly a Data Model (Data Model) is established, then a basic fact table is constructed by MapReduce (a programming Model) by configuring Cube attributes, and Data of all Cuboid (a dimension combination) are pre-calculated and stored in Hbase. However, Cube is constructed in such a way that the line bonds of Hbase are formed according to a fixed dimensional sequence, so that the response speed of retrieving Hbase data is slow.
Therefore, there is a need to provide a more rational RowKey design, thereby making the query speed for retrieving data more efficient.
Disclosure of Invention
The embodiment of the invention provides an index generation method and device based on Hbase, and aims to solve the problems that in the prior art, the data index structure is unreasonable, and the data query rate is low.
To solve the above technical problem, the embodiment of the present invention is implemented as follows:
in a first aspect, an embodiment of the present invention provides an index generation method based on Hbase, including:
acquiring basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises event content in a plurality of event dimensions;
determining dimensionality heat corresponding to each event dimensionality, and sequentially adjusting each multi-dimensional event according to the dimensionality heat to obtain an ordered multi-dimensional event;
counting the event heat of each ordered multidimensional event, and combining each ordered multidimensional event and the event heat corresponding to each ordered multidimensional event to form line key data of Hbase; the event heat comprises a first click number of the user aiming at each ordered multidimensional event;
writing the row key data into the Hbase to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event.
In a second aspect, an embodiment of the present invention further provides an index generating apparatus based on Hbase, including:
the acquisition module is used for acquiring basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises event content in a plurality of event dimensions;
the determining module is used for determining the dimensionality heat corresponding to each event dimensionality and sequentially adjusting each multi-dimensional event according to the dimensionality heat to obtain an ordered multi-dimensional event;
the execution module is used for counting the event heat of each ordered multi-dimensional event and combining each ordered multi-dimensional event and the event heat corresponding to each ordered multi-dimensional event to form the row key data of the Hbase; the event heat comprises a first click number of the user aiming at each ordered multi-dimensional event;
the generating module is used for writing the row key data into the Hbase so as to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event.
In a third aspect, an embodiment of the present invention further provides a network device, including:
a memory storing computer program instructions;
a processor that when executed by the processor implements the Hbase based index generation method of any of the above.
In a fourth aspect, embodiments of the present invention further provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the Hbase-based index generation method according to any one of the above-described embodiments.
In the embodiment of the invention, the multidimensional events are sequentially adjusted according to the dimension heat of each event dimension, and then the row key data of Hbase is constructed according to the adjusted event heat of the ordered multidimensional events, so that each ordered multidimensional event in the row key index can be arranged based on the dimension heat of each event dimension, for example, the ordered multidimensional events are arranged according to the dimension heat from high to low, because the higher the dimension heat is, the more concerned the user is, the more concerned the event heat corresponding to the dimensions is, the more concerned the user is, the dimension is arranged in front, and the structure of the row key index is more in line with the requirement of the user; in addition, since the data are sequentially retrieved according to the order from left to right by the row key index, the query speed of the Hbase data can be increased by arranging the multidimensional events according to the dimension heat of each event dimension.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a Hbase-based index generation method in an embodiment of the present invention.
FIG. 2 is a schematic flow chart of a Hbase-based index generation method in another embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an index generating apparatus based on Hbase according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a network device in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a schematic flow chart of a Hbase-based index generation method in an embodiment of the present invention. The method of fig. 1 may include:
s102, acquiring basic data generated in a specified time period.
The basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event includes event content in multiple event dimensions.
Wherein the specified time period may be a certain month(s), a certain week(s), a certain day(s), etc.
For example, if the specified time period is the whole 5 months of 2019, and if udids represent user identifiers, and event dimensions include package (product), country, channel (channel), version, and the like, the basic data is event content in at least 4 event dimensions corresponding to the udids and the package (product), country, channel (channel), and version, respectively.
In addition, the basic data can be subjected to data cleaning so as to guarantee the quality of the basic data. And cleaning effective data from all the acquired data, wherein basic data which is used as effective data is cleaned, wherein the basic data is used as the effective data, and the basic data is used for cleaning the user identification, the event content under the event dimension is not empty, and the user identification accords with the preset length.
And S104, determining the dimension heat corresponding to each event dimension, and sequentially adjusting each multi-dimensional event according to the dimension heat to obtain an ordered multi-dimensional event.
For example, the event dimensions include package, count, channel, and version, and the dimension heat relationship corresponding to each event dimension is package > count > channel > version, where the symbol ">" indicates that the former dimension heat is higher than the latter dimension heat. Therefore, the ordered multidimensional events obtained after the sequential adjustment according to the dimension heat degree are package, count, channel and version.
And S106, counting the event heat of each ordered multidimensional event, and combining each ordered multidimensional event and the event heat corresponding to each ordered multidimensional event to form the row key data of the Hbase.
The event heat comprises a first click frequency of the user aiming at each ordered multidimensional event; the first number of clicks may be obtained by counting the records in the base data.
And S108, writing the row key data into the Hbase to generate a row key index of the Hbase.
The row key index is used for inquiring the first event heat corresponding to the first ordered multi-dimensional event according to the input first ordered multi-dimensional event.
In the embodiment of the invention, the row key data of Hbase is constructed by sequentially adjusting the multidimensional events according to the dimension heat of each event dimension and further according to the event heat of the adjusted ordered multidimensional events, so that each ordered multidimensional event in the row key index can be arranged based on the dimension heat of each event dimension, for example, the ordered multidimensional events are arranged according to the dimension heat from high to low, because the higher the dimension heat is, the more concerned dimension of the user is arranged in front, and the structure of the row key index is more in line with the requirement of the user; in addition, since the data are sequentially retrieved according to the order from left to right by the row key index, the query speed of the Hbase data can be increased by arranging the multidimensional events according to the dimension heat of each event dimension.
In one embodiment, after generating the row key index of Hbase, when a query instruction for a first ordered multidimensional event input by using a specified query language is received, a first event heat corresponding to the first ordered multidimensional event may be queried according to the row key index.
Alternatively, the specified query language may be the structured query language SQL.
In this embodiment, the Hbase query language itself is complex and slow in writing query instructions, and the select query syntax of SQL has the advantages of high speed and simple language when querying data, so that the query instructions of the first ordered multidimensional events are input by using SQL, which can effectively improve the data query speed.
In one embodiment, the specified query language is the structured query language, SQL; when a query instruction for a first ordered multidimensional event input by using a specified query language is received, the heat degree of the first event corresponding to the first ordered multidimensional event can be queried by the following steps:
first, a first mapping table of Hbase is created using specified middleware embedded in Hbase.
The first mapping table is used for mapping row key data in Hbase; the specified middleware is used to provide the Hbase with a query interface for querying data using structured query language, SQL.
Secondly, when a query instruction of a first ordered multidimensional event input by using Structured Query Language (SQL) is received, the first event heat corresponding to the first ordered multidimensional event is queried according to the row key index mapped by the first mapping table.
Optionally, the specified middleware may be Phoenix, and the query interface may be a JDBC interface of Phoenix; the first mapping table may include a dimension field (rk) for mapping the multidimensional event and a heat information field (num) for mapping the heat information.
In the embodiment, the first mapping table of the Hbase is created by using the specified middleware embedded in the Hbase, so that when the first event heat corresponding to the first ordered multidimensional event is queried, the query interface supporting Structured Query Language (SQL) query data provided by the specified middleware can be used for querying the data in the Hbase by using the SQL language, and the data query speed is improved.
In an embodiment, when determining the dimension heat corresponding to each event dimension, the second click times of the user for each event dimension in the specified time period may be counted, and the dimension heat corresponding to each event dimension is determined according to the second click times.
And the dimensionality heat and the second click frequency are positively correlated.
For example, the event dimensions comprise products, countries, channels and versions, the times of clicking each event dimension by a user in the day of 2019-05-01 are counted, the event dimensions are arranged according to the sequence from high to low of the times, the arrangement result is products, countries, channels and versions, and the dimension heat sequence of the day of 2019-05-01 is products, countries, channels and versions.
Optionally, the sorted event dimensions may be saved to a database. Following the example above, the event dimension order "product, country, channel, version" is saved to a database (e.g., mysql).
In this embodiment, the dimensionality heat corresponding to each event dimensionality is determined by specifying the click times of the user for each event dimensionality in a time period, so that the statistical result of the dimensionality heat can reflect the degree of interest of the user for each event dimensionality.
In one embodiment, when writing the row key data into the Hbase to generate the row key index of the Hbase, a message digest algorithm value of the row key data may be determined and inserted into a first mapping table of the Hbase created in advance to generate the row key index of the Hbase.
Alternatively, the Message digest algorithm value may be a higher security MD5(Message-DigestAlgorithm 5, version 5 of the Message digest algorithm) value.
In this embodiment, since the message digest algorithm can calculate an input string of any length to obtain an output of a fixed length, the data length of the RowKey can be unified by calculating the message digest algorithm value of the RowKey data and inserting the message digest algorithm value into the first mapping table of the Hbase created in advance to generate the RowKey index of the Hbase; meanwhile, the excessive data pressure of the database caused by the data stored in the plaintext is reduced.
In one embodiment, a second mapping table for Hbase may be created in a designated database prior to writing row key data into Hbase; based on this, when the row key data is written into the Hbase, the row key data may be inserted into the second mapping table so that the row key data may be operated by the structured query language SQL.
The second mapping table is used for mapping data in Hbase; the specified database may be operated by the structured query language SQL.
Optionally, the database is designated as Hive, and the second mapping table is a Hive table.
In one embodiment, the second mapping table may include a dimension field and a heat information field; the dimension field is used for mapping the multidimensional event, and the heat information field is used for mapping the heat information.
For example, the second mapping table includes a dimension field (rk) and a heat information field (num), and the structure of the second mapping table is: create table result _ info (rk string, num int).
In the above embodiment, the second mapping table of the Hbase is created in the specified database that can be operated by the structured query language SQL, and the row key data is inserted into the second mapping table, so that the row key data is written into the Hbase, when the row key data in the second mapping table is operated, the operation is actually equivalent to the operation of the row key data in the Hbase, compared with the conventional method of performing data insertion or query on the Hbase by using java syntax, the embodiment directly operates the Hive table by using the SQL statement, that is, the row key data can be written into the Hbase, thereby reducing the development cost.
FIG. 2 is a schematic flow chart of a Hbase-based index generation method in another embodiment of the present invention. The method of fig. 2 may include:
s201, acquiring basic data generated in a specified time period.
The basic data may include a plurality of udids (user identifiers) and event contents in at least 4 event dimensions corresponding to the udids, such as package, country, channel, and version.
For example, the obtained basic data generated in the whole 5 months of 2019 includes multiple udids and event contents in 4 event dimensions corresponding to the udids, such as:
the "01", "001", "002", "003", "004", "005", "06", etc. below the column of the user id "represent different user ids, the" cn.fish "," cn.cut "," cn.fish "," cn.map "," cn.cut ", etc. below the column of the event dimension package represent different product contents, the" cn "," usa "," cn "," usa "," cn "," usa ", etc. below the column of the event dimension counter represent different country contents, the" google "," baidu "," google "," baidu ", etc. below the column of the event dimension channel represent different channel contents, and the" 1.3.0 "," 1.1.2 "," 2.1.0 "," 1.2.2 "," 1.3.0 "," 1.1.2 "and" 2.1.0 "below the column of the event dimension version represent different versions.
S202, data cleaning is carried out on the basic data, and the cleaned basic data are stored in a Hive library.
In the step, invalid data with the user identifier being empty and the event content under the event dimension being empty or the user identifier not conforming to the preset length can be screened out, and the valid data with the user identifier and the event content under the event dimension not being empty and the user identifier conforming to the preset length is cleaned out, so that the quality of the basic data is guaranteed.
Following the above example, the basic data after data washing is as follows:
s203, creating a Hive table for mapping Hbase in Hive.
Optionally, the Hive table includes a dimension field (rk) and a hotness information field (num), and the Hive table has a structure: create table result _ info (rk string, num int).
It should be noted that the execution order of S203 is not limited in this embodiment. For example, S203 may be executed after S206 described below, in addition to the order in which the Hive table is created after data cleansing is performed on the basic data, which is listed in this embodiment.
And S204, determining the dimension heat corresponding to each event dimension, and sequentially adjusting each multi-dimensional event according to the dimension heat to obtain an ordered multi-dimensional event.
Assuming that the designated time period is the day of 2019-05-01, the number of clicks of the user for each event dimension in the day of 2019-05-01 can be counted, the dimension heat corresponding to each event dimension is determined according to the number of clicks, and the dimensions are arranged in the order of the dimension heat from high to low to obtain the ordered multi-dimensional events.
Optionally, the specified time period and its corresponding ordered multidimensional events may be saved to the mysql database; the mysql database can provide two fields of data and dimension for storing the content, wherein the data is used for storing a specified time period, and the dimension is used for storing the ordered multidimensional event.
For example, if the event dimension order of the day 2019-05-01 is counted as package, count, channel, version, the content in the field data and the field dimension in the corresponding mysql database is:
dt dimension
2019-05-01 package、country、channel、version
s205, counting the event heat of each ordered multidimensional event.
In the step, the click times of the user in the basic data for each ordered multidimensional event can be counted, and the event heat of each ordered multidimensional event is determined according to the value of the click times.
For example, a Spark (calculation engine) is used to count the basic data in groups, and the event heat (num) corresponding to each multidimensional event is counted, and the following results can be obtained by following the above example:
and S206, combining the ordered multidimensional events and the event heat corresponding to the ordered multidimensional events respectively to form row key data of Hbase.
Following the above example, the row key data is:
s207, inserting the row key data into the Hive table to generate the row key index of the Hbase.
The row key index is used for inquiring the first event heat corresponding to the first ordered multi-dimensional event according to the input first ordered multi-dimensional event.
This step is to write the row key data into Hbase to generate the row key index of Hbase.
Following the above example, the order of the event dimension inserted in the dimension field is spliced and inserted into the result _ info table in the order of package, count, channel, and version, as follows:
in addition, the message digest algorithm value of the row key data can be determined first, and then the message digest algorithm value is inserted into the Hive table to generate the row key index of Hbase. Because the message digest algorithm can calculate the input string with any length to obtain the output with fixed length, the message digest algorithm value of the row key data is calculated and inserted into the Hive table to generate the row key index of Hbase, and the length of the RowKey can be unified; meanwhile, the excessive data pressure on the database caused by plaintext storage is reduced.
And S208, creating a homonymous table for mapping the Hbase by using Phoenix embedded in the Hbase.
Wherein the homonym table is used for mapping row key data in Hbase; phoenix is used to provide Hbase with a query interface JDBC that supports the structured query language SQL.
Optionally, the list of the same name may include a dimension field (rk) for mapping the multidimensional event and a heat information field (num) for mapping the heat information.
S209, when receiving the query instruction of the first ordered multidimensional event input by SQL language, querying the first event heat corresponding to the first ordered multidimensional event according to the row key index mapped by the same name table.
For example, when the product cn is to be queried, the channel is baidu in the country cn, and the version is 1.1.2, the second piece of data of S207 is selected, and the SQL query statement "select num from result _ info where rk ═ cn.
In the embodiment of the invention, the row key data of Hbase is constructed by sequentially adjusting the multidimensional events according to the dimension heat of each event dimension and further according to the event heat of the adjusted ordered multidimensional events, so that each ordered multidimensional event in the row key index can be arranged based on the dimension heat of each event dimension, for example, the ordered events are arranged according to the dimension heat from high to low, because the higher the dimension heat is, the more concerned the user is about the event heat corresponding to the dimensions, the more concerned the user is arranged in front, and the structure of the row key index is more in line with the requirement of the user; moreover, because the data are sequentially retrieved according to the sequence from left to right by the row key index query, the query speed of the Hbase data can be increased by arranging the multidimensional events according to the dimension heat of each event dimension; meanwhile, the mapping table of the Hbase is created in the Hive supporting the SQL language, so that the step of inserting operation by using the java grammar of the Hbase is omitted, the Hive table is directly operated by using the SQL sentence, and the development cost is reduced.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Fig. 3 is a schematic structural diagram of an index generating apparatus based on Hbase according to an embodiment of the present invention. Referring to fig. 3, an Hbase-based index generating device 300 may include:
an obtaining module 310, configured to obtain basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises event content in a plurality of event dimensions;
the determining module 320 is configured to determine dimension heat corresponding to each event dimension, and sequentially adjust each multidimensional event according to the dimension heat to obtain an ordered multidimensional event;
the execution module 330 is configured to count event heats of the ordered multidimensional events, and combine the ordered multidimensional events and the event heats corresponding to the ordered multidimensional events to form row key data of the Hbase; the event heat comprises the first click times of the user aiming at each ordered multidimensional event;
a generating module 340, configured to write the row key data into the Hbase to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event.
In one embodiment, the apparatus 300 further comprises:
and the query module is used for querying the first event heat corresponding to the first ordered multidimensional event according to the row key index when receiving a query instruction of the first ordered multidimensional event input by using the specified query language.
In one embodiment, the specified query language is the structured query language, SQL; the query module comprises:
the building unit is used for building a first mapping table of the Hbase by using the specified middleware embedded in the Hbase; the first mapping table is used for mapping row key data in Hbase; the specified middleware is used for providing a query interface for querying data by using Structured Query Language (SQL) for Hbase;
and the query unit is used for querying the first event heat corresponding to the first ordered multidimensional event according to the row key index mapped by the first mapping table when receiving a query instruction of the first ordered multidimensional event input by using the Structured Query Language (SQL).
In one embodiment, the determining module 320 includes:
the counting unit is used for counting second click times of the user aiming at each event dimension in a specified time period;
the first determining unit is used for determining the dimension heat degree corresponding to each event dimension according to the second click times; and the dimensionality heat and the second click frequency are positively correlated.
In one embodiment, the generation module 340 includes:
the second determining unit is used for determining the message digest algorithm value of the row key data;
and the generating unit is used for inserting the message digest algorithm value into a first mapping table of the Hbase created in advance so as to generate the line key index of the Hbase.
In one embodiment, the apparatus 300 further comprises:
the creating module is used for creating a second mapping table of the Hbase in the specified database; the second mapping table is used for mapping data in Hbase; specifying that the database can be operated by the structured query language, SQL;
writing row key data into Hbase, including:
the row key data is inserted into the second mapping table so that the row key data can be operated by the structured query language SQL.
In one embodiment, the second mapping table comprises a dimension field and a hot degree information field; the dimension field is used for mapping the multidimensional event, and the heat information field is used for mapping the heat information.
The Hbase-based index generation apparatus provided in the embodiment of the present invention can implement each process implemented in the above Hbase-based index generation method embodiment, and is not described here again to avoid repetition.
In the embodiment of the invention, the multidimensional events are sequentially adjusted according to the dimension heat of each event dimension, and then the row key data of Hbase is constructed according to the adjusted event heat of the ordered multidimensional events, so that each ordered multidimensional event in the row key index can be arranged based on the dimension heat of each event dimension, for example, the ordered multidimensional events are arranged according to the dimension heat from high to low, because the higher the dimension heat is, the more concerned the user is, the more concerned the event heat corresponding to the dimensions is, the more concerned the user is, the dimension is arranged in front, and the structure of the row key index is more in line with the requirement of the user; in addition, since the data are sequentially retrieved according to the order from left to right by the row key index, the query speed of the Hbase data can be increased by arranging the multidimensional events according to the dimension heat of each event dimension.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a network device according to an embodiment of the present invention, which can implement details of the Hbase-based index generating method performed by the network device in the above embodiment, and achieve the same effect. As shown in fig. 4, the network device 400 includes: a processor 401, a transceiver 402, a memory 403, a user interface 404, and a bus interface, wherein:
in this embodiment of the present invention, the network device 400 further includes: a computer program stored in a memory 403 and executable on a processor 401, the computer program when executed by the processor 401 performing the steps of:
acquiring basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises event content under a plurality of event dimensions;
determining the dimensionality heat corresponding to each event dimensionality, and sequentially adjusting each multi-dimensional event according to the dimensionality heat to obtain an ordered multi-dimensional event;
counting the event heat of each ordered multidimensional event, and combining the event heat corresponding to each ordered multidimensional event and each ordered multidimensional event to form line key data of Hbase; the event heat comprises the first click times of the user aiming at each ordered multidimensional event;
writing the line key data into the Hbase to generate a line key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event.
In FIG. 4, the bus architecture may include any number of interconnected buses and bridges, with one or more processors, represented by processor 401, and various circuits, represented by memory 403, being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 402 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. For different user devices, the user interface 404 may also be an interface capable of interfacing externally to a desired device, including but not limited to a keypad, display, speaker, microphone, joystick, etc.
The processor 401 is responsible for managing the bus architecture and general processing, and the memory 403 may store data used by the processor 401 in performing operations.
Optionally, the computer program when executed by the processor 401 may further implement the steps of:
and after the row key data is written into the Hbase to generate a row key index of the Hbase, when a query instruction of a first ordered multidimensional event input by using a specified query language is received, querying the first event heat corresponding to the first ordered multidimensional event according to the row key index.
Optionally, the specified query language is structured query language SQL; the computer program, when executed by the processor 401, may further implement the steps of:
establishing a first mapping table of the Hbase by using a specified middleware embedded in the Hbase; the first mapping table is used for mapping row key data in Hbase; the specified middleware is used for providing a query interface for querying data by using Structured Query Language (SQL) for Hbase;
when a query instruction of a first ordered multidimensional event input by using Structured Query Language (SQL) is received, the first event heat corresponding to the first ordered multidimensional event is queried according to the row key index mapped by the first mapping table.
Optionally, the computer program when executed by the processor 401 may further implement the steps of:
counting second click times of the user for each event dimension in a specified time period;
determining the dimensionality heat corresponding to each event dimensionality according to the second click times; and the dimensionality heat and the second click frequency are positively correlated.
Optionally, the computer program when executed by the processor 401 may further implement the steps of:
determining a message digest algorithm value of the row key data;
the message digest algorithm value is inserted into a first mapping table of the Hbase created in advance to generate a row key index of the Hbase.
Optionally, the computer program when executed by the processor 401 may further implement the steps of:
creating a second mapping table of Hbase in a specified database; the second mapping table is used for mapping data in Hbase; specifying that the database can be operated by the structured query language, SQL;
the row key data is inserted into the second mapping table so that the row key data can be operated on by the structured query language SQL.
Optionally, the second mapping table includes a dimension field and a heat information field; the dimension field is used for mapping the multidimensional event, and the heat information field is used for mapping the heat information.
In the embodiment of the invention, the row key data of Hbase is constructed by sequentially adjusting the multidimensional events according to the dimension heat of each event dimension and further according to the event heat of the adjusted ordered multidimensional events, so that each ordered multidimensional event in the row key index can be arranged based on the dimension heat of each event dimension, for example, the ordered multidimensional events are arranged according to the dimension heat from high to low, because the higher the dimension heat is, the more concerned dimension of the user is arranged in front, and the structure of the row key index is more in line with the requirement of the user; in addition, since the data are sequentially retrieved according to the order from left to right through the row key index, the query speed of the Hbase data can be increased by arranging the multidimensional events according to the dimensionality of each event dimensionality.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned Hbase-based index generation method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method of the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (8)
1. An Hbase-based index generation method is characterized by comprising the following steps:
acquiring basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises a plurality of event contents, and one event content corresponds to one event dimension;
determining dimensionality heat corresponding to each event dimensionality, and sequentially adjusting each multi-dimensional event according to the dimensionality heat to obtain an ordered multi-dimensional event; the dimension heat degree is determined according to the second click times of the user aiming at each event dimension in the specified time period;
counting the event heat of each ordered multidimensional event, and combining the event heat corresponding to each ordered multidimensional event and each ordered multidimensional event to form line key data of Hbase; the event heat comprises a first click number of the user aiming at each ordered multi-dimensional event;
writing the row key data into the Hbase to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event;
before writing the row key data into the Hbase, the method further includes:
creating a second mapping table of the Hbase in a specified database; the second mapping table is used for mapping data in the Hbase; the specified database can be operated by Structured Query Language (SQL); the second mapping table comprises a dimension field and a heat information field; the dimension field is used for mapping the multidimensional event, and the heat information field is used for mapping the heat information;
the writing the row key data into the Hbase includes:
inserting the row key data into the second mapping table so that the row key data can be operated by the structured query language SQL.
2. The method of claim 1, wherein after writing the row key data to the Hbase to generate a row key index for the Hbase, further comprising:
and when a query instruction of the first ordered multidimensional event input by using a specified query language is received, querying the heat degree of the first event corresponding to the first ordered multidimensional event according to the row key index.
3. The method of claim 2, wherein the specified query language is Structured Query Language (SQL); when a query instruction for the first ordered multidimensional event input by using a specified query language is received, querying the first event heat corresponding to the first ordered multidimensional event according to the row key index, including:
creating a first mapping table of the Hbase by using a specified middleware embedded in the Hbase; the first mapping table is used for mapping the row key data in the Hbase; the specified middleware is used for providing the Hbase with a query interface for querying data by using the Structured Query Language (SQL);
when a query instruction of the first ordered multidimensional event input by using the Structured Query Language (SQL) is received, the heat degree of the first event corresponding to the first ordered multidimensional event is queried according to the row key index mapped by the first mapping table.
4. The method according to claim 1, wherein the determining the dimension heat corresponding to each event dimension comprises:
counting second click times of the user for each event dimension in the specified time period;
determining the dimensionality heat corresponding to each event dimensionality according to the second click times; the dimensionality heat and the second click times are positively correlated.
5. The method of claim 1, wherein said writing said row key data into said Hbase to generate a row key index for said Hbase comprises:
determining a message digest algorithm value of the row key data;
inserting the message digest algorithm value into a first mapping table of the Hbase created in advance to generate a row key index of the Hbase.
6. An Hbase-based index generation apparatus, comprising:
the acquisition module is used for acquiring basic data generated in a specified time period; the basic data comprises user identifications and multidimensional events respectively corresponding to the user identifications; the multi-dimensional event comprises a plurality of event contents, and one event content corresponds to one event dimension;
the determining module is used for determining the dimension heat corresponding to each event dimension and sequentially adjusting each multi-dimensional event according to the dimension heat to obtain an ordered multi-dimensional event; the dimension heat degree is determined according to the second click times of the user aiming at each event dimension in the specified time period;
the execution module is used for counting the event heat of each ordered multi-dimensional event and combining each ordered multi-dimensional event and the event heat corresponding to each ordered multi-dimensional event to form the row key data of the Hbase; the event heat comprises a first click number of the user aiming at each ordered multidimensional event;
the generating module is used for writing the row key data into the Hbase to generate a row key index of the Hbase; the row key index is used for inquiring the first event heat corresponding to the first ordered multidimensional event according to the input first ordered multidimensional event;
the creating module is used for creating a second mapping table of the Hbase in a specified database; the second mapping table is used for mapping data in the Hbase; the specified database can be operated by Structured Query Language (SQL); the second mapping table comprises a dimension field and a heat information field; the dimension field is used for mapping the multidimensional event, and the heat information field is used for mapping the heat information;
writing the row key data into the Hbase, including:
inserting the row key data into the second mapping table so that the row key data can be operated by the structured query language SQL.
7. A network device, comprising:
a memory storing computer program instructions;
a processor that when executed implements the Hbase-based index generation method of any one of claims 1 to 5.
8. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the Hbase-based index generation method of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910917506.1A CN110704436B (en) | 2019-09-26 | 2019-09-26 | Hbase-based index generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910917506.1A CN110704436B (en) | 2019-09-26 | 2019-09-26 | Hbase-based index generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110704436A CN110704436A (en) | 2020-01-17 |
CN110704436B true CN110704436B (en) | 2022-07-19 |
Family
ID=69198137
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910917506.1A Active CN110704436B (en) | 2019-09-26 | 2019-09-26 | Hbase-based index generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110704436B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116226202A (en) * | 2023-03-14 | 2023-06-06 | 金蝶软件(中国)有限公司 | Multidimensional database query method, multidimensional database query device, computer equipment and storage medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102306176B (en) * | 2011-08-25 | 2013-09-25 | 浙江鸿程计算机系统有限公司 | On-line analytical processing (OLAP) keyword query method based on intrinsic characteristic of data warehouse |
IN2014MU00872A (en) * | 2014-03-14 | 2015-09-25 | Tata Consultancy Services Ltd | |
CN105138592B (en) * | 2015-07-31 | 2019-03-26 | 武汉虹信技术服务有限责任公司 | A kind of daily record data storage and search method based on distributed structure/architecture |
CN107220287A (en) * | 2017-04-24 | 2017-09-29 | 东软集团股份有限公司 | For the index managing method of log query, device, storage medium and equipment |
CN107239497B (en) * | 2017-05-02 | 2020-11-03 | 广东万丈金数信息技术股份有限公司 | Hot content search method and system |
CN108595668A (en) * | 2018-04-28 | 2018-09-28 | 深圳春沐源控股有限公司 | A kind of auto ordering method of commodity, device and computer readable storage medium |
CN109284351A (en) * | 2018-08-14 | 2019-01-29 | 青海大学 | A kind of data query method based on HBase database |
-
2019
- 2019-09-26 CN CN201910917506.1A patent/CN110704436B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110704436A (en) | 2020-01-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11281793B2 (en) | User permission data query method and apparatus, electronic device and medium | |
CN108363602B (en) | Intelligent UI (user interface) layout method and device, terminal equipment and storage medium | |
CN110795455B (en) | Dependency analysis method, electronic device, computer apparatus, and readable storage medium | |
CN104699718B (en) | Method and apparatus for being rapidly introduced into business datum | |
KR102097881B1 (en) | Method and apparatus for processing a short link, and a short link server | |
WO2018036549A1 (en) | Distributed database query method and device, and management system | |
EP2539832A2 (en) | Operating on time sequences of data | |
US9229961B2 (en) | Database management delete efficiency | |
US9305016B2 (en) | Efficient data extraction by a remote application | |
CN111221791A (en) | Method for importing multi-source heterogeneous data into data lake | |
CN110727663A (en) | Data cleaning method, device, equipment and medium | |
CN112434015B (en) | Data storage method and device, electronic equipment and medium | |
CN113094370B (en) | Data index construction method and device, storage medium and electronic equipment | |
CN108319608A (en) | The method, apparatus and system of access log storage inquiry | |
CN109388659B (en) | Data storage method, device and computer readable storage medium | |
CN111221785A (en) | Semantic data lake construction method of multi-source heterogeneous data | |
CN112905600A (en) | Data query method and device, storage medium and electronic equipment | |
CN113051460A (en) | Elasticissearch-based data retrieval method and system, electronic device and storage medium | |
US20210042302A1 (en) | Cost-based optimization for document-oriented database queries | |
CN110704436B (en) | Hbase-based index generation method and device | |
CN108182204A (en) | The processing method and processing device of data query based on house prosperity transaction multi-dimensional data | |
CN112988798B (en) | Log processing method, device, equipment and medium | |
CN110704472A (en) | Data query statistical method and device | |
CN113297266B (en) | Data processing method, device, equipment and computer storage medium | |
CN104123329A (en) | Search method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |