CN117370349A

CN117370349A - Index storage method, index query method, index storage device, index query equipment and index medium

Info

Publication number: CN117370349A
Application number: CN202311312218.6A
Authority: CN
Inventors: 蒋文伟; 汪磊; 赵荣生; 朱一飞; 傅星楠; 孙梓涵; 李垚周
Original assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Current assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2024-01-09

Abstract

The application provides a storage method, a query device and a query medium for indexes, wherein the storage method for the indexes comprises the following steps: generating derived real-time indicators based on the atomic real-time indicators; determining service line identifications and index storage strategies associated with the derived real-time indexes; determining a target storage sub-library from the real-time index library based on the service line identification; and storing the atomic real-time index and the derived real-time index into a target storage sub-library based on the index storage strategy. The method and the device can shorten the query time of the real-time index and improve the query efficiency of the real-time index.

Description

Index storage method, index query method, index storage device, index query equipment and index medium

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to a storage method, a query method, a device, equipment and a medium for indexes.

Background

The data index (also called index) obtains a summarized result through statistical analysis of the data, and quantifies the achievement in enterprise operation management, so that the business objective can be described, measured and disassembled, and the data has practical value. Therefore, the enterprise may make index decisions for the business based on the index.

Currently, the types of indicators include: offline metrics and real-time metrics. As shown in fig. 1a, the offline index is managed by performing data cleaning and calculation processing on a batch of data accumulated in a data bin table by an offline calculation engine (such as spark, etc.), and then storing the offline index in a wide table structure in a database (such as hive, clickhouse and impala, etc.) supporting a column storage mode. The broad table in fig. 1a includes three types of fields, primary key, dimension, and index, where primary key is user ID, dimension is time dimension and gender dimension, index is access duration, number of accesses, and others. Since the column storage is to store a certain column of data in the offline indicator together, the subsequent reading performance is superior to the full reading performance when the specified column of data is read. For example, when the access time sequence is required to be queried, the index database only needs to return the index data of the access time sequence without reading the stored full index, so that the query speed and efficiency of the index can be improved.

As shown in fig. 1b, the real-time index is managed by performing data cleansing and calculation processing on the message queue by a real-time calculation engine such as a Flink, etc., to generate a real-time index, and then storing the real-time index in a database (such as kafka) in a wide table structure. However, the generation process of the real-time index and the off-line index is different, specifically, the real-time index calculation is to calculate one real-time data from one real-time data, so that the column storage cannot be performed in the storage implementation principle, because the column storage needs to accumulate a batch of data to combine and store the same fields in different records. Therefore, when a user queries the real-time index, the real-time calculation engine needs to read all data records related to the real-time index to be queried, and then filters out target fields from all read data to obtain the real-time index queried by the user.

Disclosure of Invention

The method, the device, the equipment and the medium for storing the index can shorten the query time of the real-time index and improve the query efficiency of the real-time index.

In a first aspect, the present application provides a method for storing an index, including:

generating derived real-time indicators based on the atomic real-time indicators;

determining service line identifiers and index storage strategies associated with the derived real-time indexes;

determining a target storage sub-library from a real-time index library based on the service line identification;

and storing the atomic real-time index and the derived real-time index into the target storage sub-library based on the index storage strategy.

In some optional implementations, the storing the atomic real-time index and the derived real-time index into the target storage sub-library based on the index storage policy includes:

obtaining a partition flow table corresponding to the index storage strategy;

determining a conditional expression corresponding to a partition field in the partition flow table;

pushing down the conditional expression to a programming language transcoding module of the real-time computing engine based on a pushing down mechanism of the real-time computing engine so as to convert the conditional expression into a storage rule code which can be identified by the real-time computing engine through the programming language transcoding module;

And storing the atomic real-time index and the derived real-time index into the target storage sub-library based on the storage rule code.

In some optional implementations, the target storage sub-library includes at least one storage space, and the storing the atomic real-time index and the derived real-time index into the target storage sub-library based on the storage rule code includes:

based on the storage rule code, determining a first target storage space corresponding to the atomic real-time index and a second target storage space corresponding to the derived real-time index;

and storing the atomic real-time index into the first target storage space, and storing the derived real-time index into the corresponding second target storage space.

In some alternative implementations, the method further includes:

storing the partition flow table and the mapping relation between the atomic real-time index and the first target storage space as metadata of the atomic real-time index;

and storing the partition flow table and the mapping relation between the derived real-time index and the second target storage space as metadata of the derived real-time index.

In some optional implementations, when the index storage policy is the first storage policy, the obtaining a partition flow table corresponding to the index storage policy includes:

acquiring a first partition flow table corresponding to the first storage strategy;

the first partition flow table at least comprises a first partition field and a first conditional expression corresponding to the first partition field, wherein the first conditional expression is determined based on an atomic real-time index identifier, a derived real-time index identifier and a preset storage space, and the preset storage space is at least two partitions in the target storage sub-library.

In some alternative implementations, the first conditional expression is: when the condition meets the atomic real-time index identifier, returning a result to be a first preset storage space; when the condition meets the derived real-time index identifier, returning a result to be a second preset storage space; and when all the conditions are not met, returning the result to the third preset storage space and ending.

In some optional implementations, when the index storage policy is the second storage policy, the obtaining a partition flow table corresponding to the index storage policy includes:

Acquiring a second partition flow table corresponding to the second storage strategy;

the second partition flow table at least comprises a second partition field and a second conditional expression corresponding to the second partition field, the second conditional expression is determined based on a first modulus value obtained by carrying out hash processing and modulo operation on the atomic real-time index identifier, a second modulus value obtained by carrying out hash processing and modulo operation on the derived real-time index identifier, and a preset storage space, and the preset storage space is at least two partitions in the target storage sub-library.

In some alternative implementations, the second conditional expression is: when the condition meets the condition that the modulus value is equal to the first numerical value, returning a result to be a first preset storage space; when the condition meets the condition that the modulus value is equal to the second numerical value, returning a result to be a second preset storage space; and when all the conditions are not met, returning the result to the third preset storage space and ending.

In some optional implementations, when the index storage policy is the third storage policy, the obtaining a partition flow table corresponding to the index storage policy includes:

acquiring a third partition flow table corresponding to the third storage strategy;

The third partition flow table at least comprises a third partition field and a third conditional expression corresponding to the third partition field, wherein the third conditional expression is determined based on a first modulus value obtained by carrying out hash processing and modular operation on the atomic real-time index identifier, a second modulus value obtained by carrying out hash processing and modular operation on the derivative real-time data index identifier, a preset storage space and a preset flow value of the preset storage space, and the preset storage space is at least two partitions in the target storage sub-library.

In some alternative implementations, the third conditional expression is: when the condition meets the condition that the modulus value is equal to the first numerical value, returning a result to be a first preset storage space and a preset flow value of the first preset storage space; when the condition meets the condition that the modulus value is equal to a second numerical value, returning a result to a second preset storage space and a preset flow value of the second preset storage space; and when all the conditions are not met, returning the result to the third preset storage space and the preset flow value of the third preset storage space and ending.

In some optional implementations, before the storing the atomic real-time index and the derived real-time index into the target storage sub-library based on the storage rule code, the method further includes:

Detecting a current flow value of each preset storage space in response to the storage rule code corresponding to the third storage strategy;

and expanding the preset flow value of the preset storage space in response to the fact that the current flow value of any preset storage space is larger than the preset flow value.

In some optional implementations, the storing the atomic real-time index and the derived real-time index into the target storage sub-library includes:

acquiring a data storage structure corresponding to the target storage sub-library;

and storing the atomic real-time index and the derived real-time index into the target storage sub-library according to the data storage structure.

In some alternative implementations, the generating the derived real-time index based on the atomic real-time index includes:

acquiring processing parameters corresponding to the atomic real-time index;

and generating the derived real-time index based on the atomic real-time index and the processing parameter.

In a second aspect, the present application provides a query method for an index, including:

responding to real-time index query operation sent by a user, and acquiring real-time index identification to be queried;

determining a target storage position of the real-time index to be queried in a real-time index library based on the real-time index identifier to be queried;

And acquiring real-time indexes to be queried from the real-time index library based on the target storage position, and displaying the real-time indexes to be queried to the user.

In some optional implementations, the determining, based on the real-time index identifier to be queried, a target storage location of the real-time index to be queried in a real-time index library includes:

determining metadata corresponding to the real-time index identifier to be queried;

and determining a target storage position of the real-time index to be queried in a real-time index library based on the metadata.

In some optional implementations, the determining, based on the metadata, a target storage location of the real-time index to be queried in a real-time index library includes:

determining a target partition flow table based on the metadata;

transcoding a conditional expression corresponding to a partition field in the target partition flow table to convert the conditional expression into an identifiable storage rule code of a real-time computing engine;

and determining a target storage position of the real-time index to be queried in a real-time index library based on the storage rule code.

Determining a mapping relation between the real-time index to be queried and a storage space based on the metadata;

and determining a target storage position of the real-time index to be queried in a real-time index library according to the mapping relation between the real-time index to be queried and the storage space.

In some optional implementations, the acquiring, based on the target storage location, the real-time index to be queried from the real-time index library includes:

acquiring a plurality of discrete target fields from the real-time index library based on the target storage location;

and obtaining the real-time index to be queried based on the plurality of target discrete fields.

In a third aspect, the present application provides a storage device for an index, including:

the index generation module is used for generating derived real-time indexes based on the atomic real-time indexes;

the first determining module is used for determining service line identifiers and index storage strategies associated with the derived real-time indexes;

the second determining module is used for determining a target storage sub-library from the real-time index library based on the service line identification;

and the index storage module is used for storing the atomic real-time index and the derived real-time index into the target storage sub-library based on the index storage strategy.

In some optional implementations, the index storage module includes:

the acquisition unit is used for acquiring the partition flow table corresponding to the index storage strategy;

a first determining unit, configured to determine a conditional expression corresponding to a partition field in the partition flow table;

the conversion unit is used for pushing the conditional expression down to a programming language transcoding module of the real-time computing engine based on a pushing mechanism of the real-time computing engine so as to convert the conditional expression into a storage rule code which can be identified by the real-time computing engine through the programming language transcoding module;

and the storage unit is used for storing the atomic real-time index and the derived real-time index into the target storage sub-library based on the storage rule code.

In some alternative implementations, the target storage sub-library includes at least one storage space, and the storage unit is specifically configured to:

based on the storage rule code, determining a first target storage space corresponding to the atomic real-time index and a second target storage space corresponding to the derived real-time index; and storing the atomic real-time index into the first target storage space, and storing the derived real-time index into the corresponding second target storage space.

In some alternative implementations, the storage unit is further configured to:

In some optional implementations, when the index storage policy is the first storage policy, the acquiring unit is specifically configured to:

In some optional implementations, when the index storage policy is the second storage policy, the obtaining unit is specifically configured to:

In some optional implementations, when the index storage policy is the third storage policy, the obtaining unit is specifically configured to:

In some optional implementations, the index storage module further includes:

the detection unit is used for detecting the current flow value of each preset storage space when the storage rule code corresponds to the third storage strategy;

and the capacity expansion unit is used for expanding the preset flow value of the preset storage space in response to the fact that the current flow value of any preset storage space is larger than the preset flow value.

In some optional implementations, the index storage module is specifically configured to: acquiring a data storage structure corresponding to the target storage sub-library; and storing the atomic real-time index and the derived real-time index into the target storage sub-library according to the data storage structure.

In some optional implementations, the index generating module is specifically configured to: acquiring processing parameters corresponding to the atomic real-time index; and generating the derived real-time index based on the atomic real-time index and the processing parameter.

In a fourth aspect, the present application provides an index query device, including:

the identification acquisition module is used for responding to real-time index query operation sent by a user and acquiring real-time index identification to be queried;

The third determining module is used for determining a target storage position of the real-time index to be queried in a real-time index library based on the real-time index identifier to be queried;

and the index acquisition module is used for acquiring the real-time index to be queried from the real-time index library based on the target storage position and displaying the real-time index to be queried to the user.

In some optional implementations, the third determining module includes:

the second determining unit is used for determining metadata corresponding to the real-time index identifier to be queried;

and the third determining unit is used for determining the target storage position of the real-time index to be queried in the real-time index library based on the metadata.

In some optional implementations, the third determining unit is specifically configured to: determining a target partition flow table based on the metadata; transcoding a conditional expression corresponding to a partition field in the target partition flow table to convert the conditional expression into an identifiable storage rule code of a real-time computing engine; and determining a target storage position of the real-time index to be queried in a real-time index library based on the storage rule code.

In some optional implementations, the third determining unit is specifically configured to: determining a mapping relation between the real-time index to be queried and a storage space based on the metadata; and determining a target storage position of the real-time index to be queried in a real-time index library according to the mapping relation between the real-time index to be queried and the storage space.

In some optional implementations, the index obtaining module is specifically configured to: acquiring a plurality of discrete target fields from the real-time index library based on the target storage location; and obtaining the real-time index to be queried based on the plurality of target discrete fields.

In a fifth aspect, the present application provides an electronic device, including:

a processor and a memory, the memory being used for storing a computer program, the processor being used for calling and running the computer program stored in the memory to execute the method for storing the index as described in the first aspect embodiment or the method for querying the index as described in the second aspect embodiment.

In a sixth aspect, the present application provides a computer-readable storage medium storing a computer program, where the computer program causes a computer to execute a method for storing an index as described in an embodiment of the first aspect, or a method for querying an index as described in an embodiment of the second aspect.

In a seventh aspect, the present application provides a computer program product comprising program instructions which, when run on an electronic device, cause the electronic device to perform a method for storing an index as described in an embodiment of the first aspect, or a method for querying an index as described in an embodiment of the second aspect.

The technical scheme disclosed by the embodiment of the application has at least the following beneficial effects:

generating a derived real-time index according to the atomic real-time index, determining a service line identifier and an index storage strategy associated with the derived real-time index, determining a target storage sub-library from the real-time index library based on the service line identifier, and storing the atomic real-time index and the derived real-time index into the target storage sub-library based on the index storage strategy. According to the method and the system, the atomic real-time index and the derived real-time index are stored into the real-time index library based on the service line identification and the index storage strategy associated with the derived index, so that the query time of the real-time index is shortened, and the query efficiency of the real-time index is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1a is a schematic diagram of offline indicator storage in the related art;

FIG. 1b is a schematic diagram of real-time index storage in the related art;

FIG. 2 is a flow chart of a method for storing an index according to an embodiment of the present application;

FIG. 3 is a schematic diagram of generating derived real-time metrics based on atomic real-time metrics and processing parameters according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating another method for storing an index according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of storing real-time metrics provided in an embodiment of the present application;

FIG. 6 is a flowchart of a query method for an index according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a visual query interface provided in an embodiment of the present application;

FIG. 8 is a schematic block diagram of an index storage device provided in an embodiment of the present application;

FIG. 9 is a schematic block diagram of an index query device provided in an embodiment of the present application;

FIG. 10 is a schematic block diagram of an electronic device provided by an embodiment of the present application;

fig. 11 is a schematic block diagram of a computer-readable storage medium provided in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Considering that when a user inquires the real-time index at present, the real-time calculation engine needs to read out data records related to the real-time index to be inquired, and then filters or screens out target fields from all the read data to obtain the real-time index inquired by the user. However, this way of querying the real-time index requires a lot of time, resulting in low efficiency of querying the real-time index.

In order to solve the technical problems, the invention concept of the application is as follows: the business line identification and the index storage strategy associated with the real-time index are determined, so that the target storage sub-library is determined from the real-time index library according to the business line identification, and the atomic real-time index and the derived real-time index are stored into the target storage sub-library according to the index storage strategy, so that when a user inquires the real-time index, the inquiry time of the real-time index can be shortened, and the inquiry efficiency of the real-time index is improved.

Before describing embodiments of the present application in detail, the terms and terminology involved in the embodiments of the present application will be described first, and the terms and terminology involved in the embodiments of the present application are suitable for the following explanation:

the index is as follows: refers to data that measures a certain attribute of a specified target for data statistics. Such as in a music scenario, the number of songs a user listens to each day will be defined as an index: the songs are listened to daily.

spark: the open source project for Apache is a fast and versatile computational engine designed for large-scale data processing. Spark and Hadoop's MapReduce computational framework is similar, but has characteristics such as scalable, based on memory calculation for MapReduce to can directly read the advantage of any format data on the Hadoop, more high-efficient and delay is lower when carrying out batch processing.

Hadoop: the distributed system infrastructure can enable a user to develop a distributed program without knowing the details of a distributed bottom layer, and fully utilize the power of a cluster to perform high-speed operation and storage. The most core design of the hadoop framework is HDFS and MapReduce, the HDFS provides storage for massive data, and the MapReduce provides calculation for massive data.

HDFS: english is fully called: hadoop Distributed File System, chinese name: hadoop distributed file system. HDFS refers to a distributed file system (Distributed File System) designed to fit on general purpose hardware (commodity hardware). HDFS is a highly fault tolerant system that provides high throughput data access.

MapReduce: the distributed operation framework integrates business logic codes written by users and self-contained default components into a complete distributed operation program, and the complete distributed operation program runs on a hadoop cluster.

Dimension: is a dimension of data statistics such as per day statistics, per hour statistics, etc.

Column store (also known as column store): in a database storage mode, each record of a general database is stored together, and a column memory stores certain column data in the record together, so that the performance is better when the specified column data is read.

Flink: the open source processing framework is a real-time computing processing engine. The Flink executes any stream data program in a data parallel and pipeline manner, and the pipeline runtime system of the Flink can execute batch processing and stream processing programs. Furthermore, the runtime itself of the flank also supports the execution of the iterative algorithm.

Kafka: the open source stream processing platform is a high throughput distributed publish-subscribe messaging system. The purpose of Kafka is to unify on-line and off-line message processing through the Hadoop parallel loading mechanism, and also to provide real-time messages through the clusters. Kafka has the characteristics of high throughput, support of partitioning messages through a Kafka server and a consumer cluster, support of Hadoop parallel data loading and the like.

Batch flow integration: a set of codes is developed in the data processing, so that the set of codes can process both offline data and real-time data. That is, by means of a data model and an SQL sentence, batch data (offline data) and stream data (real-time data) can be accessed simultaneously, a unified query outlet is provided for data application, and minute-level pure real-time data analysis or batch stream fusion analysis can be performed, so that monitoring of enterprises on real-time data or fusion analysis of historical data and real-time data can be supported.

Atomic index: is an index of no longer splitting in service definition, and nouns with definite service meaning such as login times, song listening quantity and the like. In other words, an atomic index refers to a basic index without dimensions.

Partition flow table: the method refers to a partition technology for fusing offline data on the basis of a real-time flow table so as to realize the partition of the flow table. Where the flow table is a table into which real-time flow data is mapped, and the partition refers to a partition technique on the offline side, such as a partition at the directory level by HIVE, or the like.

SQL: (Structured Query Language ) is a special purpose programming language, a database query and programming redundancy, for storing data and querying, updating and managing relational database systems. And is also an extension of the database script file.

Open source Calcite: the open source SQL parsing tool can parse various SQL sentences into abstract syntax trees (Abstract Syntax Tree, AST), and then can embody algorithms and relations to be expressed in SQL in specific codes through operating the AST.

AST: is a tree representation of the abstract syntax structure of the source code, each node on the tree representing a structure in the source code.

kafka topic: topic is the topic in kafka each message must be written into one topic. Topic is a logical unit of messages, corresponding to mailboxes in the post office. It may be considered a class-based message queue, such as an order system, may have an order topic, an inventory topic, and so on. Each topic may be divided into a number of partitions (partitions), each of which is in turn an ordered sequence of a stack of messages.

Partition, partition. Typically a topic can be divided into a number of parts, each of which is an ordered, immutable set of messages and can be distributed over different stokers. Within one topic, the messages for each part are unordered, but the messages inside the part are ordered. The Kafka realizes the load balancing of the messages and improves the throughput by means of the partition.

Metadata (Metadata): also called intermediate data and relay data, which are data describing data, mainly describing data attribute information, and are used to support functions such as indicating storage location, historical data, resource searching, file recording, etc.

Having introduced some concepts related to the embodiments of the present application, a detailed description of a method, an apparatus, a device, and a medium for storing and querying an index provided by the embodiments of the present application is provided below with reference to the accompanying drawings.

Fig. 2 is a flow chart of a method for storing an index according to an embodiment of the present application. The method and the device are suitable for the storage scene of the real-time index, and the storage method of the index can be executed by the storage device of the index. The storage means of the index may consist of hardware and/or software and may be integrated in the electronic device. In this embodiment of the present application, the electronic device may be any hardware device with a data processing function, for example, a smart phone, a tablet computer, a palmtop computer, a notebook computer, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), and a wearable device, which are not specifically limited in the type of electronic device.

As shown in fig. 2, the method may include the steps of:

s101, generating a derived real-time index based on the atomic real-time index.

Alternatively, the present application may obtain multiple atomic indicators from an upstream data source as atomic real-time indicators. Thereafter, at least one derived real-time index is generated based on each of the atomic real-time indexes. The above plural may be understood as two or more.

The atomic real-time index is an atomic index which can count the service activity state according to the actual service. Wherein, when creating the atomic index associated with the business activity, the function can be configured based on the atomic index provided by the electronic equipment. Specifically, the user may create an atomic index in the index factory through the visual index configuration interface, or may also create an atomic index in the index factory by inputting the create atomic index SQL language, or the like, which does not impose any limitation in the present application.

For example, assuming that the actual service is a music service, creating an atomic indicator associated with the music service activity may be selected as the number of songs listened to, the number of logins, the duration of the songs listened to, etc.

Also, after the atomic indicator is created, the present application may obtain upstream data from an upstream data source and obtain the atomic indicator based on the upstream data. Optionally, the obtaining the atomic indicator based on the upstream data is specifically selecting the atomic indicator from the upstream data.

The upstream data source may be understood as a device or apparatus that provides index data; accordingly, the upstream data provided by the upstream data source may be selected as log data or the like. Wherein the log data may be, but is not limited to: server side logs, traffic logs, client side logs, etc.

In some alternative embodiments, it is contemplated that when the user creates an atomic real-time index, it is also optional to create a derived real-time index corresponding to the atomic real-time index. The derived real-time index refers to a derived index capable of reflecting the service activity state.

And, when creating the derived real-time index corresponding to the atomic real-time index, the user may select or choose the processing parameter in the visual creation interface. The processing parameters can be understood as the parameter information of deriving the real-time index generated by processing and cleaning the atomic real-time index.

In the present application, the above processing parameters include: a time parameter and at least one modifier.

The above time parameter may be understood as a time period, which is used to determine a time range that needs statistics, such as one natural day, or three natural days.

The above modifiers are understood to limit the scope of business activities such as male, female, on-line, off-line, etc.

Therefore, the method generates at least one derived real-time index based on each obtained atomic real-time index, and can firstly obtain the processing parameter corresponding to each atomic real-time index, and then generate at least one derived real-time index based on each atomic real-time index and the processing parameter.

In some alternative embodiments, as shown in fig. 3, at least one derived real-time index is generated based on each atomic real-time index and the processing parameter corresponding to the atomic real-time index, and optionally, the processing logic is implemented according to the index of one atomic real-time index+time parameter+one or more modifiers. That is, a derived real-time index may include one or more modifiers, and a derived real-time index is uniquely attributed to an atomic real-time index.

Correspondingly, the SQL corresponding to the index processing logic is expressed as < Group by time, modifier 1, … … and modifier k >. Wherein k is a positive integer greater than or equal to 1.

For example, assuming that a certain atomic real-time index is login times, a time parameter corresponding to the login times is 1 day, and a modifier is male, based on the index processing logic of the atomic real-time index+the time parameter+one or more modifiers, a derived real-time index is generated as daily male login times.

For another example, assuming that a certain atomic real-time index is the number of songs to be listened to, the time parameter corresponding to the number of songs to be listened to is one week, and the modifier is female, and 90, then the number of songs to be listened to for female after deriving the real-time index is 90 per week is generated based on the index processing logic of the atomic real-time index+the time parameter+one or more modifiers.

It can be understood that the derived real-time index in the present application is composed of three elements, namely an atomic real-time index, a time period and a modifier, and is used for reflecting the numerical expression of the atomic real-time index under specific time and service conditions and reflecting the service state of a certain service activity. Such as the number of male songs per day, the number of female logins per week, etc.

S102, determining service line identifiers and index storage strategies associated with the derived real-time indexes.

Where a service line is understood as a service for a certain class of products. For example, XX music is a business line for music class products. As another example, XX mailboxes are business lines for information delivery type products.

In this application, service line identification is understood as being used for uniquely identifying the corresponding service line. Such as a service line name or a service line ID, etc., to which the present application does not impose any limitation.

In consideration of the fact that before the derived real-time index is generated based on the atomic real-time index, a user can create a derived real-time index capable of reflecting the service activity state in addition to the derived real-time index creation function provided by the electronic device based on the service activity, so that the generated derived real-time index can be stored in the real-time index library based on the configured index storage strategy.

The storage strategy of the index corresponding to the derived real-time index can be understood as a storage rule or a storage mode for storing the derived real-time index to the actual bottom layer of the real-time index library, and the storage strategy corresponding to the derived real-time index is not limited in any way. Wherein, when the real-time index library is kafka, then the real-time index library real-time base layer stores kafka topic.

In this application, creating the derived real-time index specifically includes: configuration business logic and configuration basic information. The configuration service logic is a configuration time period, modifier and associated atomic real-time index. The configuration basic information is description information of the data layer, the business process, the derived real-time index identifier and the derived real-time index to which the configuration belongs.

The business process is the business process to which the assigned raw real-time index belongs and is used for determining the business activity type.

The derived real-time index identifier is understood to be used to uniquely identify the corresponding derived real-time index. In the present application, the derived real-time index identifier may be a derived real-time index name. Wherein, the derived real-time index name can be selected as a Chinese name and/or an English name.

Therefore, the method and the device can obtain the configuration information of the derivative real-time index from the index management module, and obtain the service line identification and the index storage strategy associated with the derivative real-time index from the configuration information.

The index management module may be understood as a device or process responsible for registering metadata, atomic indexes, and derived indexes of the management index. Wherein, the metadata of the index refers to all data related to the index. Such as definition information of the index, calculation logic of the index, storage location of the index, and the like.

The indexes comprise offline indexes and real-time indexes, so that the atomic indexes in the index management module are atomic real-time indexes and/or atomic offline indexes, and the corresponding derived indexes are derived real-time indexes and/or derived offline indexes.

S103, determining a target storage sub-library from the real-time index library based on the service line identification.

And S104, storing the atomic real-time index and the derived real-time index into a target storage sub-library based on the index storage strategy.

In the present application, a real-time index repository is understood as a real-time index repository, and the real-time index repository may be selected from Kafka, kafka clusters, or other real-time databases, which the present application does not limit in any way.

The distinction is made in view of the traffic lines associated with each derived real-time indicator. Such as the service line associated with one derived real-time indicator being service line XX, the service line associated with another derived real-time indicator being service line YY, etc.

Therefore, the method and the device can realize that one business line can correspond to one storage sub-library by dividing the real-time index library into a plurality of sub-libraries which do not share data based on the business line related with the derived real-time index. Therefore, the atomic real-time index and the derived real-time index corresponding to the same service line can be stored in the same storage sub-library according to the index storage strategy, so that the real-time index is stored in a partition mode, when the real-time index is inquired later, the real-time index can be determined to be stored in the storage sub-library of the real-time index library based on the metadata of the real-time index, and further the required real-time index is quickly read from the storage sub-library, thereby reducing the reading quantity of data and improving the inquiring speed when the real-time index is inquired.

In this application, when the real-time index library is Kafka, a plurality of sub-libraries into which Kafka is divided may be selected as the sub-libraries into which Kafka is divided into Kafka1, kafka2, kafkan, etc., where n is a positive integer greater than 1.

In some alternative embodiments, the dimension information between derived real-time metrics corresponding to the same business line is different, such as a time period and/or modifier. Therefore, the storage sub-library corresponding to one service line is split into a plurality of storage spaces according to the dimension of the real-time index. For example, a certain memory sub-bank KafkaXX is split into Topic1, topic2, topic3, and the like. And further, the derived real-time index is stored in the corresponding storage space according to the dimension, so that the derived real-time index can be scattered in different storage spaces of the corresponding sub-libraries in the real-time index library according to the dimension, and the aim of storing the columns of the real-time index is fulfilled.

Considering that the real-time index of the atom has no dimension information, when the storage sub-library corresponding to one service line is split into a plurality of storage spaces according to the dimension, the storage space without the dimension is additionally split. Furthermore, the real-time atomic index corresponding to the service line can be stored in the storage space without dimension. Thus, the user can conveniently inquire the atomic real-time index associated with any derivative real-time index later.

In some optional embodiments, when the atomic real-time index and the derived real-time index are stored, the data storage structure corresponding to the target storage sub-library is first obtained. And then, according to the data storage structure, storing the atomic real-time index and the derived real-time index into a target storage sub-library. The method has the advantages that the real-time index storage structure is optimized, and the real-time index with the same statistical dimension can be stored in one storage space of the corresponding target storage sub-library, so that the speed of the user for inquiring the real-time index later can be improved.

Alternatively, the data storage structure in the present application may be as shown in table 1 below:

TABLE 1

Based on table 1, it can be seen that the real-time index of the same statistical dimension is stored in one storage space of the corresponding target storage sub-library, specifically, the real-time index and the corresponding dimension are all stored in Value in map form, and the record is written in the record in json format into the storage space Topic corresponding to Partition (Partition).

In view of storing the atomic real-time index and derived real-time index, the user may only need to query any of the derived real-time indexes. Therefore, the method and the device can selectively store the derived real-time index only and not store the atomic real-time index corresponding to the derived real-time index, so that the data storage quantity is reduced, and the data storage cost is reduced.

According to the method for storing the index, the derived real-time index is generated according to the atomic real-time index, then the service line identification and the index storage strategy associated with the derived real-time index are determined, then the target storage sub-library is determined from the real-time index library based on the service line identification, and then the atomic real-time index and the derived real-time index are stored into the target storage sub-library based on the index storage strategy. According to the method and the system, the atomic real-time index and the derived real-time index are stored into the real-time index library based on the service line identification and the index storage strategy associated with the derived index, so that the query time of the real-time index is shortened, and the query efficiency of the real-time index is improved.

On the basis of the above embodiment, in the following, with reference to fig. 4, the storage of the atomic real-time index and the derived real-time index into the target storage sub-library in the present application is further explained.

As shown in fig. 4, the step S104 may include the following steps:

s104-1, obtaining a partition flow table corresponding to the index storage strategy.

In the application, the partition flow table is a flow table mapped based on a plurality of storage spaces, and is exposed outwards in the form of a flow table, so that when the partition flow table is used, read and written in later, the identification information of the partition flow table is used. Wherein the identification information of the partition flow table is understood to be used for uniquely identifying the corresponding partition flow table. Such as partition flow table names, etc., which are not to be limiting in any way.

In some alternative embodiments, when the user configures the atomic real-time index and the derived real-time index, the real-time index is also configured as a flow table. Therefore, the method and the device can automatically execute the partition flow table creation operation according to the configuration information of the user, and set corresponding index storage strategies for partition fields in the created partition flow table.

When creating the partition flow table and setting the index storage strategy for the partition field in the partition flow table, the optional user is realized through generalized SQL grammar, which is not limited in this application.

Illustratively, the partition flow table created is partition flow table a, and the SQL statement that creates the partition flow table a may be selected as:

wherein, id represents the identity information of the partition flow table A, and the data type of the id is an integer type; name represents partition flow table a name information, and the data type of name is a character string type; the action represents a partition field of the partition flow table A, and the partition field can be selected as a real-time index identifier; partition by action indicates partition logic, i.e., an index storage policy to which the partition field corresponds, and the partition logic may be implemented by an SQL conditional expression (case write syntax). For example, the partition logic of the partition flow table a is write action=play the partial 1 write action=view the partial 2 else partial 3 end. This exemplary partition logic can be understood as writing topic1 of kafka when action= 'play', writing topic2 of kafka when action= 'view', and otherwise writing topic3.

Kafka=127.0.0.1, and xxx=xxx indicates an index library in which the index is stored, and boottrap=127.0.0.1 indicates actual storage location information of the index, and xxx=xxx indicates other storage parameters.

Therefore, after the derived real-time index is generated, the partition flow table corresponding to the index storage strategy can be obtained based on the index storage strategy associated with the derived real-time index.

The index storage policy in the present application is a first storage policy, a second storage policy, or a third storage policy. Wherein the first storage policy is specifically a specified storage space (Topic) policy; the second storage strategy is specifically a hash modular strategy; the third storage policy is specifically a dynamic storage space Topic policy.

Correspondingly, the obtaining the partition flow table corresponding to the index storage strategy may include the following cases:

in the first case of the first type of case,

when the index storage strategy associated with the derived real-time index is determined to be the first storage strategy, the method and the device acquire the partition flow table corresponding to the index storage strategy, specifically acquire the first partition flow table corresponding to the first storage strategy.

The first partition flow table at least comprises a first partition field and a first conditional expression corresponding to the first partition field, wherein the first conditional expression is determined based on an atomic real-time index identifier, a derived real-time index identifier and a preset storage space, and the preset storage space is at least two partitions in a target storage sub-library.

In some alternative embodiments, when the first partition flow table is built, the first conditional expression in the SQL language may be selected to be when indexName in ('index name 1', 'index name 2') the toship 1 when xxx then topic else toship 3 end. Where index name represents the real-time index name of the index. topic1, topic2, and topic3 represent preset storage spaces.

The first conditional expression described above can be understood as: when the condition meets the atomic real-time index mark, returning a result to be a first preset storage space; when the condition meets the derived real-time index identification, returning a result to be a second preset storage space; and when all the conditions are not met, returning the result to the third preset storage space and ending.

That is, the present application directly specifies to which storage space the real-time index name should be mapped through the above expression, enabling the index storage policy to be configured in an exhaustive manner.

In the second case of the two-way valve,

when the index storage strategy associated with the derived real-time index is determined to be the second storage strategy, the method and the device acquire the partition flow table corresponding to the index storage strategy, specifically acquire the second partition flow table corresponding to the second storage strategy.

The second partition flow table at least comprises a second partition field and a second conditional expression corresponding to the second partition field, wherein the second conditional expression is determined based on a first module value obtained by carrying out hash processing and modulo operation on the atomic real-time index identifier, a second module value obtained by carrying out hash processing and modulo operation on the derived real-time index identifier and a preset storage space, and the preset storage space is at least two partitions in the target storage sub-library.

Illustratively, when the second partition flow table is established, the second conditional expression in the SQL language is when hash (index name)% 3= 0then topic1 when hash (index name)% 3= 1then topic2 else topic3 end. Where index name represents the real-time index name of the index. topic1, topic2, and topic3 represent preset storage spaces.

The second conditional expression can be understood as that each real-time index identifier index name is hashed by a character hash generation function (String hash code) of java to generate a hash value of integer int type. And then, performing modular operation on the hash value corresponding to each real-time index identifier to obtain a modular value corresponding to each real-time index. It should be noted that, the number of modes to be taken in the present application should be equal to the number of configurations of the preset storage space Topic, and the number of configurations of the preset storage space Topic is an adjustable parameter.

Further, when the modulus value=0, a real-time index corresponding to the modulus value is assigned to topic0; when the module value=1, the real-time index corresponding to the module value is allocated to topic1, and so on, and all the real-time indexes are stored in the corresponding preset storage space. Wherein, all real-time indexes are specifically atomic real-time indexes and derived real-time indexes.

That is, the second conditional expression is that when the condition satisfies the modulus value equal to the first numerical value, the return result is the first preset storage space; when the condition meets the condition that the modulus value is equal to the second numerical value, returning a result to be a second preset storage space; and when all the conditions are not met, returning the result to the third preset storage space and ending.

The benefits of this arrangement are: according to the random storage principle, the number of real-time indexes randomly allocated to each preset storage space topic is approximately the same, so that uniform distribution is realized.

For example, assuming that 3 preset storage spaces topic are registered, and 15 real-time indexes are provided, after hash processing and modulo 3 operation are performed on the 15 real-time index names respectively, 5 different real-time indexes are basically allocated in each preset storage space topic, so that when the real-time index data volume is larger, the real-time index data is more uniformly stored.

In a third case of the method, the third case,

when the index storage strategy associated with the derived real-time index is determined to be the third storage strategy, the method and the device acquire the partition flow table corresponding to the index storage strategy, specifically acquire the third partition flow table corresponding to the third storage strategy.

The third partition flow table at least comprises a third partition field and a third conditional expression corresponding to the third partition field, wherein the third conditional expression is determined based on a first modulus value obtained by carrying out hash processing and modular operation on the atomic real-time index identifier, a second modulus value obtained by carrying out hash processing and modular operation on the derived real-time data index identifier, a preset storage space and a preset flow value of the preset storage space, and the preset storage space is at least two partitions in the target storage sub-library.

Illustratively, when the third partition flow table is built, the third conditional expression in the SQL language is when hash (index name)% 3=0 the next dynamic (topic 1, '10 k') when hash (index name)% 3=1 the next dynamic (topic 2, '10 k') else dynamic (topic 3, '10 k') end. Wherein, the corresponding allocated traffic size of each preset storage space is 10 kilobytes (kb). And, index name represents the real-time index name of the index. topic1, topic2, and topic3 represent preset storage spaces.

Considering that the second storage strategy can only uniformly distribute the real-time index into different preset storage spaces Topic. However, in the real-time use process, when the flow of a certain type of real-time index is unbalanced, for example, the flow of 1 real-time index in 100 real-time indexes is 100 pieces/second, the flow of the other 99 real-time indexes is 1 piece/second, and if the real-time indexes are evenly distributed to the preset storage space Topic in the real-time index library, the problem that data is inclined and even the real-time index cannot be stored in the corresponding preset storage space Topic exists in the situation.

Therefore, the flow of each preset storage space Topic is detected on the basis of the second storage strategy. When the flow of any preset storage space Topic exceeds the pre-allocated flow, the automatic capacity expansion mode is adopted to perform capacity expansion processing on the flow value of the preset storage space Topic exceeding the preset flow so as to solve the problems of inclined storage and incapability of storing of real-time index data.

In some optional embodiments, when the capacity expansion processing is performed on the flow of the preset storage space Topic exceeding the preset flow size, the capacity expansion processing may be performed according to a preset capacity expansion rule. In the present application, the preset capacity expansion rule may be any strategy or algorithm capable of realizing flow expansion, which is not limited in this application. The preset capacity expansion rule may be, for example, performing an equal capacity expansion according to a preset flow size, or may be performing a preset multiple capacity expansion on the preset flow size. Wherein, the preset multiple can be selected as 2, 3 or 4, etc., and the application does not limit the preset multiple.

That is, the third conditional expression may be understood as that when the condition satisfies the condition that the modulus value is equal to the first value, the return result is the first preset storage space and the preset flow value of the first preset storage space; when the condition meets the condition that the modulus value is equal to the second numerical value, returning the result to the second preset storage space and the preset flow value of the second preset storage space; and when all the conditions are not met, returning the result to the third preset storage space and the preset flow value of the third preset storage space and ending.

In some alternative embodiments, it is contemplated that the partition fields in the partition flow table may be updated, such as by adding fields. Therefore, in order to meet the requirement of storing the real-time index corresponding to the newly added field, the second storage policy or the third storage policy is preferably adopted in the application, so that the real-time index of different fields is stored in a preset storage space Topic, and when any real-time index is subsequently queried, the target storage space Topic for storing the real-time index can be determined based on the metadata of the real-time index. Furthermore, the real-time index to be queried can be read from the target storage space Topic, so that the data reading amount during index query can be reduced, and the query speed of the real-time index is improved.

S104-2, determining a conditional expression corresponding to the partition field in the partition flow table.

After the partition flow table is obtained, the conditional expressions corresponding to the partition fields in the partition flow table can be obtained from the SQL statement code created by the partition flow table.

It is contemplated that the partition flow table is a first partition flow table, a second partition flow table, or a second partition flow table. Therefore, the conditional expression determined in the present embodiment is a first conditional expression corresponding to the first partition flow table, or a second conditional expression corresponding to the second partition flow table, or a third conditional expression corresponding to the third partition flow table.

S104-3, pushing down the conditional expression to a programming language transcoding module of the real-time computing engine based on a pushing down mechanism of the real-time computing engine so as to convert the conditional expression into a storage rule code which can be identified by the real-time computing engine through the programming language transcoding module.

Wherein the real-time computing engine is specifically fink.

The above-mentioned programming language transcoding module is specifically an implementation manner of implementing SQL transcoding by the flink open source software, for example, the programming language transcoding module is fink kafka sink, etc., which is not limited in any way in the present application.

When the user stores the atomic real-time index and the derived real-time index into the target storage sub-library, the user can send a storage instruction to the electronic device by utilizing the SQL sentence, or can trigger the real-time index storage operation in the visual interface to send the SQL sentence storage instruction to the electronic device. Considering that the electronic device cannot understand the SQL statement stored for the real-time index input by the user, the obtained conditional expression in the partition flow table needs to be pushed down to a programming language transcoding module of the real-time computing engine link, so that the conditional expression is converted into a storage rule code which can be identified by the real-time computing engine link through the programming language transcoding module.

For example, assuming that the real-time metrics to be stored are (1, "a" play) and (2, "b", view), the user inputs the SQL statement for indicating that the real-time metrics are stored as Insert into A select id, name, action from data, to indicate that the above-mentioned (1, "a" play) and (2, "b", view) are stored into the partition flow table a. Furthermore, when the electronic device obtains the SQL sentence input by the user through SQL grammar analysis to write into the partition flow table A, the electronic device obtains the directory information of the partition flow table A from the index management module. If the partition flow table A is determined to be a flow table based on the directory information of the partition flow table A, and the partition flow table A is determined to have a partition field based on the flow table information registered by the user, and the partition field is a behavior. Further, a conditional expression corresponding to the partition field is acquired: when action=play the top 1When action=view the top 2 else top 3 end. Then, the conditional expression of w hen action=play thesaurus 1w hen action=view thesaurus 2 else topic3 end is pushed down to the kafka sink of the fink of the real-time computing engine link based on a push-down mechanism by the real-time computing engine link, so that the conditional expression is converted into a storage rule code which can be identified by the real-time computing engine link by the kafka sink of the fink of the real-time computing engine link.

S104-4, based on the storage rule codes, storing the atomic real-time index and the derived real-time index into a target storage sub-library.

After the storage rule codes of the real-time indexes are obtained, the method can analyze and determine the partition fields corresponding to the atomic real-time indexes to be stored and the derived real-time indexes. Further, the atomic real-time index is stored in the target storage sub-library based on the partition field corresponding to the atomic real-time index, and the derived real-time index is stored in the target storage sub-library based on the partition field corresponding to the derived real-time index.

In the present application, when the atomic real-time index and the derived real-time index are stored in the target storage sub-library, the storage manner supported by the software development kit (Software Development Kit, SDK) of the real-time index library may be used for storage, and the specific storage process is shown in fig. 5. That is, the method and the device can achieve that the real-time indexes are scattered and stored in different preset storage spaces Topic of the target storage sub-library in the real-time index library.

For example, assuming that the real-time index library is kafka and the real-time index to be stored is (id=1, name= 'a', action=play), the partition field action=play in the real-time index may be determined, and then the preset storage space Topic1 in the target storage sub-library corresponding to play is determined based on the storage rule code. Then, the above-mentioned real-time index a can be stored into the preset memory space Topic1 based on the function of Kafka's SDK to directly write different messages to different topics in the following manner, the ProducerRecord < String, string > record= new ProducerRecord > (Topic, "hello, kafka |").

That is, the method for storing the atomic real-time index and the derived real-time index in the target storage sub-library specifically includes: determining a first target storage space corresponding to the atomic real-time index and a second target storage space corresponding to the derivative real-time index based on the storage rule code; further, the atomic real-time index is stored in a first target storage space, and the derived real-time index is stored in a corresponding second target storage space.

Considering that the storage rule code may be a rule code corresponding to the third storage policy, before the atomic real-time index and the derived real-time index are stored in the target storage sub-library, the method optionally further includes: determining whether the storage rule code corresponds to a third storage strategy, and detecting the current flow value of each preset storage space in response to the fact that the storage rule code corresponds to the third storage strategy; and expanding the preset flow value of the preset storage space in response to the fact that the current flow value of any preset storage space is larger than the preset flow value. Therefore, each real-time index can be ensured to be correctly stored in the corresponding preset storage space, and the problem of data inclination can not occur.

In some optional embodiments, after the present application stores the atomic real-time index and the derived real-time index to the target storage sub-library, optionally storing the partition flow table and the mapping relationship between the atomic real-time index and the first target storage space as metadata of the atomic real-time index; and storing the partition flow table and the mapping relation between the derived real-time index and the second target storage space as metadata of the derived real-time index. In the present application, when metadata of an atomic real-time index and metadata of a derived real-time index are stored, specifically, metadata of an atomic real-time index and metadata of a derived real-time index are stored in an index management module. Therefore, when the subsequent user inquires the real-time index, the metadata of the real-time index to be inquired can be obtained from the index management module to determine the specific storage position of the real-time index to be inquired in the real-time index library based on the obtained metadata, and the real-time index to be inquired is quickly obtained from the real-time index library based on the specific storage position, so that the inquiring efficiency of the real-time index is improved.

Based on the foregoing storage method of the indexes shown in fig. 2 to 5, the present application further provides a query method of the indexes, see fig. 6.

As shown in fig. 6, the query method of the index may include the following steps:

s301, responding to real-time index query operation sent by a user, and acquiring real-time index identification to be queried.

Alternatively, the user may input or sort out the real-time index identifier to be queried based on a visual query interface provided by the electronic device, for example, as shown in fig. 7. At least one optional real-time metrics identification is shown in fig. 7 along with a search box. Or when the user grasps the SQL language, the user can also use the SQL language to perform real-time index query operation and the like, and the real-time index query operation sent by the user is not limited.

When receiving a real-time index query operation sent by a user, analyzing the real-time index query operation to determine a real-time index identifier to be used, which is required to be queried by the user.

It can be understood that the real-time index to be used is identified as the identity information of the real-time index to be used, such as the real-time index name, which can be uniquely determined, and the application does not limit the present application.

S302, determining a target storage position of the real-time index to be queried in a real-time index library based on the real-time index identifier to be queried.

The target storage position is specifically a preset storage space Topic position of any one sub-library in the real-time index library.

Since the index management module in the application registers and manages the metadata of all the indexes, the metadata at least comprises the storage positions of the indexes, the partition flow table identifiers corresponding to the indexes and other information describing the attribute of the indexes. Therefore, the metadata corresponding to the real-time index identifier to be queried can be obtained from the index management module based on the real-time index identifier to be queried. Further, based on the metadata, a target storage position of the real-time index to be queried in the real-time index library is determined.

In some alternative embodiments, determining the target storage location of the real-time index to be queried in the real-time index library based on the metadata may include the following:

in case one, the target partition flow table is determined based on metadata. Further, the conditional expressions corresponding to the partition fields in the target partition flow table are transcoded to convert the conditional expressions into identifiable storage rule codes for the real-time computing engine. Then, based on the storage rule code, the target storage position of the real-time index to be queried in the real-time index library is determined.

The method for converting the conditional expression into the identifiable storage rule code of the real-time computing engine is the same as the implementation method for converting the conditional expression into the identifiable storage rule code of the real-time computing engine in the storage method of the index, and is not repeated here.

And secondly, determining the mapping relation between the real-time index to be queried and the storage space based on the metadata. And then, determining the target storage position of the real-time index to be queried in the real-time index library according to the mapping relation between the real-time index to be queried and the storage space.

The storage space can be understood as any one of a plurality of preset storage spaces which are formed by splitting sub-libraries in the real-time index library.

S303, acquiring real-time indexes to be queried from a real-time index library based on the target storage position, and displaying the real-time indexes to be queried to a user.

Optionally, when determining the target storage location of the real-time index to be queried, the application can query the real-time index to be queried from the real-time index library based on the target storage location. And displaying the real-time index to be queried to the user through a visual interface so as to facilitate the user to use the real-time index to be queried.

In some alternative embodiments, the real-time metrics are scattered and stored in the storage space in consideration of storing the real-time metrics. Therefore, when the real-time index to be queried is obtained from the real-time index library based on the target storage position, the method and the device obtain a plurality of discrete target fields from the real-time index library based on the target storage position. And then, obtaining the real-time index to be queried based on the plurality of target discrete fields.

That is, when displaying the real-time index to be queried to the user, the method and the device need to restore the scattered real-time index data into a form before storage, so that the user can intuitively and clearly view the complete real-time index to be queried.

Considering that the indexes comprise real-time indexes and offline indexes, and along with the development requirement of batch integration, users often want the real-time indexes and the offline indexes to be identical in query use, that is, the query of the real-time indexes and the offline indexes can be completed through a set of SQL and logic. Therefore, the method and the device can not only achieve the purpose of acquiring the corresponding real-time index to be queried from the real-time index library based on the determined target storage position, but also meet the requirement of users for querying offline indexes. Specifically, when a user queries any index, which may be a real-time index or an offline index, the query operation may include two parts: the first part is a query index data generation part, and the part firstly acquires metadata corresponding to index identifications to be queried from an index management module based on index identifications queried by a user so as to determine storage spaces in which the indexes to be queried are stored based on the metadata. For example, when the index to be queried is a real-time index, it is determined on which Topic the real-time index is stored. When the index to be queried is an offline index, determining which hives the offline index is stored on. And furthermore, synchronizing the index to be queried into a Key/value database with better performance through a synchronization task. For example, assuming that the synchronization task is a real-time task, a link task may be generated, and a preset storage space message (for example, a kafka topic message) in the corresponding real-time index library is queried. And then, taking the Key in the message as a main Key in a Key/value database, leveling the corresponding dimension information dimension into a plurality of fields, and acquiring the index to be queried from the map through the index identifier (such as index name to be queried). If one record simultaneously stores a plurality of index data, converting one row of records into a plurality of rows according to the identification of the index data, and writing the leveled index to be queried into a Key/value database.

The second part is a query service providing part, and the part specifically can provide real-time indexes and/or offline indexes to be queried for users through the Key/value database after generating index data to be queried for users in the Key/value database.

The Key/value database may be a non-relational distributed database hbase, redis, etc., which is not limited in this application.

It should be noted that the Key/value database in the present application includes an offline index library and a real-time index library. That is, the Key/value database is one large database, the offline index database and the real-time index database are two small databases in the large database, and the index data in the offline index database and the real-time index database are different in dimension.

In this application, when the synchronous task is a real-time task, the SQL language corresponding to the flink task may be generated as follows:

INSERT INTO Habse1

SELECT

-a dimension field,

dimensionMap['userid']as userid,

-an index field,

index map [ 'daily life' ] as ua,

FROM A

WHERE action in(‘play’)

-partition

The method includes the steps that SQL grammar optimization is conducted, a grammar pushing function in Flink is utilized, a where condition is pushed to a source of Kafka, transformation is conducted on the source of kafaka, the pushed where condition is analyzed, and meta information of a partition flow table A is obtained through meta data corresponding to index identification to be queried. If it is determined that the partition field of partition flow table A is action, and that there is a filtering condition for action in the where condition. Then look for action=play in the where condition to determine which Topic the play should map to, it is found by case white parsing that the play of partition flow table a maps to Topic1. Then, an index management module is called to acquire the actual kafka address of the index to be queried based on the metadata corresponding to the index identification to be queried through the index management module, and the actual kafka query operation is carried out. If it is determined that the pushing down where condition does not have partition information, but the table a is still a partition flow table, then all partition information is read at this time to ensure that no missed reading of index data queried by the user occurs, and at this time, performance optimization does not exist, and besides, the data reading flow is always smaller than all index reading.

In some alternative embodiments, it is contemplated that some real-time metrics may be rarely used by user queries after storage. For this situation, after the real-time index is stored in the real-time index library, the real-time index is detected to exceed the preset duration and is not used by the user, or the number of times that the real-time index is used by the user in the preset duration is smaller than the preset number of times. If any real-time index is determined to be not queried by the user for more than the preset time length or the number of times of queried and used by the user in the preset time length is less than the preset number of times, outputting the offline reminding information of the real-time index to the user so as to remind the user to stop the management operation of the real-time index, thereby reducing the resource cost of data storage. In consideration of the fact that the user may not be able to receive the offline reminding information in time, the online reminding information of the real-time index is output to the user, meanwhile, the unused and/or used times data smaller than the preset times are stored into the index management module as metadata of the real-time index, and accordingly the user carries out offline processing on the real-time index with low use rate based on the index metadata managed in the index management module, and accordingly the utilization value of the index is improved.

The detection can be periodic detection, and the detection period can be flexibly set according to actual detection requirements. For example, if the real-time performance of detection is required to be stronger, the detection period can be set smaller, for example, 5 hours or one day. If the real-time performance of the detection is required to be weaker, the detection period can be set to be larger, such as one week or two weeks, and the application is not limited in any way.

The preset time length can be flexibly set according to the actual requirements of index query, and the application is not limited in any way. For example, the preset time period is 1 day or 7 days, etc.

According to the index query method, the real-time index identifier to be queried is obtained through the real-time index query operation sent by the user, the target storage position of the real-time index to be queried in the real-time index library is determined according to the real-time index identifier to be queried, the real-time index to be queried is obtained from the real-time index library based on the target storage position, and the real-time index to be queried is displayed to the user. Therefore, the query time of the real-time index can be shortened, and the query efficiency of the real-time index can be improved.

A storage device for an index according to an embodiment of the present application will be described below with reference to fig. 8. Fig. 8 is a schematic block diagram of an index storage device according to an embodiment of the present application. As shown in fig. 9, the index storage device 400 includes: the metrics generation module 410, the first determination module 420, the second determination module 430, and the metrics storage module 440.

Wherein, the index generating module 410 is configured to generate a derived real-time index based on the atomic real-time index;

a first determining module 420, configured to determine a service line identifier and an index storage policy associated with the derived real-time index;

a second determining module 430, configured to determine a target storage sub-library from the real-time index library based on the service line identifier;

and an index storage module 440, configured to store the atomic real-time index and the derived real-time index into the target storage sub-library based on the index storage policy.

In an optional implementation manner of this embodiment of the present application, the index storage module 440 includes:

In an optional implementation manner of this embodiment of the present application, the target storage sub-library includes at least one storage space, and the storage unit is specifically configured to:

In an optional implementation manner of the embodiment of the present application, the storage unit is further configured to:

In an optional implementation manner of this embodiment, when the index storage policy is the first storage policy, the obtaining unit is specifically configured to:

In an optional implementation manner of the embodiment of the present application, the first conditional expression is: when the condition meets the atomic real-time index identifier, returning a result to be a first preset storage space; when the condition meets the derived real-time index identifier, returning a result to be a second preset storage space; and when all the conditions are not met, returning the result to the third preset storage space and ending.

In an optional implementation manner of this embodiment, when the index storage policy is the second storage policy, the obtaining unit is specifically configured to:

In an optional implementation manner of the embodiment of the present application, the second conditional expression is: when the condition meets the condition that the modulus value is equal to the first numerical value, returning a result to be a first preset storage space; when the condition meets the condition that the modulus value is equal to the second numerical value, returning a result to be a second preset storage space; and when all the conditions are not met, returning the result to the third preset storage space and ending.

In an optional implementation manner of this embodiment, when the index storage policy is the third storage policy, the obtaining unit is specifically configured to:

In an optional implementation manner of the embodiment of the present application, the third conditional expression is: when the condition meets the condition that the modulus value is equal to the first numerical value, returning a result to be a first preset storage space and a preset flow value of the first preset storage space; when the condition meets the condition that the modulus value is equal to a second numerical value, returning a result to a second preset storage space and a preset flow value of the second preset storage space; and when all the conditions are not met, returning the result to the third preset storage space and the preset flow value of the third preset storage space and ending.

In an optional implementation manner of this embodiment of the present application, the index storage module 440 further includes:

In an optional implementation manner of this embodiment of the present application, the index storage module 440 is specifically configured to: acquiring a data storage structure corresponding to the target storage sub-library; and storing the atomic real-time index and the derived real-time index into the target storage sub-library according to the data storage structure.

An optional implementation manner of this embodiment of the present application, the index generating module is specifically configured to: acquiring processing parameters corresponding to the atomic real-time index; and generating the derived real-time index based on the atomic real-time index and the processing parameter.

According to the storage device for the index, the derived real-time index is generated according to the atomic real-time index, then the service line identification and the index storage strategy associated with the derived real-time index are determined, then the target storage sub-library is determined from the real-time index library based on the service line identification, and then the atomic real-time index and the derived real-time index are stored into the target storage sub-library based on the index storage strategy. According to the method and the system, the atomic real-time index and the derived real-time index are stored into the real-time index library based on the service line identification and the index storage strategy associated with the derived index, so that the query time of the real-time index is shortened, and the query efficiency of the real-time index is improved.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus 400 shown in fig. 8 may perform the method embodiment corresponding to fig. 2, and the foregoing and other operations and/or functions of each module in the apparatus 400 are respectively for implementing the corresponding flow in each method in fig. 2, and are not further described herein for brevity.

The apparatus 400 of the embodiments of the present application is described above in terms of functional modules in connection with the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiments in the embodiments of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in the memory, and the processor reads the information in the memory, and combines the hardware to complete the steps in the method embodiment.

Referring to fig. 9, a description is given below of an index query device according to an embodiment of the present application. Fig. 9 is a schematic block diagram of an index query device provided in an embodiment of the present application. As shown in fig. 10, the query device 500 for the index includes: an identification acquisition module 510, a third determination module 520, and an index acquisition module 530.

The identifier obtaining module 510 is configured to obtain a real-time indicator identifier to be queried in response to a real-time indicator query operation sent by a user;

a third determining module 520, configured to determine, based on the real-time index identifier to be queried, a target storage location of the real-time index to be queried in a real-time index library;

the index obtaining module 530 is configured to obtain, based on the target storage location, a real-time index to be queried from the real-time index library, and display the real-time index to be queried to the user.

In an optional implementation manner of the embodiment of the present application, the third determining module 520 includes:

An optional implementation manner of this embodiment of the present application, the third determining unit is specifically configured to: determining a target partition flow table based on the metadata; transcoding a conditional expression corresponding to a partition field in the target partition flow table to convert the conditional expression into an identifiable storage rule code of a real-time computing engine; and determining a target storage position of the real-time index to be queried in a real-time index library based on the storage rule code.

An optional implementation manner of this embodiment of the present application, the third determining unit is specifically configured to: determining a mapping relation between the real-time index to be queried and a storage space based on the metadata; and determining a target storage position of the real-time index to be queried in a real-time index library according to the mapping relation between the real-time index to be queried and the storage space.

In an optional implementation manner of this embodiment of the present application, the index obtaining module 530 is specifically configured to: acquiring a plurality of discrete target fields from the real-time index library based on the target storage location; and obtaining the real-time index to be queried based on the plurality of target discrete fields.

According to the query device for the indexes, the real-time index identification to be queried is obtained through the query operation of the real-time indexes sent by the user, and then the target storage position of the real-time indexes to be queried in the real-time index library is determined according to the real-time index identification to be queried, then the real-time indexes to be queried are obtained from the real-time index library based on the target storage position, and the real-time indexes to be queried are displayed to the user. Therefore, the query time of the real-time index can be shortened, and the query efficiency of the real-time index can be improved.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus 500 shown in fig. 9 may perform the method embodiment corresponding to fig. 6, and the foregoing and other operations and/or functions of each module in the apparatus 500 are respectively for implementing the corresponding flow in each method in fig. 6, and are not further described herein for brevity.

The apparatus 500 of the embodiments of the present application is described above in terms of functional modules in connection with the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiments in the embodiments of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in the memory, and the processor reads the information in the memory, and combines the hardware to complete the steps in the method embodiment.

Fig. 10 is a schematic block diagram of an electronic device provided in an embodiment of the present application.

As shown in fig. 10, the electronic device 600 may include:

a memory 610 and a processor 620, the memory 610 being adapted to store a computer program and to transfer the program code to the processor 620. In other words, the processor 620 may call and run a computer program from the memory 610 to implement the method of storing the index or the method of querying the index in the embodiment of the present application.

For example, the processor 620 may be configured to perform the method embodiments described above in accordance with instructions in the computer program.

In some embodiments of the present application, the processor 620 may include, but is not limited to:

a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

In some embodiments of the present application, the memory 610 includes, but is not limited to:

volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).

In some embodiments of the present application, the computer program may be partitioned into one or more modules that are stored in the memory 610 and executed by the processor 620 to complete the methods of storing the metrics or querying the metrics provided herein. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which are used to describe the execution of the computer program in the electronic device.

As shown in fig. 10, the electronic device may further include:

a transceiver 630, the transceiver 630 being connectable to the processor 620 or the memory 610.

The processor 620 may control the transceiver 630 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. Transceiver 630 may include a transmitter and a receiver. Transceiver 630 may further include antennas, the number of which may be one or more.

It will be appreciated that the various components in the electronic device are connected by a bus system that includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.

The embodiment of the application also provides a computer readable storage medium, which is used for storing a computer program, and the computer program makes a computer execute the method for storing the index or the method for querying the index according to the method embodiment, and particularly shown in fig. 11. Wherein the computer readable storage medium in fig. 11 is 700 and the computer program is 710.

The embodiment of the application also provides a computer program product containing program instructions, wherein the program instructions enable the electronic device to execute the method for storing the index or the method for querying the index according to the embodiment of the method when the program instructions are run on the electronic device.

When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces, in whole or in part, a flow or function consistent with embodiments of the present application. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of storing an index, comprising:

2. The method of claim 1, wherein storing the atomic real-time index and the derived real-time index into the target storage sub-library based on the index storage policy comprises:

obtaining a partition flow table corresponding to the index storage strategy;

3. The method of claim 2, wherein the target storage sub-library comprises at least one storage space, wherein the storing the atomic real-time index and the derived real-time index into the target storage sub-library based on the storage rule code comprises:

4. A method according to claim 3, further comprising:

5. An index query method, comprising:

6. The method of claim 5, wherein determining a target storage location of the real-time index to be queried in a real-time index library based on the real-time index identification to be queried comprises:

7. A storage device for an index, comprising:

8. An index query device, comprising:

9. An electronic device, comprising:

a processor and a memory for storing a computer program, the processor being adapted to call and run the computer program stored in the memory to perform the method of storing an index as claimed in any one of claims 1 to 4 or the method of querying an index as claimed in claims 5 to 6.

10. A computer-readable storage medium storing a computer program for causing a computer to execute the storing method of the index according to any one of claims 1 to 4 or the querying method of the index according to claim 5 to 6.