CN112818017A - Event data processing method and device - Google Patents

Event data processing method and device Download PDF

Info

Publication number
CN112818017A
CN112818017A CN202110093521.6A CN202110093521A CN112818017A CN 112818017 A CN112818017 A CN 112818017A CN 202110093521 A CN202110093521 A CN 202110093521A CN 112818017 A CN112818017 A CN 112818017A
Authority
CN
China
Prior art keywords
event
data
query
information
rule information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110093521.6A
Other languages
Chinese (zh)
Inventor
杨世谨
高键城
丘玉秀
刘亚东
黄家健
赵荣生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bigo Technology Pte Ltd
Original Assignee
Bigo Technology Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bigo Technology Pte Ltd filed Critical Bigo Technology Pte Ltd
Priority to CN202110093521.6A priority Critical patent/CN112818017A/en
Publication of CN112818017A publication Critical patent/CN112818017A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses an event data processing method and device. The method comprises the following steps: acquiring a core model, and determining preprocessing rule information and query rule information of dotting event data based on the core model, wherein the core model comprises a data processing rule, metadata, a data scattering rule and a data aggregation rule; analyzing the preprocessing rule information and the query rule information, generating relational metadata information based on the analyzed preprocessing rule information and the analyzed query rule information, and creating a wide table of a distributed column-type database based on the relational metadata information; preprocessing the dotting event data according to the relational metadata information, and storing the preprocessed data into a corresponding wide table of a distributed columnar database; the method comprises the steps of obtaining an event and query rule information selected by a user side, generating an event query language based on relational metadata information matched with the query rule information of the event, and correspondingly obtaining preprocessed data in a wide table of a distributed columnar database according to the event query language.

Description

Event data processing method and device
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to an event data processing method and device.
Background
With the continuous development of big data, the value brought by the big data has been widely verified, and the big data has become an important strategic resource concerned by enterprises and society nowadays and has become a new focus for people to compete for robbery. At present, the construction of big data infrastructure is basically mature, and related underlying platform components are also gradually mature, such as a storage engine, a computing engine, an OLAP engine (Online Analytical Processing), a scheduling engine, a reporting system, and the like. At present, the key development direction of big data gradually changes from infrastructure to data system construction, and the focus is changed into how to make data quickly circulate, so that the business value is generated, and the core competitiveness of enterprises such as quick strain capacity is improved.
The traditional data acquisition mode mainly comprises the following steps: the product or the operator puts forward a data requirement to an analyst, the analyst summarizes the data requirement every week and aligns data indexes, dimensions and priority scheduling with the product or the operator, the analyst develops a data SQL script according to the scheduling and delivers the data SQL script to the product or the operator, and the product or the operator executes the SQL script on an AD Hoc (AD Hoc query, user-defined query according to the requirement of the user) query platform to acquire related data. With the continuous expansion of business scale, team scale and data scale, the data demand is gradually increased, and the contradiction is gradually revealed. The time for processing data requirements by analysts is far from meeting the data requirements of businesses, and one data requirement is extracted to obtain related data, which usually needs at least one week. This brings about a number of problems, such as: the data acquisition efficiency is extremely low, and the data use efficiency and value output are seriously influenced; the data demand is increased, and the influence brought by the communication cost is gradually shown; data indexes, dimensions and formats are not uniformly managed, and iteration cost is increased; the Ad Hoc query speed is slow, and repeated query causes serious waste of computing resources.
Disclosure of Invention
The embodiment of the application provides an event data processing method and device, which can improve data acquisition efficiency, reduce data acquisition cost and realize rapid data value conversion.
In a first aspect, an embodiment of the present application provides an event data processing method, including:
acquiring a core model, and determining preprocessing rule information and query rule information of dotting event data based on the core model, wherein the core model comprises a data processing rule, metadata, a data scattering rule and a data aggregation rule;
analyzing the preprocessing rule information and the query rule information, generating relational metadata information based on the analyzed preprocessing rule information and the analyzed query rule information, creating a distributed column-type database broad table based on the relational metadata information, and storing the relational metadata information into a relational database;
obtaining the relational metadata information, preprocessing the dotting event data according to the relational metadata information, and storing the preprocessed data into a corresponding distributed column-type database wide table;
acquiring an event and query rule information selected by a user side, generating an event query language based on relational metadata information matched with the query rule information of the event, and correspondingly acquiring preprocessed data in a wide table of the distributed columnar database according to the event query language.
In a second aspect, an embodiment of the present application provides an event data processing apparatus, including:
the event data combing module is configured to acquire a core model, and determine preprocessing rule information and query rule information of dotting event data based on the core model, wherein the core model comprises a data processing rule, metadata, a data scattering rule and a data aggregation rule;
the metadata information acquisition module is configured to analyze the preprocessing rule information and the query rule information, generate relational metadata information based on the analyzed preprocessing rule information and query rule information, create a distributed columnar database broad table based on the relational metadata information, and store the relational metadata information into a relational database;
the data preprocessing module is configured to acquire the relational metadata information, preprocess the dotting event data according to the relational metadata information, and store the preprocessed data into a corresponding distributed columnar database wide table;
and the event data acquisition module is configured to acquire an event and query rule information selected by a user side, generate an event query language based on the relational metadata information matched with the query rule information of the event, and correspondingly acquire preprocessed data in the wide table of the distributed columnar database according to the event query language.
In a third aspect, an embodiment of the present application provides an electronic device, including:
a memory and one or more processors;
the memory for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the event data processing method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a storage medium containing computer-executable instructions for performing the event data processing method according to the first aspect when executed by a computer processor.
According to the data processing method and device, dotting event data are combed based on the core model, a complex data structure is simplified into a flat relational metadata model based on nested analysis, so that standardized processing of original data is achieved, and data processing efficiency is improved. Based on a relational metadata model, original data are preprocessed into flat and aggregated wide-table data, and the data volume is reduced by one magnitude order, so that the subsequent data query efficiency and query performance are improved. The method comprises the steps of acquiring a query rule selected by a user based on a simple interactive page, automatically generating a data query language, and automatically querying corresponding data to realize self-service acquisition of the data, so that the data acquisition efficiency is improved, and the efficiency of rapid value conversion of the data is improved.
Drawings
Fig. 1 is a flowchart of an event data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an event template;
FIG. 3 is a flowchart of data nesting analysis in the first embodiment of the present application;
FIG. 4 is a diagram of relational metadata information;
FIG. 5 is a flow chart of raw data preprocessing according to an embodiment of the present application;
FIG. 6 is a front-end interactive interface in accordance with an embodiment of the present application;
FIG. 7 is a flowchart illustrating an event query language generation process according to an embodiment of the present application;
FIG. 8 is a flow chart of another event processing method according to the first embodiment of the present application;
fig. 9 is a schematic structural diagram of an event data processing apparatus according to a second embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The application provides an event data processing method and device, aiming at combing dotting event data based on a core model and simplifying a complex data structure into a flat relational metadata model based on nested analysis, so that standardized processing of original data is realized, and data processing efficiency is improved. Based on a relational metadata model, original data are preprocessed into flat and aggregated wide-table data, and the data volume is reduced by one magnitude order, so that the subsequent data query efficiency and query performance are improved. The method comprises the steps of acquiring a query rule selected by a user based on a simple interactive page, automatically generating a data query language, and automatically querying corresponding data to realize self-service acquisition of the data, so that the data acquisition efficiency is improved, and the efficiency of rapid value conversion of the data is improved. Compared with the traditional data acquisition mode, an analyst and a service worker need to communicate data requirements and develop an SQL script regularly, and the process needs about one week. Most of the traditional SQL directly queries an original table of complex-structure data, the execution efficiency is low, the cluster resource waste is serious, and the SQL-based data acquisition needs to wait for more than ten minutes. The traditional data acquisition mode needs high time cost and labor cost, and has low data acquisition efficiency and large data value discount. Therefore, the event data processing method and the event data processing device in the embodiment of the application are provided, the data acquisition efficiency is improved, the data acquisition cost is reduced, and the rapid conversion of the data value is realized.
The first embodiment is as follows:
fig. 1 is a flowchart of an event data processing method according to an embodiment of the present application, where the event data processing method provided in this embodiment may be executed by an event data processing device, and the event data processing device may be implemented by software and/or hardware.
The following description will be given taking an event data processing apparatus as an example of a main body that executes an event data processing method. Referring to fig. 1, the event data processing method includes:
s110, obtaining a core model, and determining preprocessing rule information and query rule information of dotting event data based on the core model, wherein the core model comprises a data processing rule, metadata, a data scattering rule and a data aggregation rule.
The dotting event data refers to event data collected through a buried point, and the events comprise page jump, playing, attention, praise and the like. Different event data correspond to different processing modes, and a core model is established based on experience of previous event data analysis and data use of different types of dotting. The event template comprises a preprocessing rule and a query rule of the event data, and can be understood, wherein the preprocessing rule is a processing rule obtained based on experience of processing the same type of event data, for example, event data for page jump, play, attention and approval can be correspondingly processed through corresponding past data processing experience. According to the scheme, modeling is carried out based on past data processing experience, a core model is generated, a data processing rule is determined through the core model, the traditional manual data analysis link is replaced, the data analysis efficiency is improved, and the labor cost is reduced. Further, the query rule includes statistical indexes and dimensions of the event data, and for different event data, the statistical indexes correspond to different concepts, for example, the statistical indexes of the play event include play times, play duration, and the like, the statistical indexes of the page jump event include page entry times, dwell duration, and the like, and the dimensions represent event dimensions, such as country, region, version, and the like.
Illustratively, referring to FIG. 2, FIG. 2 is a schematic diagram of an event template. As shown in fig. 2, the preprocessing rule information and the query rule information are recorded in the event template in a nested structure, wherein the preprocessing rule and the query rule are recorded in the first field and the related extension field and the second field and the related extension field, respectively. It can be understood that the preprocessing rule information and the query rule information are formed and recorded in the event template by texts, the event template needs to be converted into a script file which is directly processed by the server, and the script file of the event template is imported into the server.
S120, analyzing the preprocessing rule information and the query rule information, generating relational metadata information based on the analyzed preprocessing rule information and the analyzed query rule information, creating a distributed column-type database wide table based on the relational metadata information, and storing the relational metadata information into a relational database.
Because the event template is complex nested structure information, if the event template is directly analyzed line by line, the information structure in the event template may change, so that the analyzed information is deviated, and the subsequent data processing accuracy is influenced. Specifically, referring to fig. 3, fig. 3 is a flowchart of data nesting analysis in the first embodiment of the present application. As shown in fig. 3, the data nesting parsing flow includes:
s1201, flattening the preprocessing rule information and the query rule information, and simplifying a nested structure of the preprocessing rule information and the query rule information into a flattened structure;
s1202, multi-queue hierarchical analysis is carried out on the flat preprocessing rule information and the query rule information to obtain the relational metadata information.
Specifically, the server performs deep analysis on the imported event template based on a nested analysis model. The nested analysis model is formed on the basis of an Excel asynchronous analysis rule and a multi-queue cooperation rule, and a flat event template is analyzed line by line through nested structure flattening processing of the multi-queue cooperation rule and the Excel asynchronous analysis rule. Illustratively, the callback record is cached, and a result set is generated according to a preset record rule. And constructing n queues on the basis of n layers of nested parsing, performing enqueuing and dequeuing integration operation on the records according to the characteristics of the parsed records, and indicating that the corresponding nested records are parsed when the records are enqueued in the outermost layer of the queue.
Illustratively, referring to fig. 4, fig. 4 is a diagram of relational metadata information. The event template in fig. 3 is subjected to flattening processing and multi-queue level analysis to obtain the relational metadata information in fig. 4. As can be seen from fig. 4, the nested structure of the event template can be simplified into a flat structure through nested parsing, and the preprocessing rule information and the query rule information are both parsed into relational metadata, so that convenience is provided for preprocessing corresponding event data based on the event template, and the data processing efficiency is improved.
Further, the Relational metadata information of the event data is persisted into RDB (Relational Database) of MYSQL (Relational Database management system). And generating a Schema (an abstract set of metadata) for building a table based on the relational metadata information of the event data, creating a flat CK (ClickHouse, distributed column database management system) wide table, and taking the CK wide table as a query target table. Understandably, one event data corresponds to one CK width table, i.e., the relational metadata information of the event data is associated with the generated CK width table.
S130, obtaining the relational metadata information, preprocessing the event data according to the relational metadata information, and storing the preprocessed data into a corresponding distributed column-type database wide table.
Specifically, referring to fig. 5, fig. 5 is a flowchart of raw data preprocessing in the first embodiment of the present application. As shown in fig. 5, the raw data preprocessing flow includes:
s1301, acquiring the relational metadata information from the relational database at a preset time node;
s1302, aggregating the dotting event data based on the relational metadata information to obtain pre-polymerization data;
s1303, determining a distributed column-type database wide table corresponding to the relational metadata information, and storing the pre-polymerization data into the distributed list database wide table.
Illustratively, Spark is scheduled at regular time, the relational metadata information in the RDB is pulled, and event data is aggregated by Spark (a computing engine) based on the relational metadata information, so as to obtain pre-polymerization data. And if the event data comprises the complex field, analyzing the complex field of the event data based on a preset analysis rule. Further, according to the CK width table associated with the relational metadata information, the CK width table corresponding to the event data is determined, and the pre-polymerization data is stored in the corresponding CK width table. The data size can be reduced by one magnitude by preprocessing the Spark, the subsequent data query performance is obviously improved, and the data query efficiency is improved. And the preprocessing realizes near real-time preprocessing by real-time scheduling, the delay is about 30 minutes, the real-time preprocessing can improve the subsequent data query speed and ensure the data timeliness, thereby improving the data value conversion efficiency.
It is understood that the relational metadata information includes relational metadata of a preprocessing rule of the event data and relational metadata of a query rule. When the event data is preprocessed, the event data is aggregated through the relational metadata of the preprocessing rule, so that pre-aggregation data consistent with the manually processed event data is obtained. A standardized data processing mode is formed by the core model, the template analysis and the preprocessing, so that manual data processing is replaced, the data processing efficiency is improved, and meanwhile, the labor cost is saved. Furthermore, relational metadata of the query rule represents indexes and dimensions of the event data, and when the CK broad table is created, a Schema for creating the table is generated based on the relational metadata of the query rule, so that the corresponding CK broad table can be queried according to query rule information selected by a user in the following process, and therefore preprocessing data required by the user can be obtained.
S140, acquiring the event and the query rule information selected by the user side, generating an event query language based on the relational metadata information matched with the query rule information of the event, and correspondingly acquiring the preprocessed data in the wide table of the distributed columnar database according to the event query language.
Specifically, referring to fig. 6, fig. 6 is a front-end interaction platform in the first embodiment of the present application. As shown in fig. 6, a user may select which time range of data content and trend is desired to be viewed by date, select specific dotting events such as page skip, play, focus, and like by event name, select data desired to be filtered by filtering condition, select dotting events in the dimension such as country, region, and version by dimension, and select statistical types of the dotting events such as page entry times and dwell time by statistical index. And when the server side acquires the corresponding event data, returning the statistical result of the event data to the front end, and displaying the statistical result by the front end through a table, a line graph, a column graph, an area graph or the like. Understandably, the event index and the event dimension selected by the user correspond to the query rule information of the event data, and the embodiment of the application queries the corresponding CK wide table based on the index and the dimension selected by the user to obtain the corresponding event data.
Furthermore, the front-end interactive platform also provides a combined event query scene, namely, event data of a plurality of events which occur simultaneously can be acquired. The embodiment of the application provides a simple front-end interactive page, so that a user can automatically acquire data indexes under corresponding dimensionalities by selecting visual data contents such as a time range, an event dimensionality, an event index and a filtering condition through the page, the data acquisition process is realized by acquiring inquiry rule information by a front end and acquiring corresponding data by a rear end, manual participation is not needed, data acquisition self-service is completely realized, the data acquisition cost is greatly reduced, the data acquisition efficiency is improved, and the rapid conversion of data value is realized.
Further, after the user selects the event and the query rule at the front end, the front end sends the query rule information selected by the user to the server, the server generates an event query language based on the query rule information, and the preprocessed data in the CK cluster is queried through the event query language. Specifically, referring to fig. 7, fig. 7 is a flowchart of event query language generation in the first embodiment of the present application. As shown in fig. 7, the event query language generation flow includes:
s1401, acquiring relational metadata information matched with the query rule information of the event from the relational database based on the query rule information of the event;
and S1402, splicing the relational metadata information matched with the query rule information of the event through a preset event query language splicing component to generate the event query language, wherein the event query language splicing component provides a basic query language generating function and a nested query language generating function.
Illustratively, through the event dimension and the event index selected by the user, the relational metadata matched with the event dimension and the event index in the RDB is obtained. And splicing the relational metadata matched with the event dimension and the event index selected by the user in the RDB through an SQL (Structured Query Language) splicing component to generate SQL adaptive to the CK grammar so as to submit the SQL to a CK cluster to Query a corresponding CK broad table. The SQL splicing component provides basic SQL generation functions, such as common query grammars including select (field data instruction in query table), disanct (repeat row removing instruction), from (table name query instruction), where (filter instruction), group by (field grouping instruction), having (aggregation filter instruction), order by (result sorting instruction), limit (query result return quantity limiting instruction), exist (sub-query instruction), in (sub-query instruction), and the like, and also provides functions of complex nested SQL, such as union (union), union (union, no repeat row removing), alias (alias), on (filter), and from sub-query. Based on the method, aggregation function components and condition generation components of the expansion compatible CK grammar, such as aggregation function components of summation, counting, maximum, minimum, mean value, quantile and the like, and condition generation components of summation, or, equal to zero, unequal to zero, equal to, less than, equal to less than, greater than, equal to or greater than, interval, absolute time interval, relative time interval, correct, error and the like can be directly integrated into functions of basic functions, and an expansion interface is provided, so that the generation components can be expanded at any time. SQL can be generated quickly based on the SQL splicing component, so that the query complexity is greatly reduced and the query efficiency is improved.
Further, if the user selects the combined event and the query rule information on the front-end interaction page, the server receives the combined event and the query rule information selected by the user side, wherein the combined event comprises at least two events. And aiming at the combined event and the query rule selected by the user, the data query processing flow is consistent with the data query flow of a single event. And acquiring the matched relational metadata in the RDB according to the query rule of each event, splicing the relational metadata based on the SQL splicing component to generate corresponding SQL, and querying the preprocessed data in the CK cluster according with the combined event and the query rule selected by the user by the SQL to realize the acquisition of the event data of the combined event.
Further, if the user selects the screening condition on the front-end interactive page, the front end sends the screening condition information selected by the user to the server, and the server processes the preprocessing data obtained by the SQL according to the screening condition information. Specifically, screening condition information selected by a user side is obtained, and preprocessing data obtained by the event query language is aggregated according to the screening condition information. Illustratively, after the preprocessed data in the corresponding CK wide table is obtained through SQL, the data which do not meet the requirements of the user are removed according to the screening conditions, and the rest data are aggregated.
The embodiment of the application provides simple front-end interactive page and back-end data query to realize self-service data query, so that a user does not need to face obscure SQL, event data is subjected to near-real-time pre-polymerization treatment in advance, a CK wide table in a high-availability cluster is combined, the query efficiency is remarkably improved, the query speed is reduced to the second-level query speed by ten-minute-level query speed, the response time of the user for acquiring the data is greatly shortened, the real-time performance and the high efficiency of event data acquisition are considered, the data acquisition link cost is better reduced, and the efficiency of data rapid conversion value is improved.
On the other hand, referring to fig. 8, fig. 8 is a flowchart of another event processing method in the first embodiment of the present application. As shown in fig. 8, the management end combs event data based on the core model, generates an event template, imports an event template generation script file into the service end, the service end parses the imported event template, generates relational metadata of the event template, creates a distributed columnar database broad table based on the relational metadata of the query rule, and persists the relational metadata of the event template into the relational database. The event data are preprocessed by the timing scheduling calculation engine, pre-polymerization is carried out on the event data based on the relational metadata of the preprocessing rules corresponding to the event data, the generated pre-polymerization data are written into the corresponding wide table of the distributed columnar database based on the relational metadata of the query rules, and the preprocessed data are stored in a cluster by the distributed columnar database management system. At this point, the preliminary preparation of the event data is completed. And then, receiving a query rule sent by the front end, splicing the relational metadata in a relational database related to the query rule through an event query language splicing component to generate an event query language, querying the preprocessed data in the distributed column-type database management system cluster through the event query language, formatting the queried data result, outputting the formatted data result to the front end, and performing tabular or diagrammatized rendering by the front end to display a data structure concerned by a user.
In summary, in the embodiment of the application, a core model is obtained, and preprocessing rule information and query rule information of event data are determined based on the core model, wherein the core model comprises a data processing rule, metadata, a data scattering rule and a data aggregation rule; analyzing the preprocessing rule information and the query rule information, generating relational metadata information based on the analyzed preprocessing rule information and the analyzed query rule information, creating a distributed column-type database broad table based on the relational metadata information, and storing the relational metadata information into a relational database; acquiring the relational metadata information, preprocessing the event data according to the relational metadata information, and storing the preprocessed data into a corresponding distributed columnar database wide table; acquiring an event and query rule information selected by a user side, generating an event query language based on relational metadata information matched with the query rule information of the event, and correspondingly acquiring preprocessed data in a wide table of the distributed columnar database according to the event query language. By adopting the technical means, event data are combed based on the core model, and a complex data structure is simplified into a flat relational metadata model based on nested analysis, so that the standardized processing of the original data is realized, and the data processing efficiency is improved. Based on a relational metadata model, original data are preprocessed into flat and aggregated wide-table data, and the data volume is reduced by one magnitude order, so that the subsequent data query efficiency and query performance are improved. The method comprises the steps of acquiring a query rule selected by a user based on a simple interactive page, automatically generating a data query language, and automatically querying corresponding data to realize self-service acquisition of the data, so that the data acquisition efficiency is improved, and the efficiency of rapid value conversion of the data is improved.
Example two:
on the basis of the foregoing embodiment, fig. 9 is a schematic structural diagram of an event data processing apparatus according to a second embodiment of the present application. Referring to fig. 9, the event data processing apparatus provided in this embodiment specifically includes: an event data combing module 21, a metadata information acquisition module 22, a data preprocessing module 23 and an event data acquisition module 24.
The event data combing module 21 is configured to obtain a core model, and determine preprocessing rule information and query rule information of dotting event data based on the core model, wherein the core model comprises a data processing rule, metadata, a data scattering rule and a data aggregation rule;
a metadata information obtaining module 22 configured to analyze the preprocessing rule information and the query rule information, generate relational metadata information based on the analyzed preprocessing rule information and query rule information, create a distributed columnar database wide table based on the relational metadata information, and store the relational metadata information into a relational database;
the data preprocessing module 23 is configured to acquire the relational metadata information, preprocess the dotting event data according to the relational metadata information, and store the preprocessed data into a corresponding wide table of the distributed columnar database;
and the event data acquisition module 24 is configured to acquire the event and the query rule information selected by the user side, generate an event query language based on the relational metadata information matched with the query rule information of the event, and correspondingly acquire the preprocessed data in the wide table of the distributed columnar database according to the event query language.
The event data are sorted based on the core model, and the complex data structure is simplified into the flat relational metadata model based on nested analysis, so that the standardized processing of the original data is realized, and the data processing efficiency is improved. Based on a relational metadata model, original data are preprocessed into flat and aggregated wide-table data, and the data volume is reduced by one magnitude order, so that the subsequent data query efficiency and query performance are improved. The method comprises the steps of acquiring a query rule selected by a user based on a simple interactive page, automatically generating a data query language, and automatically querying corresponding data to realize self-service acquisition of the data, so that the data acquisition efficiency is improved, and the efficiency of rapid value conversion of the data is improved.
The event data processing device provided by the second embodiment of the present application can be used for executing the event data processing method provided by the first embodiment, and has corresponding functions and beneficial effects.
Example three:
an embodiment of the present application provides an electronic device, and with reference to fig. 10, the electronic device includes: a processor 31, a memory 32, a communication module 33, an input device 34, and an output device 35. The number of processors in the electronic device may be one or more, and the number of memories in the electronic device may be one or more. The processor, memory, communication module, input device, and output device of the electronic device may be connected by a bus or other means.
The memory 32 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the event data processing method according to any embodiment of the present application (for example, the event data combing module 21, the metadata information obtaining module 22, the data preprocessing module 23, and the event data obtaining module 24 in the event data processing apparatus). The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The communication module 33 is used for data transmission.
The processor 31 executes various functional applications of the device and data processing by executing software programs, instructions and modules stored in the memory, that is, implements the event data processing method described above.
The input device 34 may be used to receive entered numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 35 may include a display device such as a display screen.
The electronic device provided above can be used to execute the event data processing method provided in the first embodiment above, and has corresponding functions and advantages.
Example four:
embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform an event data processing method, the event data processing method including: acquiring a core model, and determining preprocessing rule information and query rule information of dotting event data based on the core model, wherein the core model comprises a data processing rule, metadata, a data scattering rule and a data aggregation rule; analyzing the preprocessing rule information and the query rule information, generating relational metadata information based on the analyzed preprocessing rule information and the analyzed query rule information, creating a distributed column-type database broad table based on the relational metadata information, and storing the relational metadata information into a relational database; obtaining the relational metadata information, preprocessing the dotting event data according to the relational metadata information, and storing the preprocessed data into a corresponding distributed column-type database wide table; acquiring an event and query rule information selected by a user side, generating an event query language based on relational metadata information matched with the query rule information of the event, and correspondingly acquiring preprocessed data in a wide table of the distributed columnar database according to the event query language.
Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media residing in different locations, e.g., in different computer systems connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.
Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the event data processing method described above, and may also perform related operations in the event data processing method provided in any embodiment of the present application.
The event data processing apparatus, the storage medium, and the electronic device provided in the foregoing embodiments may execute the event data processing method provided in any embodiment of the present application, and reference may be made to the event data processing method provided in any embodiment of the present application without detailed technical details described in the foregoing embodiments.
The foregoing is considered as illustrative of the preferred embodiments of the invention and the technical principles employed. The present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the claims.

Claims (10)

1. An event data processing method, comprising:
acquiring a core model, and determining preprocessing rule information and query rule information of dotting event data based on the core model, wherein the core model comprises a data processing rule, metadata, a data scattering rule and a data aggregation rule;
analyzing the preprocessing rule information and the query rule information, generating relational metadata information based on the analyzed preprocessing rule information and the analyzed query rule information, creating a distributed column-type database broad table based on the relational metadata information, and storing the relational metadata information into a relational database;
obtaining the relational metadata information, preprocessing the dotting event data according to the relational metadata information, and storing the preprocessed data into a corresponding distributed column-type database wide table;
acquiring an event and query rule information selected by a user side, generating an event query language based on relational metadata information matched with the query rule information of the event, and correspondingly acquiring preprocessed data in a wide table of the distributed columnar database according to the event query language.
2. The method of claim 1, wherein parsing the pre-processing rule information and the query rule information, and wherein generating the relational metadata information based on the parsed pre-processing rule information and query rule information comprises:
flattening the preprocessing rule information and the query rule information, and simplifying a nested structure of the preprocessing rule information and the query rule information into a flattened structure;
and performing multi-queue hierarchical analysis on the flat preprocessing rule information and the query rule information to obtain the relational metadata information.
3. The method of claim 1, wherein the obtaining the relational metadata information, pre-processing the dotting event data according to the relational metadata information, and storing the pre-processed dotting event data in a corresponding wide table of a distributed columnar database comprises:
acquiring the relational metadata information from the relational database at a preset time node;
aggregating the dotting event data based on the relational metadata information to obtain pre-polymerization data;
and determining a distributed column-type database wide table corresponding to the relational metadata information, and storing the pre-polymerization data into the distributed list database wide table.
4. The method of claim 1, wherein generating an event query language based on the relational metadata information matching the query rule information of the event comprises:
acquiring relational metadata information matched with the query rule information of the event from the relational database based on the query rule information of the event;
and splicing the relational metadata information matched with the query rule information of the event through a preset event query language splicing component to generate the event query language, wherein the event query language splicing component provides a basic query language generation function and a nested query language generation function.
5. The method of claim 1, further comprising, after said obtaining preprocessed data in said distributed columnar database according to said event query language:
and acquiring screening condition information selected by a user side, and aggregating the preprocessed data acquired by the event query language according to the screening condition information.
6. The method of claim 3, further comprising, prior to said aggregating the dotting event data based on the relational metadata information:
and if the dotting event data comprise complex fields, analyzing the complex fields based on a preset analysis rule.
7. The method of claim 4, wherein the obtaining event and query rule information selected by the user side comprises:
acquiring combined events and query rule information selected by a user side, wherein the combined events comprise at least two events.
8. An event data processing apparatus characterized by comprising:
the event data combing module is configured to acquire a core model, and determine preprocessing rule information and query rule information of dotting event data based on the core model, wherein the core model comprises a data processing rule, metadata, a data scattering rule and a data aggregation rule;
the metadata information acquisition module is configured to analyze the preprocessing rule information and the query rule information, generate relational metadata information based on the analyzed preprocessing rule information and query rule information, create a distributed columnar database broad table based on the relational metadata information, and store the relational metadata information into a relational database;
the data preprocessing module is configured to acquire the relational metadata information, preprocess the dotting event data according to the relational metadata information, and store the preprocessed data into a corresponding distributed columnar database wide table;
and the event data acquisition module is configured to acquire an event and query rule information selected by a user side, generate an event query language based on the relational metadata information matched with the query rule information of the event, and correspondingly acquire preprocessed data in the wide table of the distributed columnar database according to the event query language.
9. An electronic device, comprising:
a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the event data processing method of any one of claims 1-7.
10. A storage medium containing computer-executable instructions for performing the event data processing method of any one of claims 1 to 7 when executed by a computer processor.
CN202110093521.6A 2021-01-22 2021-01-22 Event data processing method and device Pending CN112818017A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110093521.6A CN112818017A (en) 2021-01-22 2021-01-22 Event data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110093521.6A CN112818017A (en) 2021-01-22 2021-01-22 Event data processing method and device

Publications (1)

Publication Number Publication Date
CN112818017A true CN112818017A (en) 2021-05-18

Family

ID=75859047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110093521.6A Pending CN112818017A (en) 2021-01-22 2021-01-22 Event data processing method and device

Country Status (1)

Country Link
CN (1) CN112818017A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070198478A1 (en) * 2006-02-15 2007-08-23 Matsushita Electric Industrial Co., Ltd. Distributed meta data management middleware
US20120137367A1 (en) * 2009-11-06 2012-05-31 Cataphora, Inc. Continuous anomaly detection based on behavior modeling and heterogeneous information analysis
CN103714129A (en) * 2013-12-12 2014-04-09 用友软件股份有限公司 Device and method for buildingg dynamic data structures and relationships based on condition rules
WO2015027932A1 (en) * 2013-08-28 2015-03-05 Tencent Technology (Shenzhen) Company Limited Multi-dimensional decomposition computing method and system
CN106570129A (en) * 2016-10-27 2017-04-19 南京邮电大学 Storage system for rapidly analyzing real-time data and storage method thereof
CN107016025A (en) * 2016-11-17 2017-08-04 阿里巴巴集团控股有限公司 A kind of method for building up and device of non-relational database index
CN108984177A (en) * 2018-06-21 2018-12-11 中国铁塔股份有限公司 A kind of data processing method and system
WO2018233364A1 (en) * 2017-06-19 2018-12-27 华为技术有限公司 Index updating method and system, and related device
CN110019396A (en) * 2017-12-01 2019-07-16 中国移动通信集团广东有限公司 A kind of data analysis system and method based on distributed multidimensional analysis
CN111104394A (en) * 2019-12-31 2020-05-05 新奥数能科技有限公司 Energy data warehouse system construction method and device
CN111159204A (en) * 2020-01-02 2020-05-15 北京东方金信科技有限公司 Method and system for generating label in configuration mode
US20200187324A1 (en) * 2018-11-20 2020-06-11 Whirlwind Vr, Inc System and Method for an End-User Scripted (EUS) Customized Effect from a Rendered Web-Page
CN111885012A (en) * 2020-07-03 2020-11-03 安徽继远软件有限公司 Network situation perception method and system based on information acquisition of various network devices

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070198478A1 (en) * 2006-02-15 2007-08-23 Matsushita Electric Industrial Co., Ltd. Distributed meta data management middleware
US20120137367A1 (en) * 2009-11-06 2012-05-31 Cataphora, Inc. Continuous anomaly detection based on behavior modeling and heterogeneous information analysis
WO2015027932A1 (en) * 2013-08-28 2015-03-05 Tencent Technology (Shenzhen) Company Limited Multi-dimensional decomposition computing method and system
CN103714129A (en) * 2013-12-12 2014-04-09 用友软件股份有限公司 Device and method for buildingg dynamic data structures and relationships based on condition rules
CN106570129A (en) * 2016-10-27 2017-04-19 南京邮电大学 Storage system for rapidly analyzing real-time data and storage method thereof
CN107016025A (en) * 2016-11-17 2017-08-04 阿里巴巴集团控股有限公司 A kind of method for building up and device of non-relational database index
WO2018233364A1 (en) * 2017-06-19 2018-12-27 华为技术有限公司 Index updating method and system, and related device
CN110019396A (en) * 2017-12-01 2019-07-16 中国移动通信集团广东有限公司 A kind of data analysis system and method based on distributed multidimensional analysis
CN108984177A (en) * 2018-06-21 2018-12-11 中国铁塔股份有限公司 A kind of data processing method and system
US20200187324A1 (en) * 2018-11-20 2020-06-11 Whirlwind Vr, Inc System and Method for an End-User Scripted (EUS) Customized Effect from a Rendered Web-Page
CN111104394A (en) * 2019-12-31 2020-05-05 新奥数能科技有限公司 Energy data warehouse system construction method and device
CN111159204A (en) * 2020-01-02 2020-05-15 北京东方金信科技有限公司 Method and system for generating label in configuration mode
CN111885012A (en) * 2020-07-03 2020-11-03 安徽继远软件有限公司 Network situation perception method and system based on information acquisition of various network devices

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
SALVADOR GARCÍA 等: "Big data preprocessing: methods and prospects", BIG DATA ANALYTICS, 1 November 2016 (2016-11-01), pages 1 - 22 *
WEIXIN_39587407: "jdbc 生成建表语句_java使用JDBC动态创建数据表及SQL预处理的方法", pages 1, Retrieved from the Internet <URL:https://blog.csdn.net/weixin_39587407/article/details/111961748> *
ZAINEB CHELLY DAGDIA 等: "A distributed rough set theory based algorithm for an efficient big data pre-processing under the spark framework", 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 15 January 2018 (2018-01-15), pages 911 - 916 *
刘晓建 等: "宇航中的一种基于线性约束数据库的GIS空时数据模型及其查询", 宇航学报, no. 01, 30 January 2004 (2004-01-30), pages 77 - 81 *
曾敬: "基于移动用户大数据的自助取数分析系统设计与实现", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 4, 15 April 2019 (2019-04-15), pages 138 - 664 *
李广乾 等: "电子政务模块化、元数据与信息资源的开发利用", 电子政务, no. 1, 20 March 2011 (2011-03-20), pages 41 - 51 *
王海洋 等: "面向电力物联网的电力大数据应用", 电力大数据, no. 02, 21 February 2020 (2020-02-21), pages 87 - 92 *
蔡鑫;: "电信数据挖掘数据准备过程的规范化设计", 计算机工程, no. 24, 20 December 2007 (2007-12-20), pages 44 - 45 *
邓莎莎 等: "基于异构数据抽取清洗模型的元数据的研究", 计算机工程与应用, no. 30, 1 June 2006 (2006-06-01), pages 175 - 177 *

Similar Documents

Publication Publication Date Title
KR102627690B1 (en) Dimensional context propagation techniques for optimizing SKB query plans
US20220035815A1 (en) Processing database queries using format conversion
CN109669934B (en) Data warehouse system suitable for electric power customer service and construction method thereof
US9043348B2 (en) System and method for performing set operations with defined sketch accuracy distribution
US10902022B2 (en) OLAP pre-calculation model, automatic modeling method, and automatic modeling system
US20160179852A1 (en) Visualizing Large Data Volumes Utilizing Initial Sampling and Multi-Stage Calculations
EP3654198A1 (en) Conversational database analysis
CN103440288A (en) Big data storage method and device
US9633077B2 (en) Query of multiple unjoined views
EP3044706A1 (en) A method of optimizing queries execution on a data store
CN110928903B (en) Data extraction method and device, equipment and storage medium
CN115964374B (en) Query processing method and device based on pre-calculation scene
CN112559567A (en) Query method and device suitable for OLAP query engine
CN105677687A (en) Data processing method and device
CN113342843A (en) Big data online analysis method and system
CN111125045B (en) Lightweight ETL processing platform
CN112818017A (en) Event data processing method and device
CN110297858A (en) Optimization method, device, computer equipment and the storage medium of executive plan
US10331715B2 (en) Metadata enrichment with a keyword definition editor
CN113010519A (en) Data processing method and device, storage medium and electronic equipment
CN114490724A (en) Method and device for processing database query statement
Kazi et al. MOLAP data warehouse of a software products servicing Call center
US20240028250A1 (en) Dynamic update of consolidated data based on granular data values
Wang et al. Research on display system for agricultural science and technology support data based on Microsoft data warehouse
CN116680284A (en) Database query processing method, cloud computing platform and cloud computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination