WO2021164253A1

WO2021164253A1 - Method and device for real-time multidimensional analysis of user behaviors, and storage medium

Info

Publication number: WO2021164253A1
Application number: PCT/CN2020/117423
Authority: WO
Inventors: 禹蕾; 张观成; 万书武; 李均
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-02-18
Filing date: 2020-09-24
Publication date: 2021-08-26
Also published as: CN111311326A

Abstract

The present application relates to the technical field of big data. Provided are a method and device for real-time multidimensional analysis of user behaviors, and a storage medium. The method comprises: user behavior data in a distributed message queue is extracted via a real-time processing program and processed to acquire a corresponding dataset; the dataset is transmitted into a user behavior pre-analysis engine, precomputation is performed with respect to user data by the user behavior pre-analysis engine, and a corresponding precomputation result is acquired; the precomputation result corresponding to the user dataset is saved in a distributed file system; when analyzing a user behavior, a query request is transmitted to a query engine, the query engine determines, on the basis of the type of content of the request, whether precomputation is performed with respect to the user dataset; and, a query result is acquired by analysis via a corresponding analysis engine on the basis of the query result on whether precomputation is performed with respect to the user dataset and fed back. The present application increases query response speed, system throughput, and, resource utilization and isolation.

Description

Method, device and storage medium for real-time multi-dimensional analysis of user behavior

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 18, 2020, the application number is 202010098994.0, and the invention title is "User Behavior Real-time Multi-Dimensional Analysis Method, Device and Storage Medium", the entire content of which is incorporated by reference Incorporated in this application.

Technical field

This application relates to the field of big data technology, and in particular to a method, device and computer-readable storage medium for real-time multi-dimensional analysis of user behavior.

Background technique

At present, almost all mobile apps are constantly collecting mobile user behavior data. Markets, products, operations, and managers need to conduct real-time and multi-dimensional event analysis, funnel analysis, retention analysis, distribution analysis, and User path analysis, etc. These analysis functions must be built on a distributed, scalable, highly available, real-time, computing and storage platform, so the big data user behavior real-time multi-dimensional in-depth analysis platform was born, which satisfies the massive user behavior Data analysis needs.

At present, the widely used Shence data has a similar big data real-time multi-dimensional analysis platform for user behavior. The analysis platform of Shence data is mainly composed of impala, kudu, hdfs, and yarn. The inventor realizes that it has the following drawbacks: each analysis must be performed based on a large amount of raw data, which causes some analysis functions frequently used by users to be very slow, and the user experience is very poor.

technical problem

This application provides a real-time multi-dimensional analysis method, electronic device, and computer-readable storage medium for user behavior, the main purpose of which is to pre-calculate the analysis functions frequently used by users, which can improve query response speed and system throughput, and improve user experience . In addition, one or more access applications can be processed at the same time, and each program can independently allocate resources on Druid and YARN, with high resource utilization and strong isolation.

Technical solutions

To achieve the above objective, the present application provides a real-time multi-dimensional analysis method of user behavior, applied to an electronic device, the method includes: extracting user behavior data in the distributed message queue through a real-time processing program and processing it to obtain the corresponding The data set; send the data set to the data warehouse and the user behavior pre-analysis engine, store the user data set through the data warehouse, and pre-analyze the user data through the user behavior pre-analysis engine Calculate and obtain the corresponding pre-calculation result; save the user data set and the corresponding pre-calculation result in the distributed file system; when the user behavior is analyzed, a query request is sent to the query engine, and the query The engine determines whether the user data set has been pre-calculated according to the type and content of the request; based on the query result of whether the user data set has been pre-calculated, the corresponding analysis engine analyzes and obtains the query result and feeds it back.

In order to achieve the above objective, this application also provides a real-time multi-dimensional analysis device for user behavior, which includes: a data set acquisition unit for extracting and processing user behavior data in the distributed message queue through a real-time processing program, Obtain the corresponding data set; the pre-calculation result acquisition unit is used to send the data set to the data warehouse and the user behavior pre-analysis engine, store the user data set through the data warehouse, and pass the user behavior The pre-analysis engine pre-calculates the user data and obtains the corresponding pre-calculation result; the pre-calculation result storage unit is used to save the user data set and the corresponding pre-calculation result in the distributed file system; the judgment unit , Used to send a query request to the query engine when analyzing the user behavior, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request; the result feedback unit is used to base The query result of whether the user data set has been pre-calculated is analyzed through the corresponding analysis engine to obtain the query result and feedback.

In addition, in order to achieve the above object, the present application also provides an electronic device, which is characterized by comprising: a processor; and a memory for storing program instructions of the processor; wherein the processor is configured to execute The program instruction is used to execute the method of any one of the foregoing, the method includes: extracting and processing user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set; and sending the data set In the data warehouse and the user behavior pre-analysis engine, the user data set is stored through the data warehouse, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained; The user data set and the corresponding pre-calculation result are saved in the distributed file system; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine judges according to the type and content of the request Whether the user data set has been pre-calculated; based on the query result of whether the user data set has been pre-calculated, the query result is obtained by analyzing the corresponding analysis engine and fed back.

In addition, in order to achieve the above object, the present application also provides a computer-readable storage medium, the computer-readable storage medium includes a user behavior real-time multi-dimensional analysis program, when the user behavior real-time multi-dimensional analysis program is executed by the processor , To implement the steps in the method for real-time multi-dimensional analysis of user behavior as described below: extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set; and send the data set to In the data warehouse and the user behavior pre-analysis engine, the user data set is stored through the data warehouse, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained; The user data set and the corresponding pre-calculation result are saved in the distributed file system; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine judges the user according to the type and content of the request Whether the data set has been pre-calculated; based on the query result of whether the user data set has been pre-calculated, the query result is obtained and fed back through the corresponding analysis engine analysis.

Beneficial effect

The real-time multi-dimensional analysis method, electronic device, and computer-readable storage medium of user behavior proposed in this application pre-calculate some analysis functions frequently used by users and retain the calculation results. When the user uses this analysis function, the already calculated The result is returned to the user, so that the query response speed and system throughput are improved through the pre-calculation technology, and the user experience is improved. At the same time, according to the data volume of each access application, the computing resources and storage resources are independently evaluated and allocated for each access application, and the resources are managed in a more fine-grained manner based on the application, which can improve resource utilization. In addition, Since the resources used by each access application are independent, the access applications do not affect each other, and the isolation of resources can also be enhanced.

Description of the drawings

FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a method for real-time multi-dimensional analysis of user behavior in this application.

FIG. 2 is a schematic diagram of modules of a preferred embodiment of a real-time multi-dimensional analysis program of user behavior in FIG. 1.

FIG. 3 is a flowchart of a preferred embodiment of a method for real-time multi-dimensional analysis of user behavior in this application.

Figure 4 is a schematic block diagram of the method for real-time multi-dimensional analysis of user behavior in this application.

Figure 5 is a block diagram of a real-time multi-dimensional analysis system for user behavior of the application.

Figure 6 is a structural diagram of a real-time multi-dimensional analysis system for user behavior of the application.

Figure 7 is a schematic diagram of the program for real-time multi-dimensional analysis of user behavior in this application.

Embodiments of the present invention

It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

This application provides a real-time multi-dimensional analysis method of user behavior, which is applied to an electronic device 1. Referring to FIG. 1, it is a schematic diagram of the application environment of the preferred embodiment of the method for real-time multi-dimensional analysis of user behavior of this application.

In this embodiment, the electronic device 1 may be a terminal device with a computing function such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.

The electronic device 1 includes a processor 12, a memory 11, a camera device 13, a network interface 14, and a communication bus 15.

The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 11, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the readable storage medium may also be the external memory 11 of the electronic device 1, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1. , Secure Digital (SD) card, Flash Card, etc.

In this embodiment, the readable storage medium of the memory 11 is generally used to store the user behavior real-time multi-dimensional analysis program 10 and the like installed in the electronic device 1. The memory 11 can also be used to temporarily store data that has been output or will be output.

The processor 12 may be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip in some embodiments, and is used to run program codes or process data stored in the memory 11, for example, to execute user behaviors in real time. Multi-dimensional analysis program 10 etc.

The network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.

The communication bus 15 is used to realize the connection and communication between these components.

FIG. 1 only shows the electronic device 1 with the components 11-15, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.

Optionally, the electronic device 1 may also include a user interface. The user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with a voice recognition function, and a voice output device such as audio, earphones, etc. Optionally, the user interface may also include a standard wired interface and a wireless interface.

Optionally, the electronic device 1 may also include a display, and the display may also be referred to as a display screen or a display unit. In some embodiments, it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light Emitting Diode). Light-Emitting Diode, OLED) touch device, etc. The display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.

Optionally, the electronic device 1 further includes a touch sensor. The area provided by the touch sensor for the user to perform a touch operation is called a touch area. In addition, the touch sensor described here may be a resistive touch sensor, a capacitive touch sensor, or the like. Moreover, the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like. In addition, the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.

In addition, the area of the display of the electronic device 1 may be the same as or different from the area of the touch sensor. Optionally, the display and the touch sensor are stacked to form a touch display screen. The device detects the touch operation triggered by the user based on the touch screen.

Optionally, the electronic device 1 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.

In the device embodiment shown in FIG. 1, the memory 11 as a computer storage medium may include an operating system and a user behavior real-time multi-dimensional analysis program 10; the processor 12 executes the real-time multi-dimensional user behavior stored in the memory 11. When analyzing program 10, the following steps are implemented.

The user behavior data in the distributed message queue is extracted and processed through a real-time processing program to obtain the corresponding data set; the data set is sent to the data warehouse and the user behavior pre-analysis engine, and the data warehouse The user data set is stored, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained; the user data set and the corresponding pre-calculation result are saved to the distributed file system Within; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request; based on whether the user data set has been pre-calculated The pre-calculated query results are analyzed to obtain the query results and feedback through the corresponding analysis engine.

Preferably, the request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis; the pre-calculated dimensions include: App version, operating system, operating system version, channel, country, Province, city, network, operator, device brand, device model, screen size, language.

Preferably, the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set includes the following steps.

Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue; perform data analysis, extract key information, and information classification processing on the log data in sequence to obtain corresponding user equipment information and user event information , User session information, user activity information and new user information five data sets.

Preferably, the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set further includes the following steps.

Obtain user attribute data in the log data from the device topic of the distributed message queue; perform data analysis and extract key information processing on the user attribute data in sequence to obtain a data set corresponding to the user attribute data.

Preferably, the step of obtaining and feeding back the query result by analyzing the query result based on whether the user data set has been pre-calculated includes the following steps.

If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, the query result is obtained and returned; if the user data set is not pre-calculated, the request is sent to the user behavior analysis engine, and the user behavior analysis engine accesses the query through Hive LLAP query. Data warehouse, analyze and obtain query results and return them.

In other embodiments, the user behavior real-time multi-dimensional analysis program 10 may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by the processor 12 to complete the application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.

Referring to FIG. 2, it is a program module diagram of a preferred embodiment of the user behavior real-time multi-dimensional analysis program 10 in FIG. 1. Among them, the above-mentioned real-time multi-dimensional analysis program of user behavior is a real-time multi-dimensional analysis device of user behavior. Wherein, the user behavior real-time multi-dimensional analysis program 10 can be divided into: a data set acquisition unit 101, configured to extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set The pre-calculation result obtaining unit 102 is configured to send the data set to the data warehouse and the user behavior pre-analysis engine, store the user data set through the data warehouse, and use the user behavior pre-analysis engine to The user data is pre-calculated and the corresponding pre-calculation result is obtained; the pre-calculation result storage unit 103 is configured to save the user data set and the corresponding pre-calculation result in the distributed file system; the judgment unit 104 uses When the user behavior is analyzed, a query request is sent to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request; the result feedback unit 105 is configured to Whether the user data set has been pre-calculated query results, through the corresponding analysis engine analysis to obtain the query results and feedback.

In addition, this application also provides a real-time multi-dimensional analysis method of user behavior. Referring to FIG. 3, it is a flowchart of a preferred embodiment of a method for real-time multi-dimensional analysis of user behavior in this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.

In this embodiment, the method for real-time multi-dimensional analysis of user behavior includes the following steps.

S110: Extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set.

S120: Send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, and pre-compute the user data through the user behavior pre-analysis engine, And get the corresponding pre-calculation result.

S130: Save the user data set and the corresponding pre-calculation result in the distributed file system.

Among them, the pre-calculation process is mainly to compare the indicators of the user relationship, such as the number of active users, the number of new users, the number of reduced users, etc., according to the conventional or frequently used dimensions for real-time pre-calculation, and the pre-calculation results save. When the user queries, the pre-calculated result can be directly returned to the customer, instead of temporarily calculating the index from the original data.

S140: When analyzing the user behavior, send a query request to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request.

Among them, request types include overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, user path analysis, etc. The pre-calculated dimensions include App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, language, etc.

S150: Based on the query result of whether the user data set has been pre-calculated, analyze and obtain the query result through the corresponding analysis engine and feed it back.

Wherein, based on the query result of whether the user data set has been pre-calculated, the corresponding analysis engine is used to analyze and obtain the query result and feedback includes two situations: if the user data set has been pre-calculated, the request is sent To the user behavior pre-analysis engine, the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid (real-time multi-dimensional analysis system), and obtains the query based on the pre-calculation result analysis The result is returned; if the user data set is not pre-calculated, the request is sent to the user behavior analysis engine, and the user behavior analysis engine uses Hive LLAP (Long Life Cycle Structured Query Engine) to query and access the Data warehouse, analyze and obtain query results and return them.

Further, the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the corresponding data set is obtained in two cases: The first case: obtains the related data from the talking data topic of the distributed message queue. The log data corresponding to the user behavior data; data analysis, key information extraction, and information classification processing are sequentially performed on the log data to obtain corresponding user equipment information, user event information, user session information, user activity information, and new user information. Data collections.

Specifically, the user event information mainly records what button a certain app user clicks on the app at what time, for example, user A uses the Taobao app to add a piece of clothing to the shopping cart at 18:00. Activity information records when an app user clicks on which webpage on the app. The session information records the events and activities that a certain user triggers in an app session. New user information records when an app adds a new user and its basic user information.

Furthermore, the real-time batch processing program turtle sends 5 data sets to the data warehouse for storage through spark streaming (universal big data stream processing system). The real-time processing program flyfish uses kafka stream (distributed message queue stream processing engine) to send 5 data sets to the user behavior pre-analysis engine for pre-calculation, and obtain the corresponding pre-calculation results.

In the second case, the user attribute data in the log data is obtained from the device topic of the distributed message queue; data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data .

Further, the data set corresponding to the user attribute data is sent to the distributed real-time database. This process is completed by the real-time batch processing program boxfish, which is implemented using spark streaming. The user attribute data in the distributed real-time database will be written into the data warehouse at a certain time interval. This process can be realized by writing a Spark program.

Among them, Spark Streaming is an extension of Spark's core API, which can realize high-throughput, fault-tolerant real-time streaming data processing. Supports obtaining data from a variety of data sources, including Kafk, Flume, Twitter, ZeroMQ, Kinesis, and TCP sockets. After obtaining data from the data source, you can use advanced functions such as map, reduce, join, and window to process complex algorithms. Finally, the processing results can be stored in the file system, database and field dashboard.

In addition, a queue of computing resources is independently allocated on the resource manager of the user behavior analysis engine, and storage resources and computing resources are independently allocated on the resource manager of the user behavior pre-analysis engine.

Specifically, the user behavior analysis engine analyzes the ID corresponding to the user behavior according to the query request, and selects the corresponding resource queue according to the ID to run the analysis function, and the user behavior pre-analysis engine analyzes the corresponding user behavior according to the query request. The ID of the user behavior, and the analysis data is stored on the storage resource corresponding to the user behavior according to the ID, and the corresponding storage resource is selected to run the analysis function according to the ID.

As a specific example, in the real-time multi-dimensional analysis method of user behavior of the present application, a large amount of user behavior data is transmitted from the web server to the talking data topic and the device topic of the distributed message queue through a data transmission tool. Among them, the data transmission tool can be Flume, and the distributed message queue can be Kafka, as shown in Figure 4.

In Figure 4, the data warehouse is Hive, the distributed real-time database is HBase, and the user behavior pre-analysis engine uses Druid for pre-calculation. The data in the data warehouse, the data in the distributed real-time database and the pre-calculation results are finally stored in the distributed file system HDFS. When the platform user performs multi-dimensional analysis, it will send a query request to the query engine. The query engine will determine whether pre-calculation has been performed in advance according to the type and content of the request. If pre-calculation has been performed in advance, the request will be forwarded to the user behavior prediction. Analysis engine, the user behavior pre-analysis engine uses Druid to query the pre-calculated analysis results and then perform simple processing, and finally return the final result to the user. Otherwise, the request is forwarded to the user behavior analysis engine, and the user behavior analysis engine uses HiveLLAP to query user behavior data for analysis and return the analysis result to the user.

In order to achieve the above functions, a product ID is assigned to each application that accesses data. According to the product ID, the user behavior analysis engine resource manager (YARN) independently allocates the computing resource queue and the user behavior pre-analysis engine resource manager independently allocates storage resources and computing resources, so that different products are mutually exclusive Influence, enhance the isolation of resources. The user behavior analysis engine parses out the product ID of the query request, and selects different resource queues to run the analysis function according to the product ID. The user behavior pre-analysis engine will store the analysis data in the storage resource dedicated to the product ID according to the product ID. The storage resources between products are completely isolated, and at the same time, different resources will be selected to run the analysis function according to the product ID.

Corresponding to the above-mentioned real-time multi-dimensional analysis method of user behavior, this application also provides a real-time multi-dimensional analysis system of user behavior. The system structure block diagram is shown in FIG. 5.

In Figure 5, the connection interface includes the general interface JDBC (database connection), thrift (interface description language and binary communication protocol), REST (representational state transfer), etc. for connecting the system and outer applications, the data transmission tool Flume from the log The server log server obtains massive user behavior data and sends it to the distributed message queue kafka. The distributed message queue kafka is further connected to the data warehouse Hive and the distributed real-time database HBase through spark streaming. In addition, in the process of user behavior analysis, the data in the distributed real-time database HBase will be written into the data warehouse at a certain time interval, and the data warehouse is connected with a general interface to return query results to the outer application. In addition, Kafka and Druid are connected through the real-time processing program flyfish, and Druid is connected with the general interface, and the final user behavior analysis results are output from Druid or the data warehouse (with the general interface).

Further, FIG. 6 shows a schematic structure of a real-time multi-dimensional analysis system of user behavior.

It can be seen from Figure 6 that the real-time multi-dimensional analysis system of user behavior is divided into storage layer, resource management layer, and computing layer from bottom to top.

The storage layer includes: HDFS, Hive, HBase, Kafka, Druid.

The resource management layer includes: YARN and Druid.

The computing layer includes: Spark, Hive LLAP, Spark Streaming, Kafka Stream, Druid.

Others include ZooKeeper (reliable coordination system) and Flume.

The original data of user behavior is stored in Hive for detailed query. Druid provides real-time multi-dimensional analysis across three layers.

As shown in Figure 7, the system further includes three real-time stream processing programs, two batch processing programs and a tool program.

Among them, the real-time processing program flyfish is implemented using Kafka stream, which is responsible for extracting event, session, chunk, activity, and newdevice information from Kafka's talking data topic parsing log data in real time, storing it in Kafka related topics, and ingesting it by Druid Index Service.

In addition, the real-time batch processing program turtle is implemented using spark streaming, which is responsible for extracting event, session, chunk, activity, and newdevice information from Kafka's talking data topic parsing log data in real time and storing it in Hive related tables.

In addition, the real-time batch processing program boxfish is implemented using spark streaming, which is responsible for analyzing device information data from the device topic of kafka in real time and storing it in the device information table of hbase.

In addition, the batch processing program sardine belongs to the user behavior pre-analysis engine and is responsible for counting the cumulative number of devices.

In addition, Druid's tool program nemo is responsible for batch merging, batch offline, and batch deletion of Druid's segment files, which are used by Druid maintenance personnel.

The real-time multi-dimensional analysis method of user behavior of this application pre-calculates some analysis functions frequently used by users and retains the calculation results. When the user uses this analysis function, the calculated results are directly returned to the user. The pre-computing technology improves query response speed and system throughput, and improves user experience. At the same time, according to the data volume of each access application, the computing resources and storage resources are independently evaluated and allocated for each access application, and the resources are managed in a more fine-grained manner based on the application, which can improve resource utilization. In addition, Since the resources used by each access application are independent, the access applications do not affect each other, which enhances the isolation of resources.

In addition, the embodiment of the present application also proposes a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium includes a user behavior real-time multi-dimensional analysis program, and the following operations are implemented when the user behavior real-time multi-dimensional analysis program is executed by a processor.

Preferably, the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set includes the following operations.

Preferably, the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set further includes the following operations.

Preferably, the step of obtaining and feeding back the query result by analyzing the query result based on whether the user data set has been pre-calculated or not includes the following operations.

The specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned real-time multi-dimensional analysis method of user behavior and the electronic device, and will not be repeated here.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A real-time multi-dimensional analysis method of user behavior is applied to an electronic device, wherein the method includes:

Extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set;

Send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, pre-calculate the user data through the user behavior pre-analysis engine, and obtain The corresponding pre-calculation result;

Save the user data set and the corresponding pre-calculation result in the distributed file system;

When analyzing the user behavior, send a query request to a query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request;

Based on the query result of whether the user data set has been pre-calculated, the query result is obtained and fed back by analyzing the corresponding analysis engine.
The method for real-time multi-dimensional analysis of user behavior according to claim 1, wherein:

The request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis;

The pre-calculated dimensions include: App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, and language.
The method for real-time multi-dimensional analysis of user behavior according to claim 1, wherein:

The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining a corresponding data set includes:

Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue;

The log data is sequentially subjected to data analysis, key information extraction, and information classification processing to obtain five data sets corresponding to user equipment information, user event information, user session information, user activity information, and new user information.
The method for real-time multi-dimensional analysis of user behavior according to claim 3, wherein:

The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining the corresponding data set further includes:

Obtain user attribute data in the log data from the device topic of the distributed message queue;

Data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data.
The method for real-time multi-dimensional analysis of user behavior according to claim 1, wherein:

The step of obtaining and feeding back the query result based on whether the user data set has been pre-calculated and analyzed by the corresponding analysis engine includes:

If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, obtain the query result and return it;

If the user data set does not perform pre-calculation, the request is sent to the user behavior analysis engine, and the user behavior analysis engine passes Hive LLAP query accesses the data warehouse, analyzes and obtains the query result and returns it.
A real-time multi-dimensional analysis device for user behavior, which includes:

A data set acquisition unit, configured to extract and process user behavior data in the distributed message queue through a real-time processing program to acquire a corresponding data set;

The pre-calculation result acquisition unit is used to send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, and use the user behavior pre-analysis engine to User data is pre-calculated and the corresponding pre-calculation result is obtained;

A pre-calculation result storage unit, configured to save the user data set and the corresponding pre-calculation result in the distributed file system;

The judging unit is configured to send a query request to the query engine when analyzing the user behavior, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request;

The result feedback unit is configured to obtain and feed back the query result through analysis of the corresponding analysis engine based on whether the user data set has been pre-calculated query result.
The device for real-time multi-dimensional analysis of user behavior according to claim 6, wherein:

The data set obtaining unit is specifically used for:

Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue;

The log data is sequentially subjected to data analysis, key information extraction, and information classification processing to obtain five data sets corresponding to user equipment information, user event information, user session information, user activity information, and new user information.
The device for real-time multi-dimensional analysis of user behavior according to claim 7, wherein:

The data set obtaining unit is specifically further used for:

Obtain user attribute data in the log data from the device topic of the distributed message queue;

Data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data.
The device for real-time multi-dimensional analysis of user behavior according to claim 6, wherein:

The result feedback unit is specifically used for:

If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, obtain the query result and return it;

If the user data set does not perform pre-calculation, the request is sent to the user behavior analysis engine, and the user behavior analysis engine passes Hive LLAP query accesses the data warehouse, analyzes and obtains the query result and returns it.
The device for real-time multi-dimensional analysis of user behavior according to claim 6, wherein the request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis;

The pre-calculated dimensions include: App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, and language.
An electronic device, wherein the electronic device includes a memory and a processor, the memory and the processor are connected to each other, and the memory is used to store a computer program configured to be executed by the processor , The computer program is configured to execute a real-time multi-dimensional analysis method of user behavior:

Wherein, the method includes:

Extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set;

Send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, pre-calculate the user data through the user behavior pre-analysis engine, and obtain The corresponding pre-calculation result;

Save the user data set and the corresponding pre-calculation result in the distributed file system;

When analyzing the user behavior, send a query request to a query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request;

Based on the query result of whether the user data set has been pre-calculated, the query result is obtained and fed back by analyzing the corresponding analysis engine.
The electronic device according to claim 11, wherein:

The request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis;

The pre-calculated dimensions include: App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, and language.
The electronic device according to claim 11, wherein:

The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining a corresponding data set includes:

Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue;

The log data is sequentially subjected to data analysis, key information extraction, and information classification processing to obtain five data sets corresponding to user equipment information, user event information, user session information, user activity information, and new user information.
The electronic device according to claim 13, wherein:

The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining the corresponding data set further includes:

Obtain user attribute data in the log data from the device topic of the distributed message queue;

Data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data.
The electronic device according to claim 11, wherein:

The step of obtaining and feeding back the query result based on whether the user data set has been pre-calculated and analyzed by the corresponding analysis engine includes:

If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, obtain the query result and return it;

If the user data set does not perform pre-calculation, the request is sent to the user behavior analysis engine, and the user behavior analysis engine passes Hive LLAP query accesses the data warehouse, analyzes and obtains the query result and returns it.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it is used to implement a real-time multi-dimensional analysis method of user behavior, the method comprising the following steps :

Extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set;

Send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, pre-calculate the user data through the user behavior pre-analysis engine, and obtain The corresponding pre-calculation result;

Save the user data set and the corresponding pre-calculation result in the distributed file system;

When analyzing the user behavior, send a query request to a query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request;

Based on the query result of whether the user data set has been pre-calculated, the query result is obtained and fed back by analyzing the corresponding analysis engine.
The computer-readable storage medium according to claim 16, wherein:

The request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis;

The pre-calculated dimensions include: App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, and language.
The computer-readable storage medium according to claim 16, wherein:

The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining a corresponding data set includes:

Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue;

The log data is sequentially subjected to data analysis, key information extraction, and information classification processing to obtain five data sets corresponding to user equipment information, user event information, user session information, user activity information, and new user information.
The computer-readable storage medium according to claim 18, wherein:

The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining the corresponding data set further includes:

Obtain user attribute data in the log data from the device topic of the distributed message queue;

Data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data.
The computer-readable storage medium according to claim 16, wherein:

The step of obtaining and feeding back the query result based on whether the user data set has been pre-calculated and analyzed by the corresponding analysis engine includes:

If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, obtain the query result and return it;

If the user data set does not perform pre-calculation, the request is sent to the user behavior analysis engine, and the user behavior analysis engine passes Hive LLAP query accesses the data warehouse, analyzes and obtains the query result and returns it.