WO2021164253A1 - Method and device for real-time multidimensional analysis of user behaviors, and storage medium - Google Patents

Method and device for real-time multidimensional analysis of user behaviors, and storage medium Download PDF

Info

Publication number
WO2021164253A1
WO2021164253A1 PCT/CN2020/117423 CN2020117423W WO2021164253A1 WO 2021164253 A1 WO2021164253 A1 WO 2021164253A1 CN 2020117423 W CN2020117423 W CN 2020117423W WO 2021164253 A1 WO2021164253 A1 WO 2021164253A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
data
analysis
user behavior
data set
Prior art date
Application number
PCT/CN2020/117423
Other languages
French (fr)
Chinese (zh)
Inventor
禹蕾
张观成
万书武
李均
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021164253A1 publication Critical patent/WO2021164253A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Definitions

  • This application relates to the field of big data technology, and in particular to a method, device and computer-readable storage medium for real-time multi-dimensional analysis of user behavior.
  • Shence data has a similar big data real-time multi-dimensional analysis platform for user behavior.
  • the analysis platform of Shence data is mainly composed of impala, kudu, hdfs, and yarn.
  • the inventor realizes that it has the following drawbacks: each analysis must be performed based on a large amount of raw data, which causes some analysis functions frequently used by users to be very slow, and the user experience is very poor.
  • This application provides a real-time multi-dimensional analysis method, electronic device, and computer-readable storage medium for user behavior, the main purpose of which is to pre-calculate the analysis functions frequently used by users, which can improve query response speed and system throughput, and improve user experience .
  • one or more access applications can be processed at the same time, and each program can independently allocate resources on Druid and YARN, with high resource utilization and strong isolation.
  • the present application provides a real-time multi-dimensional analysis method of user behavior, applied to an electronic device, the method includes: extracting user behavior data in the distributed message queue through a real-time processing program and processing it to obtain the corresponding The data set; send the data set to the data warehouse and the user behavior pre-analysis engine, store the user data set through the data warehouse, and pre-analyze the user data through the user behavior pre-analysis engine Calculate and obtain the corresponding pre-calculation result; save the user data set and the corresponding pre-calculation result in the distributed file system; when the user behavior is analyzed, a query request is sent to the query engine, and the query The engine determines whether the user data set has been pre-calculated according to the type and content of the request; based on the query result of whether the user data set has been pre-calculated, the corresponding analysis engine analyzes and obtains the query result and feeds it back.
  • this application also provides a real-time multi-dimensional analysis device for user behavior, which includes: a data set acquisition unit for extracting and processing user behavior data in the distributed message queue through a real-time processing program, Obtain the corresponding data set; the pre-calculation result acquisition unit is used to send the data set to the data warehouse and the user behavior pre-analysis engine, store the user data set through the data warehouse, and pass the user behavior
  • the pre-analysis engine pre-calculates the user data and obtains the corresponding pre-calculation result;
  • the pre-calculation result storage unit is used to save the user data set and the corresponding pre-calculation result in the distributed file system;
  • the judgment unit Used to send a query request to the query engine when analyzing the user behavior, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request; the result feedback unit is used to base The query result of whether the user data set has been pre-calculated is analyzed through the corresponding analysis engine to obtain the query result and feedback
  • the present application also provides an electronic device, which is characterized by comprising: a processor; and a memory for storing program instructions of the processor; wherein the processor is configured to execute
  • the program instruction is used to execute the method of any one of the foregoing, the method includes: extracting and processing user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set; and sending the data set
  • the user data set is stored through the data warehouse, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained;
  • the user data set and the corresponding pre-calculation result are saved in the distributed file system; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine judges according to the type and content of the request Whether the user data set has been pre-calculated; based on the query result of whether the user data set has been pre-calculated, the query result is
  • the present application also provides a computer-readable storage medium
  • the computer-readable storage medium includes a user behavior real-time multi-dimensional analysis program, when the user behavior real-time multi-dimensional analysis program is executed by the processor ,
  • the user data set is stored through the data warehouse, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained;
  • the user data set and the corresponding pre-calculation result are saved in the distributed file system; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine judges the user according to the type and content of the request Whether the data set has been pre-calculated; based on the query result of whether the user data set has
  • the real-time multi-dimensional analysis method, electronic device, and computer-readable storage medium of user behavior proposed in this application pre-calculate some analysis functions frequently used by users and retain the calculation results.
  • the already calculated The result is returned to the user, so that the query response speed and system throughput are improved through the pre-calculation technology, and the user experience is improved.
  • the computing resources and storage resources are independently evaluated and allocated for each access application, and the resources are managed in a more fine-grained manner based on the application, which can improve resource utilization.
  • the resources used by each access application are independent, the access applications do not affect each other, and the isolation of resources can also be enhanced.
  • FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a method for real-time multi-dimensional analysis of user behavior in this application.
  • FIG. 2 is a schematic diagram of modules of a preferred embodiment of a real-time multi-dimensional analysis program of user behavior in FIG. 1.
  • FIG. 3 is a flowchart of a preferred embodiment of a method for real-time multi-dimensional analysis of user behavior in this application.
  • Figure 4 is a schematic block diagram of the method for real-time multi-dimensional analysis of user behavior in this application.
  • Figure 5 is a block diagram of a real-time multi-dimensional analysis system for user behavior of the application.
  • Figure 6 is a structural diagram of a real-time multi-dimensional analysis system for user behavior of the application.
  • Figure 7 is a schematic diagram of the program for real-time multi-dimensional analysis of user behavior in this application.
  • This application provides a real-time multi-dimensional analysis method of user behavior, which is applied to an electronic device 1.
  • FIG. 1 it is a schematic diagram of the application environment of the preferred embodiment of the method for real-time multi-dimensional analysis of user behavior of this application.
  • the electronic device 1 may be a terminal device with a computing function such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • the electronic device 1 includes a processor 12, a memory 11, a camera device 13, a network interface 14, and a communication bus 15.
  • the memory 11 includes at least one type of readable storage medium.
  • the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 11, and the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
  • the readable storage medium may also be the external memory 11 of the electronic device 1, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1. , Secure Digital (SD) card, Flash Card, etc.
  • SD Secure Digital
  • the readable storage medium of the memory 11 is generally used to store the user behavior real-time multi-dimensional analysis program 10 and the like installed in the electronic device 1.
  • the memory 11 can also be used to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip in some embodiments, and is used to run program codes or process data stored in the memory 11, for example, to execute user behaviors in real time. Multi-dimensional analysis program 10 etc.
  • CPU Central Processing Unit
  • microprocessor or other data processing chip in some embodiments, and is used to run program codes or process data stored in the memory 11, for example, to execute user behaviors in real time.
  • Multi-dimensional analysis program 10 etc.
  • the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.
  • the communication bus 15 is used to realize the connection and communication between these components.
  • FIG. 1 only shows the electronic device 1 with the components 11-15, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the electronic device 1 may also include a user interface.
  • the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with a voice recognition function, and a voice output device such as audio, earphones, etc.
  • the user interface may also include a standard wired interface and a wireless interface.
  • the electronic device 1 may also include a display, and the display may also be referred to as a display screen or a display unit.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light Emitting Diode). Light-Emitting Diode, OLED) touch device, etc.
  • the display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the electronic device 1 further includes a touch sensor.
  • the area provided by the touch sensor for the user to perform a touch operation is called a touch area.
  • the touch sensor described here may be a resistive touch sensor, a capacitive touch sensor, or the like.
  • the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like.
  • the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
  • the area of the display of the electronic device 1 may be the same as or different from the area of the touch sensor.
  • the display and the touch sensor are stacked to form a touch display screen. The device detects the touch operation triggered by the user based on the touch screen.
  • the electronic device 1 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
  • RF radio frequency
  • the memory 11 as a computer storage medium may include an operating system and a user behavior real-time multi-dimensional analysis program 10; the processor 12 executes the real-time multi-dimensional user behavior stored in the memory 11.
  • the processor 12 executes the real-time multi-dimensional user behavior stored in the memory 11.
  • the user behavior data in the distributed message queue is extracted and processed through a real-time processing program to obtain the corresponding data set; the data set is sent to the data warehouse and the user behavior pre-analysis engine, and the data warehouse
  • the user data set is stored, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained; the user data set and the corresponding pre-calculation result are saved to the distributed file system Within; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request; based on whether the user data set has been pre-calculated The pre-calculated query results are analyzed to obtain the query results and feedback through the corresponding analysis engine.
  • the request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis; the pre-calculated dimensions include: App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, language.
  • the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set includes the following steps.
  • log data corresponding to the user behavior data from the talking data topic of the distributed message queue; perform data analysis, extract key information, and information classification processing on the log data in sequence to obtain corresponding user equipment information and user event information , User session information, user activity information and new user information five data sets.
  • the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set further includes the following steps.
  • the step of obtaining and feeding back the query result by analyzing the query result based on whether the user data set has been pre-calculated includes the following steps.
  • the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, the query result is obtained and returned; if the user data set is not pre-calculated, the request is sent to the user behavior analysis engine, and the user behavior analysis engine accesses the query through Hive LLAP query. Data warehouse, analyze and obtain query results and return them.
  • the user behavior real-time multi-dimensional analysis program 10 may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by the processor 12 to complete the application.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
  • FIG. 2 it is a program module diagram of a preferred embodiment of the user behavior real-time multi-dimensional analysis program 10 in FIG. 1.
  • the above-mentioned real-time multi-dimensional analysis program of user behavior is a real-time multi-dimensional analysis device of user behavior.
  • the user behavior real-time multi-dimensional analysis program 10 can be divided into: a data set acquisition unit 101, configured to extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set
  • the pre-calculation result obtaining unit 102 is configured to send the data set to the data warehouse and the user behavior pre-analysis engine, store the user data set through the data warehouse, and use the user behavior pre-analysis engine to The user data is pre-calculated and the corresponding pre-calculation result is obtained;
  • the pre-calculation result storage unit 103 is configured to save the user data set and the corresponding pre-calculation result in the distributed file system;
  • the judgment unit 104 uses When the user behavior is analyzed, a query request is sent to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request;
  • the result feedback unit 105 is configured to Whether the user data set has been pre-calculated query results, through the corresponding analysis engine analysis to obtain
  • this application also provides a real-time multi-dimensional analysis method of user behavior.
  • FIG. 3 it is a flowchart of a preferred embodiment of a method for real-time multi-dimensional analysis of user behavior in this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the method for real-time multi-dimensional analysis of user behavior includes the following steps.
  • S110 Extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set.
  • S120 Send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, and pre-compute the user data through the user behavior pre-analysis engine, And get the corresponding pre-calculation result.
  • the pre-calculation process is mainly to compare the indicators of the user relationship, such as the number of active users, the number of new users, the number of reduced users, etc., according to the conventional or frequently used dimensions for real-time pre-calculation, and the pre-calculation results save.
  • the pre-calculated result can be directly returned to the customer, instead of temporarily calculating the index from the original data.
  • S140 When analyzing the user behavior, send a query request to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request.
  • request types include overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, user path analysis, etc.
  • the pre-calculated dimensions include App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, language, etc.
  • the corresponding analysis engine is used to analyze and obtain the query result and feedback includes two situations: if the user data set has been pre-calculated, the request is sent To the user behavior pre-analysis engine, the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid (real-time multi-dimensional analysis system), and obtains the query based on the pre-calculation result analysis The result is returned; if the user data set is not pre-calculated, the request is sent to the user behavior analysis engine, and the user behavior analysis engine uses Hive LLAP (Long Life Cycle Structured Query Engine) to query and access the Data warehouse, analyze and obtain query results and return them.
  • Hive LLAP Long Life Cycle Structured Query Engine
  • the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the corresponding data set is obtained in two cases:
  • the first case obtains the related data from the talking data topic of the distributed message queue.
  • the log data corresponding to the user behavior data; data analysis, key information extraction, and information classification processing are sequentially performed on the log data to obtain corresponding user equipment information, user event information, user session information, user activity information, and new user information. Data collections.
  • the user event information mainly records what button a certain app user clicks on the app at what time, for example, user A uses the Taobao app to add a piece of clothing to the shopping cart at 18:00.
  • Activity information records when an app user clicks on which webpage on the app.
  • the session information records the events and activities that a certain user triggers in an app session.
  • New user information records when an app adds a new user and its basic user information.
  • the real-time batch processing program turtle sends 5 data sets to the data warehouse for storage through spark streaming (universal big data stream processing system).
  • the real-time processing program flyfish uses kafka stream (distributed message queue stream processing engine) to send 5 data sets to the user behavior pre-analysis engine for pre-calculation, and obtain the corresponding pre-calculation results.
  • the user attribute data in the log data is obtained from the device topic of the distributed message queue; data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data .
  • the data set corresponding to the user attribute data is sent to the distributed real-time database.
  • This process is completed by the real-time batch processing program boxfish, which is implemented using spark streaming.
  • the user attribute data in the distributed real-time database will be written into the data warehouse at a certain time interval. This process can be realized by writing a Spark program.
  • Spark Streaming is an extension of Spark's core API, which can realize high-throughput, fault-tolerant real-time streaming data processing.
  • Supports obtaining data from a variety of data sources including Kafk, Flume, Twitter, ZeroMQ, Kinesis, and TCP sockets.
  • After obtaining data from the data source you can use advanced functions such as map, reduce, join, and window to process complex algorithms.
  • the processing results can be stored in the file system, database and field dashboard.
  • a queue of computing resources is independently allocated on the resource manager of the user behavior analysis engine, and storage resources and computing resources are independently allocated on the resource manager of the user behavior pre-analysis engine.
  • the user behavior analysis engine analyzes the ID corresponding to the user behavior according to the query request, and selects the corresponding resource queue according to the ID to run the analysis function
  • the user behavior pre-analysis engine analyzes the corresponding user behavior according to the query request.
  • the ID of the user behavior, and the analysis data is stored on the storage resource corresponding to the user behavior according to the ID, and the corresponding storage resource is selected to run the analysis function according to the ID.
  • a large amount of user behavior data is transmitted from the web server to the talking data topic and the device topic of the distributed message queue through a data transmission tool.
  • the data transmission tool can be Flume
  • the distributed message queue can be Kafka, as shown in Figure 4.
  • the data warehouse is Hive
  • the distributed real-time database is HBase
  • the user behavior pre-analysis engine uses Druid for pre-calculation.
  • the data in the data warehouse, the data in the distributed real-time database and the pre-calculation results are finally stored in the distributed file system HDFS.
  • the platform user When the platform user performs multi-dimensional analysis, it will send a query request to the query engine.
  • the query engine will determine whether pre-calculation has been performed in advance according to the type and content of the request. If pre-calculation has been performed in advance, the request will be forwarded to the user behavior prediction.
  • Analysis engine the user behavior pre-analysis engine uses Druid to query the pre-calculated analysis results and then perform simple processing, and finally return the final result to the user. Otherwise, the request is forwarded to the user behavior analysis engine, and the user behavior analysis engine uses HiveLLAP to query user behavior data for analysis and return the analysis result to the user.
  • a product ID is assigned to each application that accesses data.
  • the user behavior analysis engine resource manager (YARN) independently allocates the computing resource queue and the user behavior pre-analysis engine resource manager independently allocates storage resources and computing resources, so that different products are mutually exclusive Influence, enhance the isolation of resources.
  • the user behavior analysis engine parses out the product ID of the query request, and selects different resource queues to run the analysis function according to the product ID.
  • the user behavior pre-analysis engine will store the analysis data in the storage resource dedicated to the product ID according to the product ID. The storage resources between products are completely isolated, and at the same time, different resources will be selected to run the analysis function according to the product ID.
  • this application also provides a real-time multi-dimensional analysis system of user behavior.
  • the system structure block diagram is shown in FIG. 5.
  • connection interface includes the general interface JDBC (database connection), thrift (interface description language and binary communication protocol), REST (representational state transfer), etc. for connecting the system and outer applications, the data transmission tool Flume from the log
  • the server log server obtains massive user behavior data and sends it to the distributed message queue kafka.
  • the distributed message queue kafka is further connected to the data warehouse Hive and the distributed real-time database HBase through spark streaming.
  • the data in the distributed real-time database HBase will be written into the data warehouse at a certain time interval, and the data warehouse is connected with a general interface to return query results to the outer application.
  • Kafka and Druid are connected through the real-time processing program flyfish, and Druid is connected with the general interface, and the final user behavior analysis results are output from Druid or the data warehouse (with the general interface).
  • FIG. 6 shows a schematic structure of a real-time multi-dimensional analysis system of user behavior.
  • the storage layer includes: HDFS, Hive, HBase, Kafka, Druid.
  • the resource management layer includes: YARN and Druid.
  • the computing layer includes: Spark, Hive LLAP, Spark Streaming, Kafka Stream, Druid.
  • Druid provides real-time multi-dimensional analysis across three layers.
  • the system further includes three real-time stream processing programs, two batch processing programs and a tool program.
  • the real-time processing program flyfish is implemented using Kafka stream, which is responsible for extracting event, session, chunk, activity, and newdevice information from Kafka's talking data topic parsing log data in real time, storing it in Kafka related topics, and ingesting it by Druid Index Service.
  • real-time batch processing program turtle is implemented using spark streaming, which is responsible for extracting event, session, chunk, activity, and newdevice information from Kafka's talking data topic parsing log data in real time and storing it in Hive related tables.
  • the real-time batch processing program boxfish is implemented using spark streaming, which is responsible for analyzing device information data from the device topic of kafka in real time and storing it in the device information table of hbase.
  • the batch processing program sardine belongs to the user behavior pre-analysis engine and is responsible for counting the cumulative number of devices.
  • Druid's tool program nemo is responsible for batch merging, batch offline, and batch deletion of Druid's segment files, which are used by Druid maintenance personnel.
  • the real-time multi-dimensional analysis method of user behavior of this application pre-calculates some analysis functions frequently used by users and retains the calculation results. When the user uses this analysis function, the calculated results are directly returned to the user.
  • the pre-computing technology improves query response speed and system throughput, and improves user experience.
  • the computing resources and storage resources are independently evaluated and allocated for each access application, and the resources are managed in a more fine-grained manner based on the application, which can improve resource utilization.
  • the resources used by each access application are independent, the access applications do not affect each other, which enhances the isolation of resources.
  • the embodiment of the present application also proposes a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium includes a user behavior real-time multi-dimensional analysis program, and the following operations are implemented when the user behavior real-time multi-dimensional analysis program is executed by a processor.
  • the user behavior data in the distributed message queue is extracted and processed through a real-time processing program to obtain the corresponding data set; the data set is sent to the data warehouse and the user behavior pre-analysis engine, and the data warehouse
  • the user data set is stored, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained; the user data set and the corresponding pre-calculation result are saved to the distributed file system Within; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request; based on whether the user data set has been pre-calculated The pre-calculated query results are analyzed to obtain the query results and feedback through the corresponding analysis engine.
  • the request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis; the pre-calculated dimensions include: App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, language.
  • the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set includes the following operations.
  • log data corresponding to the user behavior data from the talking data topic of the distributed message queue; perform data analysis, extract key information, and information classification processing on the log data in sequence to obtain corresponding user equipment information and user event information , User session information, user activity information and new user information five data sets.
  • the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set further includes the following operations.
  • the step of obtaining and feeding back the query result by analyzing the query result based on whether the user data set has been pre-calculated or not includes the following operations.
  • the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, the query result is obtained and returned; if the user data set is not pre-calculated, the request is sent to the user behavior analysis engine, and the user behavior analysis engine accesses the query through Hive LLAP query. Data warehouse, analyze and obtain query results and return them.
  • the specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned real-time multi-dimensional analysis method of user behavior and the electronic device, and will not be repeated here.

Abstract

The present application relates to the technical field of big data. Provided are a method and device for real-time multidimensional analysis of user behaviors, and a storage medium. The method comprises: user behavior data in a distributed message queue is extracted via a real-time processing program and processed to acquire a corresponding dataset; the dataset is transmitted into a user behavior pre-analysis engine, precomputation is performed with respect to user data by the user behavior pre-analysis engine, and a corresponding precomputation result is acquired; the precomputation result corresponding to the user dataset is saved in a distributed file system; when analyzing a user behavior, a query request is transmitted to a query engine, the query engine determines, on the basis of the type of content of the request, whether precomputation is performed with respect to the user dataset; and, a query result is acquired by analysis via a corresponding analysis engine on the basis of the query result on whether precomputation is performed with respect to the user dataset and fed back. The present application increases query response speed, system throughput, and, resource utilization and isolation.

Description

用户行为实时多维度分析方法、装置及存储介质Method, device and storage medium for real-time multi-dimensional analysis of user behavior
本申请要求于2020年02月18日提交中国专利局、申请号为202010098994.0,发明名称为“用户行为实时多维度分析方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 18, 2020, the application number is 202010098994.0, and the invention title is "User Behavior Real-time Multi-Dimensional Analysis Method, Device and Storage Medium", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及大数据技术领域,尤其涉及一种用户行为实时多维度分析方法、装置及计算机可读存储介质。This application relates to the field of big data technology, and in particular to a method, device and computer-readable storage medium for real-time multi-dimensional analysis of user behavior.
背景技术Background technique
当前几乎所有的手机App都在不停的采集手机用户的行为数据,市场、产品、运营、管理者都需要对海量的用户行为数据进行实时多维的事件分析、漏斗分析、留存分析、分布分析、用户路径分析等。这些分析功能都必须建立在一个分布式的、可伸缩的、高可用的、实时的、计算和存储平台之上,因此大数据用户行为实时多维深度分析平台孕育而生,满足了海量的用户行为数据分析的需求。At present, almost all mobile apps are constantly collecting mobile user behavior data. Markets, products, operations, and managers need to conduct real-time and multi-dimensional event analysis, funnel analysis, retention analysis, distribution analysis, and User path analysis, etc. These analysis functions must be built on a distributed, scalable, highly available, real-time, computing and storage platform, so the big data user behavior real-time multi-dimensional in-depth analysis platform was born, which satisfies the massive user behavior Data analysis needs.
目前,广泛应用的神策数据有类似的大数据用户行为实时多维分析平台,神策数据的分析平台主要由impala、kudu、hdfs、yarn组成。发明人意识到,其存在以下缺陷:每次分析必须基于海量的原始数据进行,导致用户经常使用的一些分析功能非常慢,用户体验非常差。At present, the widely used Shence data has a similar big data real-time multi-dimensional analysis platform for user behavior. The analysis platform of Shence data is mainly composed of impala, kudu, hdfs, and yarn. The inventor realizes that it has the following drawbacks: each analysis must be performed based on a large amount of raw data, which causes some analysis functions frequently used by users to be very slow, and the user experience is very poor.
技术问题technical problem
本申请提供一种用户行为实时多维度分析方法、电子装置及计算机可读存储介质,其主要目的在于对用户经常使用的分析功能进行预计算,能够提高查询响应速度和系统吞吐量,改进用户体验。此外,还可以同时处理一到多个接入应用,每个程序都可以在Druid和YARN上独立分配资源,资源利用率高、隔离性强。This application provides a real-time multi-dimensional analysis method, electronic device, and computer-readable storage medium for user behavior, the main purpose of which is to pre-calculate the analysis functions frequently used by users, which can improve query response speed and system throughput, and improve user experience . In addition, one or more access applications can be processed at the same time, and each program can independently allocate resources on Druid and YARN, with high resource utilization and strong isolation.
技术解决方案Technical solutions
为实现上述目的,本申请提供一种用户行为实时多维度分析方法,应用于电子装置,所述方法包括:通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;将所述用户数据集及对应的预计算结果保存至分布式文件系统内;当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。To achieve the above objective, the present application provides a real-time multi-dimensional analysis method of user behavior, applied to an electronic device, the method includes: extracting user behavior data in the distributed message queue through a real-time processing program and processing it to obtain the corresponding The data set; send the data set to the data warehouse and the user behavior pre-analysis engine, store the user data set through the data warehouse, and pre-analyze the user data through the user behavior pre-analysis engine Calculate and obtain the corresponding pre-calculation result; save the user data set and the corresponding pre-calculation result in the distributed file system; when the user behavior is analyzed, a query request is sent to the query engine, and the query The engine determines whether the user data set has been pre-calculated according to the type and content of the request; based on the query result of whether the user data set has been pre-calculated, the corresponding analysis engine analyzes and obtains the query result and feeds it back.
为实现上述目的,本申请还提供一种用户行为实时多维度分析装置,其中,包括:数据集获取单元,用于通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;预计算结果获取单元,用于将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;预计算结果保存单元,用于将所述用户数据集及对应的预计算结果保存至分布式文件系统内;判断单元,用于当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;结果反馈单元,用于基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。In order to achieve the above objective, this application also provides a real-time multi-dimensional analysis device for user behavior, which includes: a data set acquisition unit for extracting and processing user behavior data in the distributed message queue through a real-time processing program, Obtain the corresponding data set; the pre-calculation result acquisition unit is used to send the data set to the data warehouse and the user behavior pre-analysis engine, store the user data set through the data warehouse, and pass the user behavior The pre-analysis engine pre-calculates the user data and obtains the corresponding pre-calculation result; the pre-calculation result storage unit is used to save the user data set and the corresponding pre-calculation result in the distributed file system; the judgment unit , Used to send a query request to the query engine when analyzing the user behavior, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request; the result feedback unit is used to base The query result of whether the user data set has been pre-calculated is analyzed through the corresponding analysis engine to obtain the query result and feedback.
此外,为实现上述目的,本申请还提供一种电子装置,其特征在于,包括:处理器;以及存储器,用于存储所述处理器的程序指令;其中,所述处理器配置为经由执行所述程序指令来执行上述任一项所述的方法,该方法包括:通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;将所述用户数据集及对应的预计算结果保存至分布式文件系统内;当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。In addition, in order to achieve the above object, the present application also provides an electronic device, which is characterized by comprising: a processor; and a memory for storing program instructions of the processor; wherein the processor is configured to execute The program instruction is used to execute the method of any one of the foregoing, the method includes: extracting and processing user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set; and sending the data set In the data warehouse and the user behavior pre-analysis engine, the user data set is stored through the data warehouse, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained; The user data set and the corresponding pre-calculation result are saved in the distributed file system; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine judges according to the type and content of the request Whether the user data set has been pre-calculated; based on the query result of whether the user data set has been pre-calculated, the query result is obtained by analyzing the corresponding analysis engine and fed back.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中包括用户行为实时多维度分析程序,所述用户行为实时多维度分析程序被处理器执行时,实现如下所述的用户行为实时多维度分析方法中的步骤:通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;将所述用户数据集及对应的预计算结果保存至分布式文件系统内;当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。In addition, in order to achieve the above object, the present application also provides a computer-readable storage medium, the computer-readable storage medium includes a user behavior real-time multi-dimensional analysis program, when the user behavior real-time multi-dimensional analysis program is executed by the processor , To implement the steps in the method for real-time multi-dimensional analysis of user behavior as described below: extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set; and send the data set to In the data warehouse and the user behavior pre-analysis engine, the user data set is stored through the data warehouse, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained; The user data set and the corresponding pre-calculation result are saved in the distributed file system; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine judges the user according to the type and content of the request Whether the data set has been pre-calculated; based on the query result of whether the user data set has been pre-calculated, the query result is obtained and fed back through the corresponding analysis engine analysis.
有益效果Beneficial effect
本申请提出的用户行为实时多维度分析方法、电子装置及计算机可读存储介质,对用户经常使用的一些分析功能进行预计算并保留计算结果,当用户使用此分析功能时直接将已经计算好的结果返回给用户,从而经过预计算技术提高查询响应速度和系统吞吐量,改进用户体验。同时,根据每个接入应用的数据量大小,为每个接入应用独立的评估、分配计算资源和存储资源,以应用为单位对资源进行更加细粒度的管理,能够提高资源利用率,此外由于每个接入应用使用的资源都是独立的,接入应用之间互不影响,还能够增强资源的隔离性。The real-time multi-dimensional analysis method, electronic device, and computer-readable storage medium of user behavior proposed in this application pre-calculate some analysis functions frequently used by users and retain the calculation results. When the user uses this analysis function, the already calculated The result is returned to the user, so that the query response speed and system throughput are improved through the pre-calculation technology, and the user experience is improved. At the same time, according to the data volume of each access application, the computing resources and storage resources are independently evaluated and allocated for each access application, and the resources are managed in a more fine-grained manner based on the application, which can improve resource utilization. In addition, Since the resources used by each access application are independent, the access applications do not affect each other, and the isolation of resources can also be enhanced.
附图说明Description of the drawings
图1为本申请用户行为实时多维度分析方法较佳实施例的应用环境示意图。FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a method for real-time multi-dimensional analysis of user behavior in this application.
图2为图1中用户行为实时多维度分析程序较佳实施例的模块示意图。FIG. 2 is a schematic diagram of modules of a preferred embodiment of a real-time multi-dimensional analysis program of user behavior in FIG. 1.
图3为本申请用户行为实时多维度分析方法较佳实施例的流程图。FIG. 3 is a flowchart of a preferred embodiment of a method for real-time multi-dimensional analysis of user behavior in this application.
图4为本申请用户行为实时多维度分析方法的原理框图。Figure 4 is a schematic block diagram of the method for real-time multi-dimensional analysis of user behavior in this application.
图5为本申请用户行为实时多维度分析系统的框图。Figure 5 is a block diagram of a real-time multi-dimensional analysis system for user behavior of the application.
图6为本申请用户行为实时多维度分析系统的结构图。Figure 6 is a structural diagram of a real-time multi-dimensional analysis system for user behavior of the application.
图7为本申请用户行为实时多维度分析的程序原理图。Figure 7 is a schematic diagram of the program for real-time multi-dimensional analysis of user behavior in this application.
本发明的实施方式Embodiments of the present invention
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请提供一种用户行为实时多维度分析方法,应用于一种电子装置1。参照图1所示,为本申请用户行为实时多维度分析方法较佳实施例的应用环境示意图。This application provides a real-time multi-dimensional analysis method of user behavior, which is applied to an electronic device 1. Referring to FIG. 1, it is a schematic diagram of the application environment of the preferred embodiment of the method for real-time multi-dimensional analysis of user behavior of this application.
在本实施例中,电子装置1可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端设备。In this embodiment, the electronic device 1 may be a terminal device with a computing function such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
该电子装置1包括:处理器12、存储器11、摄像装置13、网络接口14及通信总线15。The electronic device 1 includes a processor 12, a memory 11, a camera device 13, a network interface 14, and a communication bus 15.
存储器11包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器11等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子装置1的内部存储单元,例如该电子装置1的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子装置1的外部存储器11,例如所述电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 11, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the readable storage medium may also be the external memory 11 of the electronic device 1, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1. , Secure Digital (SD) card, Flash Card, etc.
在本实施例中,所述存储器11的可读存储介质通常用于存储安装于所述电子装置1的用户行为实时多维度分析程序10等。所述存储器11还可以用于暂时地存储已经输出或者将要输出的数据。In this embodiment, the readable storage medium of the memory 11 is generally used to store the user behavior real-time multi-dimensional analysis program 10 and the like installed in the electronic device 1. The memory 11 can also be used to temporarily store data that has been output or will be output.
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit, CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行用户行为实时多维度分析程序10等。The processor 12 may be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip in some embodiments, and is used to run program codes or process data stored in the memory 11, for example, to execute user behaviors in real time. Multi-dimensional analysis program 10 etc.
网络接口14可选地可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子装置1与其他电子设备之间建立通信连接。The network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.
通信总线15用于实现这些组件之间的连接通信。The communication bus 15 is used to realize the connection and communication between these components.
图1仅示出了具有组件11-15的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。FIG. 1 only shows the electronic device 1 with the components 11-15, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
可选地,该电子装置1还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 1 may also include a user interface. The user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with a voice recognition function, and a voice output device such as audio, earphones, etc. Optionally, the user interface may also include a standard wired interface and a wireless interface.
可选地,该电子装置1还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子装置1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may also include a display, and the display may also be referred to as a display screen or a display unit. In some embodiments, it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light Emitting Diode). Light-Emitting Diode, OLED) touch device, etc. The display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
可选地,该电子装置1还包括触摸传感器。所述触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外,这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,所述触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,所述触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。Optionally, the electronic device 1 further includes a touch sensor. The area provided by the touch sensor for the user to perform a touch operation is called a touch area. In addition, the touch sensor described here may be a resistive touch sensor, a capacitive touch sensor, or the like. Moreover, the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like. In addition, the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
此外,该电子装置1的显示器的面积可以与所述触摸传感器的面积相同,也可以不同。可选地,将显示器与所述触摸传感器层叠设置,以形成触摸显示屏。该装置基于触摸显示屏侦测用户触发的触控操作。In addition, the area of the display of the electronic device 1 may be the same as or different from the area of the touch sensor. Optionally, the display and the touch sensor are stacked to form a touch display screen. The device detects the touch operation triggered by the user based on the touch screen.
可选地,该电子装置1还可以包括射频(Radio Frequency,RF)电路,传感器、音频电路等等,在此不再赘述。Optionally, the electronic device 1 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
在图1所示的装置实施例中,作为一种计算机存储介质的存储器11中可以包括操作系统、以及用户行为实时多维度分析程序10;处理器12执行存储器11中存储的用户行为实时多维度分析程序10时实现如下步骤。In the device embodiment shown in FIG. 1, the memory 11 as a computer storage medium may include an operating system and a user behavior real-time multi-dimensional analysis program 10; the processor 12 executes the real-time multi-dimensional user behavior stored in the memory 11. When analyzing program 10, the following steps are implemented.
通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;将所述用户数据集及对应的预计算结果保存至分布式文件系统内;当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。The user behavior data in the distributed message queue is extracted and processed through a real-time processing program to obtain the corresponding data set; the data set is sent to the data warehouse and the user behavior pre-analysis engine, and the data warehouse The user data set is stored, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained; the user data set and the corresponding pre-calculation result are saved to the distributed file system Within; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request; based on whether the user data set has been pre-calculated The pre-calculated query results are analyzed to obtain the query results and feedback through the corresponding analysis engine.
优选地,所述请求类型包括:总体指标、事件分析、漏斗分析、留存分析、分布分析、用户路径分析;所述预计算的维度包括:App版本、操作系统、操作系统版本、渠道、国家、省份、城市、网络、运营商、设备品牌、设备型号、屏幕大小、语言。Preferably, the request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis; the pre-calculated dimensions include: App version, operating system, operating system version, channel, country, Province, city, network, operator, device brand, device model, screen size, language.
优选地,所述通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集的步骤包括以下步骤。Preferably, the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set includes the following steps.
从分布式消息队列的talking data主题中获取与所述用户行为数据对应的日志数据;对所述日志数据依次进行数据解析、抽取关键信息和信息分类处理,获取对应的用户设备信息、用户事件信息、用户会话信息、用户活动信息和新用户信息五个数据集合。Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue; perform data analysis, extract key information, and information classification processing on the log data in sequence to obtain corresponding user equipment information and user event information , User session information, user activity information and new user information five data sets.
优选地,所述通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集的步骤还包括以下步骤。Preferably, the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set further includes the following steps.
从分布式消息队列的device主题中获取所述日志数据中的用户属性数据;对所述用户属性数据依次进行数据解析和抽取关键信息处理,获取与用户属性数据对应的数据集。Obtain user attribute data in the log data from the device topic of the distributed message queue; perform data analysis and extract key information processing on the user attribute data in sequence to obtain a data set corresponding to the user attribute data.
优选地,所述基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈的步骤包括以下步骤。Preferably, the step of obtaining and feeding back the query result by analyzing the query result based on whether the user data set has been pre-calculated includes the following steps.
若所述用户数据集已经执行预计算,则将所述请求发送至所述用户行为预分析引擎,所述用户行为预分析引擎通过Druid在所述分布式文件系统内查询所述预计算结果,并基于所述预计算结果分析获取查询结果并返回;若所述用户数据集没有执行预计算,则将所述请求发送至用户行为分析引擎,所述用户行为分析引擎通过Hive LLAP查询访问所述数据仓库,分析获取查询结果并返回。If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, the query result is obtained and returned; if the user data set is not pre-calculated, the request is sent to the user behavior analysis engine, and the user behavior analysis engine accesses the query through Hive LLAP query. Data warehouse, analyze and obtain query results and return them.
在其他实施例中,用户行为实时多维度分析程序10还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由处理器12执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。In other embodiments, the user behavior real-time multi-dimensional analysis program 10 may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by the processor 12 to complete the application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
参照图2所示,为图1中用户行为实时多维度分析程序10较佳实施例的程序模块图。其中,上述用户行为实时多维度分析程序即用户行为实时多维度分析装置。其中,所述用户行为实时多维度分析程序10可以被分割为:数据集获取单元101,用于通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;预计算结果获取单元102,用于将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;预计算结果保存单元103,用于将所述用户数据集及对应的预计算结果保存至分布式文件系统内;判断单元104,用于当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;结果反馈单元105,用于基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。Referring to FIG. 2, it is a program module diagram of a preferred embodiment of the user behavior real-time multi-dimensional analysis program 10 in FIG. 1. Among them, the above-mentioned real-time multi-dimensional analysis program of user behavior is a real-time multi-dimensional analysis device of user behavior. Wherein, the user behavior real-time multi-dimensional analysis program 10 can be divided into: a data set acquisition unit 101, configured to extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set The pre-calculation result obtaining unit 102 is configured to send the data set to the data warehouse and the user behavior pre-analysis engine, store the user data set through the data warehouse, and use the user behavior pre-analysis engine to The user data is pre-calculated and the corresponding pre-calculation result is obtained; the pre-calculation result storage unit 103 is configured to save the user data set and the corresponding pre-calculation result in the distributed file system; the judgment unit 104 uses When the user behavior is analyzed, a query request is sent to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request; the result feedback unit 105 is configured to Whether the user data set has been pre-calculated query results, through the corresponding analysis engine analysis to obtain the query results and feedback.
此外,本申请还提供一种用户行为实时多维度分析方法。参照图3所示,为本申请用户行为实时多维度分析方法较佳实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。In addition, this application also provides a real-time multi-dimensional analysis method of user behavior. Referring to FIG. 3, it is a flowchart of a preferred embodiment of a method for real-time multi-dimensional analysis of user behavior in this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.
在本实施例中,用户行为实时多维度分析方法包括以下步骤。In this embodiment, the method for real-time multi-dimensional analysis of user behavior includes the following steps.
S110:通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集。S110: Extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set.
S120:将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果。S120: Send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, and pre-compute the user data through the user behavior pre-analysis engine, And get the corresponding pre-calculation result.
S130:将所述用户数据集及对应的预计算结果保存至分布式文件系统内。S130: Save the user data set and the corresponding pre-calculation result in the distributed file system.
其中,预计算的过程主要是对用户比较关系的指标,比如活跃用户数、新增用户数、减少用户数等,按着常规或者经常使用的维度进行实时的预先计算,并将预计算结果进行保存。等用户查询时可直接将预先计算的结果返回给客户,而不是从原始数据中临时进行指标计算。Among them, the pre-calculation process is mainly to compare the indicators of the user relationship, such as the number of active users, the number of new users, the number of reduced users, etc., according to the conventional or frequently used dimensions for real-time pre-calculation, and the pre-calculation results save. When the user queries, the pre-calculated result can be directly returned to the customer, instead of temporarily calculating the index from the original data.
S140:当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算。S140: When analyzing the user behavior, send a query request to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request.
其中,请求类型包括总体指标、事件分析、漏斗分析、留存分析、分布分析、用户路径分析等。预计算的维度包括App版本、操作系统、操作系统版本、渠道、国家、省份、城市、网络、运营商、设备品牌、设备型号、屏幕大小、语言等。Among them, request types include overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, user path analysis, etc. The pre-calculated dimensions include App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, language, etc.
S150:基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。S150: Based on the query result of whether the user data set has been pre-calculated, analyze and obtain the query result through the corresponding analysis engine and feed it back.
其中,基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈包括两种情况:若所述用户数据集已经执行预计算,则将所述请求发送至所述用户行为预分析引擎,所述用户行为预分析引擎通过Druid(实时多维度分析系统)在所述分布式文件系统内查询所述预计算结果,并基于所述预计算结果分析获取查询结果并返回;若所述用户数据集没有执行预计算,则将所述请求发送至用户行为分析引擎,所述用户行为分析引擎通过Hive LLAP(长生命周期的结构化查询引擎)查询访问所述数据仓库,分析获取查询结果并返回。Wherein, based on the query result of whether the user data set has been pre-calculated, the corresponding analysis engine is used to analyze and obtain the query result and feedback includes two situations: if the user data set has been pre-calculated, the request is sent To the user behavior pre-analysis engine, the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid (real-time multi-dimensional analysis system), and obtains the query based on the pre-calculation result analysis The result is returned; if the user data set is not pre-calculated, the request is sent to the user behavior analysis engine, and the user behavior analysis engine uses Hive LLAP (Long Life Cycle Structured Query Engine) to query and access the Data warehouse, analyze and obtain query results and return them.
进一步地,通过实时处理程序提取分布式消息队列中的用户行为数据并进行处理,获取对应的数据集也包括两种情况:第一种情况:从分布式消息队列的talking data主题中获取与所述用户行为数据对应的日志数据;对所述日志数据依次进行数据解析、抽取关键信息和信息分类处理,获取对应的用户设备信息、用户事件信息、用户会话信息、用户活动信息和新用户信息五个数据集合。Further, the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the corresponding data set is obtained in two cases: The first case: obtains the related data from the talking data topic of the distributed message queue. The log data corresponding to the user behavior data; data analysis, key information extraction, and information classification processing are sequentially performed on the log data to obtain corresponding user equipment information, user event information, user session information, user activity information, and new user information. Data collections.
具体地,用户事件信息主要记录某个app用户什么时间点在app上点击了什么按钮,例如:用户A在18点使用淘宝app向购物车中加入了一个衣服。活动信息记录某个app用户什么时候在app上点击了什么网页。会话信息记录某个用户在一次app的会话中触发了那些事件和活动。新用户信息记录了某个app在什么时候增加了一个新用户及其用户基本信息。Specifically, the user event information mainly records what button a certain app user clicks on the app at what time, for example, user A uses the Taobao app to add a piece of clothing to the shopping cart at 18:00. Activity information records when an app user clicks on which webpage on the app. The session information records the events and activities that a certain user triggers in an app session. New user information records when an app adds a new user and its basic user information.
进一步地,实时批量处理程序turtle通过spark streaming(通用大数据流处理系统)将5个数据集合送入数据仓库存储。实时处理程序flyfish使用kafka stream(分布式消息队列流处理引擎)将5个数据集合送入用户行为预分析引擎做预计算,并获取对应的预计算结果。Furthermore, the real-time batch processing program turtle sends 5 data sets to the data warehouse for storage through spark streaming (universal big data stream processing system). The real-time processing program flyfish uses kafka stream (distributed message queue stream processing engine) to send 5 data sets to the user behavior pre-analysis engine for pre-calculation, and obtain the corresponding pre-calculation results.
第二种情况,从分布式消息队列的device主题中获取所述日志数据中的用户属性数据;对所述用户属性数据依次进行数据解析和抽取关键信息处理,获取与用户属性数据对应的数据集。In the second case, the user attribute data in the log data is obtained from the device topic of the distributed message queue; data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data .
进一步地,将与用户属性数据对应的数据集并送入分布式实时数据库,这个过程由实时批量处理程序boxfish负责完成,使用spark streaming来实现。分布式实时数据库中的用户属性数据会以一定的时间间隔写入数据仓库中,这个过程可通过编写Spark程序来实现。Further, the data set corresponding to the user attribute data is sent to the distributed real-time database. This process is completed by the real-time batch processing program boxfish, which is implemented using spark streaming. The user attribute data in the distributed real-time database will be written into the data warehouse at a certain time interval. This process can be realized by writing a Spark program.
其中,Spark Streaming是Spark核心API的一个扩展,可以实现高吞吐量的、具备容错机制的实时流数据的处理。支持从多种数据源获取数据,包括Kafk、Flume、Twitter、ZeroMQ、Kinesis以及TCP sockets,从数据源获取数据之后,可以使用诸如map、reduce、join和window等高级函数进行复杂算法的处理。最后还可以将处理结果存储到文件系统,数据库和现场仪表盘中。Among them, Spark Streaming is an extension of Spark's core API, which can realize high-throughput, fault-tolerant real-time streaming data processing. Supports obtaining data from a variety of data sources, including Kafk, Flume, Twitter, ZeroMQ, Kinesis, and TCP sockets. After obtaining data from the data source, you can use advanced functions such as map, reduce, join, and window to process complex algorithms. Finally, the processing results can be stored in the file system, database and field dashboard.
另外,在所述用户行为分析引擎的资源管理器上独立分配计算资源列队,在所述用户行为预分析引擎的资源管理器上独立分配存储资源和计算资源。In addition, a queue of computing resources is independently allocated on the resource manager of the user behavior analysis engine, and storage resources and computing resources are independently allocated on the resource manager of the user behavior pre-analysis engine.
具体地,所述用户行为分析引擎根据所述查询请求解析对应用户行为的ID,并根据所述ID选择对应的资源列队来运行分析功能,所述用户行为预分析引擎根据所述查询请求解析对应用户行为的ID,并根据所述ID将分析数据存储在对应用户行为的存储资源上,并根据所述ID选择对应的存储资源运行分析功能。Specifically, the user behavior analysis engine analyzes the ID corresponding to the user behavior according to the query request, and selects the corresponding resource queue according to the ID to run the analysis function, and the user behavior pre-analysis engine analyzes the corresponding user behavior according to the query request. The ID of the user behavior, and the analysis data is stored on the storage resource corresponding to the user behavior according to the ID, and the corresponding storage resource is selected to run the analysis function according to the ID.
作为具体示例,在本申请的用户行为实时多维度分析方法中,海量的用户行为数据通过数据传输工具从web服务器传输到分布式消息队列的talking data主题和device主题中。其中,数据传输工具可以采用Flume,分布式消息队列可采用Kafka,如图4所示。As a specific example, in the real-time multi-dimensional analysis method of user behavior of the present application, a large amount of user behavior data is transmitted from the web server to the talking data topic and the device topic of the distributed message queue through a data transmission tool. Among them, the data transmission tool can be Flume, and the distributed message queue can be Kafka, as shown in Figure 4.
在图4中,数据仓库是Hive,分布式实时数据库是HBase,用户行为预分析引擎使用Druid做预计算。数据仓库内的数据、分布式实时数据库内的数据和预计算结果最终被保存在分布式文件系统HDFS中。当平台用户进行多维分析的时候会向查询引擎发出查询请求,查询引擎会根据请求的类型和内容判断是否已经预先进行过预计算,如果已经预先进行过预计算,就将请求转发给用户行为预分析引擎,用户行为预分析引擎使用Druid查询预计算的分析结果再进行简单加工,最后将最终结果返回给用户。否则将请求转发给用户行为分析引擎,用户行为分析引擎使用HiveLLAP查询用户行为数据进行分析并将分析结果返回给用户。In Figure 4, the data warehouse is Hive, the distributed real-time database is HBase, and the user behavior pre-analysis engine uses Druid for pre-calculation. The data in the data warehouse, the data in the distributed real-time database and the pre-calculation results are finally stored in the distributed file system HDFS. When the platform user performs multi-dimensional analysis, it will send a query request to the query engine. The query engine will determine whether pre-calculation has been performed in advance according to the type and content of the request. If pre-calculation has been performed in advance, the request will be forwarded to the user behavior prediction. Analysis engine, the user behavior pre-analysis engine uses Druid to query the pre-calculated analysis results and then perform simple processing, and finally return the final result to the user. Otherwise, the request is forwarded to the user behavior analysis engine, and the user behavior analysis engine uses HiveLLAP to query user behavior data for analysis and return the analysis result to the user.
为了实现上述功能,对接入数据的每个应用分配一个产品ID。根据产品ID,在用户行为分析引擎的资源管理器(YARN)上独立分配计算资源队列和在用户行为预分析引擎的资源管理器上独立的分配存储资源和计算资源,这样不同产品之间互不影响,增强了资源的隔离性。用户行为分析引擎解析出查询请求的产品ID,会根据产品ID选择不同的资源队列来运行分析功能。用户行为预分析引擎会根据产品ID将分析数据存储在产品ID专属的存储资源上,产品之间的存储资源是完全隔离的,同时会根据产品ID选择不同的资源来运行分析功能。In order to achieve the above functions, a product ID is assigned to each application that accesses data. According to the product ID, the user behavior analysis engine resource manager (YARN) independently allocates the computing resource queue and the user behavior pre-analysis engine resource manager independently allocates storage resources and computing resources, so that different products are mutually exclusive Influence, enhance the isolation of resources. The user behavior analysis engine parses out the product ID of the query request, and selects different resource queues to run the analysis function according to the product ID. The user behavior pre-analysis engine will store the analysis data in the storage resource dedicated to the product ID according to the product ID. The storage resources between products are completely isolated, and at the same time, different resources will be selected to run the analysis function according to the product ID.
对应上述用户行为实时多维度分析方法,本申请还提供一种用户行为实时多维度分析系统,系统结构框图如图5所示。Corresponding to the above-mentioned real-time multi-dimensional analysis method of user behavior, this application also provides a real-time multi-dimensional analysis system of user behavior. The system structure block diagram is shown in FIG. 5.
在图5中,连接接口包括通用接口JDBC(数据库连接)、thrift(接口描述语言和二进制通讯协议)、REST(表述性状态转移)等用于连接系统和外层应用,数据传输工具Flume从日志服务器log server获取海量用户行为数据,然后发送至分布式消息队列kafka中,分布式消息队列kafka通过spark streaming进一步与数据仓库Hive和分布式实时数据库HBase连接。此外,在用户行为分析用过程中,分布式实时数据库HBase中的数据会以一定的时间间隔写入数据仓库中,数据仓库与通用接口连接,用于将查询结果返回至外层应用。此外,Kafka与Druid通过实时处理程序flyfish连接,Druid与通用接口连接,最终的用户行为分析结果从Druid或者数据仓库(配合通用接口)输出。In Figure 5, the connection interface includes the general interface JDBC (database connection), thrift (interface description language and binary communication protocol), REST (representational state transfer), etc. for connecting the system and outer applications, the data transmission tool Flume from the log The server log server obtains massive user behavior data and sends it to the distributed message queue kafka. The distributed message queue kafka is further connected to the data warehouse Hive and the distributed real-time database HBase through spark streaming. In addition, in the process of user behavior analysis, the data in the distributed real-time database HBase will be written into the data warehouse at a certain time interval, and the data warehouse is connected with a general interface to return query results to the outer application. In addition, Kafka and Druid are connected through the real-time processing program flyfish, and Druid is connected with the general interface, and the final user behavior analysis results are output from Druid or the data warehouse (with the general interface).
进一步地,图6示出了用户行为实时多维度分析系统的示意结构。Further, FIG. 6 shows a schematic structure of a real-time multi-dimensional analysis system of user behavior.
通过图6可知,该用户行为实时多维度分析系统从下到上分为:存储层、资源管理层、计算层。It can be seen from Figure 6 that the real-time multi-dimensional analysis system of user behavior is divided into storage layer, resource management layer, and computing layer from bottom to top.
存储层包含:HDFS、Hive、HBase、Kafka、Druid。The storage layer includes: HDFS, Hive, HBase, Kafka, Druid.
资源管理层包含:YARN、Druid。The resource management layer includes: YARN and Druid.
计算层包含:Spark、Hive LLAP、Spark Streaming、Kafka Stream、Druid。The computing layer includes: Spark, Hive LLAP, Spark Streaming, Kafka Stream, Druid.
其他包含ZooKeeper(可靠协调系统)、Flume。Others include ZooKeeper (reliable coordination system) and Flume.
用户行为的原始数据存储在Hive中以实现明细查询,Druid横跨三个层提供实时的多维分析。The original data of user behavior is stored in Hive for detailed query. Druid provides real-time multi-dimensional analysis across three layers.
如图7所示,该系统进一步包含三个实时流处理程序、两个批处理程序和一个工具程序。As shown in Figure 7, the system further includes three real-time stream processing programs, two batch processing programs and a tool program.
其中,实时处理程序flyfish使用kafka stream实现,负责实时的从kafka的talking data主题解析日志数据提取event、session、chunk、activity、newdevice信息存入kafka的相关主题,并由Druid Index Service摄入。Among them, the real-time processing program flyfish is implemented using Kafka stream, which is responsible for extracting event, session, chunk, activity, and newdevice information from Kafka's talking data topic parsing log data in real time, storing it in Kafka related topics, and ingesting it by Druid Index Service.
另外,实时批量处理程序turtle使用spark streaming实现,负责实时的从kafka的talking data主题解析日志数据提取event、session、chunk、activity、newdevice信息存入Hive相关表中。In addition, the real-time batch processing program turtle is implemented using spark streaming, which is responsible for extracting event, session, chunk, activity, and newdevice information from Kafka's talking data topic parsing log data in real time and storing it in Hive related tables.
另外,实时批量处理程序boxfish使用spark streaming实现,负责实时的从kafka的device主题解析设备信息数据并存入hbase的device信息表中。In addition, the real-time batch processing program boxfish is implemented using spark streaming, which is responsible for analyzing device information data from the device topic of kafka in real time and storing it in the device information table of hbase.
另外,批量处理程序sardine属于用户行为预分析引擎,负责统计累计的设备数量。In addition, the batch processing program sardine belongs to the user behavior pre-analysis engine and is responsible for counting the cumulative number of devices.
另外,Druid的工具程序nemo负责Druid的segment文件批量合并、批量下线、批量删除,由Druid的维护人员使用。In addition, Druid's tool program nemo is responsible for batch merging, batch offline, and batch deletion of Druid's segment files, which are used by Druid maintenance personnel.
本申请的用户行为实时多维度分析方法,对用户经常使用的一些分析功能进行预计算并保留计算结果,当用户使用此分析功能时直接将已经计算好的结果返回给用户。经过预计算技术提高了查询响应速度和系统吞吐量,改进了用户体验。同时,根据每个接入应用的数据量大小,为每个接入应用独立的评估、分配计算资源和存储资源,以应用为单位对资源进行更加细粒度的管理,能够提高资源利用率,此外由于每个接入应用使用的资源都是独立的,接入应用之间互不影响,增强了资源的隔离性。The real-time multi-dimensional analysis method of user behavior of this application pre-calculates some analysis functions frequently used by users and retains the calculation results. When the user uses this analysis function, the calculated results are directly returned to the user. The pre-computing technology improves query response speed and system throughput, and improves user experience. At the same time, according to the data volume of each access application, the computing resources and storage resources are independently evaluated and allocated for each access application, and the resources are managed in a more fine-grained manner based on the application, which can improve resource utilization. In addition, Since the resources used by each access application are independent, the access applications do not affect each other, which enhances the isolation of resources.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性的。所述计算机可读存储介质中包括用户行为实时多维度分析程序,所述用户行为实时多维度分析程序被处理器执行时实现如下操作。In addition, the embodiment of the present application also proposes a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium includes a user behavior real-time multi-dimensional analysis program, and the following operations are implemented when the user behavior real-time multi-dimensional analysis program is executed by a processor.
通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;将所述用户数据集及对应的预计算结果保存至分布式文件系统内;当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。The user behavior data in the distributed message queue is extracted and processed through a real-time processing program to obtain the corresponding data set; the data set is sent to the data warehouse and the user behavior pre-analysis engine, and the data warehouse The user data set is stored, the user data is pre-calculated through the user behavior pre-analysis engine, and the corresponding pre-calculation result is obtained; the user data set and the corresponding pre-calculation result are saved to the distributed file system Within; when the user behavior is analyzed, a query request is sent to the query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request; based on whether the user data set has been pre-calculated The pre-calculated query results are analyzed to obtain the query results and feedback through the corresponding analysis engine.
优选地,所述请求类型包括:总体指标、事件分析、漏斗分析、留存分析、分布分析、用户路径分析;所述预计算的维度包括:App版本、操作系统、操作系统版本、渠道、国家、省份、城市、网络、运营商、设备品牌、设备型号、屏幕大小、语言。Preferably, the request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis; the pre-calculated dimensions include: App version, operating system, operating system version, channel, country, Province, city, network, operator, device brand, device model, screen size, language.
优选地,所述通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集的步骤包括如下操作。Preferably, the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set includes the following operations.
从分布式消息队列的talking data主题中获取与所述用户行为数据对应的日志数据;对所述日志数据依次进行数据解析、抽取关键信息和信息分类处理,获取对应的用户设备信息、用户事件信息、用户会话信息、用户活动信息和新用户信息五个数据集合。Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue; perform data analysis, extract key information, and information classification processing on the log data in sequence to obtain corresponding user equipment information and user event information , User session information, user activity information and new user information five data sets.
优选地,所述通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集的步骤还包括如下操作。Preferably, the user behavior data in the distributed message queue is extracted and processed through a real-time processing program, and the step of obtaining the corresponding data set further includes the following operations.
从分布式消息队列的device主题中获取所述日志数据中的用户属性数据;对所述用户属性数据依次进行数据解析和抽取关键信息处理,获取与用户属性数据对应的数据集。Obtain user attribute data in the log data from the device topic of the distributed message queue; perform data analysis and extract key information processing on the user attribute data in sequence to obtain a data set corresponding to the user attribute data.
优选地,所述基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈的步骤包括如下操作。Preferably, the step of obtaining and feeding back the query result by analyzing the query result based on whether the user data set has been pre-calculated or not includes the following operations.
若所述用户数据集已经执行预计算,则将所述请求发送至所述用户行为预分析引擎,所述用户行为预分析引擎通过Druid在所述分布式文件系统内查询所述预计算结果,并基于所述预计算结果分析获取查询结果并返回;若所述用户数据集没有执行预计算,则将所述请求发送至用户行为分析引擎,所述用户行为分析引擎通过Hive LLAP查询访问所述数据仓库,分析获取查询结果并返回。If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, the query result is obtained and returned; if the user data set is not pre-calculated, the request is sent to the user behavior analysis engine, and the user behavior analysis engine accesses the query through Hive LLAP query. Data warehouse, analyze and obtain query results and return them.
本申请之计算机可读存储介质的具体实施方式与上述用户行为实时多维度分析方法、电子装置的具体实施方式大致相同,在此不再赘述。The specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned real-time multi-dimensional analysis method of user behavior and the electronic device, and will not be repeated here.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (20)

  1. 一种用户行为实时多维度分析方法,应用于电子装置,其中,所述方法包括:A real-time multi-dimensional analysis method of user behavior is applied to an electronic device, wherein the method includes:
    通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;Extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set;
    将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;Send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, pre-calculate the user data through the user behavior pre-analysis engine, and obtain The corresponding pre-calculation result;
    将所述用户数据集及对应的预计算结果保存至分布式文件系统内;Save the user data set and the corresponding pre-calculation result in the distributed file system;
    当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;When analyzing the user behavior, send a query request to a query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request;
    基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。Based on the query result of whether the user data set has been pre-calculated, the query result is obtained and fed back by analyzing the corresponding analysis engine.
  2. 根据权利要求1所述的用户行为实时多维度分析方法,其中,The method for real-time multi-dimensional analysis of user behavior according to claim 1, wherein:
    所述请求类型包括:总体指标、事件分析、漏斗分析、留存分析、分布分析、用户路径分析;The request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis;
    所述预计算的维度包括:App版本、操作系统、操作系统版本、渠道、国家、省份、城市、网络、运营商、设备品牌、设备型号、屏幕大小、语言。The pre-calculated dimensions include: App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, and language.
  3. 根据权利要求1所述的用户行为实时多维度分析方法,其中,The method for real-time multi-dimensional analysis of user behavior according to claim 1, wherein:
    所述通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集的步骤包括:The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining a corresponding data set includes:
    从分布式消息队列的talking data主题中获取与所述用户行为数据对应的日志数据;Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue;
    对所述日志数据依次进行数据解析、抽取关键信息和信息分类处理,获取对应的用户设备信息、用户事件信息、用户会话信息、用户活动信息和新用户信息五个数据集合。The log data is sequentially subjected to data analysis, key information extraction, and information classification processing to obtain five data sets corresponding to user equipment information, user event information, user session information, user activity information, and new user information.
  4. 根据权利要求3所述的用户行为实时多维度分析方法,其中,The method for real-time multi-dimensional analysis of user behavior according to claim 3, wherein:
    所述通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集的步骤还包括:The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining the corresponding data set further includes:
    从分布式消息队列的device主题中获取所述日志数据中的用户属性数据;Obtain user attribute data in the log data from the device topic of the distributed message queue;
    对所述用户属性数据依次进行数据解析和抽取关键信息处理,获取与用户属性数据对应的数据集。Data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data.
  5. 根据权利要求1所述的用户行为实时多维度分析方法,其中,The method for real-time multi-dimensional analysis of user behavior according to claim 1, wherein:
    所述基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈的步骤包括:The step of obtaining and feeding back the query result based on whether the user data set has been pre-calculated and analyzed by the corresponding analysis engine includes:
    若所述用户数据集已经执行预计算,则将所述请求发送至所述用户行为预分析引擎,所述用户行为预分析引擎通过Druid在所述分布式文件系统内查询所述预计算结果,并基于所述预计算结果分析获取查询结果并返回;If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, obtain the query result and return it;
    若所述用户数据集没有执行预计算,则将所述请求发送至用户行为分析引擎,所述用户行为分析引擎通过Hive LLAP查询访问所述数据仓库,分析获取查询结果并返回。If the user data set does not perform pre-calculation, the request is sent to the user behavior analysis engine, and the user behavior analysis engine passes Hive LLAP query accesses the data warehouse, analyzes and obtains the query result and returns it.
  6. 一种用户行为实时多维度分析装置,其中,包括:A real-time multi-dimensional analysis device for user behavior, which includes:
    数据集获取单元,用于通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;A data set acquisition unit, configured to extract and process user behavior data in the distributed message queue through a real-time processing program to acquire a corresponding data set;
    预计算结果获取单元,用于将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;The pre-calculation result acquisition unit is used to send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, and use the user behavior pre-analysis engine to User data is pre-calculated and the corresponding pre-calculation result is obtained;
    预计算结果保存单元,用于将所述用户数据集及对应的预计算结果保存至分布式文件系统内;A pre-calculation result storage unit, configured to save the user data set and the corresponding pre-calculation result in the distributed file system;
    判断单元,用于当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;The judging unit is configured to send a query request to the query engine when analyzing the user behavior, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request;
    结果反馈单元,用于基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。The result feedback unit is configured to obtain and feed back the query result through analysis of the corresponding analysis engine based on whether the user data set has been pre-calculated query result.
  7. 根据权利要求6所述的用户行为实时多维度分析装置,其中,The device for real-time multi-dimensional analysis of user behavior according to claim 6, wherein:
    所述数据集获取单元具体用于:The data set obtaining unit is specifically used for:
    从分布式消息队列的talking data主题中获取与所述用户行为数据对应的日志数据;Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue;
    对所述日志数据依次进行数据解析、抽取关键信息和信息分类处理,获取对应的用户设备信息、用户事件信息、用户会话信息、用户活动信息和新用户信息五个数据集合。The log data is sequentially subjected to data analysis, key information extraction, and information classification processing to obtain five data sets corresponding to user equipment information, user event information, user session information, user activity information, and new user information.
  8. 根据权利要求7所述的用户行为实时多维度分析装置,其中,The device for real-time multi-dimensional analysis of user behavior according to claim 7, wherein:
    所述数据集获取单元具体还用于:The data set obtaining unit is specifically further used for:
    从分布式消息队列的device主题中获取所述日志数据中的用户属性数据;Obtain user attribute data in the log data from the device topic of the distributed message queue;
    对所述用户属性数据依次进行数据解析和抽取关键信息处理,获取与用户属性数据对应的数据集。Data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data.
  9. 根据权利要求6所述的用户行为实时多维度分析装置,其中,The device for real-time multi-dimensional analysis of user behavior according to claim 6, wherein:
    所述结果反馈单元具体用于:The result feedback unit is specifically used for:
    若所述用户数据集已经执行预计算,则将所述请求发送至所述用户行为预分析引擎,所述用户行为预分析引擎通过Druid在所述分布式文件系统内查询所述预计算结果,并基于所述预计算结果分析获取查询结果并返回;If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, obtain the query result and return it;
    若所述用户数据集没有执行预计算,则将所述请求发送至用户行为分析引擎,所述用户行为分析引擎通过Hive LLAP查询访问所述数据仓库,分析获取查询结果并返回。If the user data set does not perform pre-calculation, the request is sent to the user behavior analysis engine, and the user behavior analysis engine passes Hive LLAP query accesses the data warehouse, analyzes and obtains the query result and returns it.
  10. 根据权利要求6所述的用户行为实时多维度分析装置,其中,所述请求类型包括:总体指标、事件分析、漏斗分析、留存分析、分布分析、用户路径分析;The device for real-time multi-dimensional analysis of user behavior according to claim 6, wherein the request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis;
    所述预计算的维度包括:App版本、操作系统、操作系统版本、渠道、国家、省份、城市、网络、运营商、设备品牌、设备型号、屏幕大小、语言。The pre-calculated dimensions include: App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, and language.
  11. 一种电子装置,其中,所述电子装置包括存储器和处理器,所述存储器和所述处理器相互连接,所述存储器用于存储计算机程序,所述计算机程序被配置为由所述处理器执行,所述计算机程序配置用于执行一种用户行为实时多维度分析方法:An electronic device, wherein the electronic device includes a memory and a processor, the memory and the processor are connected to each other, and the memory is used to store a computer program configured to be executed by the processor , The computer program is configured to execute a real-time multi-dimensional analysis method of user behavior:
    其中,所述方法包括:Wherein, the method includes:
    通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;Extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set;
    将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;Send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, pre-calculate the user data through the user behavior pre-analysis engine, and obtain The corresponding pre-calculation result;
    将所述用户数据集及对应的预计算结果保存至分布式文件系统内;Save the user data set and the corresponding pre-calculation result in the distributed file system;
    当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;When analyzing the user behavior, send a query request to a query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request;
    基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。Based on the query result of whether the user data set has been pre-calculated, the query result is obtained and fed back by analyzing the corresponding analysis engine.
  12. 根据权利要求11所述的电子装置,其中,The electronic device according to claim 11, wherein:
    所述请求类型包括:总体指标、事件分析、漏斗分析、留存分析、分布分析、用户路径分析;The request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis;
    所述预计算的维度包括:App版本、操作系统、操作系统版本、渠道、国家、省份、城市、网络、运营商、设备品牌、设备型号、屏幕大小、语言。The pre-calculated dimensions include: App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, and language.
  13. 根据权利要求11所述的电子装置,其中,The electronic device according to claim 11, wherein:
    所述通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集的步骤包括:The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining a corresponding data set includes:
    从分布式消息队列的talking data主题中获取与所述用户行为数据对应的日志数据;Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue;
    对所述日志数据依次进行数据解析、抽取关键信息和信息分类处理,获取对应的用户设备信息、用户事件信息、用户会话信息、用户活动信息和新用户信息五个数据集合。The log data is sequentially subjected to data analysis, key information extraction, and information classification processing to obtain five data sets corresponding to user equipment information, user event information, user session information, user activity information, and new user information.
  14. 根据权利要求13所述的电子装置,其中,The electronic device according to claim 13, wherein:
    所述通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集的步骤还包括:The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining the corresponding data set further includes:
    从分布式消息队列的device主题中获取所述日志数据中的用户属性数据;Obtain user attribute data in the log data from the device topic of the distributed message queue;
    对所述用户属性数据依次进行数据解析和抽取关键信息处理,获取与用户属性数据对应的数据集。Data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data.
  15. 根据权利要求11所述的电子装置,其中,The electronic device according to claim 11, wherein:
    所述基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈的步骤包括:The step of obtaining and feeding back the query result based on whether the user data set has been pre-calculated and analyzed by the corresponding analysis engine includes:
    若所述用户数据集已经执行预计算,则将所述请求发送至所述用户行为预分析引擎,所述用户行为预分析引擎通过Druid在所述分布式文件系统内查询所述预计算结果,并基于所述预计算结果分析获取查询结果并返回;If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, obtain the query result and return it;
    若所述用户数据集没有执行预计算,则将所述请求发送至用户行为分析引擎,所述用户行为分析引擎通过Hive LLAP查询访问所述数据仓库,分析获取查询结果并返回。If the user data set does not perform pre-calculation, the request is sent to the user behavior analysis engine, and the user behavior analysis engine passes Hive LLAP query accesses the data warehouse, analyzes and obtains the query result and returns it.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时用于实现一种用户行为实时多维度分析方法,所述方法包括以下步骤:A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it is used to implement a real-time multi-dimensional analysis method of user behavior, the method comprising the following steps :
    通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集;Extract and process user behavior data in the distributed message queue through a real-time processing program to obtain a corresponding data set;
    将所述数据集发送至数据仓库和用户行为预分析引擎中,通过所述数据仓库对所述用户数据集进行存储,通过所述用户行为预分析引擎对所述用户数据进行预计算,并获取对应的预计算结果;Send the data set to a data warehouse and a user behavior pre-analysis engine, store the user data set through the data warehouse, pre-calculate the user data through the user behavior pre-analysis engine, and obtain The corresponding pre-calculation result;
    将所述用户数据集及对应的预计算结果保存至分布式文件系统内;Save the user data set and the corresponding pre-calculation result in the distributed file system;
    当对所述用户行为进行分析时,向查询引擎发出查询请求,所述查询引擎根据所述请求的类型和内容判断用户数据集是否已进行预计算;When analyzing the user behavior, send a query request to a query engine, and the query engine determines whether the user data set has been pre-calculated according to the type and content of the request;
    基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈。Based on the query result of whether the user data set has been pre-calculated, the query result is obtained and fed back by analyzing the corresponding analysis engine.
  17. 根据权利要求16所述的计算机可读存储介质,其中,The computer-readable storage medium according to claim 16, wherein:
    所述请求类型包括:总体指标、事件分析、漏斗分析、留存分析、分布分析、用户路径分析;The request types include: overall indicators, event analysis, funnel analysis, retention analysis, distribution analysis, and user path analysis;
    所述预计算的维度包括:App版本、操作系统、操作系统版本、渠道、国家、省份、城市、网络、运营商、设备品牌、设备型号、屏幕大小、语言。The pre-calculated dimensions include: App version, operating system, operating system version, channel, country, province, city, network, operator, device brand, device model, screen size, and language.
  18. 根据权利要求16所述的计算机可读存储介质,其中,The computer-readable storage medium according to claim 16, wherein:
    所述通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集的步骤包括:The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining a corresponding data set includes:
    从分布式消息队列的talking data主题中获取与所述用户行为数据对应的日志数据;Obtain log data corresponding to the user behavior data from the talking data topic of the distributed message queue;
    对所述日志数据依次进行数据解析、抽取关键信息和信息分类处理,获取对应的用户设备信息、用户事件信息、用户会话信息、用户活动信息和新用户信息五个数据集合。The log data is sequentially subjected to data analysis, key information extraction, and information classification processing to obtain five data sets corresponding to user equipment information, user event information, user session information, user activity information, and new user information.
  19. 根据权利要求18所述的计算机可读存储介质,其中,The computer-readable storage medium according to claim 18, wherein:
    所述通过实时处理程序提取所述分布式消息队列中的用户行为数据并进行处理,获取对应的数据集的步骤还包括:The step of extracting and processing user behavior data in the distributed message queue through a real-time processing program, and obtaining the corresponding data set further includes:
    从分布式消息队列的device主题中获取所述日志数据中的用户属性数据;Obtain user attribute data in the log data from the device topic of the distributed message queue;
    对所述用户属性数据依次进行数据解析和抽取关键信息处理,获取与用户属性数据对应的数据集。Data analysis and key information extraction processing are sequentially performed on the user attribute data to obtain a data set corresponding to the user attribute data.
  20. 根据权利要求16所述的计算机可读存储介质,其中,The computer-readable storage medium according to claim 16, wherein:
    所述基于所述用户数据集是否已进行预计算的查询结果,通过对应的分析引擎分析获取查询结果并反馈的步骤包括:The step of obtaining and feeding back the query result based on whether the user data set has been pre-calculated and analyzed by the corresponding analysis engine includes:
    若所述用户数据集已经执行预计算,则将所述请求发送至所述用户行为预分析引擎,所述用户行为预分析引擎通过Druid在所述分布式文件系统内查询所述预计算结果,并基于所述预计算结果分析获取查询结果并返回;If the user data set has been pre-calculated, the request is sent to the user behavior pre-analysis engine, and the user behavior pre-analysis engine queries the pre-calculation result in the distributed file system through Druid, And based on the pre-calculation result analysis, obtain the query result and return it;
    若所述用户数据集没有执行预计算,则将所述请求发送至用户行为分析引擎,所述用户行为分析引擎通过Hive LLAP查询访问所述数据仓库,分析获取查询结果并返回。If the user data set does not perform pre-calculation, the request is sent to the user behavior analysis engine, and the user behavior analysis engine passes Hive LLAP query accesses the data warehouse, analyzes and obtains the query result and returns it.
PCT/CN2020/117423 2020-02-18 2020-09-24 Method and device for real-time multidimensional analysis of user behaviors, and storage medium WO2021164253A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010098994.0A CN111311326A (en) 2020-02-18 2020-02-18 User behavior real-time multidimensional analysis method and device and storage medium
CN202010098994.0 2020-02-18

Publications (1)

Publication Number Publication Date
WO2021164253A1 true WO2021164253A1 (en) 2021-08-26

Family

ID=71151051

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117423 WO2021164253A1 (en) 2020-02-18 2020-09-24 Method and device for real-time multidimensional analysis of user behaviors, and storage medium

Country Status (2)

Country Link
CN (1) CN111311326A (en)
WO (1) WO2021164253A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417817A (en) * 2021-12-30 2022-04-29 中国电信股份有限公司 Session information cutting method and device
CN114996306A (en) * 2022-08-04 2022-09-02 北京首信科技股份有限公司 Data management method and system based on multiple dimensions
CN115689844A (en) * 2023-01-04 2023-02-03 成都中轨轨道设备有限公司 Intelligent data management platform based on multidimensional engine, construction method and application

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311326A (en) * 2020-02-18 2020-06-19 平安科技(深圳)有限公司 User behavior real-time multidimensional analysis method and device and storage medium
CN111930857A (en) * 2020-07-08 2020-11-13 成都双链科技有限责任公司 Real-time online data analysis processing method based on graph calculation
CN112182031B (en) * 2020-10-12 2023-06-13 浙江大华技术股份有限公司 Data query method and device, storage medium and electronic device
CN112269808B (en) * 2020-11-17 2024-03-19 携程旅游网络技术(上海)有限公司 Engine query control method, system, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302831A (en) * 2014-07-18 2016-02-03 上海星红桉数据科技有限公司 High-speed calculation analysis method based on mass user behavior data
CN107944059A (en) * 2017-12-29 2018-04-20 深圳市中润四方信息技术有限公司西安分公司 A kind of user behavior analysis method and system based on stream calculation
CN110399395A (en) * 2018-04-18 2019-11-01 福建天泉教育科技有限公司 Speedup query method, storage medium based on precomputation
US20200014768A1 (en) * 2018-07-03 2020-01-09 Naver Corporation Apparatus for analysing online user behavior and method for the same
CN111311326A (en) * 2020-02-18 2020-06-19 平安科技(深圳)有限公司 User behavior real-time multidimensional analysis method and device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684352B (en) * 2018-12-29 2020-12-01 江苏满运软件科技有限公司 Data analysis system, data analysis method, storage medium, and electronic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302831A (en) * 2014-07-18 2016-02-03 上海星红桉数据科技有限公司 High-speed calculation analysis method based on mass user behavior data
CN107944059A (en) * 2017-12-29 2018-04-20 深圳市中润四方信息技术有限公司西安分公司 A kind of user behavior analysis method and system based on stream calculation
CN110399395A (en) * 2018-04-18 2019-11-01 福建天泉教育科技有限公司 Speedup query method, storage medium based on precomputation
US20200014768A1 (en) * 2018-07-03 2020-01-09 Naver Corporation Apparatus for analysing online user behavior and method for the same
CN111311326A (en) * 2020-02-18 2020-06-19 平安科技(深圳)有限公司 User behavior real-time multidimensional analysis method and device and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417817A (en) * 2021-12-30 2022-04-29 中国电信股份有限公司 Session information cutting method and device
CN114417817B (en) * 2021-12-30 2023-05-16 中国电信股份有限公司 Session information cutting method and device
CN114996306A (en) * 2022-08-04 2022-09-02 北京首信科技股份有限公司 Data management method and system based on multiple dimensions
CN115689844A (en) * 2023-01-04 2023-02-03 成都中轨轨道设备有限公司 Intelligent data management platform based on multidimensional engine, construction method and application
CN115689844B (en) * 2023-01-04 2023-03-28 成都中轨轨道设备有限公司 Intelligent data management platform based on multidimensional engine and construction method

Also Published As

Publication number Publication date
CN111311326A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2021164253A1 (en) Method and device for real-time multidimensional analysis of user behaviors, and storage medium
CN106649670B (en) Data monitoring method and device based on stream computing
CN108009236B (en) Big data query method, system, computer and storage medium
CN107102941B (en) Test case generation method and device
US11775501B2 (en) Trace and span sampling and analysis for instrumented software
EP3436984A1 (en) Managed function execution for processing data streams in real time
WO2021169268A1 (en) Data processing method, apparatus and device, and storage medium
CN112162965B (en) Log data processing method, device, computer equipment and storage medium
CN107341033A (en) A kind of data statistical approach, device, electronic equipment and storage medium
CN107967347B (en) Batch data processing method, server, system and storage medium
US20190205342A1 (en) Identifying and structuring related data
CN108228322B (en) Distributed link tracking and analyzing method, server and global scheduler
WO2020155651A1 (en) Method and device for storing and querying log information
WO2022048422A1 (en) Data processing method and apparatus, device, and storage medium
CN111694866A (en) Data searching and storing method, data searching system, data searching device, data searching equipment and data searching medium
US10171606B2 (en) System and method for providing data as a service (DaaS) in real-time
WO2021232292A1 (en) Log data processing method and related product
US10324933B2 (en) Technique for processing query in database management system
CN109426597A (en) Application performance monitoring method, device, equipment, system and storage medium
CN113010542B (en) Service data processing method, device, computer equipment and storage medium
CN107888445B (en) Method and device for analyzing performance state, computer equipment and storage medium
CN113821514A (en) Data splitting method and device, electronic equipment and readable storage medium
CN113190237A (en) Data processing method, system and device
WO2019096207A1 (en) Image processing method and computer device, and computer readable storage medium
WO2022184079A1 (en) Document processing method and apparatus, and device and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919710

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 11/10/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20919710

Country of ref document: EP

Kind code of ref document: A1