CN117763060A

CN117763060A - Data processing method, device, equipment and storage medium based on user behavior

Info

Publication number: CN117763060A
Application number: CN202410051939.4A
Authority: CN
Inventors: 单宣峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2024-01-12
Filing date: 2024-01-12
Publication date: 2024-03-26

Abstract

The disclosure provides a data processing method, device, equipment and storage medium based on user behaviors, and relates to the field of artificial intelligence, in particular to the field of big data. The specific implementation scheme is as follows: acquiring a plurality of pieces of user behavior data; the user behavior data represents the behavior of the user on the preset application, and the user behavior data comprises at least two behavior fields which represent the scene of the behavior of the user; according to at least two behavior fields, carrying out aggregation treatment on a plurality of pieces of user behavior data to obtain aggregated data; wherein, the aggregated data characterizes the behavior done in any scenario; and storing the aggregated data based on a pre-constructed query database. The scene is represented by the combination of different behavior fields, so that the accurate description of the scene is realized, the user behavior data is aggregated according to the behavior fields, the data volume is reduced, and the data processing efficiency is improved.

Description

Data processing method, device, equipment and storage medium based on user behavior

Technical Field

The present disclosure relates to the field of big data in the field of artificial intelligence, and in particular, to a method, an apparatus, a device, and a storage medium for processing data based on user behavior.

Background

In the process of using the application program by the user, user behavior data can be generated according to the behavior of the user, and the user behavior data represents the operation performed by the application program in various scenes, for example, the operation of starting or closing the application program by the user can be recorded.

Different users are corresponding to the user behavior data of the users, and the data quantity is more. Massive user behavior data influence the processing and analysis efficiency of the data, and further influence the performance analysis of the application program.

Disclosure of Invention

The disclosure provides a data processing method, device, equipment and storage medium based on user behaviors.

According to a first aspect of the present disclosure, there is provided a data processing method based on user behavior, including:

acquiring a plurality of pieces of user behavior data; the user behavior data represents the behavior of the user on the preset application, and comprises at least two behavior fields which represent the scene of the behavior of the user;

according to the at least two behavior fields, carrying out aggregation processing on the plurality of pieces of user behavior data to obtain aggregated data; wherein the aggregated data characterizes behavior done in any scenario;

And storing the aggregated data based on a pre-constructed query database.

According to a second aspect of the present disclosure, there is provided a data processing apparatus based on user behaviour, comprising:

an acquisition unit configured to acquire a plurality of pieces of user behavior data; the user behavior data represents the behavior of the user on the preset application, and comprises at least two behavior fields which represent the scene of the behavior of the user;

the aggregation unit is used for carrying out aggregation processing on the plurality of pieces of user behavior data according to the at least two behavior fields to obtain aggregated data; wherein the aggregated data characterizes behavior done in any scenario;

and the storage unit is used for storing the aggregated data based on a pre-constructed query database.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor;

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising: a computer program which, when executed by a processor, implements the method of the first aspect.

According to the technology disclosed by the invention, the data processing efficiency is improved, the data volume is reduced, and the storage space is saved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram of a data processing method based on user behavior according to an embodiment of the present disclosure;

FIG. 2 is a flow diagram of a data processing method based on user behavior according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of data aggregation provided in accordance with an embodiment of the present disclosure;

FIG. 4 is a flow chart of a data processing method based on user behavior according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of a data processing apparatus based on user behavior provided in accordance with an embodiment of the present disclosure;

FIG. 6 is a block diagram of a data processing apparatus based on user behavior provided in accordance with an embodiment of the present disclosure;

FIG. 7 is a block diagram of an electronic device for implementing a user behavior-based data processing method of an embodiment of the present disclosure;

FIG. 8 is a block diagram of an electronic device for implementing a user behavior-based data processing method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The user can use APP (Application program) according to actual demands, and in the process that the user uses APP, a log collection tool can be utilized to collect the behavior log of the user. User behavior data may be included in the behavior log, representing the user's behavior with respect to the APP. By carrying out processing such as inquiring and analyzing on the user behavior data in the behavior log, the behavior of the user can be analyzed, and the product decision is assisted, so that the product use experience of the user on the APP is improved.

In actual business, the scene of the user behavior is more, the data volume is larger, the required storage space is more, and the efficiency of analyzing and processing the data is lower. In a general ad hoc query scenario, an ad hoc query may be implemented by combining a query engine Hive or Spark, and the like. Hive is a data warehouse built on Hadoop that provides a mechanism for processing structured data and provides the query language HiveQL similar to SQL (Structured Query Language ). Hive can convert HiveQL statements into MapReduce tasks for execution so that non-programmers can also conveniently perform large-scale data processing and analysis. However, mapReduce is an offline calculation, so Hive runs inefficiently and is a high delay, affecting the efficiency of data query and processing.

The Spark spot query is generally based on Hive's data topic modeling, and builds a corresponding topic number bin table based on the specific analysis data topic. The SQL-like query language is provided, a plurality of data sources and data formats are supported, and SQL sentences are converted into corresponding tasks to be executed. However, this requires some SQL language base and database knowledge. And for the query operation of large-scale data, a long time and a long computing resource are required to be consumed, and an excessive burden is caused to the system.

The disclosure provides a data processing method, a device, equipment and a storage medium based on user behaviors, which are applied to the field of artificial intelligence, and are particularly applied to the field of big data so as to realize aggregation processing of the data, reduce the data quantity and save the storage space.

Note that, the data in this embodiment is not specific to a specific user, and cannot reflect personal information of a specific user. It should be noted that, the data in this embodiment comes from the public data set.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

In order for the reader to more fully understand the principles of the implementations of the present disclosure, the embodiments are now further refined in conjunction with the following fig. 1-8.

Fig. 1 is a flowchart of a data processing method based on user behavior, which may be executed by a data processing method device based on user behavior according to an embodiment of the present disclosure. As shown in fig. 1, the method comprises the steps of:

s101, acquiring a plurality of pieces of user behavior data; the user behavior data represents behaviors of the user on the preset application, and the user behavior data comprises at least two behavior fields which represent scenes of the behaviors of the user.

For example, according to the behavior of the user on the preset application, user behavior data can be generated in real time, and the preset application is the preset APP. The user behavior data may characterize the behavior the user does with respect to the APP, e.g., the user may initiate or shut down the APP. The user behavior data may include a plurality of behavior fields that collectively characterize the context in which the user is behaving. For example, the start APP is a scene, the user clicks an icon of the APP, and the clicking operation is a behavior made by the user, and the scene is the start scene.

The behavior done by the user in different scenarios is different, i.e. the field values corresponding to the behavior fields may be different in different scenarios. For example, the user behavior data includes five behavior fields, which are denoted as bhv _from, bhv_page, bhv_type, bhv_source, and bhv_value, respectively, and if bhv _from has a field value of home, bhv_page has a field value of home_page, and bhv_type has a field value of clk, bhv_source has a field value of abc, and bhv_value has a field value of 100, the user behavior data may be denoted as a hot start scene.

User behavior data of different users can be acquired in real time or regularly, namely, a plurality of pieces of user behavior data of a plurality of users can be obtained. For example, in the same time period, a plurality of users use the APP, and the behaviors made by the users to the APP are different, so that a plurality of pieces of user behavior data of the users can be obtained.

In this embodiment, acquiring a plurality of pieces of user behavior data includes: acquiring a preset number of user behavior data from a preset data warehouse table according to a preset data acquisition period; the preset data warehouse table is a data table constructed based on a preset data management platform and is used for storing user behavior data.

In particular, the user behavior data may be stored in real-time in a data warehouse table, which may be a data table built based on the UDW (User Data Warehouse ) platform. The UDW is a data management platform, adopts the techniques of flattened storage management, data construction process management and metadata management, and provides comprehensive, consistent, high-quality and analysis-oriented relevant data of user behaviors.

The data warehouse table stores a plurality of pieces of user behavior data, for example, one row in the data warehouse table may be one piece of user behavior data. A data acquisition period is preset, and a plurality of pieces of user behavior data can be acquired from a data warehouse table according to the preset data acquisition period. The number of user behavior data acquired each time may be preset, that is, the preset number of user behavior data may be acquired periodically. Columns in the data warehouse table may be represented as fields, and a plurality of fields may be provided in the data warehouse table, i.e., a plurality of fields may be included in one piece of user behavior data. Table 1 illustrates the fields in the data warehouse table.

Table 1 field schematic tables in a data warehouse table

The first column in table 1 is a field that may be contained in the data warehouse table, and each piece of acquired user behavior data may contain the field. The field values under these fields may be different or null values may exist in different pieces of user behavior data. For example, two pieces of user behavior data are obtained, wherein each field corresponds to a field value in one piece of user behavior data, and the bhv _page field in the other piece of user behavior data is empty. Bhv _id in table 1 may represent a scenario ID of a behavior, bhv _from, bhv_page, bhv_type, bhv_source, bhv_value are all behavior fields, and a scenario representing a behavior may be combined. Bhv _id may be used as a rough division of a scene, and a combination of bhv _from, bhv_page, bhv_type, bhv_source, bhv_value may be used as a subdivision of a scene. For example, bhv _id 691 represents a startup scenario, where a hot start and a cold start can be classified, and the hot start and the cold start can be distinguished by different field value combinations of five behavior parameters. That is, scenes can be roughly divided by bhv _id, and can be subdivided by a combination of field values of five behavior parameters. These five behavior parameters may be contained for each subdivided scenario, and the field value of each behavior parameter may be different or there may be a null value.

The beneficial effects of setting up like this are that a data warehouse table is constructed in advance, acquires the user behavior data of different users regularly, realizes analyzing APP's service behavior based on the user behavior data of different users, improves the analysis precision and the efficiency of data. And each scene can be represented by different combinations of behavior fields, so that the scene of the user behavior can be accurately described, behavior operations such as clicking, displaying, staying time duration and the like of the user in a specific scene can be conveniently queried, and the efficiency and the accuracy of data processing are further improved.

S102, according to at least two behavior fields, aggregating a plurality of pieces of user behavior data to obtain aggregated data; wherein the aggregated data characterizes behavior done in any scenario.

For example, after obtaining a plurality of pieces of user behavior data, the user behavior data may be subjected to aggregation processing. For example, two pieces of user behavior data may be aggregated into one piece of data, the number of pieces of user behavior data is reduced, and the aggregated data may be determined as aggregated data. Multiple pieces of aggregated data can be obtained, and the number of the aggregated data is smaller than or equal to the number of the user behavior data.

The user behavior data may be aggregated according to the behavior fields in the user behavior data. The plurality of behavior fields in the user behavior data can jointly represent the scene of the behavior, and the data aggregation is performed according to the plurality of behavior fields, namely, the user behavior data are aggregated in the scene dimension, and a plurality of pieces of user behavior data in the same scene can be aggregated into one piece of aggregated data. For example, there are two pieces of user behavior data, of which bhv _from has the same field value, bhv _page has the same field value, bhv _type has the same field value, bhv _source has the same field value, and bhv _value has the same field value, and the two pieces of user behavior data may be aggregated. The aggregated data includes bhv _from, bhv_page, bhv_type, bhv_source, bhv_value, bhv _from is a field value of bhv _from in the original user behavior data, bhv _page is a field value of bhv _page in the original user behavior data, bhv _type is a field value of bhv _type in the original user behavior data, bhv _source is a field value of bhv _source in the original user behavior data, and bhv _value is a field value of bhv _value in the original user behavior data.

User behavior data of different users in the same scene can be aggregated, i.e. the aggregated data can characterize the behavior done in a certain scene. In the user behavior data of different users, field values representing fields such as user identification and user login ID are different. The purpose of processing the user behavior data is to improve the function of the APP, focusing on the behavior made by the user, so that fields such as user identification and user login ID may not be considered when generating the aggregated data. For example, fields representing user identification and user login ID may not be included in the aggregated data. A user identifier represents a user, the number of users who act in a certain scene can be determined according to the number of the user identifiers, and the number of the users can be used as a new field to be added into the aggregated data.

S103, storing the aggregated data based on a pre-constructed query database.

For example, a query database may be pre-constructed to store the aggregated data, and after the aggregated data is obtained, the aggregated data is stored in the query database, so that a worker can find the data conveniently, thereby improving the APP.

In this embodiment, the magnitude of the aggregated data is typically G-level or smaller, and the magnitude may be directly stored in an OLAP (Online Analytical Processing, online analysis processing) analysis type database such as Drois, clickhouse, and the millisecond-level query may be implemented by using an indexing mechanism of the database itself. For example, analysis of TB level data can be accomplished in millisecond-level time based on Drois, and Drois supports MySQL (relational database management system) client protocol, which is less costly to learn and use.

In the embodiment of the disclosure, a plurality of pieces of user behavior data are acquired, and the user behavior data adopts a plurality of behavior fields to represent the scenes of the behaviors, so that the scenes are described in detail through the plurality of behavior fields, and a plurality of scene combinations are covered, thereby facilitating the accurate data search. And aggregating the user behavior data according to the plurality of behavior fields, namely aggregating the user behavior data in the dimension of the scene to obtain aggregated data, thereby greatly reducing the data quantity required to be processed and improving the data processing efficiency. And the aggregated data is stored in a preset query database, so that the storage space is effectively saved, the subsequent query instruction is responded quickly, and the data processing and analyzing efficiency is improved.

Fig. 2 is a flow chart of a data processing method based on user behavior according to an embodiment of the disclosure.

In this embodiment, the user behavior data includes a user field, where the user field characterizes the identifier of the user; according to at least two behavior fields, aggregating a plurality of pieces of user behavior data to obtain aggregated data, wherein the aggregated data can be refined into: performing duplication removal processing on the user behavior data according to the user field and the behavior field of the user behavior data to obtain duplication-removed data; and according to the behavior field in the de-duplicated data, carrying out aggregation treatment on the de-duplicated data to obtain the aggregated data.

As shown in fig. 2, the method comprises the steps of:

s201, acquiring a plurality of pieces of user behavior data; the user behavior data represents behaviors of the user on the preset application, and the user behavior data comprises at least two behavior fields which represent scenes of the behaviors of the user.

For example, this step may refer to step S101, and will not be described in detail.

S202, performing duplication removal processing on the user behavior data according to the user field and the behavior field of the user behavior data to obtain duplication-removed data.

Illustratively, each piece of user behavior data may include a user field therein, which may represent a unique identification of the user. After obtaining the plurality of pieces of user behavior data, a user field and a field value under the behavior field in each piece of user behavior data may be determined. And performing duplication removal processing on the user behavior data according to the determined field values of the user field and the behavior field. The deduplication processing may be to determine whether field values of user fields of different pieces of user behavior data are the same, and whether field values of behavior fields are the same, if so, it is indicated that the different pieces of user behavior data are duplicated, and only one piece of user behavior data may be reserved, so as to obtain deduplicated data. The field value in the data after the duplication removal is completely consistent with the user behavior data, and the number of the data after the duplication removal is smaller than or equal to the number of the user behavior data. If the behavior fields in different pieces of user behavior data are different or the user fields are different, the different pieces of user data are directly determined to be the duplicate removal data.

In this embodiment, the user behavior data includes an access duration field; according to the user field and the behavior field of the user behavior data, performing duplication removal processing on the user behavior data to obtain duplication-removed data, including: if the user fields in at least two pieces of user behavior data are consistent and the corresponding behavior fields are consistent, performing de-duplication processing on the at least two pieces of user behavior data; carrying out numerical addition on access duration fields in at least two pieces of user behavior data to obtain a first access duration, and determining the number of the at least two pieces of user behavior data as a first access frequency; and obtaining one piece of de-duplicated data corresponding to the at least two pieces of user behavior data according to the user field, the behavior field, the first access duration and the first access times of the at least two pieces of user behavior data.

Specifically, the user behavior data may further include an access duration field, where the access duration field indicates a time that the user stays in the APP page after completing the behavior. For example, the user clicks on a page, the action performed is clicking on the page, and the browsing time of the user on the page is recorded as the field value of the access duration field, for example, the field value of the access duration field may be 5 minutes.

Judging whether the field values of the user fields in different pieces of user behavior data are consistent, and judging whether the field values of the corresponding behavior fields are consistent. Judging whether the field values of the corresponding behavior fields are consistent or not refers to comparing the field value of one behavior field in one piece of user behavior data with the field value of the behavior field in the other piece of user behavior data, and if the two field values are the same, determining that the field values of the behavior fields in the two pieces of user behavior data are consistent. If the user fields in at least two pieces of user behavior data are consistent and the corresponding behavior fields are consistent, the at least two pieces of user behavior data can be subjected to de-duplication processing, and one piece of the at least two pieces of user behavior data can be reserved.

If the field values of the user fields of the plurality of pieces of user behavior data are determined to be consistent, and the field values of the corresponding behavior fields are determined to be consistent, the field values of the respective access duration fields in the plurality of pieces of user behavior data are determined. The access duration field in the user behavior data is added numerically, i.e. the field values of the access duration field are added. And determining the added result as a first access duration, namely, the first access duration is the total duration of the access of the same user in the same scene.

One piece of user behavior data indicates that the user has performed one access, and the number of pieces of user behavior data is determined as the first access number. That is, the first access number may represent the number of accesses made by the same user in the same scene.

And obtaining a piece of de-duplicated data corresponding to the pieces of user behavior data according to the field values of the user fields and the field values of the behavior fields of the pieces of user behavior data and the calculated first access time length and the first access times. That is, the fields in a piece of deduplication data may include a user field, a behavior field, a first access duration, and a first number of accesses, where field values of the user field and the behavior field are consistent with field values in the user behavior data before deduplication.

The method has the advantages that preprocessing before aggregation is carried out on the user behavior data, wherein preprocessing refers to determining corresponding user behavior data in each scene for each user, and carrying out deduplication on the user behavior data of the user in the scene. That is, for each user, a plurality of pieces of user behavior data in the same scene are the same, and only one piece of user behavior data is required to be retained. The number of data in the scene before the duplicate removal is determined as the first access times, and the access time length of each piece of data is added to obtain the first access time length, that is, the number of data after the duplicate removal is reduced, but the field index in each piece of data is increased. The data volume is effectively reduced, the storage space is saved, and the data processing efficiency is improved.

S203, according to the behavior field in the de-duplicated data, aggregation processing is carried out on the de-duplicated data to obtain aggregated data.

Illustratively, the field value of each behavior field in the de-duplicated data is determined, and the de-duplicated data is subjected to aggregation processing according to the field value of the behavior field in the de-duplicated data to obtain aggregated data, so that the data volume is further reduced. For example, the de-duplicated data of different users but the same scene can be aggregated into one piece of aggregated data, and the aggregation is performed by taking the scene as a dimension. The user field can be discarded from the aggregated data, or the field value of the user field can be sequentially added into the aggregated data. That is, it may be determined whether field values of corresponding behavior fields in the plurality of pieces of deduplicated data are identical, and if not, the plurality of pieces of deduplicated data are directly determined to be aggregated data; if so, the multiple pieces of de-duplicated data can be combined into one piece of aggregated data, and the first access time length in the multiple pieces of de-duplicated data can be added, the first access times are added, and the two added results are recorded in the aggregated data.

And the data quantity is further reduced and the storage space is saved by aggregating the data after the duplication removal. The aggregated data can cover the user behavior data of all users under each scene combination, so that the analysis of the user behavior data by subsequent staff based on the scenes is facilitated, the data processing efficiency is improved, and the improvement of preset application is realized.

In this embodiment, according to a behavior field in the deduplicated data, aggregation processing is performed on the deduplicated data to obtain aggregated data, including: determining a target field from at least two behavior fields, and determining the behavior fields other than the target field as other fields; and if the corresponding other fields in the at least two pieces of de-duplicated data are consistent, aggregating the at least two pieces of de-duplicated data to obtain one piece of aggregated data corresponding to the at least two pieces of de-duplicated data.

Specifically, a piece of deduplicated data includes a plurality of behavior fields, one behavior field is sequentially determined from the plurality of behavior fields as a target field, and the behavior fields other than the target field are determined as other fields. For example, bhv _from may be first used as the target field, and bhv _page, bhv_type, bhv_source, and bhv_value may be used as other fields for aggregation; and then, using bhv _page as a target field, and using bhv _from, bhv_type, bhv_source and bhv_value as other fields for aggregation until all behavior fields are aggregated as target fields.

After determining the target field and other fields, judging whether field values of other fields corresponding to the multiple pieces of de-duplicated data are consistent, if not, directly determining the multiple pieces of de-duplicated data as aggregated data; if so, the multiple pieces of de-duplicated data can be subjected to aggregation treatment to obtain one piece of aggregated data corresponding to the multiple pieces of de-duplicated data. That is, the field values of the other fields in the aggregated data are the same as the field values of the other fields in the deduplicated data, and the fields of the target fields in the aggregated data may be preset values. That is, the related data of the user behavior in the scene without considering the target field is obtained. Each piece of aggregated data can comprise data such as the number of users performing actions in a scene corresponding to the aggregated data, total access time length and the like, so that the use condition of the APP in the scene is obtained.

In this embodiment, a first target field may be determined from a plurality of behavior fields, and the behavior fields other than the first target field may be determined as other fields. And if the corresponding other fields in the multiple pieces of de-duplicated data are consistent, aggregating the multiple pieces of de-duplicated data to obtain one piece of aggregated data. And determining a second target field from the other fields, and determining the behavior fields except the first target field and the second target field as new other fields. If the corresponding new other fields in the multiple pieces of de-duplicated data are consistent, the multiple pieces of de-duplicated data are aggregated to obtain one piece of aggregated data. And determining a third target field from the new other fields until all the behavior fields are aggregated as target fields.

The method has the beneficial effects that the data under all scenes with different behavior field combinations are aggregated, the subsequent analysis on multiple scenes is realized, the problem that massive data cannot be processed and analyzed under the multi-dimensional scene is solved, and the efficiency and the accuracy of data processing are improved.

In this embodiment, aggregating at least two pieces of de-duplicated data to obtain one piece of aggregated data corresponding to the at least two pieces of de-duplicated data, including: the target field is represented by a preset identifier, and the first user quantity corresponding to the user field in at least two pieces of de-duplicated data is determined; performing numerical addition on the first access time length in the at least two pieces of de-duplicated data to obtain a second access time length, and performing numerical addition on the first access times in the at least two pieces of de-duplicated data to obtain a second access times; and obtaining one piece of aggregated data corresponding to the at least two pieces of de-duplicated data according to the other fields, the first user number, the second access time length, the second access times and the target field of the preset identifier.

Specifically, the target field is represented by a preset identifier, for example, the preset identifier is "all", that is, a field value of the aggregated data, which uses "all" as the target field. If the other corresponding fields in the multiple pieces of de-duplicated data are identical, determining the number of users corresponding to the user fields in the multiple pieces of de-duplicated data as the first number of users. The number of users is the type number of the user identification represented by the user field, and if the field values of the user fields in the multiple de-duplicated data are the same, the first number of users is 1.

And determining the field value of each first access duration in the pieces of de-duplicated data. And carrying out numerical addition on the first access time length in the pieces of de-duplicated data, wherein the added result is used as a second access time length. And determining the field value of each first access frequency in the multiple pieces of data after the duplication removal, and carrying out numerical addition on the first access frequency in the multiple pieces of data after the duplication removal, wherein the added result is used as a second access frequency.

And obtaining one piece of aggregated data corresponding to the pieces of de-duplicated data according to other fields in the de-duplicated data and according to the first user number, the second access duration, the second access times and the target field of the preset identifier. That is, the aggregated data may not include a user field, but a field that increases the number of users by one.

The beneficial effects of setting up like this lie in, according to the use of different users to APP, obtain the data such as the number of people that use of APP under various scenes and access duration, realize carrying out high-efficient analysis to APP's service behavior according to data after the polymerization.

In this embodiment, the method further includes: if the target fields in the at least two pieces of de-duplicated data are consistent, determining the at least two pieces of de-duplicated data as data to be aggregated; and if the other corresponding fields in the data to be aggregated are consistent, aggregating the data to be aggregated to obtain aggregated data corresponding to the data to be aggregated.

Specifically, after determining the target field and other fields, whether the target field in the multiple pieces of de-duplicated data is consistent or not can be judged first, if yes, multiple pieces of de-duplicated data with consistent target field are determined as data to be aggregated; if not, continuing to judge whether other corresponding fields in the multiple pieces of de-duplicated data are consistent.

If the target fields in the multiple pieces of de-duplicated data are consistent, the multiple pieces of de-duplicated data correspond to the multiple pieces of data to be aggregated in the same quantity, whether other corresponding fields in the multiple pieces of data to be aggregated are consistent is judged, and if not, the multiple pieces of data to be aggregated are directly determined to be the aggregated data; if so, the data to be aggregated are aggregated to obtain one piece of aggregated data corresponding to the data to be aggregated.

The method has the beneficial effects that whether the target fields in the de-duplicated data are consistent is judged, so that at least one piece of aggregated data can be obtained for different target fields, full coverage of scenes is realized, and further, data analysis and processing of APP under different scenes are ensured.

In this embodiment, aggregation processing is performed on data to be aggregated to obtain aggregated data corresponding to the data to be aggregated, including: determining a second user quantity corresponding to the user field in the data to be aggregated; carrying out numerical addition on the first access time length in the data to be aggregated to obtain a third access time length, and carrying out numerical addition on the first access times in the data to be aggregated to obtain a third access times; and obtaining aggregated data corresponding to the data to be aggregated according to the other fields, the second user number, the third access duration, the third access times and the target field.

Specifically, after determining that other fields corresponding to the data to be aggregated are consistent, determining the number of users corresponding to the user fields in the data to be aggregated as the second number of users, that is, determining how many users are represented in the pieces of data to be aggregated. The number of users is the category number of the user identification represented by the user field, and if the field values of the user fields in the plurality of data to be aggregated are the same, the second number of users is 1.

Determining field values of first access time durations in the data to be aggregated, carrying out numerical addition on the first access time durations in the data to be aggregated, and taking the added result as a third access time duration. And determining field values of the first access times in each piece of data to be aggregated, carrying out numerical addition on the first access times in the data to be aggregated, and taking the added result as a third access time.

And obtaining aggregated data corresponding to the data to be aggregated according to the other fields, the second user number, the third access duration, the third access times and the target field. That is, the aggregated data may include field values of other fields in the data to be aggregated and field values of the target field, and further include the second user number, the third access duration, and the third access times.

The method has the beneficial effects that the number of the combination of the scenes is further increased, the full coverage of the scenes is realized, the data volume is reduced through aggregation, and the precision and the efficiency of the subsequent data processing are improved.

S204, storing the aggregated data based on a pre-constructed query database.

For example, this step may refer to step S103, and will not be described in detail.

Fig. 3 is a schematic diagram of data aggregation provided in this embodiment. In fig. 3, the document indicates the identity of the user, and there are four users, A, B, C, D. bhv _from has a field value of home, bhv _type has a field value of tab_clk_video, bhv _source has a field value of abc, bhv _page has a field value of null, and bhv _value has a field value of refresh. pv represents the number of accesses, duration represents the access duration. The bhv _from is used as a target field, field values of the remaining four other fields are correspondingly consistent in different pieces of data, one piece of aggregated data when bhv _from is home, one piece of aggregated data when bhv _from is search, and one piece of aggregated data when bhv _from is all can be obtained, and three pieces of aggregated data can be obtained.

Fig. 4 is a flowchart of a data processing method based on user behavior according to an embodiment of the present disclosure.

In this embodiment, the method further includes: responding to a data query instruction sent by a query person on a visual interface, and determining a query condition in the data query instruction; the data query instruction is used for indicating to query the data meeting the query condition; and determining the data meeting the query conditions from the query database.

As shown in fig. 4, the method comprises the steps of:

s401, acquiring a plurality of pieces of user behavior data; the user behavior data represents behaviors of the user on the preset application, and the user behavior data comprises at least two behavior fields which represent scenes of the behaviors of the user.

S402, according to at least two behavior fields, aggregating a plurality of pieces of user behavior data to obtain aggregated data; wherein the aggregated data characterizes behavior done in any scenario.

For example, this step may refer to step S102, and will not be described in detail.

S403, storing the aggregated data based on a pre-constructed query database.

S404, responding to a data query instruction sent by a query person on a visual interface, and determining a query condition in the data query instruction; the data query instruction is used for indicating to query the data meeting the query condition.

Illustratively, a query person can query the data stored in the query database, a visual query platform is constructed in advance, and the query person can send out a data query instruction through a visual interface of the platform. The data inquiry command contains inquiry conditions, for example, the inquiry conditions are data between 1 month and 4 days and 1 month and 14 days to be inquired. The inquirer can select or input the required inquiry conditions on the visual interface, for example, the inquirer can fill in the time of wanted inquiry at the position of the date, and can select the wanted field value at the position of the action field through a drop-down box, so as to inquire the data under a certain scene.

And responding to a data query instruction sent by a query person on the visual interface, and acquiring a query condition from the data query instruction, so that the data meeting the query condition is queried through the data query instruction.

S405, determining data meeting the query conditions from the query database.

The visual query platform may illustratively connect to the query database via MySQL protocol, and after determining the query conditions, may retrieve the desired data from the query database based on the query conditions. For example, if the query condition includes bhv _type with field value tab_clk_video, bhv_source with field value abc, bhv_page with field value null, bhv _value with field value refresh, and bhv _from field value not included, data with bhv _from preset identifier all, bhv _type tab_clk_video, bhv_source with abc, bhv_page with field value null, and bhv _value with field value refresh may be queried as data to be queried by the query person.

A query platform with a visual interface is built in advance, data query is carried out under multidimensional query conditions, the operation of query personnel is facilitated, the actual business requirements can be met, the data query efficiency is improved, and the query experience is improved.

In this embodiment, the method further includes: responding to a data viewing instruction sent by a query person on a visual interface, and determining a data expression form in the data query instruction; the data viewing instruction is used for indicating to display the data; and displaying the data meeting the query conditions on a visual interface according to the data expression form.

Specifically, the query personnel can send a data viewing instruction on the visual interface of the visual query platform, wherein the data viewing instruction contains a data expression form designated by the query personnel, and for example, the data expression form can be a graph, a bar graph, a pie chart, a dashboard and the like.

The inquirer can select a desired data expression form through a drop-down box at the data expression form on the visual interface, and send out a data viewing instruction. And responding to a data viewing instruction sent by a query person on the visual interface, and acquiring the data expression form from the data viewing instruction. And displaying the data meeting the query conditions on a visual interface according to the data expression form appointed by the query personnel. For example, the queried data may be displayed in the form of a graph.

The method has the beneficial effects that the display modes of various charts are provided, so that the analysis result is more visual and easy to understand, and the experience of data query is improved.

Fig. 5 is a block diagram of a data processing apparatus based on user behavior according to an embodiment of the present disclosure. For ease of illustration, only portions relevant to embodiments of the present disclosure are shown. Referring to fig. 5, a data processing apparatus 500 based on user behavior includes: an acquisition unit 501, an aggregation unit 502, and a storage unit 503.

An acquiring unit 501, configured to acquire a plurality of pieces of user behavior data; the user behavior data represents the behavior of the user on the preset application, and comprises at least two behavior fields which represent the scene of the behavior of the user;

An aggregation unit 502, configured to aggregate the plurality of pieces of user behavior data according to the at least two behavior fields, to obtain aggregated data; wherein the aggregated data characterizes behavior done in any scenario;

and the storage unit 503 is configured to store the aggregated data based on a query database that is constructed in advance.

Fig. 6 is a block diagram of a data processing apparatus based on user behavior according to an embodiment of the present disclosure, and as shown in fig. 6, a data processing apparatus 600 based on user behavior includes an obtaining unit 601, an aggregation unit 602, and a storage unit 603, where user behavior data includes a user field, and the user field characterizes an identifier of a user, and the aggregation unit 602 includes a data deduplication module 6021 and a data aggregation module 6022.

The data deduplication module 6021 is configured to perform deduplication processing on the user behavior data according to the user field and the behavior field of the user behavior data, so as to obtain deduplicated data;

and the data aggregation module 6022 is configured to aggregate the de-duplicated data according to the behavior field in the de-duplicated data, so as to obtain the aggregated data.

In one example, the user behavior data includes an access duration field; a data deduplication module 6021, comprising:

The de-duplication processing sub-module is used for performing de-duplication processing on at least two pieces of user behavior data if the user fields in the at least two pieces of user behavior data are consistent and the corresponding behavior fields are consistent;

the adding sub-module is used for carrying out numerical addition on the access duration fields in the at least two pieces of user behavior data to obtain a first access duration, and determining the number of the at least two pieces of user behavior data as a first access frequency;

the data obtaining sub-module is used for obtaining one piece of de-duplicated data corresponding to the at least two pieces of user behavior data according to the user field, the behavior field, the first access duration and the first access times of the at least two pieces of user behavior data.

In one example, the data aggregation module 6022 comprises:

a field determining sub-module, configured to determine a target field from the at least two behavior fields, and determine behavior fields other than the target field as other fields;

and the first aggregation sub-module is used for aggregating the at least two pieces of de-duplicated data if the corresponding other fields in the at least two pieces of de-duplicated data are consistent, so as to obtain one piece of aggregated data corresponding to the at least two pieces of de-duplicated data.

In one example, the first aggregation sub-module is specifically configured to:

the target field is represented by a preset identifier, and the first user quantity corresponding to the user field in the at least two pieces of de-duplicated data is determined;

performing numerical addition on the first access time length in the at least two pieces of de-duplicated data to obtain a second access time length, and performing numerical addition on the first access times in the at least two pieces of de-duplicated data to obtain a second access times;

and obtaining one piece of aggregated data corresponding to the at least two pieces of de-duplicated data according to the other fields, the first user number, the second access duration, the second access times and the target field of the preset identifier.

In one example, the data aggregation module 6022 further comprises:

the data determination submodule is used for determining at least two pieces of de-duplicated data as data to be aggregated if target fields in the at least two pieces of de-duplicated data are consistent;

and the second aggregation sub-module is used for carrying out aggregation treatment on the data to be aggregated if other corresponding fields in the data to be aggregated are consistent, so as to obtain aggregated data corresponding to the data to be aggregated.

In one example, the second polymeric sub-module is specifically configured to:

determining a second user quantity corresponding to a user field in the data to be aggregated;

performing numerical addition on the first access time length in the data to be aggregated to obtain a third access time length, and performing numerical addition on the first access times in the data to be aggregated to obtain a third access times;

and obtaining aggregated data corresponding to the data to be aggregated according to the other fields, the second user number, the third access duration, the third access times and the target field.

In one example, the obtaining unit 601 includes:

the data acquisition module is used for acquiring a preset number of user behavior data from a preset data warehouse table according to a preset data acquisition period; the preset data warehouse table is a data table constructed based on a preset data management platform and is used for storing user behavior data.

In one example, further comprising:

the query unit is used for responding to a data query instruction sent by a query person on the visual interface and determining a query condition in the data query instruction; the data query instruction is used for indicating to query the data meeting the query condition;

And the determining unit is used for determining the data meeting the query condition from the query database.

In one example, further comprising:

the viewing unit is used for responding to a data viewing instruction sent by a query person on the visual interface and determining a data expression form in the data query instruction; the data viewing instruction is used for indicating to display data;

and the display unit is used for displaying the data meeting the query conditions on the visual interface according to the data expression form.

According to an embodiment of the disclosure, the disclosure further provides an electronic device.

Fig. 7 is a block diagram of an electronic device, which may be a terminal device or a server, according to an embodiment of the present disclosure, as shown in fig. 7, an electronic device 700 includes: at least one processor 702; and a memory 701 communicatively coupled to the at least one processor 702; wherein the memory stores instructions executable by the at least one processor 702, the instructions being executable by the at least one processor 702 to enable the at least one processor 702 to perform the user behavior based data processing method of the present disclosure.

The electronic device 700 further comprises a receiver 703 and a transmitter 704. The receiver 703 is configured to receive instructions and data transmitted from other devices, and the transmitter 704 is configured to transmit instructions and data to external devices.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any one of the embodiments described above.

Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as a data processing method based on user behavior. For example, in some embodiments, the user behavior based data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the data processing method based on user behavior described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the user behavior based data processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A data processing method based on user behavior, comprising:

And storing the aggregated data based on a pre-constructed query database.

2. The method of claim 1, wherein the user behavior data includes a user field therein, the user field characterizing an identity of a user; according to the at least two behavior fields, aggregating the plurality of pieces of user behavior data to obtain aggregated data, including:

performing duplication elimination processing on the user behavior data according to the user field and the behavior field of the user behavior data to obtain duplication-eliminated data;

and according to the behavior field in the de-duplicated data, carrying out aggregation treatment on the de-duplicated data to obtain the aggregated data.

3. The method of claim 2, wherein the user behavior data includes an access duration field therein; according to the user field and the behavior field of the user behavior data, performing duplication removal processing on the user behavior data to obtain duplication-removed data, including:

if the user fields in at least two pieces of user behavior data are consistent and the corresponding behavior fields are consistent, performing de-duplication processing on the at least two pieces of user behavior data;

performing numerical addition on the access duration fields in the at least two pieces of user behavior data to obtain a first access duration, and determining the number of the at least two pieces of user behavior data as a first access number;

And obtaining one piece of de-duplicated data corresponding to the at least two pieces of user behavior data according to the user field, the behavior field, the first access duration and the first access times of the at least two pieces of user behavior data.

4. A method according to claim 2 or 3, wherein aggregating the de-duplicated data according to the behavior field in the de-duplicated data to obtain the aggregated data comprises:

determining a target field from the at least two behavior fields, and determining the behavior fields except the target field as other fields;

and if the corresponding other fields in the at least two pieces of de-duplicated data are consistent, performing aggregation treatment on the at least two pieces of de-duplicated data to obtain one piece of aggregated data corresponding to the at least two pieces of de-duplicated data.

5. The method of claim 4, wherein aggregating the at least two pieces of de-duplicated data to obtain one piece of aggregated data corresponding to the at least two pieces of de-duplicated data, comprises:

6. The method of claim 4 or 5, further comprising:

if the target fields in at least two pieces of de-duplicated data are consistent, determining the at least two pieces of de-duplicated data as data to be aggregated;

and if the other corresponding fields in the data to be aggregated are consistent, aggregating the data to be aggregated to obtain aggregated data corresponding to the data to be aggregated.

7. The method of claim 6, wherein the aggregating the data to be aggregated to obtain aggregated data corresponding to the data to be aggregated, comprises:

determining the number of users on the second day corresponding to the user field in the data to be aggregated;

8. The method of any of claims 1-7, wherein obtaining a plurality of pieces of user behavior data comprises:

acquiring a preset number of user behavior data from a preset data warehouse table according to a preset data acquisition period; the preset data warehouse table is a data table constructed based on a preset data management platform and is used for storing user behavior data.

9. The method of any of claims 1-8, further comprising:

responding to a data query instruction sent by a query person on a visual interface, and determining a query condition in the data query instruction; the data query instruction is used for indicating to query the data meeting the query condition;

and determining the data meeting the query conditions from the query database.

10. The method of claim 9, further comprising:

responding to a data viewing instruction sent by a inquirer on a visual interface, and determining a data expression form in the data inquiring instruction; the data viewing instruction is used for indicating to display data;

And displaying the data meeting the query conditions on the visual interface according to the data expression form.

11. A data processing apparatus based on user behavior, comprising:

12. The apparatus of claim 11, wherein the user behavior data includes a user field therein, the user field characterizing an identity of a user; the polymerization unit comprises:

the data deduplication module is used for performing deduplication processing on the user behavior data according to the user field and the behavior field of the user behavior data to obtain deduplicated data;

And the data aggregation module is used for carrying out aggregation processing on the de-duplicated data according to the behavior field in the de-duplicated data to obtain the aggregated data.

13. The apparatus of claim 12, wherein the user behavior data includes an access duration field therein; the data deduplication module comprises:

14. The apparatus of claim 12 or 13, wherein the data aggregation module comprises:

15. The apparatus of claim 14, wherein the first aggregation sub-module is specifically configured to:

16. The apparatus of claim 14 or 15, the data aggregation module further comprising:

17. The apparatus of claim 16, wherein the second polymerization submodule is specifically configured to:

18. The apparatus according to any one of claims 11-17, wherein the acquisition unit comprises:

19. The apparatus of any of claims 11-18, further comprising:

20. The apparatus of claim 19, further comprising:

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-10.

23. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of claims 1-10.