CN112417267A - User behavior analysis method and device, computer equipment and storage medium - Google Patents

User behavior analysis method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112417267A
CN112417267A CN202011078653.3A CN202011078653A CN112417267A CN 112417267 A CN112417267 A CN 112417267A CN 202011078653 A CN202011078653 A CN 202011078653A CN 112417267 A CN112417267 A CN 112417267A
Authority
CN
China
Prior art keywords
user behavior
user
data
service
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011078653.3A
Other languages
Chinese (zh)
Inventor
王欢
胡仲旻
张殿鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011078653.3A priority Critical patent/CN112417267A/en
Publication of CN112417267A publication Critical patent/CN112417267A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a user behavior analysis method, a user behavior analysis device, computer equipment and a storage medium, wherein a target user behavior to be subjected to internet product service index influence analysis is determined from a plurality of user behaviors; acquiring at least one user behavior data matched with a target user behavior in the user data of the Internet product; training a to-be-trained business result prediction model based on at least one user behavior data to generate a business result prediction model; and acquiring the associated information of the target user behavior and the service index from the service result prediction model, wherein the associated information represents the influence degree of the target user behavior on the service index. Based on the invention, the automatic analysis of the relationship between the user behavior and the service index can be realized, the analysis cost is reduced, and the analysis efficiency and the accuracy of the analysis result are improved.

Description

User behavior analysis method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for user behavior analysis, a computer device, and a storage medium.
Background
With the rapid development of communication technology, the functional requirements of people on terminals such as mobile phones and computers are increasing, and various internet products applied to the terminals become essential tools for daily work and life of people gradually. The service indexes (such as retention rate and the like) of the internet products can reflect the market prospect of the internet products and provide decision basis for the development of the internet products.
In the analysis of the service index, the user behavior generating the magic figures influencing the service index needs to be determined from a plurality of user behaviors based on the service experience of service personnel. The essence of the magic figures is that the behavior difference between active users and inactive users is found through analysis and research, and all new users can experience the product value as much as possible through product design and operation. For example, a user has added 7 buddies in a week, and the user has experienced the true value of the product at that time, "adding 7 buddies in 1 week" is a magic number.
The service index analysis mode depends on manual analysis of user behaviors by service personnel, so that the analysis cost is high and the analysis efficiency is low; moreover, the analysis method is limited by manual experience, and if the manual experience is insufficient, the analysis result is often inaccurate.
Disclosure of Invention
In view of the above, to solve the above problems, the present invention provides a method and an apparatus for analyzing user behavior, a computer device, and a storage medium, so as to implement automatic analysis of a relationship between a user behavior and a service index, reduce analysis cost, and improve analysis efficiency and accuracy of an analysis result, where the technical scheme is as follows:
a user behavior analysis method, comprising:
determining a target user behavior to be subjected to internet product service index influence analysis from a plurality of user behaviors;
acquiring at least one piece of user behavior data matched with the target user behavior in the user data of the Internet product, wherein the user behavior data represent the frequency of the target user behavior generated by a user in an observation period indicated by the service index and the service result of the user for influencing the index information of the service index after the observation period;
training a to-be-trained business result prediction model based on the at least one user behavior data to generate a business result prediction model;
obtaining the correlation information of the target user behavior and the service index from the service result prediction model, wherein the correlation information represents the influence degree of the target user behavior on the service index
A user behavior analysis device, comprising:
the first determining unit is used for determining a target user behavior to be subjected to internet product service index influence analysis from a plurality of user behaviors;
a first obtaining unit, configured to obtain at least one piece of user behavior data that matches the target user behavior in the user data of the internet product, where the user behavior data represents a frequency of a user generating the target user behavior in an observation period indicated by the service indicator and a service result of the user after the observation period, where the service result is used to affect indicator information of the service indicator;
the first generating unit is used for training the business result prediction model to be trained based on the at least one user behavior data to generate a business result prediction model;
and the second obtaining unit is used for obtaining the associated information of the target user behavior and the service index from the service result prediction model, and the associated information represents the influence degree of the target user behavior on the service index.
A computer device, comprising: the system comprises a processor and a memory, wherein the processor and the memory are connected through a communication bus; the processor is used for calling and executing the program stored in the memory; the memory is used for storing a program, and the program is used for realizing the user behavior analysis method.
A computer-readable storage medium, having stored thereon a computer program, which is loaded and executed by a processor, carries out the steps of the user behavior analysis method.
The application provides a user behavior analysis method, a user behavior analysis device, computer equipment and a storage medium, wherein a target user behavior to be subjected to internet product service index influence analysis is determined from a plurality of user behaviors; acquiring at least one user behavior data matched with the target user behavior from the user data of the Internet product; training the to-be-trained business result prediction model based on at least one piece of user behavior data to generate a business result prediction model; the correlation information of the target user behavior and the service index is obtained from the service result prediction model, so that the automatic analysis of the influence degree of the user behavior on the service index is realized, a basis is provided for the analysis of the magic data, the analysis cost is reduced, and the analysis efficiency and the accuracy of the analysis result are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a UI interface schematic diagram of an unsupervised user behavior analysis method according to an embodiment of the present application;
fig. 2 is a detailed interface schematic diagram of an unsupervised user behavior analysis method according to an embodiment of the present application;
fig. 3 is a flowchart of a user behavior analysis method according to an embodiment of the present disclosure;
fig. 4(a) - (d) are schematic diagrams of a user behavior analysis method provided in an embodiment of the present application;
fig. 4(e) is a schematic diagram of a first display interface provided in the embodiment of the present application;
fig. 4(f) is a schematic view of another first display interface provided in the embodiment of the present application;
fig. 5 is a flowchart of another user behavior analysis method provided in the embodiment of the present application;
FIG. 6(a) is a schematic diagram of an hdfs page provided in an embodiment of the present application;
FIG. 6(b) is a schematic diagram of an analysis result provided in the embodiment of the present application;
FIG. 7 is a diagram of a user behavior analysis tool architecture provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram for grouping users according to service understanding according to an embodiment of the present application;
fig. 9 is a schematic diagram of data information provided in an embodiment of the present application;
fig. 10(a) is a schematic diagram of retention analysis results of a newly added user of a browser according to an embodiment of the present application;
fig. 10(b) is a schematic view of a browser webpage according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a user behavior analysis apparatus according to an embodiment of the present application;
fig. 12 is a structural diagram of an implementation manner of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the analysis of the service index, the user behavior generating the magic figures influencing the service index needs to be determined from a plurality of user behaviors based on the service experience of service personnel. The service index analysis mode depends on manual analysis of user behaviors, so that the analysis cost is high and the analysis efficiency is low; moreover, the analysis method is limited by manual experience, and if the manual experience is insufficient, the situation that the analysis result is inaccurate is often generated.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The scheme provided by the embodiment of the application relates to the technologies such as machine learning of artificial intelligence and the like, and is specifically explained by the following embodiment:
firstly, the inventor of the application provides an unsupervised user behavior analysis method, which automatically analyzes a correlation coefficient between user behaviors and service indexes, provides convenience and basis for analyzing the user behaviors generating magic numbers influencing the service indexes, reduces analysis cost, and improves analysis efficiency and accuracy of analysis results.
The unsupervised user behavior analysis method mainly adopts a correlation coefficient method, and a UI interface of the unsupervised user behavior analysis method can be shown in fig. 1. The first row of fig. 1 shows: click 3 times at least (IOS-home-bottom personal center) new user within seven days from the first login of internet product, and the weekly retention rate is 71.2%.
Illustratively, [ IOS-top-bottom personal center ] this user behavior has a strong positive correlation with the next week survival rate of internet products. The higher the correlation coefficient is, the greater the correlation between the user behavior and the second-week retention rate is, the higher the correlation coefficient is, the user behavior is promoted, and the second-week retention rate can be greatly improved.
Clicking on each row in fig. 1, the user amount ratio, the next week retention rate and the correlation coefficient of the user behavior indicated by the row in different frequency ranges can be also checked. For example, clicking on the first line in fig. 1, one can also look at the "IOS-top page-bottom person center" indicated by the first line, the user amount ratio, the next week retention rate, and the correlation coefficient of this user behavior in different frequency ranges.
Illustratively, clicking on the first line in FIG. 1 may display a details interface as shown in FIG. 2. The following is displayed in the detail interface shown in fig. 2:
first, different frequency ranges [ IOS-top page-bottom person center ] are the correlation coefficient between this user behavior and the next week retention rate.
For example, the correlation coefficient between this user behavior and the next week retention rate for different frequency ranges [ IOS-top page-bottom person center ] may include: when the frequency range of the user behavior is at least 1 time, [ IOS-top-bottom center ] the correlation coefficient between the user behavior and the next week retention rate, [ IOS-top-bottom center ] the frequency range of the user behavior is at least 2 times, [ IOS-top-bottom center ] the correlation coefficient between the user behavior and the next week retention rate, [ IOS-top-bottom center ] the frequency range of the user behavior is at least 3 times, [ IOS-top-bottom center ] the correlation coefficient between the user behavior and the next week retention rate, and the like.
Second, [ IOS-top-bottom personal center ] this user behavior is proportional to the amount of users in each frequency range, respectively. The user amount of this user behavior in a frequency range can be: the total amount of users in the frequency range of the user behavior is the proportion of the total amount of users in the frequency range of the user behavior [ IOS-top-bottom center of individuals ].
Exemplary, [ IOS-top-bottom personal center ] this user behavior may include, at each frequency range, the ratio of the user amounts: the user volume fraction yielding at least 1 [ IOS-top-bottom person centre ] this user behavior, the user volume fraction yielding at least 2 [ IOS-top-bottom person centre ] this user behavior, the user volume fraction yielding at least 3 [ IOS-top-bottom person centre ] this user behavior, etc.
Third, [ IOS-top-bottom personal center ] this user behavior is the next week's retention rate at each frequency range, respectively. The weekend retention rate of this user behavior over a frequency range may be: this frequency range is generated [ IOS-top-bottom personal center ] the weekend retention rate of the user for this user behavior.
For example, [ IOS-top-bottom personal center ] the weekend retention rate of this user behavior at each frequency range may include: the user's weekly retention rate that yields at least 1 [ IOS-top-bottom personal center ] of this user behavior, the user's weekly retention rate that yields at least 2 [ IOS-top-bottom personal center ] of this user behavior, the user's weekly retention rate that yields at least 3 [ IOS-top-bottom personal center ] of this user behavior, and so on.
The correlation coefficient represents the user behavior, the frequency range of the user behavior and the strength of the next week retention correlation relationship, and the range is-1 to + 1. When the correlation coefficient is +1, the new user is shown to do the user behavior frequency range within 7 days, and the internet products are necessarily visited in the second week, which is unlikely to occur in real life. Generally, a correlation coefficient greater than 0.4 indicates a relatively strong correlation, while a correlation coefficient less than 0.2 indicates a very weak or irrelevant correlation. Sometimes the correlation coefficient will appear negative, indicating that the user performed the user action, and possibly not revisit the second week, which is of course undesirable.
The methods for calculating the correlation coefficients are quite multiple, including the Pearson correlation coefficient, the Spireman rho correlation coefficient, the point rank two correlation coefficient and the phi correlation coefficient, and in the scene of the unsupervised user behavior analysis method, the point rank two correlation coefficient and the phi correlation coefficient are adopted in a large number.
In this case, the correlation analysis is an unsupervised statistical model, and only linear correlation between the user behavior and the analysis target is analyzed, and if a certain user behavior is important for the analysis target but not linear correlation, it is difficult to determine the relationship between the user behavior and the analysis target by using the correlation coefficient. The analysis target refers to an analysis object, and the analysis target may be a business index, and the business index may be a retention rate of the next week, a retention rate of the next day, and the like. In addition, if the user behaviors are more in the unsupervised user behavior analysis method, the correlation coefficient of each user behavior and the service index needs to be analyzed one by one, and the calculation is complex.
In view of this, the present inventor further provides a supervised user behavior analysis method, which is not only suitable for linear correlation scenarios but also suitable for nonlinear correlation scenarios, and can place a plurality of user behaviors together in a service result prediction model for calculation, the service result prediction model not only considers the behavior frequency of the user behaviors, but also can output the associated information of a plurality of user behaviors and service indexes at one time, and the associated information of the user behaviors and the service indexes represents the influence degree of the user behaviors on the service indexes (the influence degree of the user behaviors on the service indexes can also be regarded as the important degree of the user behaviors on the service indexes), so as to provide convenience and basis for analyzing the user behaviors generating magic numbers influencing the service indexes, reduce analysis cost, and improve analysis efficiency and accuracy of analysis results.
In order to make the above objects, features and advantages of the present invention more comprehensible, a supervised user behavior analysis method provided by the present invention is described in detail below with reference to the accompanying drawings and the detailed description.
Fig. 3 is a flowchart of a user behavior analysis method according to an embodiment of the present application, where the user behavior analysis method shown in fig. 3 is a supervised user behavior analysis method, and the method includes:
s301, determining a target user behavior to be subjected to internet product service index influence analysis from a plurality of user behaviors;
illustratively, when a service person performs user behavior analysis, it is first necessary to determine an internet product to be analyzed, obtain a service index of the determined internet product to be analyzed, and select a target user behavior from a plurality of user behaviors to analyze an influence of the target user behavior on the service index of the internet product. The method and the device for the internet product service index correlation reflect the influence of the target user behavior on the service index of the internet product based on the correlation information of the target user behavior and the service index of the internet product.
Fig. 4(a) is a schematic view of a first information input interface provided in the embodiment of the present application. Referring to fig. 4(a), the user may select a date to be analyzed by "1, select date"; internet products that need to be analyzed (also referred to as product lines) can also be selected based on "2, select products".
Taking the example that the service index to be analyzed is related to the retention rate, the selected date is the date when the newly logged-in user logs in the selected internet product for the first time. For example, if the service person selects 09/03 days 2020 and internet product 1, it is necessary to perform user behavior analysis on user data of a new user who first logs in internet product 1 on 09/03 days 2020.
Illustratively, each user behavior may be referred to as a feature, see fig. 4(a), and the service person may further display a second information input interface through the step "3, click to obtain feature list", see fig. 4 (b). A plurality of user behaviors that are set in advance are displayed in the second interface as shown in fig. 4(b), and the plurality of user behaviors constitute a user behavior list (i.e., a feature list).
Illustratively, the date selected in fig. 4(b) is "Sep 4th, 19", the name of the selected internet product is "tone rabbit", a new user who first logs in the internet product "tone rabbit" on the date "Sep 4th, 19" is determined, and for each user behavior of the plurality of user behaviors, the number of users who perform the user behavior is queried from the determined new user. Referring to fig. 4(b), for each user behavior in the user behavior list, the number of users performing the user behavior (user amount) may be displayed in the user behavior list, and the number of users performing the user behavior may be regarded as the corresponding number of people performing the user behavior.
Furthermore, the user behavior list is also provided with check boxes respectively related to each user behavior, so that the service personnel can check the target user behavior which the service personnel wants to analyze from the user behavior list by combining the user behavior displayed in the user behavior list and the number of users executing the user behavior, namely, check the characteristics which the service personnel wants to analyze from the user behavior list.
Referring to fig. 4(b), the service staff may also sort the user behaviors in the user behavior list in reverse order or in sequence according to the number of users. For example, the user behaviors can be sorted in the reverse order of the number of users corresponding to the user behaviors, so that service personnel can select the user behaviors with a large number of users for analysis.
Referring to fig. 4(b), the filter data may also be used to perform filtering by using a mathematical expression, and referring to fig. 4(b), all user behaviors with the number of users greater than 50000 are filtered out.
Further, taking the example that the service index is related to the retention rate, the service staff can also fill the retention days which the service staff wants to see, for example, when the retention days filled by the service staff is 7 days, the service index is the next week retention rate; when the retention days filled by the service personnel are 1 day, the service index is the retention rate of the next day.
The above is only the preferred setting mode of the service index to be analyzed provided in the embodiment of the present application, and the inventor may set the specific setting mode of the service index to be analyzed according to his own needs, which is not limited herein.
Furthermore, the service personnel can fill in the number of users who want to sample. If the business person does not fill in or fills in the negative number, the total amount is represented. That is, all new users who first logged in "voice rabbit" on the date "Sep 4th, 19" as an internet product were subjected to the full-scale user behavior analysis. And if the number of the users to be sampled is filled by the service personnel and the number of the users to be sampled is not a negative number, extracting a new user of the number of the users to be sampled, which is filled by the service personnel, from the total number, and further carrying out user behavior analysis on the extracted new user.
For example, the selected user behavior in the user behavior list may be referred to as a target user behavior, and the service person clicks a button "analyze importance of features in left selection" in fig. 4(b), and performs user behavior analysis on the target user behavior to obtain associated information of the target user behavior and the service index.
It should be noted that the user behavior list may also support fuzzy matching of character strings to perform user behavior screening, for example, a user inputs a character string, and each user behavior that is successfully matched with the character string in a fuzzy manner is displayed in the user behavior list. Referring to the detailed user behavior list interface shown in fig. 4(c), when the character strings input by the user are "rqd" and "> 10000", the respective user behaviors that match with "rqd" successfully and the number of users is greater than 10000 are displayed from the user behavior list.
Illustratively, each user behavior displayed in the user behavior list shown in fig. 4(b) - (c) is described in english, and further, a chinese description of each user behavior may be added to the user behavior list.
For example, only the chinese description of the user behavior may be displayed in the user behavior list.
The language inventor used for describing the user behavior in the user behavior list can be set according to the requirement of the language inventor, and is not limited herein.
S302, at least one user behavior data matched with the target user behavior in the user data of the Internet product is obtained;
in the embodiment of the application, the user behavior data represents the frequency of the target user behavior generated by the user in the observation period indicated by the service index and the service result of the user influencing the index information of the service index after the observation period.
Illustratively, the user data for the internet product may originate from a lighthouse data source, an offline data processing platform (e.g., TDW), a local file. The lighthouse data sources may be stored in a server cluster, such as an impala server cluster, a clickhouse server cluster, and so forth. The user data retrieved from the lighthouse data source may be referred to as lighthouse data and the user data retrieved from the TDW may be referred to as hdfs data.
For example, the number of target user behaviors to be subjected to the internet product service index influence analysis determined from the plurality of user behaviors may be at least one. Namely, at least one target user behavior to be subjected to the internet product service index influence analysis is determined from the plurality of user behaviors.
In the embodiment of the present application, obtaining at least one user behavior data matched with a target user behavior in user data of an internet product includes: acquiring user data of at least one user in an internet product; detecting whether user data of a user represents that the user generates at least one target user behavior in a service index indication observation period; if the user data of the user represents that the user generates at least one target user behavior in the observation period indicated by the service index, acquiring first data representing the at least one target user behavior generated by the user in the observation period indicated by the service index from the user data; acquiring second data representing index information which is generated by the user after an observation period and used for influencing a service index from user data of the user; the first data and the second data form user behavior data of the user; and if the user data of the user does not represent that the user generates at least one target user behavior in the observation period indicated by the service index, determining that the user data of the user does not have user behavior data matched with the target user behavior.
Taking the correlation of the service index and the retention rate as an example, the user data of the internet product can be obtained, the user data of the internet product comprises the user data of each user using the internet product, and the user data of a new user with the date of logging in the internet product for the first time as the selected date is screened out from the user data of the internet product; and detecting whether the user data of the new user represents that the new user generates at least one target user behavior within the observation period indicated by the service indicator.
If the user data of the new user represents that the new user generates at least one target user behavior in the observation period indicated by the service index, acquiring first data representing the at least one target user behavior generated by the new user in the observation period indicated by the service index from the user data of the new user; acquiring second data representing index information for influencing the service index, which is generated by the new user after an observation period, according to the user data of the new user; the first data and the second data constitute user behavior data of the new user. The user behavior data of the new user can be regarded as a piece of user behavior data which is matched with the target user behavior in the user data of the internet product.
The user behavior data represents the frequency of generating target user behaviors by the user in the observation period indicated by the service index and the service result of the user after the observation period for influencing the index information of the service index. The user behavior data of the user comprises first data and second data, the first data represents the frequency of the target user behavior generated by the user in the observation period indicated by the service index, and the second data can be a service result of index information used for influencing the service index and generated by the user after the observation period.
For example, when the service index is the next week retention rate, if the at least one target user behavior is user behavior 1 and user behavior 2, respectively, the user behavior data matched with the at least one target user behavior includes first data and second data, the first data represents the frequency of executing user behavior 1 and the frequency of executing user behavior 2 by the new user within 7 days from the first login to the internet product, and the second data represents the actual retention condition of the new user in the internet product at the 8 th login day, which affects the index information of the service index, i.e., the next week retention rate. The actual retention of the new user on the internet product at day 8 is either retention or non-retention. If the new user logs in the internet product on the 8 th day, the actual retention condition of the new user logging in the internet product on the 8 th day is retention; if the new user does not log in the internet product on the 8 th day, the new user logs in the actual retention condition of the internet product on the 8 th day and is not retained.
For example, the observation period indicated by the service indicator may be a waiting time for the user to generate the second data for the indicator information affecting the service indicator. Taking the example that the service index is related to the retention rate, the observation period indicated by the service index may be the number of days of retention that the user fills in the interface shown in fig. 4(a) - (b).
Further, if the user data of the new user does not represent that the new user generates at least one target user behavior in the observation period indicated by the service index, it is determined that the user data of the new user does not have user behavior data matched with the target user behavior.
S303, training the to-be-trained business result prediction model based on at least one user behavior data to generate a business result prediction model;
according to the embodiment of the application, the user behavior data are input into the business result prediction model to be trained to obtain the business result of the business result prediction model to be trained on the first data in the user behavior data; and training the business result prediction model to be trained to generate a business result prediction model by taking the second data of the predicted business result approaching to the user behavior data as a target.
For example, the business result prediction model may be a supervised model, and the supervised model may be a random forest model, an xgboost model, or the like.
S304, obtaining the associated information of the target user behavior and the service index from the service result prediction model, wherein the associated information represents the influence degree of the target user behavior on the service index.
Taking the service result prediction model as a random forest model as an example, the output result of the kini's impure algorithm in the service result prediction model can be obtained, and the output result includes the associated information of each target user behavior and the service index in at least one target user behavior.
For example, at least one user behavior data may be input (sample) as data required by the random forest model, and the process of training the random forest model to obtain the correlation information between the target user behavior and the business index is as follows:
selecting n samples from the samples as a training set by a sample with put back sampling method;
and generating a decision tree by using the training set. The generation process comprises the following steps:
a, randomly and unrepeatedly selecting d features
b, dividing the training set by using the d characteristics respectively to find the optimal division characteristics (Kini purity)
If the number of decision trees in the random forest is preset to be 8, repeating the previous two steps for 8 times, wherein 8 is the number of decision trees in the random forest;
description of the drawings: the user behavior analysis method provided by the embodiment of the application only needs a training module (a Kini impure degree algorithm) of the random forest model to obtain the importance degree result of the features, and does not need a prediction module in the random forest model (namely, a module for predicting the business result by using the output result of the Kini impure degree algorithm in the random forest is not needed).
For example, the user behavior may be referred to as a feature, and the correlation information between the feature and the service index may characterize the degree of influence of the feature on the service index, that is, characterize the importance of the feature on the service index, and the calculation process of the feature importance (kini purity algorithm) is as follows:
feature Importance (Feature Importance) is represented by FI, Gini index is represented by GI, and m features X1, X2, X3,.. and Xm are assumed, and Gini non-purity FI (j) of each Feature Xj is calculated, namely the average change amount of node splitting non-purity of the jth Feature in all decision trees of a random forest.
The Gini index is calculated as:
Figure BDA0002717777880000121
where K denotes K frequency ranges, pmk denotes the user ratio of the feature m when the frequency range is equal to K, and the importance FIjk of the feature Xj in the frequency range K:
FIjk=GIk-GIl-GIr
wherein GIl and GIr respectively represent Gini indexes of two new nodes after branching.
The Gini impure degree FI (m) of the characteristic m is an average value of FIjk in the sum of 8 decision trees, and the Gini impure degree of the characteristic m can be regarded as the associated information of the characteristic m and the service index.
Referring to the analysis result display interface shown in fig. 4(d), the analysis result display interface shows the importance result of the user behavior analysis on the target user behavior selected in fig. 4(b) after the user clicks the "analyze importance of features in left selection" button shown in fig. 4 (b); the importance result comprises the correlation information of each target user behavior and the service index in at least one target user behavior.
Illustratively, the correlation information between the target user behavior and the service index represents the degree of influence of the target user behavior on the service index. I.e. the importance of the target user behavior to the traffic indicator.
The larger the influence degree of the target user behavior on the service index is, the larger the influence of the target user behavior on the service index is, the more important the target user behavior is on the service index is, and the more important the target user behavior is on the service index is.
According to the user behavior analysis method provided by the embodiment of the application, the importance of the user behavior to the service index can be automatically analyzed, and the probability that the magic numbers influencing the service index are generated on the user behavior with higher importance of the service index is higher, so that convenience is provided for determining the user behavior generating the magic numbers from a large number of user behaviors based on the automatic analysis of the importance of the user behavior. Therefore, the user behaviors generating magic numbers influencing the service indexes are determined from numerous user behaviors without completely relying on manual experience as in the prior art, the analysis cost is reduced, and the analysis efficiency and the accuracy of the analysis result are improved.
Fig. 5 is a flowchart of another user behavior analysis method provided in the embodiment of the present application.
As shown in fig. 5, the method includes:
s501, determining a target user behavior to be subjected to internet product service index influence analysis from a plurality of user behaviors;
s502, obtaining at least one user behavior data matched with a target user behavior in the user data of the Internet product;
in an embodiment of the present application, obtaining user data of at least one user in an internet product includes: determining data acquisition conditions, wherein the data acquisition conditions are related to a user data source channel, a user category to which a user generating user data belongs, and/or functions in an internet product to which the user data belongs; user data of at least one user satisfying the data acquisition condition in the internet product is acquired.
Accordingly, at least one user behavior data matching the target user behavior may be acquired from the user data of the at least one user, and based on this, the user behavior analysis may be performed based on the user behavior data satisfying the data acquisition condition.
In this case, the amount of users who are behaving in the feature list is the number of pieces of user data that can be characterized for at least one user among the user data of the users to produce the user behaviour.
Illustratively, users can be grouped according to business understanding, and the common grouping mode comprises new and old users, channels, function modules and the like, and each group performs user behavior analysis in the same mode to find characteristic importance and inflection points.
S503, training the to-be-trained business result prediction model based on at least one user behavior data to generate a business result prediction model;
s504, obtaining the associated information of the target user behavior and the service index from the service result prediction model, wherein the associated information represents the influence degree of the target user behavior on the service index;
the execution process of steps S501-S504 shown in fig. 5 provided in the embodiment of the present application is the same as the execution process of steps S401-S404 shown in fig. 4, and for the specific execution manner of steps S501-S504, please refer to the above description, which is not described herein again.
S505, dividing at least one user behavior data into a plurality of user behavior data sets, wherein one user behavior data only belongs to one user behavior data set, and different user behavior data sets correspond to different frequency ranges;
the embodiment of the application can analyze the target user behaviors, determine the association information of the target user behaviors and the service indexes, and determine the association relation between each target user behavior and the service indexes in at least one target user behavior if the number of the target user behaviors is at least one.
With reference to fig. 4(d), the service person may select a target user behavior from at least one target user behavior, for example, the user may select "rqd _ played", which is the target user behavior, and further generate data information of the selected target user behavior. Referring to fig. 4(d), if the user behavior that the service person wants to generate the data information is not the target user behavior, the user behavior that the service person wants to generate the data information may be selected from the feature list.
For example, the target user behavior may be referred to as a first user behavior, the user behavior to be subjected to data information generation may be referred to as a second user behavior, and the service index is related to the retention rate, and the data information of the second user behavior may include a retention curve and a permeability.
Further, referring to fig. 4(d), two user behavior data division modes are shown, wherein one user behavior data division mode is a linear division mode, and the other user behavior data division mode is a division mode according to the number of people.
For example, the service person may select any one of two user behavior data division modes, see fig. 4(d), where the service person selects the division section according to the number of users.
Furthermore, the service person may also select a frequency interval, and a characteristic interval range shown in fig. 4(d) may be regarded as the frequency interval selected by the service person, so as to facilitate generation of data information of the second user behavior in the selected frequency interval.
For example, if the service person clicks a "click view retention curve and permeability" button as shown in fig. 4(d), the generation of the data information of the second user behavior in the frequency interval may be implemented.
In the embodiment of the present application, if the user behavior data division manner is a linear segmentation interval, the manner of dividing at least one user behavior data into a plurality of user behavior data sets may be: determining at least one frequency range, wherein no frequency overlap exists between different frequency ranges; and determining the user behavior data, which represents the frequency of the second user behavior and is located in the frequency range, in the at least one user behavior data as belonging to the user behavior data set corresponding to the frequency range, aiming at each frequency range in the at least one frequency range.
For example, the frequency interval may be divided into a plurality of frequency ranges according to a frequency interval, each frequency range is composed of a start frequency and an end frequency, a difference between the end frequency and the start frequency in each frequency range may be considered as a frequency interval, and the frequency intervals in different frequency ranges are the same.
Correspondingly, the following process is executed for each user behavior data in the at least one user behavior data: and determining a frequency range to which the frequency of the second user behavior represented by the user behavior data belongs, wherein the frequency range to which the frequency of the second user behavior represented by the user behavior data belongs can be regarded as the frequency range to which the user behavior data belongs. In this way, for each frequency range, all the user behavior data belonging to the frequency range may constitute a user behavior set corresponding to the frequency range.
For example, if the frequency interval is [ 10, 20 "), the frequency interval is divided into two frequency ranges, one frequency range is [ 10, 15 ], and the other frequency range is [ 15, 20), and at least one user behavior data includes 5 user behavior data, the 5 user behavior data are respectively user behavior data 1, user behavior data 2, user behavior data 3, user behavior data 4, and user behavior data 5, if the frequency of performing the second user behavior is 10 times characterized by user behavior data 1, the frequency of performing the second user behavior is 16 times characterized by user behavior data 2, the frequency of performing the second user behavior is 12 times characterized by user behavior data 3, the frequency of performing the second user behavior is 8 times characterized by user behavior data 4, and the frequency of performing the second user behavior is 18 times characterized by user behavior data 5, then the frequency of performing the second user behavior is [ 10 ], 15) the user behavior data set corresponding to the frequency range comprises user behavior data 1 and user behavior data 3; the user behavior data set corresponding to the [ 15, 20) frequency range includes user behavior data 2 and user behavior data 5, wherein user behavior data 4 does not belong to either the user behavior data set corresponding to the [ 10, 15) frequency range or the user behavior data set corresponding to the [ 15, 20) frequency range.
In the embodiment of the application, if the user behavior data is divided into the segments according to the number of people, the manner of dividing at least one user behavior data into a plurality of user behavior data sets may be as follows: sequencing at least one user behavior data according to the frequency of the characterized second user behavior to generate a first data sequence; and cutting the first data sequence into at least one second data sequence, wherein the number of pieces of user behavior data in different second data sequences is the same.
For example, each user behavior data representing the frequency of the second user behavior in the frequency interval may be determined from at least one user behavior data, and then the determined user behavior data are sorted according to the frequency of the second user behavior to obtain a data sequence (the data sequence may be referred to as a first data sequence or a first user behavior data sequence), and then the first data sequence is sequentially divided into at least one sub-data sequence (each sub-data sequence may be referred to as a second data sequence or a second user behavior data sequence) according to the sorting of the user behavior data in the first data sequence, each second data sequence may be regarded as a user behavior data set, and as an example, the user behavior data in the second data sequence form a user behavior data set, the number of pieces of user behavior data in different second data sequences is the same. In this case, different user behavior data sets have different frequency ranges corresponding to different user behavior data sets due to different user behavior data.
For example, the at least one user behavior data includes 5 pieces of user behavior data, and the user behavior data 6, the user behavior data 7, the user behavior data 8, the user behavior data 9, and the user behavior data 10 are respectively, and the frequency interval is [ 10, 20 ], if the frequency of performing the second user behavior is 10 times represented by the user behavior data 6, the frequency of performing the second user behavior is 16 times represented by the user behavior data 7, the frequency of performing the second user behavior is 12 times represented by the user behavior data 8, the frequency of performing the second user behavior is 8 times represented by the user behavior data 9, and the frequency of performing the second user behavior is 18 times represented by the user behavior data 10, the frequency of the second user behavior represented by the user behavior data 9 is not located in the frequency interval, the user behavior data 9 is ignored, only the user behavior data 6, the user behavior data 7, the user behavior data 8, and the user behavior data 10 are sorted, if the user behavior data are sorted in an ascending order according to the frequency representing the second user behavior, the first data sequence obtained by sorting the user behavior data 6, the user behavior data 7, the user behavior data 8 and the user behavior data 10 is composed of the user behavior data 6, the user behavior data 8, the user behavior data 7 and the user behavior data 10 in sequence. If 2 user behavior data are divided into a user behavior data set as an example, two second data sequences into which the first data sequence is divided are respectively a second data sequence 1 and a second data sequence 2, the second data sequence 1 is sequentially composed of user behavior data 6 and user behavior data 8, the second data sequence 2 is sequentially composed of user behavior data 7 and user behavior data 10, the user behavior data 6 and the user behavior data 8 form a user behavior data set, and the user behavior data 7 and the user behavior data 10 form a user behavior data set.
S506, generating data information related to the user behavior data set according to the user behavior data in the user behavior data set, wherein the data information comprises index information of the service index.
For example, the data information related to the user behavior data set may include not only the index information of the service index calculated from the user behavior data set, but also the number of pieces of user behavior data in the user behavior data set.
Further, the user behavior analysis method provided by the embodiment of the application may further include displaying a data information image in the first display interface, where the data information image represents data information related to each user behavior data set in the at least one user behavior data set.
Illustratively, the first display interface is used for responding to the selection operation of the first image area in the data information image and triggering the enlargement display of the target image in the first image area in the data information image in the second display interface; the second display interface is used for responding to the dragging operation of the target image and displaying the image details of a second image area matched with the dragging operation in the target image.
Fig. 4(e) provides a schematic diagram of a first display interface, which may also be referred to as a data information display interface. A data information image is displayed in the first display interface shown in fig. 4(e), the data information image represents data information respectively related to each user behavior data set in at least one user behavior data set, taking that the service index is related to the retention rate as an example, the data information related to the user behavior data set includes the retention rate calculated according to the user behavior data set and the number of user behavior data in the user behavior data set, and the number of user behavior data in the user behavior data set may be regarded as the number of users corresponding to the user behavior data set, and may also be referred to as the penetration rate of the user behavior data set. The retention rates of the individual user behavior data of the at least one user behavior data set are connected to form a curve which can be regarded as a retention curve (retention rate).
Referring to fig. 4(e), the data information image includes a traffic index curve and/or a histogram. The service index curve represents the index information of each user behavior data set in the service index. Taking the example that the service index is related to the retention rate, the service index curve may be a retention curve. The column diagram in the data information image represents the number of user behavior data in each user behavior data set in at least one user behavior data set, and the number of the user behavior data in the user behavior data set can also be regarded as the number of users corresponding to the user behavior data set.
Retention curves versus decisions may provide such assistance: and a, searching a point with the fastest retention and promotion, and guiding the user circulation of the point to bring great service promotion by combining the permeability of the user. b, searching an inflection point of the retention curve, guiding the user to flow to the ROI, wherein the ROI is better generally, and the marginal effect is reduced when the user crosses the ROI and then carries out lifting.
Referring to another first display interface schematic diagram shown in fig. 4(f), the data information image in the first display interface schematic diagram can be directly selected to be viewed in an area enlarging manner. Illustratively, the target image in the data information image in the first image area can be enlarged and displayed in response to the operation of the service personnel selecting the first image area in the data information image.
As a preferred implementation of the embodiment of the present application, the target image may be displayed in the second display interface.
Illustratively, the second display interface supports the mouse to drag up, down, left and right to view details of different areas in the target image displayed in an enlarged manner on the second display interface. For example, the service person may perform a drag operation on the target image displayed in an enlarged manner on the second display interface through the mouse, so as to display image details of an area related to the drag operation in the target image on the second display interface. That is, the image details of the second image region in the target image that matches the drag operation are displayed.
And executing a dragging operation on the target image in the second display interface, determining a target pixel positioned at the center point of the second display interface in the target image, determining an image area taking the target pixel as the center point from the target image as a second image area matched with the dragging operation, and amplifying and displaying the image details of the second image area in the target image. The area of the image region in the target image with the target pixel as the center point may be a preset area. In this regard, the specific size of the predetermined area may be set by the inventor according to the needs of the inventor, and is not limited herein.
Furthermore, the user behavior analysis method provided by the embodiment of the application can also determine an inflection point of a service index curve, wherein the service index curve is a curve taking a frequency range as an abscissa and index information as an ordinate; and determining the frequency range corresponding to the inflection point in the service index curve as the magic number of the target user behavior.
Illustratively, after determining the associated information of each target user behavior and the service index in the at least one target user behavior, a second user behavior is determined from the at least one target user behavior to generate data information of the second user behavior, wherein the data information includes a service index curve and a permeability.
The larger the influence degree of the target user behavior on the service index represented by the associated information of the target user behavior and the service index is, the higher the possibility that the target user behavior generates magic figures influencing the service index is.
Illustratively, the associated information and the influence degree are positively correlated, and the larger the associated information is, the larger the influence degree of the associated information representation is. Target user behaviors of which the associated information exceeds preset target associated information can be selected from at least one target user behavior, each selected target user behavior is determined to be a second user behavior, and data information of each second user behavior is generated.
Illustratively, each selected target user behavior can be determined as a second user behavior from at least one target user behavior in response to the selection operation of the service personnel.
Further, the service person may also select another user behavior other than the at least one target user behavior as the second user behavior, which is not limited herein.
For example, an inflection point of a service index curve in the data information of the second user behavior may be determined, where the service index curve is a curve with a frequency range as an abscissa and index information as an ordinate; and determining the frequency range corresponding to the inflection point in the service index curve as the magic number of the second user behavior.
Further, it may be determined whether the associated information of the second user behavior and the service indicator exceeds the target associated information, and if the associated information of the second user behavior and the service indicator exceeds the target associated information, an inflection point in a service indicator curve included in the data information of the second user behavior is determined, and a frequency range corresponding to the inflection point in the service curve is determined as a magic number of the second user behavior.
The user behavior analysis method provided by the embodiment of the application can also automatically realize the analysis of the magic data in the target user behavior on the basis of realizing the analysis of the associated information of the target user behavior and the service index, thereby further reducing the analysis cost and improving the analysis efficiency and the accuracy of the analysis result.
For example, if the user data of the internet product is derived from the TDW, the user data obtained from the TDW may be regarded as hdfs data, and a manner of obtaining the hdfs data will be described below.
In short, the above feature importance analysis and retention curve analysis can be completed only by specifying the hdfs address saved by the user data of the internet product. But many times we know that the outbound address is a large path, the required path of the data file can be found by using the "show file path list" on the hdfs page as shown in fig. 6 (a). Of course, if the hdfs address of the data file is known per se, this step of the analysis directly following it can be skipped at all.
The above copied data file address is pasted into the input box and analyzed. Because files exported by the export task of the TDW are not provided with headers, feature headers of the data files need to be filled out. Note that there must be a column in the data file that is retain _ label, with 1 and 0 indicating whether or not to persist. Further, the service personnel can preview the data, or can skip the preview and directly start the subsequent analysis, and the analysis result is shown in fig. 6 (b). It should be noted that the following analysis process is identical to the above-described analysis process, and the only difference is that the retention curve is analyzed by filling in the feature name, rather than selecting from the drop-down box.
Fig. 7 is a user behavior analysis tool architecture diagram according to an embodiment of the present application.
As shown in fig. 7, the user behavior analysis tool supports user input of multiple data sources, including lighthouse data sources, TDWs, and local files, user clustering and feature construction of data. Taking the correlation of the service index and the retention rate as an example, the two-dimensional relation curves of the feature importance, the frequency and the retention rate are output through a random forest, an xgboost algorithm, a decision tree and the like, and the online action closed loop is performed according to the analysis conclusion. I.e. online application analysis conclusions.
Illustratively, the user behavior analysis tool performs the user behavior analysis process as follows:
1. user profile input
Original user behavior configuration: specifying raw data (user data of internet products), specifying characteristic data.
2. Preparing data: user grouping and feature construction according to specified original data and feature data
For example, referring to fig. 8, users are grouped according to business understanding, and the common grouping manner includes new and old users, channel, function module, and the like, and each group searches for feature importance and inflection point in the same manner; according to 0/1label of the analysis target definition model, if the analysis target is a new addition, the user left in the next day is a positive sample 1, and the user not left is a negative sample 0; converting user data into a characteristic format required by a service result prediction model, wherein the broad-list format comprises the following steps: user id, action id1-value, action id2-value …. Wherein value is frequency.
3. Service result prediction model training output feature importance
Data required by a business result prediction model is prepared through 1, the number of the random forest trees can be designated as 8, the depth of the trees is automatically stopped after the impure condition is met, a random forest model is constructed, and the feature importance is obtained by calculating gini impure degree.
4. Inflection point is determined by frequency and retention rate two-dimensional curve
Counting retention rates and user amounts corresponding to different frequencies of important user behaviors, wherein frequency segments can be specified by users, for example, the viewing duration (minutes) is a characteristic, and dividing according to 5 means that every 5 minutes is an interval: 1-5, 6-10 and 11-15 …, counting the user quantity of user groups in different intervals and the retention rate of the user groups, visualizing the data as shown in fig. 9, observing curves in the graph to find inflection points, wherein the corresponding characteristic frequency of the horizontal axis is magic number, namely when the user behavior occurs to the frequency, the retention can be greatly improved. It is noted that here is a correlation and not a causal one.
Referring to fig. 10(a), a schematic diagram of a retention analysis result of a newly added user of a browser provided in an embodiment of the present application is shown, where important user behaviors that affect retention of the newly added user of the browser and are analyzed by a user behavior analysis tool are displayed in the retention analysis result of the newly added user of the browser. Based on the new analysis result of the retention of the user by the browser, mark information is added to the small-room sub-icon of the browser webpage (for example, a small point is added to the small-room sub-icon of the browser webpage as shown in fig. 10 (b)) to guide the user to click the small-room icon, and through statistical analysis, by adding the mark information to the small-room sub-icon of the browser webpage, the top rate of the feeds can be increased by 1.4%, and the information retention is increased by 0.4%. Moreover, the user behavior analysis tool provided by the embodiment of the application can realize automatic analysis of the user behavior, and effectively shortens the period of pure manual analysis (for example, from 1 week to 1.5 days).
Fig. 11 is a schematic structural diagram of a user behavior analysis apparatus according to an embodiment of the present application.
As shown in fig. 11, the apparatus includes:
a first determining unit 1101, configured to determine a target user behavior to be subjected to internet product service index impact analysis from a plurality of user behaviors;
a first obtaining unit 1102, configured to obtain at least one piece of user behavior data that matches a target user behavior in user data of an internet product, where the user behavior data represents a frequency of a user generating the target user behavior in an observation period indicated by a service indicator and a service result of the user after the observation period and used for affecting indicator information of the service indicator;
a first training unit 1103, configured to train a to-be-trained business result prediction model based on at least one user behavior data to generate a business result prediction model;
a second obtaining unit 1104, configured to obtain, from the service result prediction model, associated information between the target user behavior and the service index, where the associated information represents a degree of influence of the target user behavior on the service index.
Further, the user behavior analysis apparatus provided in the embodiment of the present application further includes:
the dividing unit is used for dividing at least one user behavior data into a plurality of user behavior data sets, wherein one user behavior data only belongs to one user behavior data set, and different user behavior data sets correspond to different frequency ranges;
and the generating unit is used for generating data information related to the user behavior data set according to the user behavior data in the user behavior data set, wherein the data information comprises index information of the service index.
In this embodiment, preferably, the dividing unit includes:
the sequencing unit is used for sequencing at least one user behavior data according to the frequency of the represented target user behavior to generate a first data sequence;
the segmentation unit is used for segmenting the first data sequence into at least one second data sequence, and the number of pieces of user behavior data in different second data sequences is the same;
alternatively, the first and second electrodes may be,
the second determining unit is used for determining at least one frequency range, and frequency overlapping does not exist between different frequency ranges;
and a third determining unit, configured to determine, for each frequency range in the at least one frequency range, user behavior data in the at least one user behavior data, where a frequency representing a target user behavior is located in the frequency range, as a user behavior data set corresponding to the frequency range.
In this embodiment, it is preferable that the data information related to the user behavior data set further includes the number of pieces of user behavior data in the user behavior data set.
Further, the user behavior analysis apparatus provided in the embodiment of the present application further includes:
the display unit is used for displaying a data information image in the first display interface, and the data information image represents data information related to each user behavior data set in at least one user behavior data set;
the first display interface is used for responding to the selection operation of a first image area in the data information image and triggering the amplification display of a target image in the first image area in the data information image in the second display interface; the second display interface is used for responding to the dragging operation of the target image and displaying the image details of a second image area matched with the dragging operation in the target image.
In the embodiment of the present application, preferably, the data information image includes a service index curve and/or a histogram;
the service index curve represents the index information of each user behavior data set in the service index; the histogram characterizes the number of user behavior data pieces in each user behavior data set.
Further, the user behavior analysis apparatus provided in the embodiment of the present application further includes:
the fourth determining unit is used for determining an inflection point of a service index curve, wherein the service index curve is a curve taking a frequency range as an abscissa and index information as an ordinate;
and the fifth determining unit is used for determining the frequency range corresponding to the inflection point in the service index curve as the magic number of the target user behavior.
In this embodiment of the application, preferably, the first obtaining unit includes:
a third obtaining unit, configured to obtain user data of at least one user in the internet product;
the detection unit is used for detecting whether the user data of the user represents that the user generates at least one target user behavior in the service index indication observation period;
a fourth obtaining unit, configured to, if user data of a user indicates that the user generates at least one target user behavior in an observation period indicated by a service indicator, obtain, from the user data, first data that indicates the at least one target user behavior generated by the user in the observation period indicated by the service indicator;
a fifth obtaining unit, configured to obtain, from user data of the user, second data representing index information, which is generated by the user after an observation period and used for affecting a service index; the first data and the second data form user behavior data of the user;
and a sixth determining unit, configured to determine that user behavior data matching the target user behavior does not exist in the user data of the user if the user data of the user does not characterize that the user generates at least one target user behavior in the observation period indicated by the service indicator.
In the embodiment of the present application, preferably, the first training unit includes:
the seventh determining unit is used for inputting the user behavior data into the business result prediction model to be trained to obtain a business result of the business result prediction model to be trained on the first data in the user behavior data;
and the second training unit is used for training the business result prediction model to be trained to generate the business result prediction model by taking second data in the predicted business result approaching to the user behavior data as a target.
In an embodiment of the present application, the second obtaining unit includes:
and the sixth obtaining unit is used for obtaining an output result of the Shiney purity algorithm in the service result prediction model, wherein the output result comprises the associated information of each target user behavior and the service index in at least one target user behavior.
In this embodiment of the application, preferably, the third obtaining unit includes:
an eighth determining unit, configured to determine a data obtaining condition, where the data obtaining condition is related to a source channel of the user data, a user category to which a user generating the user data belongs, and/or a function in an internet product to which the user data belongs;
and a seventh acquiring unit, configured to acquire user data of at least one user in the internet product, where the user data satisfies the data acquisition condition.
In the embodiment of the present application, preferably, the first determining unit includes:
the display unit is used for displaying a plurality of preset user behaviors and the user quantity of each user behavior, wherein the user quantity of the user behaviors indicates the number of pieces of user data which meet the data acquisition condition and represent the user behaviors;
and the ninth determining unit is used for performing user behavior selection operation in the plurality of user behaviors based on the user quantity of the user behaviors and determining the selected user behavior as the target user behavior to be subjected to the internet product service index influence analysis.
As shown in fig. 12, a block diagram of an implementation manner of a computer device provided in an embodiment of the present application is shown, where the computer device includes:
a memory 1201 for storing a program;
a processor 1202 for executing a program, the program specifically for:
determining a target user behavior to be subjected to internet product service index influence analysis from a plurality of user behaviors;
the method comprises the steps that at least one piece of user behavior data matched with target user behaviors in user data of the internet product is obtained, and the user behavior data represent the frequency of target user behaviors generated by a user in an observation period indicated by a service index and a service result of the user influencing index information of the service index after the observation period;
training a to-be-trained business result prediction model based on at least one user behavior data to generate a business result prediction model;
and acquiring the associated information of the target user behavior and the service index from the service result prediction model, wherein the associated information represents the influence degree of the target user behavior on the service index.
The processor 1202 may be a central processing unit CPU or an Application Specific Integrated Circuit (ASIC).
The control device may further include a communication interface 1203 and a communication bus 1204, wherein the memory 1201, the processor 1202 and the communication interface 1203 complete communication with each other through the communication bus 1204.
The embodiment of the present application further provides a readable storage medium, where a computer program is stored, and the computer program is loaded and executed by a processor to implement each step of the user behavior analysis method, where a specific implementation process may refer to descriptions of corresponding parts in the foregoing embodiment, and details are not repeated in this embodiment.
The present application also proposes a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device executes the methods provided in the various optional implementation manners in the aspect of the user behavior analysis method or the aspect of the user behavior analysis apparatus, and the specific implementation process may refer to the description of the corresponding embodiments, which is not described in detail.
The application provides a user behavior analysis method, a user behavior analysis device, computer equipment and a storage medium, wherein a target user behavior to be subjected to internet product service index influence analysis is determined from a plurality of user behaviors; acquiring at least one user behavior data matched with the target user behavior from the user data of the Internet product; training the to-be-trained business result prediction model based on at least one piece of user behavior data to generate a business result prediction model; the correlation information of the target user behavior and the service index is obtained from the service result prediction model, so that the automatic analysis of the influence degree of the user behavior on the service index is realized, a basis is provided for the analysis of the magic data, the analysis cost is reduced, and the analysis efficiency and the accuracy of the analysis result are improved.
The user behavior analysis method, the user behavior analysis device, the computer device and the storage medium provided by the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the above embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include or include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (13)

1. A user behavior analysis method is characterized by comprising the following steps:
determining a target user behavior to be subjected to internet product service index influence analysis from a plurality of user behaviors;
acquiring at least one piece of user behavior data matched with the target user behavior in the user data of the Internet product, wherein the user behavior data represent the frequency of the target user behavior generated by a user in an observation period indicated by the service index and the service result of the user for influencing the index information of the service index after the observation period;
training a to-be-trained business result prediction model based on the at least one user behavior data to generate a business result prediction model;
and acquiring the associated information of the target user behavior and the service index from the service result prediction model, wherein the associated information represents the influence degree of the target user behavior on the service index.
2. The method of claim 1, further comprising:
dividing the at least one user behavior data into a plurality of user behavior data sets, wherein one user behavior data only belongs to one user behavior data set, and different user behavior data sets correspond to different frequency ranges;
and generating data information related to the user behavior data set according to the user behavior data in the user behavior data set, wherein the data information comprises index information of the service index.
3. The method of claim 2, wherein the partitioning of the at least one user behavior data into a plurality of user behavior data sets comprises:
sequencing the at least one user behavior data according to the represented frequency of the target user behavior to generate a first data sequence;
the first data sequence is cut into at least one second data sequence, and the number of pieces of user behavior data in different second data sequences is the same;
alternatively, the first and second electrodes may be,
determining at least one frequency range, wherein no frequency overlap exists between different frequency ranges;
and for each frequency range in the at least one frequency range, determining the user behavior data, which represents the frequency of the target user behavior and is located in the frequency range, in the at least one user behavior data as a user behavior data set corresponding to the frequency range.
4. The method of claim 2, wherein the data information associated with the user behavior data set further comprises a number of user behavior data items in the user behavior data set.
5. The method of claim 1, further comprising:
determining an inflection point of the service index curve according to the index information of each user behavior data set in the at least one user behavior data set in the service index;
and determining the frequency range corresponding to the inflection point in the service index curve as the magic number of the target user behavior.
6. The method according to claim 1, wherein the number of the target user behaviors is at least one, and the obtaining at least one user behavior data matching the target user behavior from among the user data of the internet product comprises:
obtaining user data of at least one user in the internet product;
detecting whether the user data of the user represents that the user generates the at least one target user behavior in the service index indication observation period;
if the user data of the user represents that the user generates the at least one target user behavior in the observation period indicated by the service index, acquiring first data representing the at least one target user behavior generated by the user in the observation period indicated by the service index from the user data;
obtaining second data representing index information which is generated by the user after the observation period and used for influencing the service index from the user data of the user; the first data and the second data constitute user behavior data of the user;
and if the user data of the user does not represent that the user generates the at least one target user behavior in the observation period indicated by the service index, determining that the user behavior data matched with the target user behavior does not exist in the user data of the user.
7. The method of claim 6, wherein training the business result prediction model to be trained based on the at least one user behavior data generates a business result prediction model, comprising:
inputting the user behavior data into a business result prediction model to be trained to obtain a business result predicted by the business result prediction model to be trained on first data in the user behavior data;
and training the service result prediction model to be trained to generate a service result prediction model by taking the second data of the predicted service result approaching the user behavior data as a target.
8. The method as claimed in claim 7, wherein the business result prediction model is a random forest model, and the obtaining of the correlation information between the target user behavior and the business index from the business result prediction model includes:
and acquiring an output result of a Shiney purity algorithm in the service result prediction model, wherein the output result comprises the associated information of each target user behavior in the at least one target user behavior and the service index.
9. The method of claim 6, wherein said obtaining user data for at least one user of said internet product comprises:
determining data acquisition conditions, wherein the data acquisition conditions are related to a user data source channel, a user category to which a user generating user data belongs, and/or functions in an internet product to which the user data belongs;
and acquiring user data of at least one user meeting the data acquisition condition in the Internet product.
10. The method of claim 9, wherein determining a target user behavior to be analyzed for internet product business indicator impact from a plurality of user behaviors comprises:
displaying a plurality of preset user behaviors and a user quantity of each user behavior, wherein the user quantity of the user behaviors indicates the number of pieces of user data which meet the data acquisition condition and represent the user behaviors;
and performing user behavior selection operation in the user behaviors based on the user quantity of the user behaviors, and determining the selected user behavior as a target user behavior to be subjected to internet product service index influence analysis.
11. A user behavior analysis apparatus, comprising:
the first determining unit is used for determining a target user behavior to be subjected to internet product service index influence analysis from a plurality of user behaviors;
a first obtaining unit, configured to obtain at least one piece of user behavior data that matches the target user behavior in the user data of the internet product, where the user behavior data represents a frequency of a user generating the target user behavior in an observation period indicated by the service indicator and a service result of the user after the observation period, where the service result is used to affect indicator information of the service indicator;
the first training unit is used for training the business result prediction model to be trained based on the at least one user behavior data to generate a business result prediction model;
and the second obtaining unit is used for obtaining the associated information of the target user behavior and the service index from the service result prediction model, and the associated information represents the influence degree of the target user behavior on the service index.
12. A computer device, comprising: the system comprises a processor and a memory, wherein the processor and the memory are connected through a communication bus; the processor is used for calling and executing the program stored in the memory; the memory for storing a program for implementing the user behavior analysis method according to any one of claims 1 to 10.
13. A computer-readable storage medium, having stored thereon a computer program which, when loaded and executed by a processor, carries out the steps of the user behavior analysis method according to any one of claims 1 to 10.
CN202011078653.3A 2020-10-10 2020-10-10 User behavior analysis method and device, computer equipment and storage medium Pending CN112417267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011078653.3A CN112417267A (en) 2020-10-10 2020-10-10 User behavior analysis method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011078653.3A CN112417267A (en) 2020-10-10 2020-10-10 User behavior analysis method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112417267A true CN112417267A (en) 2021-02-26

Family

ID=74853954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011078653.3A Pending CN112417267A (en) 2020-10-10 2020-10-10 User behavior analysis method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112417267A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010579A (en) * 2021-03-24 2021-06-22 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN114186025A (en) * 2021-12-14 2022-03-15 中国建设银行股份有限公司 User portrait index heat prediction method, device, equipment and storage medium
CN114283502A (en) * 2021-12-08 2022-04-05 福建省特种设备检验研究院泉州分院 Special equipment sensor node data analysis method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109509017A (en) * 2018-09-27 2019-03-22 中国平安人寿保险股份有限公司 User's retention ratio prediction technique and device based on big data analysis
CN109615128A (en) * 2018-12-05 2019-04-12 重庆锐云科技有限公司 Real estate client's conclusion of the business probability forecasting method, device and server
CN109711860A (en) * 2018-11-12 2019-05-03 平安科技(深圳)有限公司 Prediction technique and device, storage medium, the computer equipment of user behavior
US20190147356A1 (en) * 2017-11-14 2019-05-16 Adobe Systems Incorporated Generating a predictive behavior model for predicting user behavior using unsupervised feature learning and a recurrent neural network
CN110634030A (en) * 2019-09-24 2019-12-31 阿里巴巴集团控股有限公司 Application service index mining method, device and equipment
CN110956296A (en) * 2018-09-26 2020-04-03 北京嘀嘀无限科技发展有限公司 User loss probability prediction method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147356A1 (en) * 2017-11-14 2019-05-16 Adobe Systems Incorporated Generating a predictive behavior model for predicting user behavior using unsupervised feature learning and a recurrent neural network
CN110956296A (en) * 2018-09-26 2020-04-03 北京嘀嘀无限科技发展有限公司 User loss probability prediction method and device
CN109509017A (en) * 2018-09-27 2019-03-22 中国平安人寿保险股份有限公司 User's retention ratio prediction technique and device based on big data analysis
CN109711860A (en) * 2018-11-12 2019-05-03 平安科技(深圳)有限公司 Prediction technique and device, storage medium, the computer equipment of user behavior
CN109615128A (en) * 2018-12-05 2019-04-12 重庆锐云科技有限公司 Real estate client's conclusion of the business probability forecasting method, device and server
CN110634030A (en) * 2019-09-24 2019-12-31 阿里巴巴集团控股有限公司 Application service index mining method, device and equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010579A (en) * 2021-03-24 2021-06-22 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN113010579B (en) * 2021-03-24 2024-05-14 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN114283502A (en) * 2021-12-08 2022-04-05 福建省特种设备检验研究院泉州分院 Special equipment sensor node data analysis method
CN114283502B (en) * 2021-12-08 2023-06-23 福建省特种设备检验研究院泉州分院 Special equipment sensor node data analysis method
CN114186025A (en) * 2021-12-14 2022-03-15 中国建设银行股份有限公司 User portrait index heat prediction method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110162593B (en) Search result processing and similarity model training method and device
CN108334605B (en) Text classification method and device, computer equipment and storage medium
CN112365171B (en) Knowledge graph-based risk prediction method, device, equipment and storage medium
CN112417267A (en) User behavior analysis method and device, computer equipment and storage medium
CN109376844A (en) The automatic training method of neural network and device recommended based on cloud platform and model
CN110781406B (en) Social network user multi-attribute inference method based on variational automatic encoder
CN110489755A (en) Document creation method and device
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
CN107871166B (en) Feature processing method and feature processing system for machine learning
CN107220296A (en) The generation method of question and answer knowledge base, the training method of neutral net and equipment
CN106445915A (en) New word discovery method and device
CN103150383A (en) Event evolution analysis method of short text data
CN111737576A (en) Application function personalized recommendation method and device
CN110147552A (en) Educational resource quality evaluation method for digging and system based on natural language processing
CN111625715A (en) Information extraction method and device, electronic equipment and storage medium
CN110245310A (en) A kind of behavior analysis method of object, device and storage medium
CN109949175B (en) User attribute inference method based on collaborative filtering and similarity measurement
CN110807676A (en) Long-tail user mining method and device, electronic equipment and storage medium
CN114496099A (en) Cell function annotation method, device, equipment and medium
Aziz et al. Social network analytics: natural disaster analysis through twitter
CN113569018A (en) Question and answer pair mining method and device
CN113590771A (en) Data mining method, device, equipment and storage medium
CN115587616A (en) Network model training method and device, storage medium and computer equipment
CN115579069A (en) Construction method and device of scRNA-Seq cell type annotation database and electronic equipment
CN115587192A (en) Relationship information extraction method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40038318

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination