CN115757963A - User behavior image drawing method based on distributed log analysis - Google Patents
User behavior image drawing method based on distributed log analysis Download PDFInfo
- Publication number
- CN115757963A CN115757963A CN202211492836.9A CN202211492836A CN115757963A CN 115757963 A CN115757963 A CN 115757963A CN 202211492836 A CN202211492836 A CN 202211492836A CN 115757963 A CN115757963 A CN 115757963A
- Authority
- CN
- China
- Prior art keywords
- log
- module
- user behavior
- analysis
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a user behavior portrait method based on distributed log analysis, which comprises a log collection module, a log server, a collection module, a processing module, an extraction determination module and a visual display module, wherein the log collection module is used for collecting a log; the log collection module is in communication connection with the log server, the log collection module collects log data and transmits the collected log data to the log server, and the log server performs distributed analysis on the collected logs; the acquisition module is in communication connection with the log server and acquires user behavior characteristic data from the log server through the acquisition module; the invention has the beneficial effects that: collecting log data through a log collection module, transmitting the collected log data to a log server, and performing distributed analysis on the collected logs by the log server; the method is beneficial to finding the differential characteristics and improving the analysis accuracy; the log data of distributed analysis and the labels to which the users belong are checked through the visual display module, and the use convenience is improved.
Description
Technical Field
The invention belongs to the technical field of user behavior portraits, and particularly relates to a user behavior portrayal method based on distributed log analysis.
Background
The logs mainly comprise system logs, application program logs and safety logs; system operation and development personnel can know the software and hardware information of the server through the log, and check errors in the configuration process and the reasons of the errors; the load and the performance safety of the server can be known by frequently analyzing the logs, so that measures can be taken in time to correct errors.
Typically, logs are stored scattered on different devices; after the logs are managed in a centralized manner, statistics and retrieval of the logs become a relatively troublesome matter, and generally, the retrieval and statistics can be realized by using Linux commands such as grep, awk and wc.
The log analysis is an indispensable part in each internet company business flow, and the behavior of the user can be analyzed from mass data, so that the log analysis is applied to intelligent prediction or abnormal detection; compared with traditional big data analysis, log analysis has several characteristics: the data is dynamic; the traditional big data analysis is usually based on the existing data to process, the data are all fixed and invariable, for the log analysis, as long as the product is still operated, the log can be continuously generated, and it is difficult to specify a node to perform static processing analysis; therefore, a distributed system which is represented by hadoop and is processed in a batch mode cannot be used, and correspondingly, a streaming system is the first choice for log analysis; data is various, more than one product of a company is often provided, the same product can be divided into web, android and IOS, and therefore logs needing to be processed are often generated from a plurality of ports simultaneously.
Personalized recommendation, advertising systems, activity marketing, content recommendation, interest preferences are all user portrait based applications; when a user wants to select a certain part of user groups for fine operation, a specific group is screened out by using a user image; the user portrait is a complex system, different labels can be designed according to different service scenes as products are gradually matured, the user role is refined and generalized, and the user portrait is more complete; the user portrayal nature is accurately descriptive of any one user.
The existing user behavior portrait cannot have specific portrait for each user because of a plurality of users, which is not beneficial to finding the existing differentiated features from the group and influencing the precision of analysis.
Disclosure of Invention
The invention aims to provide a user behavior image drawing method based on distributed log analysis, which is beneficial to searching for differentiation characteristics and improving the analysis accuracy.
In order to achieve the purpose, the invention provides the following technical scheme: a user behavior portrait method based on distributed log analysis comprises a log collection module, a log server, an acquisition module, a processing module, an extraction determination module and a visual display module;
the log collection module is in communication connection with the log server, and is used for collecting log data and transmitting the collected log data to the log server, and the log server is used for performing distributed analysis on the collected logs;
the acquisition module is in communication connection with the log server and acquires user behavior characteristic data from the log server through the acquisition module;
the processing module is in communication connection with the acquisition module and is used for processing the acquired user behavior characteristic data;
the extraction determining module is in communication connection with the processing module, extracts the user characteristics through the extraction determining module, and determines the label to which the user belongs by using a clustering algorithm;
the visual display module is respectively in communication connection with the log server and the extraction determination module, and the visual display module is used for checking the log data of the distributed analysis and the tags of the users;
the method comprises the following steps:
the method comprises the following steps: collecting log data through a log collection module, transmitting the collected log data to a log server, and performing distributed analysis on the collected logs by the log server;
step two: collecting user behavior characteristic data from a log server through a collection module;
step three: processing the collected user behavior characteristic data through a processing module;
step four: extracting user characteristics through an extraction determining module, and determining a label to which a user belongs by using a clustering algorithm;
step five: and viewing the log data of the distributed analysis and the label of the user through a visual display module.
As a preferred technical scheme of the invention, the behavior characteristic data comprises a system login class, a page access class, a sensitive data query class and a key business link.
As a preferred technical solution of the present invention, the user behavior characteristics are expressed as: a user accesses the first n most frequent URLs and the access amount in a certain time period, wherein the time period is divided into m time periods;
monthly cycle behavior characteristics: counting the top n URLs and the access amount of the URLs which are accessed by one user most frequently according to months;
characteristic of ten-day period behavior: dividing a month into upper, middle and lower ten days, respectively counting the top n URLs and the visit amount (n URLs can be counted in each ten days) of the URLs most frequently visited by a user, and combining the results in each ten days to form a total characteristic.
As a preferable technical scheme of the invention, the clustering algorithm comprises a hierarchical clustering algorithm, a division type clustering algorithm and a clustering algorithm based on density and network.
As a preferred technical solution of the present invention, the label presents two important features: semantization and short text; the semantization enables people to understand the labels, the short text can reduce preprocessing, and the computer can conveniently extract and aggregate and analyze the labels.
In a preferred embodiment of the present invention, the user behavior representation storage means includes a relational database and a non-relational database.
As a preferred technical scheme of the invention, the non-relational database comprises a key value storage database, a column storage database, a document type database and a graph database; the column storage database is more suitable for batch data processing and instant query of user figures, and has great advantages when processing mass data.
Compared with the prior art, the invention has the beneficial effects that:
collecting log data through a log collection module, transmitting the collected log data to a log server, and performing distributed analysis on the collected log by the log server; the method is beneficial to finding the differential characteristics and improving the accuracy of analysis;
the log data of the distributed analysis and the labels to which the users belong are checked through the visual display module, and the use convenience is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Please refer to fig. 1, which is a first embodiment of the present invention, the embodiment provides a user behavior representation method based on distributed log analysis, including a log collection module, a log server, an acquisition module, a processing module, an extraction determination module, and a visualization presentation module;
the log collection module is in communication connection with the log server, the log collection module collects log data and transmits the collected log data to the log server, and the log server performs distributed analysis on the collected logs;
the acquisition module is in communication connection with the log server and acquires user behavior characteristic data from the log server through the acquisition module;
the processing module is in communication connection with the acquisition module and is used for processing the acquired user behavior characteristic data;
the extraction determining module is in communication connection with the processing module, extracts user characteristics through the extraction determining module, and determines the label to which the user belongs by using a clustering algorithm;
the visual display module is respectively in communication connection with the log server and the extraction determination module, and is used for checking the log data of the distributed analysis and the tags of the users;
the method comprises the following steps:
the method comprises the following steps: collecting log data through a log collection module, transmitting the collected log data to a log server, and performing distributed analysis on the collected log by the log server;
step two: collecting user behavior characteristic data from a log server through a collection module;
step three: processing the collected user behavior characteristic data through a processing module;
step four: extracting user characteristics through an extraction determining module, and determining a label to which a user belongs by using a clustering algorithm;
step five: and viewing the log data of the distributed analysis and the labels of the users through a visual display module.
In this embodiment, preferably, the clustering algorithm includes a hierarchical clustering algorithm, a partition-type clustering algorithm, and a density and network-based clustering algorithm.
In this embodiment, two important features of the tag presentation are preferably: semantization and short text; semantization enables people to understand the labels, short texts can reduce preprocessing, and a computer can conveniently extract and aggregate the labels.
In this embodiment, preferably, the user behavior representation storage manner includes a relational database and a non-relational database; the non-relational database comprises a key value storage database, a column storage database, a document type database and a graph database; the column storage database is more suitable for batch data processing and instant query of user figures, and has great advantages when processing mass data.
Example 2
Referring to fig. 1, a second embodiment of the present invention is shown, which is based on the previous embodiment except that:
the behavior characteristic data comprises system login type, page access type, sensitive data query type and key business links.
System login class:
and logging in a plurality of accounts by the same IP: the number of accounts logged in by the same IP address system in a certain period of time (for example, 4 accounts logged in by the same IP address system in 1 day); the method comprises the following steps: login time period, IP address, login account number and login account list;
login failure times are as follows: the number of times of login failure of one account in a certain period of time (for example, login failure 5 times); the method comprises the following steps: login time period, user account, user name and login failure times;
abnormal account login: the login of the account which is not logged in for a long time is suddenly carried out (for example, the login of the account which is not logged in for 6 months is suddenly carried out); the method comprises the following steps: login time, user account, user name and time from last login;
page access class:
user high frequency access: the number of times that a single user or a single IP accesses the URL within a certain period of time (for example, within 1 minute, the total number of times that a user accesses each URL of the system by one IP exceeds 30 times, or the number of times that a single URL is accessed exceeds 30 times); the method comprises the following steps: access time period, user account, IP, URL and access times;
the user has large-scale access: the number of different URLs a single user or a single IP accesses over a period of time (e.g., more than 20 URLs a user accesses per IP system over 10 minutes); the method comprises the following steps: access time period, user account, IP, number of different URLs accessed;
sudden increase of menu access volume: the visit rate of a certain URL changes within a certain period of time (for example, the visit rate of a certain URL within a day is increased by more than 20% than the previous month, or is increased by 40% than a yesterday ring rate); the method comprises the following steps: visit time period, URL, system to which the visit belongs, visit quantity, increment than the same period of the previous month, increment than the yesterday ring;
sudden increase in user access volume: the visit amount of a certain user to visit a certain URL in a certain time period changes (for example, the visit amount of a certain user to visit a certain URL in a day exceeds the increase amount of the same month by 20%); the method comprises the following steps: the visit time period, the user account, the URL, the system to which the user belongs, the visit quantity and the increment of the same period of the previous month;
sensitive data query class:
frequent access of sensitive data: the number of times that a single user or a single IP accesses sensitive data within a certain period of time (for example, the number of times that a user accesses certain sensitive data per IP exceeds 10 times within 1 day); the method comprises the following steps: the method comprises the following steps of accessing time period, user account, IP, URL, sensitive data label, accessing mode and accessing times;
sensitive data access regularity: the number of times that a single user accesses sensitive data in a certain period of time (for example, the number of times that a user accesses certain sensitive data in 1 day is increased by 20% compared with the same period in the previous month); the method comprises the following steps: the method comprises the following steps of accessing time period, user account, URL, sensitive data label, accessing mode, accessing times and the increment of the same period of the last month;
sensitive data access period: the number of times the sensitive data is accessed during the non-working period (for example, the number of times the sensitive data is accessed by a user during the non-working period exceeds 5 times); the method comprises the following steps: the method comprises the following steps of accessing time period, user account, IP, URL, sensitive data label, accessing mode and accessing times;
a key business link:
the quantity of key business links suddenly increases: the number of key service link confirmations performed by a single user in a period of time (for example, the number of key service link confirmations performed by a user exceeds 10 times in 1 day); the method comprises the following steps: access time period, user account, key business link name and number.
The user behavior characteristics are expressed as: a user accesses the first n most frequent URLs and the access amount in a certain time period, wherein the time period is divided into m time periods;
monthly cycle behavior characteristics: counting the top n URLs and the access amount of the URLs which are accessed most frequently by one user according to months;
characteristic of ten-day period behavior: dividing a month into upper, middle and lower ten days, respectively counting the top n URLs and the visit amount (n URLs can be counted in each ten days) of the URLs most frequently visited by a user, and combining the results in each ten days to form a total characteristic.
Although embodiments of the present invention have been shown and described, with particular reference to the foregoing detailed description, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (8)
1. A user behavior image drawing method based on distributed log analysis is characterized by comprising the following steps: the system comprises a log collection module, a log server, an acquisition module, a processing module, an extraction determination module and a visual display module;
the log collection module is in communication connection with the log server, the log collection module collects log data and transmits the collected log data to the log server, and the log server performs distributed analysis on the collected logs;
the acquisition module is in communication connection with the log server and acquires user behavior characteristic data from the log server through the acquisition module;
the processing module is in communication connection with the acquisition module and is used for processing the acquired user behavior characteristic data;
the extraction determination module is in communication connection with the processing module, extracts user characteristics through the extraction determination module, and determines the label to which the user belongs by using a clustering algorithm;
the visual display module is respectively in communication connection with the log server and the extraction determination module, and the visual display module is used for checking the log data of the distributed analysis and the tags of the users;
the method comprises the following steps:
the method comprises the following steps: collecting log data through a log collection module, transmitting the collected log data to a log server, and performing distributed analysis on the collected log by the log server;
step two: collecting user behavior characteristic data from a log server through a collection module;
step three: processing the collected user behavior characteristic data through a processing module;
step four: extracting user characteristics through an extraction determining module, and determining a label to which a user belongs by using a clustering algorithm;
step five: and viewing the log data of the distributed analysis and the label of the user through a visual display module.
2. The method for user behavior imaging based on distributed log analysis as claimed in claim 1, wherein: the behavior characteristic data comprises system login types, page access types, sensitive data query types and key business links.
3. The method for user behavior imaging based on distributed log analysis as claimed in claim 1, wherein: the user behavior characteristics are expressed as: in a certain time period, a user accesses the first n most frequent URLs and the access amount, and the time period is divided into m time periods.
4. The method for user behavior imaging based on distributed log analysis as claimed in claim 3, wherein: the certain time period includes a monthly period and a ten-day period.
5. The user behavior imaging method based on distributed log analysis as claimed in claim 1, wherein: the clustering algorithm comprises a hierarchical clustering algorithm, a divided clustering algorithm and a clustering algorithm based on density and network.
6. The user behavior imaging method based on distributed log analysis as claimed in claim 1, wherein: two important features of the tag presentation: semantization and short text.
7. The user behavior imaging method based on distributed log analysis as claimed in claim 1, wherein: the user behavior portraits are stored in a manner including a relational database and a non-relational database.
8. The method for user behavior imaging based on distributed log analysis as claimed in claim 7, wherein: the non-relational database comprises a key value storage database, a column storage database, a document type database and a graph database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211492836.9A CN115757963A (en) | 2022-11-25 | 2022-11-25 | User behavior image drawing method based on distributed log analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211492836.9A CN115757963A (en) | 2022-11-25 | 2022-11-25 | User behavior image drawing method based on distributed log analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115757963A true CN115757963A (en) | 2023-03-07 |
Family
ID=85338240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211492836.9A Pending CN115757963A (en) | 2022-11-25 | 2022-11-25 | User behavior image drawing method based on distributed log analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115757963A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117435449A (en) * | 2023-11-06 | 2024-01-23 | 广州丰石科技有限公司 | User portrait analysis method and device, electronic equipment and storage medium |
-
2022
- 2022-11-25 CN CN202211492836.9A patent/CN115757963A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117435449A (en) * | 2023-11-06 | 2024-01-23 | 广州丰石科技有限公司 | User portrait analysis method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11670021B1 (en) | Enhanced graphical user interface for representing events | |
WO2022117063A1 (en) | Method and apparatus for training isolation forest, and method and apparatus for recognizing web crawler | |
CN108550068B (en) | Personalized commodity recommendation method and system based on user behavior analysis | |
US10650316B2 (en) | Issue-manage-style internet public opinion information evaluation management system and method thereof | |
KR101463974B1 (en) | Big data analysis system for marketing and method thereof | |
Das et al. | Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method | |
US6934687B1 (en) | Computer architecture and method for supporting and analyzing electronic commerce over the world wide web for commerce service providers and/or internet service providers | |
CN111614690B (en) | Abnormal behavior detection method and device | |
CN108776671A (en) | A kind of network public sentiment monitoring system and method | |
CN106021583B (en) | Statistical method and system for page flow data | |
CN104077407B (en) | A kind of intelligent data search system and method | |
CN106682686A (en) | User gender prediction method based on mobile phone Internet-surfing behavior | |
CN102254265A (en) | Rich media internet advertisement content matching and effect evaluation method | |
CN102637178A (en) | Music recommending method, music recommending device and music recommending system | |
TW201327451A (en) | Providing information recommendations based on determined user groups | |
US10467255B2 (en) | Methods and systems for analyzing reading logs and documents thereof | |
CN111447575B (en) | Short message pushing method, device, equipment and storage medium | |
CN113360566A (en) | Information content monitoring method and system | |
CN115757963A (en) | User behavior image drawing method based on distributed log analysis | |
CN112950359B (en) | User identification method and device | |
CN117132226A (en) | User behavior auditing and managing system | |
CN107729206A (en) | Real-time analysis method, system and the computer-processing equipment of alarm log | |
CN116089490A (en) | Data analysis method, device, terminal and storage medium | |
CN115062013A (en) | Information recommendation method, device, equipment and storage medium | |
CN112506800B (en) | Method, apparatus, device, medium and program product for testing code |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |