CN115757963A - User behavior image drawing method based on distributed log analysis - Google Patents

User behavior image drawing method based on distributed log analysis Download PDF

Info

Publication number
CN115757963A
CN115757963A CN202211492836.9A CN202211492836A CN115757963A CN 115757963 A CN115757963 A CN 115757963A CN 202211492836 A CN202211492836 A CN 202211492836A CN 115757963 A CN115757963 A CN 115757963A
Authority
CN
China
Prior art keywords
log
module
user behavior
analysis
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211492836.9A
Other languages
Chinese (zh)
Inventor
陈然
宝君维
赵伟华
王荣欣
杨怡静
高航
王帮灿
王睿琛
王吉飞
孙恒一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming Electric Power Transaction Center Co ltd
Original Assignee
Kunming Electric Power Transaction Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming Electric Power Transaction Center Co ltd filed Critical Kunming Electric Power Transaction Center Co ltd
Priority to CN202211492836.9A priority Critical patent/CN115757963A/en
Publication of CN115757963A publication Critical patent/CN115757963A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a user behavior portrait method based on distributed log analysis, which comprises a log collection module, a log server, a collection module, a processing module, an extraction determination module and a visual display module, wherein the log collection module is used for collecting a log; the log collection module is in communication connection with the log server, the log collection module collects log data and transmits the collected log data to the log server, and the log server performs distributed analysis on the collected logs; the acquisition module is in communication connection with the log server and acquires user behavior characteristic data from the log server through the acquisition module; the invention has the beneficial effects that: collecting log data through a log collection module, transmitting the collected log data to a log server, and performing distributed analysis on the collected logs by the log server; the method is beneficial to finding the differential characteristics and improving the analysis accuracy; the log data of distributed analysis and the labels to which the users belong are checked through the visual display module, and the use convenience is improved.

Description

User behavior image drawing method based on distributed log analysis
Technical Field
The invention belongs to the technical field of user behavior portraits, and particularly relates to a user behavior portrayal method based on distributed log analysis.
Background
The logs mainly comprise system logs, application program logs and safety logs; system operation and development personnel can know the software and hardware information of the server through the log, and check errors in the configuration process and the reasons of the errors; the load and the performance safety of the server can be known by frequently analyzing the logs, so that measures can be taken in time to correct errors.
Typically, logs are stored scattered on different devices; after the logs are managed in a centralized manner, statistics and retrieval of the logs become a relatively troublesome matter, and generally, the retrieval and statistics can be realized by using Linux commands such as grep, awk and wc.
The log analysis is an indispensable part in each internet company business flow, and the behavior of the user can be analyzed from mass data, so that the log analysis is applied to intelligent prediction or abnormal detection; compared with traditional big data analysis, log analysis has several characteristics: the data is dynamic; the traditional big data analysis is usually based on the existing data to process, the data are all fixed and invariable, for the log analysis, as long as the product is still operated, the log can be continuously generated, and it is difficult to specify a node to perform static processing analysis; therefore, a distributed system which is represented by hadoop and is processed in a batch mode cannot be used, and correspondingly, a streaming system is the first choice for log analysis; data is various, more than one product of a company is often provided, the same product can be divided into web, android and IOS, and therefore logs needing to be processed are often generated from a plurality of ports simultaneously.
Personalized recommendation, advertising systems, activity marketing, content recommendation, interest preferences are all user portrait based applications; when a user wants to select a certain part of user groups for fine operation, a specific group is screened out by using a user image; the user portrait is a complex system, different labels can be designed according to different service scenes as products are gradually matured, the user role is refined and generalized, and the user portrait is more complete; the user portrayal nature is accurately descriptive of any one user.
The existing user behavior portrait cannot have specific portrait for each user because of a plurality of users, which is not beneficial to finding the existing differentiated features from the group and influencing the precision of analysis.
Disclosure of Invention
The invention aims to provide a user behavior image drawing method based on distributed log analysis, which is beneficial to searching for differentiation characteristics and improving the analysis accuracy.
In order to achieve the purpose, the invention provides the following technical scheme: a user behavior portrait method based on distributed log analysis comprises a log collection module, a log server, an acquisition module, a processing module, an extraction determination module and a visual display module;
the log collection module is in communication connection with the log server, and is used for collecting log data and transmitting the collected log data to the log server, and the log server is used for performing distributed analysis on the collected logs;
the acquisition module is in communication connection with the log server and acquires user behavior characteristic data from the log server through the acquisition module;
the processing module is in communication connection with the acquisition module and is used for processing the acquired user behavior characteristic data;
the extraction determining module is in communication connection with the processing module, extracts the user characteristics through the extraction determining module, and determines the label to which the user belongs by using a clustering algorithm;
the visual display module is respectively in communication connection with the log server and the extraction determination module, and the visual display module is used for checking the log data of the distributed analysis and the tags of the users;
the method comprises the following steps:
the method comprises the following steps: collecting log data through a log collection module, transmitting the collected log data to a log server, and performing distributed analysis on the collected logs by the log server;
step two: collecting user behavior characteristic data from a log server through a collection module;
step three: processing the collected user behavior characteristic data through a processing module;
step four: extracting user characteristics through an extraction determining module, and determining a label to which a user belongs by using a clustering algorithm;
step five: and viewing the log data of the distributed analysis and the label of the user through a visual display module.
As a preferred technical scheme of the invention, the behavior characteristic data comprises a system login class, a page access class, a sensitive data query class and a key business link.
As a preferred technical solution of the present invention, the user behavior characteristics are expressed as: a user accesses the first n most frequent URLs and the access amount in a certain time period, wherein the time period is divided into m time periods;
monthly cycle behavior characteristics: counting the top n URLs and the access amount of the URLs which are accessed by one user most frequently according to months;
characteristic of ten-day period behavior: dividing a month into upper, middle and lower ten days, respectively counting the top n URLs and the visit amount (n URLs can be counted in each ten days) of the URLs most frequently visited by a user, and combining the results in each ten days to form a total characteristic.
As a preferable technical scheme of the invention, the clustering algorithm comprises a hierarchical clustering algorithm, a division type clustering algorithm and a clustering algorithm based on density and network.
As a preferred technical solution of the present invention, the label presents two important features: semantization and short text; the semantization enables people to understand the labels, the short text can reduce preprocessing, and the computer can conveniently extract and aggregate and analyze the labels.
In a preferred embodiment of the present invention, the user behavior representation storage means includes a relational database and a non-relational database.
As a preferred technical scheme of the invention, the non-relational database comprises a key value storage database, a column storage database, a document type database and a graph database; the column storage database is more suitable for batch data processing and instant query of user figures, and has great advantages when processing mass data.
Compared with the prior art, the invention has the beneficial effects that:
collecting log data through a log collection module, transmitting the collected log data to a log server, and performing distributed analysis on the collected log by the log server; the method is beneficial to finding the differential characteristics and improving the accuracy of analysis;
the log data of the distributed analysis and the labels to which the users belong are checked through the visual display module, and the use convenience is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Please refer to fig. 1, which is a first embodiment of the present invention, the embodiment provides a user behavior representation method based on distributed log analysis, including a log collection module, a log server, an acquisition module, a processing module, an extraction determination module, and a visualization presentation module;
the log collection module is in communication connection with the log server, the log collection module collects log data and transmits the collected log data to the log server, and the log server performs distributed analysis on the collected logs;
the acquisition module is in communication connection with the log server and acquires user behavior characteristic data from the log server through the acquisition module;
the processing module is in communication connection with the acquisition module and is used for processing the acquired user behavior characteristic data;
the extraction determining module is in communication connection with the processing module, extracts user characteristics through the extraction determining module, and determines the label to which the user belongs by using a clustering algorithm;
the visual display module is respectively in communication connection with the log server and the extraction determination module, and is used for checking the log data of the distributed analysis and the tags of the users;
the method comprises the following steps:
the method comprises the following steps: collecting log data through a log collection module, transmitting the collected log data to a log server, and performing distributed analysis on the collected log by the log server;
step two: collecting user behavior characteristic data from a log server through a collection module;
step three: processing the collected user behavior characteristic data through a processing module;
step four: extracting user characteristics through an extraction determining module, and determining a label to which a user belongs by using a clustering algorithm;
step five: and viewing the log data of the distributed analysis and the labels of the users through a visual display module.
In this embodiment, preferably, the clustering algorithm includes a hierarchical clustering algorithm, a partition-type clustering algorithm, and a density and network-based clustering algorithm.
In this embodiment, two important features of the tag presentation are preferably: semantization and short text; semantization enables people to understand the labels, short texts can reduce preprocessing, and a computer can conveniently extract and aggregate the labels.
In this embodiment, preferably, the user behavior representation storage manner includes a relational database and a non-relational database; the non-relational database comprises a key value storage database, a column storage database, a document type database and a graph database; the column storage database is more suitable for batch data processing and instant query of user figures, and has great advantages when processing mass data.
Example 2
Referring to fig. 1, a second embodiment of the present invention is shown, which is based on the previous embodiment except that:
the behavior characteristic data comprises system login type, page access type, sensitive data query type and key business links.
System login class:
and logging in a plurality of accounts by the same IP: the number of accounts logged in by the same IP address system in a certain period of time (for example, 4 accounts logged in by the same IP address system in 1 day); the method comprises the following steps: login time period, IP address, login account number and login account list;
login failure times are as follows: the number of times of login failure of one account in a certain period of time (for example, login failure 5 times); the method comprises the following steps: login time period, user account, user name and login failure times;
abnormal account login: the login of the account which is not logged in for a long time is suddenly carried out (for example, the login of the account which is not logged in for 6 months is suddenly carried out); the method comprises the following steps: login time, user account, user name and time from last login;
page access class:
user high frequency access: the number of times that a single user or a single IP accesses the URL within a certain period of time (for example, within 1 minute, the total number of times that a user accesses each URL of the system by one IP exceeds 30 times, or the number of times that a single URL is accessed exceeds 30 times); the method comprises the following steps: access time period, user account, IP, URL and access times;
the user has large-scale access: the number of different URLs a single user or a single IP accesses over a period of time (e.g., more than 20 URLs a user accesses per IP system over 10 minutes); the method comprises the following steps: access time period, user account, IP, number of different URLs accessed;
sudden increase of menu access volume: the visit rate of a certain URL changes within a certain period of time (for example, the visit rate of a certain URL within a day is increased by more than 20% than the previous month, or is increased by 40% than a yesterday ring rate); the method comprises the following steps: visit time period, URL, system to which the visit belongs, visit quantity, increment than the same period of the previous month, increment than the yesterday ring;
sudden increase in user access volume: the visit amount of a certain user to visit a certain URL in a certain time period changes (for example, the visit amount of a certain user to visit a certain URL in a day exceeds the increase amount of the same month by 20%); the method comprises the following steps: the visit time period, the user account, the URL, the system to which the user belongs, the visit quantity and the increment of the same period of the previous month;
sensitive data query class:
frequent access of sensitive data: the number of times that a single user or a single IP accesses sensitive data within a certain period of time (for example, the number of times that a user accesses certain sensitive data per IP exceeds 10 times within 1 day); the method comprises the following steps: the method comprises the following steps of accessing time period, user account, IP, URL, sensitive data label, accessing mode and accessing times;
sensitive data access regularity: the number of times that a single user accesses sensitive data in a certain period of time (for example, the number of times that a user accesses certain sensitive data in 1 day is increased by 20% compared with the same period in the previous month); the method comprises the following steps: the method comprises the following steps of accessing time period, user account, URL, sensitive data label, accessing mode, accessing times and the increment of the same period of the last month;
sensitive data access period: the number of times the sensitive data is accessed during the non-working period (for example, the number of times the sensitive data is accessed by a user during the non-working period exceeds 5 times); the method comprises the following steps: the method comprises the following steps of accessing time period, user account, IP, URL, sensitive data label, accessing mode and accessing times;
a key business link:
the quantity of key business links suddenly increases: the number of key service link confirmations performed by a single user in a period of time (for example, the number of key service link confirmations performed by a user exceeds 10 times in 1 day); the method comprises the following steps: access time period, user account, key business link name and number.
The user behavior characteristics are expressed as: a user accesses the first n most frequent URLs and the access amount in a certain time period, wherein the time period is divided into m time periods;
monthly cycle behavior characteristics: counting the top n URLs and the access amount of the URLs which are accessed most frequently by one user according to months;
characteristic of ten-day period behavior: dividing a month into upper, middle and lower ten days, respectively counting the top n URLs and the visit amount (n URLs can be counted in each ten days) of the URLs most frequently visited by a user, and combining the results in each ten days to form a total characteristic.
Although embodiments of the present invention have been shown and described, with particular reference to the foregoing detailed description, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. A user behavior image drawing method based on distributed log analysis is characterized by comprising the following steps: the system comprises a log collection module, a log server, an acquisition module, a processing module, an extraction determination module and a visual display module;
the log collection module is in communication connection with the log server, the log collection module collects log data and transmits the collected log data to the log server, and the log server performs distributed analysis on the collected logs;
the acquisition module is in communication connection with the log server and acquires user behavior characteristic data from the log server through the acquisition module;
the processing module is in communication connection with the acquisition module and is used for processing the acquired user behavior characteristic data;
the extraction determination module is in communication connection with the processing module, extracts user characteristics through the extraction determination module, and determines the label to which the user belongs by using a clustering algorithm;
the visual display module is respectively in communication connection with the log server and the extraction determination module, and the visual display module is used for checking the log data of the distributed analysis and the tags of the users;
the method comprises the following steps:
the method comprises the following steps: collecting log data through a log collection module, transmitting the collected log data to a log server, and performing distributed analysis on the collected log by the log server;
step two: collecting user behavior characteristic data from a log server through a collection module;
step three: processing the collected user behavior characteristic data through a processing module;
step four: extracting user characteristics through an extraction determining module, and determining a label to which a user belongs by using a clustering algorithm;
step five: and viewing the log data of the distributed analysis and the label of the user through a visual display module.
2. The method for user behavior imaging based on distributed log analysis as claimed in claim 1, wherein: the behavior characteristic data comprises system login types, page access types, sensitive data query types and key business links.
3. The method for user behavior imaging based on distributed log analysis as claimed in claim 1, wherein: the user behavior characteristics are expressed as: in a certain time period, a user accesses the first n most frequent URLs and the access amount, and the time period is divided into m time periods.
4. The method for user behavior imaging based on distributed log analysis as claimed in claim 3, wherein: the certain time period includes a monthly period and a ten-day period.
5. The user behavior imaging method based on distributed log analysis as claimed in claim 1, wherein: the clustering algorithm comprises a hierarchical clustering algorithm, a divided clustering algorithm and a clustering algorithm based on density and network.
6. The user behavior imaging method based on distributed log analysis as claimed in claim 1, wherein: two important features of the tag presentation: semantization and short text.
7. The user behavior imaging method based on distributed log analysis as claimed in claim 1, wherein: the user behavior portraits are stored in a manner including a relational database and a non-relational database.
8. The method for user behavior imaging based on distributed log analysis as claimed in claim 7, wherein: the non-relational database comprises a key value storage database, a column storage database, a document type database and a graph database.
CN202211492836.9A 2022-11-25 2022-11-25 User behavior image drawing method based on distributed log analysis Pending CN115757963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211492836.9A CN115757963A (en) 2022-11-25 2022-11-25 User behavior image drawing method based on distributed log analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211492836.9A CN115757963A (en) 2022-11-25 2022-11-25 User behavior image drawing method based on distributed log analysis

Publications (1)

Publication Number Publication Date
CN115757963A true CN115757963A (en) 2023-03-07

Family

ID=85338240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211492836.9A Pending CN115757963A (en) 2022-11-25 2022-11-25 User behavior image drawing method based on distributed log analysis

Country Status (1)

Country Link
CN (1) CN115757963A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435449A (en) * 2023-11-06 2024-01-23 广州丰石科技有限公司 User portrait analysis method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435449A (en) * 2023-11-06 2024-01-23 广州丰石科技有限公司 User portrait analysis method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11670021B1 (en) Enhanced graphical user interface for representing events
WO2022117063A1 (en) Method and apparatus for training isolation forest, and method and apparatus for recognizing web crawler
CN108550068B (en) Personalized commodity recommendation method and system based on user behavior analysis
US10650316B2 (en) Issue-manage-style internet public opinion information evaluation management system and method thereof
KR101463974B1 (en) Big data analysis system for marketing and method thereof
Das et al. Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method
US6934687B1 (en) Computer architecture and method for supporting and analyzing electronic commerce over the world wide web for commerce service providers and/or internet service providers
CN111614690B (en) Abnormal behavior detection method and device
CN108776671A (en) A kind of network public sentiment monitoring system and method
CN106021583B (en) Statistical method and system for page flow data
CN104077407B (en) A kind of intelligent data search system and method
CN106682686A (en) User gender prediction method based on mobile phone Internet-surfing behavior
CN102254265A (en) Rich media internet advertisement content matching and effect evaluation method
CN102637178A (en) Music recommending method, music recommending device and music recommending system
TW201327451A (en) Providing information recommendations based on determined user groups
US10467255B2 (en) Methods and systems for analyzing reading logs and documents thereof
CN111447575B (en) Short message pushing method, device, equipment and storage medium
CN113360566A (en) Information content monitoring method and system
CN115757963A (en) User behavior image drawing method based on distributed log analysis
CN112950359B (en) User identification method and device
CN117132226A (en) User behavior auditing and managing system
CN107729206A (en) Real-time analysis method, system and the computer-processing equipment of alarm log
CN116089490A (en) Data analysis method, device, terminal and storage medium
CN115062013A (en) Information recommendation method, device, equipment and storage medium
CN112506800B (en) Method, apparatus, device, medium and program product for testing code

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination