CN114372090A - User reading behavior analysis and prediction system under big data environment - Google Patents

User reading behavior analysis and prediction system under big data environment Download PDF

Info

Publication number
CN114372090A
CN114372090A CN202111662826.0A CN202111662826A CN114372090A CN 114372090 A CN114372090 A CN 114372090A CN 202111662826 A CN202111662826 A CN 202111662826A CN 114372090 A CN114372090 A CN 114372090A
Authority
CN
China
Prior art keywords
user
distribution
data
text
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111662826.0A
Other languages
Chinese (zh)
Inventor
李丹丹
段娟
肖创柏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202111662826.0A priority Critical patent/CN114372090A/en
Publication of CN114372090A publication Critical patent/CN114372090A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a system for analyzing and predicting reading behaviors of users in a big data environment, which comprises: the system comprises a text data correlation analysis unit, a user data correlation analysis unit, a data abnormity analysis unit and a user behavior prediction unit; the user reading behavior analysis and prediction system under the big data environment is divided into a user data storage layer, a user data processing layer, a user data analysis and modeling layer, a service layer and a display layer. The user data processing layer comprises source data acquisition, source data cleaning, data storage, data management and maintenance. The user data analysis and modeling layer includes code for text data correlation analysis, user data correlation analysis, data anomaly analysis, and user behavior prediction. The service layer comprises a data service, a behavior service, a user service, a portrait service and a forecast service. The display layer is mainly responsible for displaying the result of the statistical analysis on the interface. The system is beneficial to code maintainability, readability and flexibility, and is beneficial to system management and maintenance.

Description

User reading behavior analysis and prediction system under big data environment
Technical Field
The invention belongs to the technical field of computer and big data application analysis, and particularly relates to a system for analyzing and predicting reading behaviors of a user.
Background
In the context of big data, analyzing user behavior has great significance, and user portrait, user behavior anomaly detection and user behavior prediction are three important parts in user behavior analysis. Through the analysis and prediction of the data, the value of the data is fully exerted, the rapid development of enterprises is promoted, and data information with higher value is provided for the enterprises. The technical subject of the invention is to construct a user behavior analysis and prediction system by collecting and analyzing the behavior data of the user in the search application. The system can rapidly and efficiently discover the relationship among users, behaviors and data, so that the users, keywords and data portraits are further constructed. The user portrait is a label model of the user in the aspects of basic attributes, behavior characteristics, social networks, psychological characteristics, interests and hobbies and the like obtained by analyzing the user behavior data. According to the characteristics of the user behaviors, a better user normal behavior outline is established, and the deviation degree of the actual activities of the user and the normal outline is detected to judge whether the user belongs to the abnormal behaviors. And the user behavior data and the portrait data are utilized to predict the user behavior, optimize the user experience and provide better personalized search service.
Disclosure of Invention
The invention aims at analyzing and predicting user behavior, and a functional diagram of the invention is shown in figure 1.
The technical scheme adopted by the invention is a user reading behavior analysis and prediction system under a big data environment, and the system comprises: the system comprises a text data correlation analysis unit, a user data correlation analysis unit, a data abnormity analysis unit and a user behavior prediction unit; wherein:
the user reading behavior analysis and prediction system under the big data environment can be divided into a user data storage layer, a user data processing layer, a user data analysis and modeling layer, a service layer and a display layer. The user data storage layer is information stored in MySql. The user data processing layer comprises source data acquisition, source data cleaning, data storage, data management and maintenance. The user data analysis and modeling layer includes code for text data correlation analysis, user data correlation analysis, data anomaly analysis, and user behavior prediction. The service layer comprises a data service, a behavior service, a user service, a portrait service and a forecast service. The display layer is mainly responsible for displaying the result of the statistical analysis on the interface.
The text data related analysis unit is used for carrying out multi-dimensional mining on a large amount of text data in a website and researching the text data, so that service is better provided for users. The text data analysis includes text base information, text portraits, and text statistics.
The text basic information comprises a title, an author, a year, a brief introduction, keywords, a price, a label, adding time and article classification.
The text portrait comprises search quantity, click quantity, reading quantity, comment quantity, praise quantity, collection quantity and exposure quantity.
The text statistical information comprises text search quantity ranking distribution, text search conversion rate distribution, text click quantity distribution, text reading quantity ranking distribution, text comment quantity ranking distribution, text praise quantity ranking distribution, text collection quantity ranking distribution, text exposure quantity ranking distribution, text reading user quantity distribution, text reading time distribution, text related keywords distribution, text label distribution, text classification distribution, keyword search quantity distribution, keyword search conversion rate distribution, keyword click quantity distribution, keyword affiliated classification distribution, keyword hit article distribution, search user ranking distribution and article classification distribution.
The user data correlation analysis unit is used for carrying out preliminary statistical analysis on log information of user surfing the internet, then carrying out deep research on the user behavior by combining the actual needs of projects and utilizing data mining, finding out the use preference and behavior rules of the user visiting the website, and improving the problems of the website by combining the rules with the strategy of website marketing.
The user data analysis includes user basic information, user portrayal and user statistical information.
The user basic information comprises a user name, a name, an age, a gender, a contact address, a registered IP, a login place, an operator, adding time and latest operation time.
The user profile includes successful search volume, failed search volume, unchecked search volume, total clicked volume, total read volume, review volume, endorsement volume, and collection volume.
The user statistical information comprises user search volume ranking distribution, user search conversion rate statistics, user click volume ranking distribution, user reading time period distribution, user comment volume ranking distribution, user approval volume ranking distribution, user collection volume ranking distribution, user registration time distribution, user access time distribution, user affiliated region distribution, user use operator distribution, user use time interval time distribution, user browsing conversion rate statistics, search click rate statistics and user label distribution.
The data anomaly analysis unit is under the normal behavior outline of the user, and has local contingency while presenting certain regularity on the whole. This part of the contingency is considered anomalous data due to deviations from the user's general behavior.
The data anomaly analysis comprises data anomaly basic information and data anomaly statistical information.
The basic information of the data exception comprises a serial number, a name, a content introduction, a keyword, a type, an exception time, a user, a place and a search IP.
The data anomaly statistical information comprises distribution of violation hit keywords, distribution of user ip anomalies, distribution of comment content violations, distribution of user search vocabulary anomalies, distribution of user search quantity anomalies, distribution of user click quantity anomalies, distribution of user reading time period anomalies, distribution of user comment quantity anomalies, distribution of user approval quantity anomalies, distribution of user collection quantities anomalies and distribution of user access time period anomalies.
The user behavior prediction unit is used for carrying out statistical analysis on various factors influencing the user and carrying out modeling research according to the analyzed characteristics. And finally, selecting the user behavior characteristics to construct a user behavior prediction model. The main prediction indexes are user search word prediction, user search word abnormity prediction, user search behavior frequency abnormity prediction, user search article abnormity prediction, user article clicking abnormity prediction, user article reading prediction and user article reading abnormity prediction.
Most of the existing systems use Java language, and the present invention uses PHP language. The simple and elegant characteristics of Laravel enable the code implementation flow of the system to be simplified in the code writing process. Meanwhile, the good support of RESTful greatly helps to realize the front-end and back-end separation of the system. Meanwhile, the Laravel design idea is the most advanced of all mainstream PHP frameworks at present, and is very suitable for being applied to various development modes. Such as IoC containers, dependent injection, etc. The good support of the composition makes the management of project dependence simpler and more convenient, and plays a vital role in the whole system development process. The system adopts a Model-View-Controller (Model-View-Controller) architecture mode, and the Model-View-Controller architecture mode is divided into three components, namely a Model, a View and a Controller. Wherein the Model layer is responsible for how to Model the data. The View layer is responsible for user interface generation, how to present data obtained from the Model layer to the terminal and provide interaction. The Controller layer is responsible for the butt joint of the Model layer and the View layer, the butt joint is mainly corresponding to two ends, one end is a data source which requests the Model for processing, the other end transmits the processing result to the View in a certain mode, and the middle specific process is the layer responsible for the Controller. The design mode is used for decoupling, so that 3 components do not depend on each other, and the code maintainability, readability and flexibility are facilitated, and the system management and maintenance are facilitated.
Drawings
FIG. 1 is an overall functional diagram of the present invention.
Figure 2 is a system architecture diagram of the present invention.
FIG. 3 is a flow chart of the operation of the present invention.
FIG. 4 is a flow chart of the K-Means algorithm of the present invention.
FIG. 5 is a collaborative filtering technique predictive mechanization map of the present invention.
Detailed Description
The system architecture diagram of the invention is shown in fig. 2, when a user operates the system, the behavior log generated by the user is collected and analyzed, and the behavior log is analyzed and displayed to the interface. The overall structure of the system is shown in figure 2. The system can be divided into a user data storage layer, a user data processing layer, a user data analysis and modeling layer, a service layer and a display layer.
The user data storage layer stores the text information, the user information, the behavior log of the user and the result of the statistical analysis of the text and the user in MySql and Redis.
The user data processing layer comprises source data acquisition, source data cleaning, data storage, data management and maintenance. The source data mainly comes from user operation logs, data are collected through code embedded points, operation is carried out on each user request, information supplement is carried out to form the user logs, and then the user logs are stored in MySql and Redis through PHP codes. Or importing the historical behavior data of the users in the json format in batch, analyzing the data and writing the data into the user log.
The user data analysis and modeling layer includes code for text data correlation analysis, user data correlation analysis, data anomaly analysis, and user behavior prediction. Mainly completed by PHP codes in a controller under a Laravel framework. Adopts Laravel 5.5 frame. The Request life cycle of Laravel is shown in FIG. 3, after receiving a user Request (Request), the Request is assigned to a Route (Route) by Laravel for processing, and through the Route website and the method, it can be known to which Controller (Controller) the requested data is to be processed, but before being processed by the Controller, the requested data is processed by Middleware (Middleware) and then delivered to the Controller for processing. After the controller receives the request data, the correctness of the data is confirmed through a verifier (Validator), the data is sent to a to-do work note (Redis) through a Job (Job), a Queue (Queue) secretary is requested to assist in processing background work, the data of the database (MySql) is obtained through a Model (Eloquest Model), and a data interface is output to a user through the template (Blade). And determining abnormal data through a K-Means clustering algorithm. The flow chart of the K-Means clustering algorithm is shown in the attached figure 3, and the algorithm steps are as follows:
(1) selecting an initial cluster center for each cluster;
(2) distributing the sample set to the nearest cluster according to the minimum distance principle;
(3) updating the cluster center using the sample mean of each cluster;
(4) repeating the steps (2) and (3) to know that the clustering center is not changed any more;
(5) outputting the final clustering center and k cluster partitions;
and taking the value which is far away from the clustering center as abnormal data.
And (4) predicting by adopting a collaborative filtering algorithm. The algorithm mainly utilizes the similarity degree between the abnormal behaviors, and when the similarity degree between the abnormal behaviors of the user and the abnormal behaviors is high, the possible abnormal behaviors of the user can be predicted. The predictive mechanism of the algorithm is shown in figure 5. The user A has abnormal behaviors A and C, the user B has abnormal behaviors A, B and C, and the user C has only abnormal behavior A, so that the abnormal behaviors A and C are relatively high in the abnormal behaviors of the user, and the abnormal behaviors B and A, C are relatively low in the abnormal behaviors of the user. If the user C has the abnormal behavior C, and the degree of similarity between the abnormal behavior a and the abnormal behavior C is high, it can be considered that the user C may have the abnormal behavior C, so that it can be predicted that the user C may have the abnormal behavior C. The main flow of the algorithm is generally consistent with the content collaborative filtering technology.
The flow of the collected user information is consistent with the user collaborative filtering technique. The nearest neighbor search of the algorithm mainly aims at the abnormal behaviors of the user, and a behavior closest to the abnormal behaviors of the user is found by using a correlation calculation method. The process of generating the prediction list primarily utilizes the most similar set of behaviors that are obtained.
The service layer comprises a data service, a behavior service, a user service, a portrait service and a forecast service. The main function of the service layer is to provide services to the outside world, and all requests must be routed well before they can be accessed. The handler of the route definition can be correctly accessed through the specified URI, HTTP request method and route parameters. For example, when a client requests a URI in an HTTP GET manner, larvel will finally dispatch the request to an index method of the corresponding class for processing, and then return a response to the client in the index method.
The display layer is mainly responsible for displaying the result of the statistical analysis on the interface. The presentation layer is a bridge for communication between the user and the system, and provides an interactive tool for the user on one hand, and also realizes certain logic for displaying and submitting data so as to coordinate the operation of the user and the system on the other hand. The front end adopts an html, css and js code development interface, and the JQuery Ajax technology is used for communicating with a Controller (Controller) to finish the read-write operation of data.

Claims (10)

1. User's reading action analysis and prediction system under big data environment, its characterized in that: the method comprises the following steps: the system comprises a text data correlation analysis unit, a user data correlation analysis unit, a data abnormity analysis unit and a user behavior prediction unit; wherein:
the user reading behavior analysis and prediction system under the big data environment can be divided into a user data storage layer, a user data processing layer, a user data analysis and modeling layer, a service layer and a display layer; the user data storage layer is used for storing information in MySql; the user data processing layer comprises source data acquisition, source data cleaning, data storage, data management and maintenance; the user data analysis and modeling layer comprises codes for text data correlation analysis, user data correlation analysis, data anomaly analysis and user behavior prediction; the service layer comprises a data service, a behavior service, a user service, a portrait service and a forecast service; the display layer is mainly responsible for displaying the result of the statistical analysis on an interface;
the text data related analysis unit is used for carrying out multi-dimensional mining on a large amount of text data in a website and researching the text data, so that service is better provided for a user; the text data analysis comprises text basic information, text portrait and text statistical information;
the basic information of the text comprises a title, an author, a year, a brief introduction, keywords, a price, a label, adding time and article classification;
the text portrait comprises search quantity, click quantity, reading quantity, comment quantity, praise quantity, collection quantity and exposure quantity;
the text statistical information comprises text search quantity ranking distribution, text search conversion rate distribution, text click quantity distribution, text reading quantity ranking distribution, text comment quantity ranking distribution, text praise quantity ranking distribution, text collection quantity ranking distribution, text exposure quantity ranking distribution, text reading user quantity distribution, text reading time distribution, text related keywords distribution, text label distribution, text classification distribution, keyword search quantity distribution, keyword search conversion rate distribution, keyword click quantity distribution, keyword affiliated classification distribution, keyword hit article distribution, search user ranking distribution and article classification distribution;
the user data correlation analysis unit is used for carrying out preliminary statistical analysis on log information of user surfing the internet, then carrying out deep research on the user behavior by combining the actual needs of projects and utilizing data mining, finding out the use preference and behavior rules of the user visiting the website, and improving the problems of the website by combining the rules with the strategy of website marketing.
2. The big data environment user reading behavior analysis and prediction system of claim 1, wherein: the user data analysis includes user basic information, user portrayal and user statistical information.
3. The big data environment user reading behavior analysis and prediction system of claim 1, wherein: the user basic information comprises a user name, a name, an age, a gender, a contact address, a registered IP, a login place, an operator, adding time and latest operation time.
4. The big data environment user reading behavior analysis and prediction system of claim 1, wherein: the user profile includes successful search volume, failed search volume, unchecked search volume, total clicked volume, total read volume, review volume, endorsement volume, and collection volume.
5. The big data environment user reading behavior analysis and prediction system of claim 1, wherein: the user statistical information comprises user search volume ranking distribution, user search conversion rate statistics, user click volume ranking distribution, user reading time period distribution, user comment volume ranking distribution, user approval volume ranking distribution, user collection volume ranking distribution, user registration time distribution, user access time distribution, user affiliated region distribution, user use operator distribution, user use time interval time distribution, user browsing conversion rate statistics, search click rate statistics and user label distribution.
6. The big data environment user reading behavior analysis and prediction system of claim 1, wherein: the data anomaly analysis unit is under the normal behavior outline of the user, and has local contingency while presenting certain regularity on the whole; this part of the contingency is considered anomalous data due to deviations from the user's general behavior.
7. The big data environment user reading behavior analysis and prediction system of claim 1, wherein: the data anomaly analysis comprises data anomaly basic information and data anomaly statistical information.
8. The big data environment user reading behavior analysis and prediction system of claim 1, wherein: the basic information of the data exception comprises a serial number, a name, a content introduction, a keyword, a type, an exception time, a user, a place and a search IP.
9. The big data environment user reading behavior analysis and prediction system of claim 1, wherein: the data anomaly statistical information comprises distribution of violation hit keywords, distribution of user ip anomalies, distribution of comment content violations, distribution of user search vocabulary anomalies, distribution of user search quantity anomalies, distribution of user click quantity anomalies, distribution of user reading time period anomalies, distribution of user comment quantity anomalies, distribution of user approval quantity anomalies, distribution of user collection quantities anomalies and distribution of user access time period anomalies.
10. The big data environment user reading behavior analysis and prediction system of claim 1, wherein: the user behavior prediction unit is used for carrying out statistical analysis on various factors influencing the user and carrying out modeling research according to the analyzed characteristics; finally, selecting user behavior characteristics to construct a user behavior prediction model; the main prediction indexes are user search word prediction, user search word abnormity prediction, user search behavior frequency abnormity prediction, user search article abnormity prediction, user article clicking abnormity prediction, user article reading prediction and user article reading abnormity prediction.
CN202111662826.0A 2021-12-31 2021-12-31 User reading behavior analysis and prediction system under big data environment Pending CN114372090A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111662826.0A CN114372090A (en) 2021-12-31 2021-12-31 User reading behavior analysis and prediction system under big data environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111662826.0A CN114372090A (en) 2021-12-31 2021-12-31 User reading behavior analysis and prediction system under big data environment

Publications (1)

Publication Number Publication Date
CN114372090A true CN114372090A (en) 2022-04-19

Family

ID=81142073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111662826.0A Pending CN114372090A (en) 2021-12-31 2021-12-31 User reading behavior analysis and prediction system under big data environment

Country Status (1)

Country Link
CN (1) CN114372090A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182605A (en) * 2018-01-11 2018-06-19 厦门快商通信息技术有限公司 A kind of user's behavior prediction method and system based on user's portrait
CN111078994A (en) * 2019-11-06 2020-04-28 珠海健康云科技有限公司 Portrait-based medical science popularization article recommendation method and system
CN111460333A (en) * 2020-03-30 2020-07-28 北京工业大学 Real-time search data analysis system
CN112256755A (en) * 2020-10-20 2021-01-22 中电科新型智慧城市研究院有限公司福州分公司 Student abnormal behavior analysis method based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182605A (en) * 2018-01-11 2018-06-19 厦门快商通信息技术有限公司 A kind of user's behavior prediction method and system based on user's portrait
CN111078994A (en) * 2019-11-06 2020-04-28 珠海健康云科技有限公司 Portrait-based medical science popularization article recommendation method and system
CN111460333A (en) * 2020-03-30 2020-07-28 北京工业大学 Real-time search data analysis system
CN112256755A (en) * 2020-10-20 2021-01-22 中电科新型智慧城市研究院有限公司福州分公司 Student abnormal behavior analysis method based on deep learning

Similar Documents

Publication Publication Date Title
US10832219B2 (en) Using feedback to create and modify candidate streams
US10546006B2 (en) Method and system for hybrid information query
US8990241B2 (en) System and method for recommending queries related to trending topics based on a received query
KR102472572B1 (en) Method for profiling user's intention and apparatus therefor
US9064212B2 (en) Automatic event categorization for event ticket network systems
JP2023029931A (en) Syntactic analysis of named entity and determination of rhetorical relationship for cross document based on identification
CN105677844A (en) Mobile advertisement big data directional pushing and user cross-screen recognition method
US20070198459A1 (en) System and method for online information analysis
TW200925970A (en) Customized today module
WO2018040069A1 (en) Information recommendation system and method
WO2014107440A2 (en) Social media impact assessment
CN102073725A (en) Method for searching structured data and search engine system for implementing same
CN115002200B (en) Message pushing method, device, equipment and storage medium based on user portrait
WO2014107441A2 (en) Social media impact assessment
CN113544689A (en) Generating and providing additional content for a source view of a document
Gezici et al. Neural sentiment analysis of user reviews to predict user ratings
TWI480749B (en) Method of identifying organic search engine optimization
Berko et al. Features of information resources processing in electronic content commerce
CN117132226A (en) User behavior auditing and managing system
CN112749238A (en) Search ranking method and device, electronic equipment and computer-readable storage medium
CN114372090A (en) User reading behavior analysis and prediction system under big data environment
CN110737749B (en) Entrepreneurship plan evaluation method, entrepreneurship plan evaluation device, computer equipment and storage medium
KR102653187B1 (en) web crawling-based learning data preprocessing electronic device and method thereof
JP7355322B1 (en) Email element setting system and email subject setting support system
CN112182165B (en) New product quality planning method based on online comments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination