CN115840844B - Internet platform user behavior analysis system based on big data - Google Patents

Internet platform user behavior analysis system based on big data Download PDF

Info

Publication number
CN115840844B
CN115840844B CN202211627028.9A CN202211627028A CN115840844B CN 115840844 B CN115840844 B CN 115840844B CN 202211627028 A CN202211627028 A CN 202211627028A CN 115840844 B CN115840844 B CN 115840844B
Authority
CN
China
Prior art keywords
data
user
abnormal
module
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211627028.9A
Other languages
Chinese (zh)
Other versions
CN115840844A (en
Inventor
张伟
彭海坤
袁环
张连霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xinlianxin Network Technology Co ltd
Original Assignee
Shenzhen Xinlianxin Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xinlianxin Network Technology Co ltd filed Critical Shenzhen Xinlianxin Network Technology Co ltd
Priority to CN202211627028.9A priority Critical patent/CN115840844B/en
Publication of CN115840844A publication Critical patent/CN115840844A/en
Application granted granted Critical
Publication of CN115840844B publication Critical patent/CN115840844B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses an internet platform user behavior analysis system based on big data, in particular to the technical field of big data analysis, wherein an abnormal language identification module comprises the following steps: step S01, training of input text: obtaining a data sample to be tested from a data storage module, wherein the data sample is a comment text, the comment text is expressed as a sequence vector, t comment texts are provided, each character is converted into a number which can be processed by a model, and finally mapping is carried out to obtain a 128-dimensional character vector; step S02, feature extraction: performing bidirectional sequence extraction on the character vector obtained in the step S01, performing convolution operation on the pooling layer of the X1 input convolution neural network to obtain a feature vector of the comment text, and transmitting the feature vector to the full-connection layer of the convolution neural network; step S03, classifying results: and accessing an activation function F (x) in the full connection layer, and then accessing the classifier to output a classification result.

Description

Internet platform user behavior analysis system based on big data
Technical Field
The application relates to the technical field of big data analysis, in particular to an Internet platform user behavior analysis system based on big data.
Background
The interconnection network platform refers to an infrastructure for providing technical support and service for users, and interconnection and intercommunication among the interconnection network platforms are divided into: the internet content platform, the internet service platform and the internet e-commerce platform are accelerated by a new technological revolution and industry revolution, the digital technology innovation realizes multi-point breakthrough, a novel information infrastructure represented by 5G, artificial intelligence, the Internet of things, the industrial Internet and the satellite Internet gradually becomes new kinetic energy for global economic growth, people are more and more in online time, people are daily on different internet platforms, and a large number of comments are generated by users in the internet content platform.
The network security relates to reality security, more and more abnormal operations such as reaction theory and attack theory exist in the network platform, abnormal users can be obtained through analysis of user behaviors in the internet platform, abnormal traces are positioned, and further security is guaranteed.
At present, different internet user platforms all adopt user language analysis, and abnormal language is identified through extracting features, so that a good network environment is created, the analyzed user behavior emphasis points of the different internet platforms are different, the monitoring method in the existing user abnormal behavior monitoring module is too single, the monitoring method depends on user reporting, a real-time monitoring and early warning function is lacked, and meanwhile, information sharing among the platforms is lacked, so that resource waste is caused.
Disclosure of Invention
In order to overcome the defects in the prior art, the embodiment of the application provides an internet platform user behavior analysis system based on big data, which is used for obtaining an abnormal user list by providing an abnormal language identification model for identifying abnormal language in the big data, then evaluating influence of the abnormal user by an influence evaluation unit, obtaining a user list needing to be monitored in real time according to comprehensive influence, finally obtaining a blacklist according to a monitoring result and sharing the blacklist to other internet platforms, thereby achieving effective supervision and solving the problems of untimely monitoring of user behavior, low monitoring efficiency and lack of sharing data in the background technology.
In order to achieve the above purpose, the present application provides the following technical solutions: an internet platform user behavior analysis system based on big data comprises a data acquisition module, a data preprocessing module, a data storage module, an abnormal language identification module, a user influence evaluation module and an abnormal monitoring module,
the data acquisition module is used for acquiring user data in the internet platform and transmitting the acquired data to the data preprocessing module; the data preprocessing module is used for processing data, cleaning, disturbing and marking the data and transmitting the preprocessed data to the data storage module; the data storage module is used for storing the preprocessed data and waiting for the call of the abnormal language identification module; the abnormal language identification module calls the data in the storage module, identifies the abnormal language of the user in the data and transmits the result to the user influence evaluation module; the user influence evaluation module is used for evaluating influence of abnormal users, transmitting an abnormal user list exceeding a preset value to the abnormal monitoring module, wherein the abnormal monitoring module is used for monitoring behaviors of the abnormal users, obtaining real-time data, enhancing detection of the users, marking the users with multiple abnormal behaviors as a blacklist, transmitting the blacklist users to the sharing unit in the data storage module, and transmitting the blacklist to each internet platform by the data sharing unit;
the abnormal language identification module constructs an abnormal language identification model based on a deep learning algorithm and comprises a text training unit, a feature identification unit and a result classification unit, wherein the abnormal language identification module comprises the following steps:
step S01, training of input text: obtaining a data sample to be tested from a data storage module, wherein the data sample is comment text, the comment text is expressed as a sequence vector, t comment texts are provided, each text comprises z characters, and the ith comment is expressed as di= { w i1 ,w i2 ,...w iz Where w represents a character, each character is converted into a model-processable number, denoted si= { s i1 ,s i2 ,...,s iz Finally, mapping into 128-dimensional character vectors;
step S02, feature extraction: performing bidirectional sequence extraction on the character vector Si obtained in the step S01 to obtain X 1 ,X 1 Time sequence information, semantic sequence information and other sequence information of Si are contained; x is to be 1 The pooling layer of the input convolutional neural network carries out convolution operation to obtain feature vectors of comment texts, and the feature vectors are transmittedA full connection layer to the convolutional neural network, wherein W represents the weight of the feature character and b represents the bias parameter;
step S03, classifying results: and accessing an activation function F (x) in the full connection layer, adding a nonlinear factor for the model, and then accessing a classifier to output a classification result, thereby obtaining an abnormal language user list and a corresponding abnormal language.
In a preferred embodiment, the activation function satisfies the formula for F (x)Where x represents the input of the pooling layer of the convolutional neural network, where a represents the output of the pooling layer and b represents the bias parameter.
In a preferred embodiment, the calculation model of the weight W isWherein W represents the weight of the character, F in Represents the number of times the nth character appears in the ith sample, M represents the total number of samples, M n Indicating the number of times the nth character occurs, and α indicates a coefficient parameter.
In a preferred embodiment, the data preprocessing module comprises a data cleaning unit for removing irrelevant words, a data perturbation unit for privacy protection and a data marking unit, wherein the data perturbation unit protects the privacy of a user through a perturbation algorithm, and the perturbation algorithm comprises the following steps: the total attribute in the data set is assumed to be I, wherein the I comprises confidential attributes in the data set; calculating variances of all the I attributes, and assuming j' to be the attribute with the largest variance in the I attributes; calculating the average value of the data in the attribute j ', and dividing the data in the attribute j' into 2 subsets according to the average value; and repeating the steps of calculating the variance and calculating the mean value for any two child nodes, and when the child nodes contain less than the pre-specified record number, replacing the privacy data with the obtained mean value to disturb the data.
In a preferred embodiment, the influence evaluation module includes a data crawling unit, a classification scoring unit, a construction evaluation model unit, and a result judging unit, firstly calculates the number of fans, the number of praise, the liveness of the users, and the social network score, calculates the comprehensive score of the influence of the users through a formula, and when the influence exceeds a preset value, carries out real-time monitoring on abnormal users, and includes the following steps:
step S11, data crawling: according to the list of the abnormal users, crawling the detailed information of the users to obtain a data set;
step S12, classifying and scoring: classifying the number of the user fans, the number of the praise points, the liveness and the social network value to obtain four groups of data sets, and then setting scoring rules to evaluate four aspects of the user to obtain four groups of scores;
step S13, constructing an evaluation model: substituting four sets of scores into the modelCalculating to obtain an influence comprehensive score of the user, wherein A represents a user fan number score, the score is divided into 1-5 grades according to the fan number, ai represents a user fan number score of each internet platform, B represents a praise number of the user, the score is divided into 1-5 grades according to the praise number, and Bi represents a praise number of the user of each internet platform;
step S14, a result judgment unit: and judging whether the comprehensive scoring result of the influence of the user exceeds a preset value, and transmitting a user list exceeding the preset value to an abnormality monitoring module.
In a preferred embodiment, the liveness H includes a posting output liveness H1 and a leave-evaluation liveness H2, hi represents a posting liveness of an ith internet platform, where the social network value P includes a richness P1 of the social network and a affinity social relationship value score P2, pi represents a posting liveness of the ith internet platform, and a calculation formula of the liveness isWherein c1 represents a constant, and w1 represents a posting output liveness h1 weight; the calculation formula of the social network value P is +.>Where c2 represents a constant and w2 represents a posting output liveness p1 weight.
In a preferred embodiment, the data storage unit comprises an original data storage unit, an analysis result storage unit and a temporary cache unit, wherein the original data storage unit is used for storing the data obtained by the preprocessing module; the analysis result storage unit is used for storing data generated by the abnormal language identification module, the user influence evaluation module and the abnormal monitoring module; the temporary caching unit is used for storing the popular analysis data and collecting the data in real time so as to accelerate the response speed of the system.
In order to achieve the above purpose, the present application provides the following technical solutions: a method of a big data based internet platform user behavior analysis system, the method comprising the steps of:
step S21, data preprocessing: transmitting the acquired user data to a data storage unit for waiting for calling after processing;
step S22, constructing an abnormal user language identification model: inputting text training, feature extraction and result classification through a model to obtain an abnormal user list and a corresponding abnormal language, and transmitting the abnormal user list to a user influence evaluation step;
step S23, influence evaluation of the user: calculating influence scores of users according to the number of fan, the number of praise and the liveness of the users and the scores of the social networks, and transmitting an abnormal user list with large influence to a real-time monitoring step;
step S24, monitoring in real time: monitoring the real-time behavior of an abnormal user, timely obtaining an abnormal user blacklist, and transmitting the abnormal user blacklist to a data storage module;
step S25, data sharing: and the characteristics of the blacklist users are extracted, the blacklist users are transmitted to a sharing unit in the data storage module, the data sharing unit transmits the blacklist to each internet platform, and measures are taken according to the retention rate of the users on the platform after the internet receives the blacklist.
In a preferred embodiment, the user retention is the frequency of occurrence of the user within the time d, the user retention within a period of time is plotted with a line graph, the longest time span is set to 60 days, and the next day retention within the 60 days, 2 days later, 3 days later, 4 days later, 5 days later, 6 days later, 7 days later, 15 days later, 30 days later, 60 days later retention are represented using 9 different color broken lines.
The application has the technical effects and advantages that:
the application recognizes abnormal language in big data by providing an abnormal language recognition model to obtain an abnormal user list, then evaluates the influence of the abnormal user by the influence evaluation unit, obtains a user list needing to be monitored in real time according to the comprehensive influence, finally obtains a blacklist according to the monitoring result, and shares the blacklist to other internet platforms, thereby improving the supervision efficiency and saving the calculation resources; and a disturbance algorithm is adopted in data processing, noise is added to the related attributes, and disturbance is carried out on the non-related attributes, so that the range of privacy protection is widened, and the deviation in data mining is reduced.
Drawings
Fig. 1 is a block diagram of a system architecture of the present application.
Fig. 2 is a flowchart of an abnormal language identification module according to the present application.
FIG. 3 is a flowchart of an influence evaluation method according to the present application.
Fig. 4 is a flow chart of a method of the system of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "module," "system," and the like as used herein are intended to include a computer-related entity, such as, but not limited to, hardware, firmware, a combination of hardware and software, or software in execution. For example, a module may be, but is not limited to: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a module. One or more modules may be located in one process and/or thread of execution, and one module may be located on one computer and/or distributed between two or more computers.
The interconnected network platforms in the application realize mutual compatibility by providing interfaces for the other party, and users can enter one platform from the other platform through the interfaces.
Example 1
The embodiment provides an Internet platform user behavior analysis system based on big data as shown in figure 1, which comprises a data acquisition module, a data preprocessing module, a data storage module, an abnormal language identification module, a user influence evaluation module and an abnormal monitoring module,
the data acquisition module is used for acquiring user data in the internet platform and transmitting the acquired data to the data preprocessing module; the data preprocessing module is used for processing data, cleaning, disturbing and marking the data and transmitting the preprocessed data to the data storage module; the data storage module is used for storing the preprocessed data and waiting for the call of the abnormal language identification module; the abnormal language identification module calls the data in the storage module, identifies the abnormal language of the user in the data and transmits the result to the user influence evaluation module; the user influence evaluation module is used for evaluating influence of abnormal users, transmitting an abnormal user list exceeding a preset value to the abnormal monitoring module, wherein the abnormal monitoring module is used for monitoring behaviors of the abnormal users, obtaining real-time data, enhancing detection of the users, marking the users with multiple abnormal behaviors as a blacklist, transmitting the blacklist users to the sharing unit in the data storage module, and transmitting the blacklist to each internet platform by the data sharing unit;
as shown in fig. two, the abnormal speech recognition module builds an abnormal speech recognition model based on a deep learning algorithm, and the abnormal speech recognition module comprises a text training unit, a feature recognition unit and a result classification unit, and comprises the following steps:
step S01, training of input text: obtaining a data sample to be tested from a data storage module, wherein the data sample is comment text, the comment text is expressed as a sequence vector, t comment texts are provided, each text comprises z characters, and the ith comment is expressed as di= { w i1 ,w i2 ,...w iz Where w represents a character, each character is converted into a model-processable number, denoted si= { s i1 ,s i2 ,...,s iz Finally, mapping into 128-dimensional character vectors;
step S02, feature extraction: performing bidirectional sequence extraction on the character vector Si obtained in the step S01 to obtain X 1 ,X 1 Time sequence information, semantic sequence information and other sequence information of Si are contained; x is to be 1 Inputting a pooling layer of the convolutional neural network to perform convolution operation to obtain feature vectors of comment texts, and transmitting the feature vectors to a full-connection layer of the convolutional neural network, wherein W represents weights of feature characters, and b represents bias parameters;
step S03, classifying results: and accessing an activation function F (x) in the full connection layer, adding a nonlinear factor for the model, and then accessing a classifier to output a classification result, thereby obtaining an abnormal language user list and a corresponding abnormal language.
Further, the activation function is F (x) satisfying the formulaWhere x represents the input of the pooling layer of the convolutional neural network, where a represents the output of the pooling layer and b represents the bias parameter.
Further, the calculation model of the weight W is as followsWherein W represents the weight of the character, F in Represents the number of times the nth character appears in the ith sample, M represents the total number of samples, M n Indicating the number of times the nth character occurs, and α indicates a coefficient parameter.
Further, the data preprocessing module comprises a data cleaning unit for removing irrelevant words, a data disturbance unit for protecting privacy and a data marking unit, wherein the data disturbance unit protects user privacy through a disturbance algorithm, and the disturbance algorithm comprises the following steps: the total attribute in the data set is assumed to be I, wherein the I comprises confidential attributes in the data set; calculating variances of all the I attributes, and assuming j' to be the attribute with the largest variance in the I attributes; calculating the average value of the data in the attribute j ', and dividing the data in the attribute j' into 2 subsets according to the average value; and repeating the steps of calculating the variance and calculating the mean value for any two child nodes, and when the child nodes contain less than the pre-specified record number, replacing the privacy data with the obtained mean value to disturb the data.
Further, the influence evaluation module includes a data crawling unit, a classification scoring unit, an evaluation model building unit, and a result judging unit, calculates the number of fans, the number of praise, the liveness of the users, and the social network score, calculates the comprehensive score of the influence of the users through a formula, and monitors in real time when the influence exceeds a preset value, as shown in fig. 3, the method includes the following steps:
step S11, data crawling: according to the list of the abnormal users, crawling the detailed information of the users to obtain a data set;
step S12, classifying and scoring: classifying the number of the user fans, the number of the praise points, the liveness and the social network value to obtain four groups of data sets, and then setting scoring rules to evaluate four aspects of the user to obtain four groups of scores;
step S13, constructing an evaluation model: substituting four sets of scores into the modelCalculating to obtain an influence comprehensive score of the user, wherein A represents a user fan number score, the score is divided into 1-5 grades according to the fan number, ai represents a user fan number score of each internet platform, B represents a praise number of the user, the score is divided into 1-5 grades according to the praise number, and Bi represents a praise number of the user of each internet platform;
step S14, a result judgment unit: and judging whether the comprehensive scoring result of the influence of the user exceeds a preset value, and transmitting a user list exceeding the preset value to an abnormality monitoring module.
Further, the liveness H includes a posting output liveness H1 and a leave-evaluation liveness H2, hi represents the posting liveness of the ith internet platform, wherein the social network value P includes the richness P1 and the affinity social relationship value score P2 of the social network, pi represents the posting liveness of the ith internet platform, and the calculation formula of the liveness isWherein c1 represents a constant, and w1 represents a posting output liveness h1 weight; the calculation formula of the social network value P is +.>Where c2 represents a constant and w2 represents a posting output liveness p1 weight.
Further, the data storage unit comprises an original data storage unit, an analysis result storage unit and a temporary cache unit, wherein the original data storage unit is used for storing the data obtained by the preprocessing module; the analysis result storage unit is used for storing data generated by the abnormal language identification module, the user influence evaluation module and the abnormal monitoring module; the temporary caching unit is used for storing the popular analysis data and collecting the data in real time so as to accelerate the response speed of the system.
In order to achieve the above purpose, the present application provides the following technical solutions: an internet platform user behavior analysis system based on big data, as shown in fig. 4, the method comprises the following steps:
step S21, data preprocessing: transmitting the acquired user data to a data storage unit for waiting for calling after processing;
step S22, constructing an abnormal user language identification model: inputting text training, feature extraction and result classification through a model to obtain an abnormal user list and a corresponding abnormal language, and transmitting the abnormal user list to a user influence evaluation step;
step S23, influence evaluation of the user: calculating influence scores of users according to the number of fan, the number of praise and the liveness of the users and the scores of the social networks, and transmitting an abnormal user list with large influence to a real-time monitoring step;
step S24, monitoring in real time: monitoring the real-time behavior of an abnormal user, timely obtaining an abnormal user blacklist, and transmitting the abnormal user blacklist to a data storage module;
step S25, data sharing: and the characteristics of the blacklist users are extracted, the blacklist users are transmitted to a sharing unit in the data storage module, the data sharing unit transmits the blacklist to each internet platform, and measures are taken according to the retention rate of the users on the platform after the internet receives the blacklist.
Further, the user retention rate is the occurrence frequency of the user in the time d, the user retention rate in a period of time is depicted by a line graph, the longest time span is set to 60 days, and the next day retention, 2 days later retention, 3 days later retention, 4 days later retention, 5 days later retention, 6 days later retention, 7 days later retention, 15 days later retention, 30 days later retention and 60 days later retention in the 60 days are represented by using broken lines with 9 different colors.
To sum up: the abnormal language in the big data is identified by the abnormal language identification model to obtain an abnormal user list, the influence of the abnormal user is evaluated by the influence evaluation unit, the user list needing to be monitored in real time is obtained according to the comprehensive influence, the blacklist is obtained according to the monitoring result, the blacklist is shared to other internet platforms, the user list needing to be monitored in real time is screened out of the abnormal user list through influence comprehensive scoring, and therefore real-time monitoring of a large number of users is avoided, supervision efficiency is improved, and computing resources are saved.
The present embodiment provides only one implementation and does not specifically limit the protection scope of the present application.
Finally: the foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.

Claims (7)

1. The utility model provides an internet platform user behavior analysis system based on big data, includes data acquisition module, data preprocessing module, data storage module, abnormal language identification module, user influence evaluation module, abnormal monitoring module, its characterized in that: the data acquisition module is used for acquiring user data in the internet platform and transmitting the acquired data to the data preprocessing module; the data preprocessing module is used for processing data, cleaning, disturbing and marking the data and transmitting the preprocessed data to the data storage module; the data storage module is used for storing the preprocessed data and waiting for the call of the abnormal language identification module; the abnormal language identification module calls the data in the storage module, identifies the abnormal language of the user in the data and transmits the result to the user influence evaluation module; the user influence evaluation module is used for evaluating influence of abnormal users, transmitting an abnormal user list exceeding a preset value to the abnormal monitoring module, wherein the abnormal monitoring module is used for monitoring behaviors of the abnormal users, obtaining real-time data, enhancing detection of the users, marking the users with multiple abnormal behaviors as a blacklist, transmitting the blacklist users to the sharing unit in the data storage module, and transmitting the blacklist to each internet platform by the data sharing unit;
the abnormal language identification module constructs an abnormal language identification model based on a deep learning algorithm and comprises a text training unit, a feature identification unit and a result classification unit, wherein the abnormal language identification module comprises the following steps:
step S01, training of input text: obtaining a data sample to be tested from a data storage module, wherein the data sample is comment text, the comment text is expressed as a sequence vector, t comment texts are provided, each text comprises z characters, and the ith comment is expressed as di= { w i1 ,w i2 ,...w iz Where w represents a character, each character is converted into a model-processable number, denoted si= { s i1 ,s i2 ,...,s iz Finally, mapping into 128-dimensional character vectors;
step S02, feature extraction: performing bidirectional sequence extraction on the character vector Si obtained in the step S01 to obtain X 1 ,X 1 Time sequence information, semantic sequence information and other sequence information of Si are contained; x is to be 1 Inputting a pooling layer of the convolutional neural network to perform convolution operation to obtain feature vectors of comment texts, and transmitting the feature vectors to a full-connection layer of the convolutional neural network, wherein W represents weights of feature characters, and b represents bias parameters;
step S03, classifying results: accessing an activation function F (x) in the full connection layer, adding a nonlinear factor for the model, and then accessing a classifier to output a classification result, thereby obtaining an abnormal speech user list and a corresponding abnormal speech;
the influence evaluation module comprises a data crawling unit, a classification scoring unit, an evaluation model building unit and a result judging unit, wherein the number of fan, the number of praise and the liveness of users are calculated, the score of a social network is calculated, the comprehensive score of the influence of the users is obtained through formula calculation, and when the influence exceeds a preset value, abnormal users are monitored in real time, the influence evaluation module comprises the following steps:
step S11, data crawling: according to the list of the abnormal users, crawling the detailed information of the users to obtain a data set;
step S12, classifying and scoring: classifying the number of the user fans, the number of the praise points, the liveness and the social network value to obtain four groups of data sets, and then setting scoring rules to evaluate four aspects of the user to obtain four groups of scores;
step S13, constructing an evaluation model: four groups are combinedScore substitution into a modelCalculating to obtain an influence comprehensive score of the user, wherein A represents a user fan number score, the score is divided into 1-5 grades according to the fan number, ai represents a user fan number score of each internet platform, B represents a praise number of the user, the score is divided into 1-5 grades according to the praise number, and Bi represents a praise number of the user of each internet platform; the liveness H comprises posting output liveness H1 and leave evaluation liveness H2, hi represents the posting liveness of the ith Internet platform, wherein the social network value P comprises the richness P1 and the affinity social relationship value score P2 of the social network, pi represents the posting liveness of the ith Internet platform, and the calculation formula of the liveness is ∈>Wherein c1 represents a constant, and w1 represents a posting output liveness h1 weight; the calculation formula of the social network value P is as followsWherein c2 represents a constant, w2 represents a posting output liveness p1 weight;
step S14, a result judgment unit: and judging whether the comprehensive scoring result of the influence of the user exceeds a preset value, and transmitting a user list exceeding the preset value to an abnormality monitoring module.
2. The internet platform user behavior analysis system based on big data according to claim 1, wherein: the activation function is F (x) which satisfies the formulaWhere x represents the input of the pooling layer of the convolutional neural network, where a represents the output of the pooling layer and b represents the bias parameter.
3. The internet platform user behavior based on big data according to claim 1An analysis system, characterized in that: the calculation model of the weight W is as followsWherein W represents the weight of the character, F in Represents the number of times the nth character appears in the ith sample, M represents the total number of samples, M n Indicating the number of times the nth character occurs, and α indicates a coefficient parameter.
4. The internet platform user behavior analysis system based on big data according to claim 1, wherein: the data preprocessing module comprises a data cleaning unit for removing irrelevant words, a data disturbance unit for privacy protection and a data marking unit, wherein the data disturbance unit protects user privacy through a disturbance algorithm, and the disturbance algorithm comprises the following steps: in the dataset, assume that the total amount of attributes is I, where I includes confidential attributes in the dataset; calculating variances of all the I attributes, and assuming j' to be the attribute with the largest variance in the I attributes; calculating the average value of the data in the attribute j ', and dividing the data in the attribute j' into 2 subsets according to the average value; and repeating the steps of calculating the variance and calculating the mean value for any two child nodes, and when the child nodes contain less than the pre-specified record number, replacing the privacy data with the obtained mean value to disturb the data.
5. The internet platform user behavior analysis system based on big data according to claim 1, wherein: the data storage unit comprises an original data storage unit, an analysis result storage unit and a temporary cache unit, wherein the original data storage unit is used for storing data obtained by the preprocessing module; the analysis result storage unit is used for storing data generated by the abnormal language identification module, the user influence evaluation module and the abnormal monitoring module; the temporary caching unit is used for storing the popular analysis data and collecting the data in real time so as to accelerate the response speed of the system.
6. An internet platform user behavior analysis method based on big data, which is used for implementing the internet platform user behavior analysis system based on big data according to any one of claims 1-5, and is characterized in that: the method comprises the following steps:
step S21, data preprocessing: transmitting the acquired user data to a data storage unit for waiting for calling after processing;
step S22, constructing an abnormal user language identification model: inputting text training, feature extraction and result classification through a model to obtain an abnormal user list and a corresponding abnormal language, and transmitting the abnormal user list to a user influence evaluation step;
step S23, influence evaluation of the user: calculating influence scores of users according to the number of fan, the number of praise and the liveness of the users and the scores of the social networks, and transmitting an abnormal user list with large influence to a real-time monitoring step;
step S24, monitoring in real time: monitoring the real-time behavior of an abnormal user, timely obtaining an abnormal user blacklist, and transmitting the abnormal user blacklist to a data storage module;
step S25, data sharing: and the characteristics of the blacklist users are extracted, the blacklist users are transmitted to a sharing unit in the data storage module, the data sharing unit transmits the blacklist to each internet platform, and measures are taken according to the retention rate of the users on the platform after the internet receives the blacklist.
7. The internet platform user behavior analysis method based on big data according to claim 6, wherein the method comprises the following steps: the user retention rate is the occurrence frequency of the user in the time d, the user retention rate in a period of time is depicted by a line graph, the longest time span is set to 60 days, and the next day retention in the 60 days, 2-day retention, 3-day retention, 4-day retention, 5-day retention, 6-day retention, 7-day retention, 15-day retention, 30-day retention and 60-day retention are represented by using broken lines with 9 different colors.
CN202211627028.9A 2022-12-17 2022-12-17 Internet platform user behavior analysis system based on big data Active CN115840844B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211627028.9A CN115840844B (en) 2022-12-17 2022-12-17 Internet platform user behavior analysis system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211627028.9A CN115840844B (en) 2022-12-17 2022-12-17 Internet platform user behavior analysis system based on big data

Publications (2)

Publication Number Publication Date
CN115840844A CN115840844A (en) 2023-03-24
CN115840844B true CN115840844B (en) 2023-08-15

Family

ID=85578773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211627028.9A Active CN115840844B (en) 2022-12-17 2022-12-17 Internet platform user behavior analysis system based on big data

Country Status (1)

Country Link
CN (1) CN115840844B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116662769B (en) * 2023-08-02 2023-10-13 北京数字悦动科技有限公司 User behavior analysis system and method based on deep learning model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940732A (en) * 2016-05-30 2017-07-11 国家计算机网络与信息安全管理中心 A kind of doubtful waterborne troops towards microblogging finds method
CN106980692A (en) * 2016-05-30 2017-07-25 国家计算机网络与信息安全管理中心 A kind of influence power computational methods based on microblogging particular event
CN108038240A (en) * 2017-12-26 2018-05-15 武汉大学 Based on content, the social networks rumour detection method of user's multiplicity
CN111950717A (en) * 2020-08-27 2020-11-17 桂林电子科技大学 Public opinion quantification method based on neural network
US11206277B1 (en) * 2020-11-24 2021-12-21 Korea Internet & Security Agency Method and apparatus for detecting abnormal behavior in network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704715B (en) * 2019-10-18 2022-05-17 南京航空航天大学 Network overlord ice detection method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940732A (en) * 2016-05-30 2017-07-11 国家计算机网络与信息安全管理中心 A kind of doubtful waterborne troops towards microblogging finds method
CN106980692A (en) * 2016-05-30 2017-07-25 国家计算机网络与信息安全管理中心 A kind of influence power computational methods based on microblogging particular event
CN108038240A (en) * 2017-12-26 2018-05-15 武汉大学 Based on content, the social networks rumour detection method of user's multiplicity
CN111950717A (en) * 2020-08-27 2020-11-17 桂林电子科技大学 Public opinion quantification method based on neural network
US11206277B1 (en) * 2020-11-24 2021-12-21 Korea Internet & Security Agency Method and apparatus for detecting abnormal behavior in network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Scalping Anomaly Detection Based on Mobile Internet Traffic Data;Wu, Chuting et al.;PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND COMMUNICATION ENGINEERING (ICTCE 2018);第237-244页 *

Also Published As

Publication number Publication date
CN115840844A (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN111798312B (en) Financial transaction system anomaly identification method based on isolated forest algorithm
WO2021217855A1 (en) Abnormal root cause positioning method and apparatus, and electronic device and storage medium
CN110929145B (en) Public opinion analysis method, public opinion analysis device, computer device and storage medium
US20200193092A1 (en) Perceptual associative memory for a neuro-linguistic behavior recognition system
CN115840844B (en) Internet platform user behavior analysis system based on big data
CN114780727A (en) Text classification method and device based on reinforcement learning, computer equipment and medium
CN110636066B (en) Network security threat situation assessment method based on unsupervised generative reasoning
CN111538931A (en) Big data-based public opinion monitoring method and device, computer equipment and medium
CN115941322B (en) Attack detection method, device, equipment and storage medium based on artificial intelligence
CN115296933B (en) Industrial production data risk level assessment method and system
CN115454706A (en) System abnormity determining method and device, electronic equipment and storage medium
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN113674846A (en) Hospital intelligent service public opinion monitoring platform based on LSTM network
CN113746790A (en) Abnormal flow management method, electronic device and storage medium
CN116383645A (en) Intelligent system health degree monitoring and evaluating method based on anomaly detection
CN110796565A (en) Analysis method and analysis system for supervision logs
CN113612777B (en) Training method, flow classification method, device, electronic equipment and storage medium
CN116319033A (en) Network intrusion attack detection method, device, equipment and storage medium
CN111209158B (en) Mining monitoring method and cluster monitoring system for server cluster
CN111209391A (en) Information identification model establishing method and system and interception method and system
CN114461763A (en) Network security event extraction method based on burst word clustering
CN111783843A (en) Feature selection method and device and computer system
US12032909B2 (en) Perceptual associative memory for a neuro-linguistic behavior recognition system
CN115374883A (en) Abnormal value processing method and system for time series data
CN114548076A (en) Intelligent scoring method for content file and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Big Data Based User Behavior Analysis System for Internet Platforms

Effective date of registration: 20231225

Granted publication date: 20230815

Pledgee: Guangdong Development Bank Limited by Share Ltd. Shenzhen branch

Pledgor: Shenzhen Xinlianxin Network Technology Co.,Ltd.

Registration number: Y2023980073837

PE01 Entry into force of the registration of the contract for pledge of patent right