CN115840844B

CN115840844B - Internet platform user behavior analysis system based on big data

Info

Publication number: CN115840844B
Application number: CN202211627028.9A
Authority: CN
Inventors: 张伟; 彭海坤; 袁环; 张连霞
Original assignee: Shenzhen Xinlianxin Network Technology Co ltd
Current assignee: Shenzhen Xinlianxin Network Technology Co ltd
Priority date: 2022-12-17
Filing date: 2022-12-17
Publication date: 2023-08-15
Anticipated expiration: 2042-12-17
Also published as: CN115840844A

Abstract

The application discloses an internet platform user behavior analysis system based on big data, in particular to the technical field of big data analysis, wherein an abnormal language identification module comprises the following steps: step S01, training of input text: obtaining a data sample to be tested from a data storage module, wherein the data sample is a comment text, the comment text is expressed as a sequence vector, t comment texts are provided, each character is converted into a number which can be processed by a model, and finally mapping is carried out to obtain a 128-dimensional character vector; step S02, feature extraction: performing bidirectional sequence extraction on the character vector obtained in the step S01, performing convolution operation on the pooling layer of the X1 input convolution neural network to obtain a feature vector of the comment text, and transmitting the feature vector to the full-connection layer of the convolution neural network; step S03, classifying results: and accessing an activation function F (x) in the full connection layer, and then accessing the classifier to output a classification result.

Description

Internet platform user behavior analysis system based on big data

Technical Field

The application relates to the technical field of big data analysis, in particular to an Internet platform user behavior analysis system based on big data.

Background

The interconnection network platform refers to an infrastructure for providing technical support and service for users, and interconnection and intercommunication among the interconnection network platforms are divided into: the internet content platform, the internet service platform and the internet e-commerce platform are accelerated by a new technological revolution and industry revolution, the digital technology innovation realizes multi-point breakthrough, a novel information infrastructure represented by 5G, artificial intelligence, the Internet of things, the industrial Internet and the satellite Internet gradually becomes new kinetic energy for global economic growth, people are more and more in online time, people are daily on different internet platforms, and a large number of comments are generated by users in the internet content platform.

The network security relates to reality security, more and more abnormal operations such as reaction theory and attack theory exist in the network platform, abnormal users can be obtained through analysis of user behaviors in the internet platform, abnormal traces are positioned, and further security is guaranteed.

At present, different internet user platforms all adopt user language analysis, and abnormal language is identified through extracting features, so that a good network environment is created, the analyzed user behavior emphasis points of the different internet platforms are different, the monitoring method in the existing user abnormal behavior monitoring module is too single, the monitoring method depends on user reporting, a real-time monitoring and early warning function is lacked, and meanwhile, information sharing among the platforms is lacked, so that resource waste is caused.

Disclosure of Invention

In order to overcome the defects in the prior art, the embodiment of the application provides an internet platform user behavior analysis system based on big data, which is used for obtaining an abnormal user list by providing an abnormal language identification model for identifying abnormal language in the big data, then evaluating influence of the abnormal user by an influence evaluation unit, obtaining a user list needing to be monitored in real time according to comprehensive influence, finally obtaining a blacklist according to a monitoring result and sharing the blacklist to other internet platforms, thereby achieving effective supervision and solving the problems of untimely monitoring of user behavior, low monitoring efficiency and lack of sharing data in the background technology.

In order to achieve the above purpose, the present application provides the following technical solutions: an internet platform user behavior analysis system based on big data comprises a data acquisition module, a data preprocessing module, a data storage module, an abnormal language identification module, a user influence evaluation module and an abnormal monitoring module,

the data acquisition module is used for acquiring user data in the internet platform and transmitting the acquired data to the data preprocessing module; the data preprocessing module is used for processing data, cleaning, disturbing and marking the data and transmitting the preprocessed data to the data storage module; the data storage module is used for storing the preprocessed data and waiting for the call of the abnormal language identification module; the abnormal language identification module calls the data in the storage module, identifies the abnormal language of the user in the data and transmits the result to the user influence evaluation module; the user influence evaluation module is used for evaluating influence of abnormal users, transmitting an abnormal user list exceeding a preset value to the abnormal monitoring module, wherein the abnormal monitoring module is used for monitoring behaviors of the abnormal users, obtaining real-time data, enhancing detection of the users, marking the users with multiple abnormal behaviors as a blacklist, transmitting the blacklist users to the sharing unit in the data storage module, and transmitting the blacklist to each internet platform by the data sharing unit;

the abnormal language identification module constructs an abnormal language identification model based on a deep learning algorithm and comprises a text training unit, a feature identification unit and a result classification unit, wherein the abnormal language identification module comprises the following steps:

step S01, training of input text: obtaining a data sample to be tested from a data storage module, wherein the data sample is comment text, the comment text is expressed as a sequence vector, t comment texts are provided, each text comprises z characters, and the ith comment is expressed as di= { w _i1 ，w _i2 ，...w _iz Where w represents a character, each character is converted into a model-processable number, denoted si= { s _i1 ,s _i2 ,...，s _iz Finally, mapping into 128-dimensional character vectors;

step S02, feature extraction: performing bidirectional sequence extraction on the character vector Si obtained in the step S01 to obtain X ¹ ，X ¹ Time sequence information, semantic sequence information and other sequence information of Si are contained; x is to be ¹ The pooling layer of the input convolutional neural network carries out convolution operation to obtain feature vectors of comment texts, and the feature vectors are transmittedA full connection layer to the convolutional neural network, wherein W represents the weight of the feature character and b represents the bias parameter;

step S03, classifying results: and accessing an activation function F (x) in the full connection layer, adding a nonlinear factor for the model, and then accessing a classifier to output a classification result, thereby obtaining an abnormal language user list and a corresponding abnormal language.

In a preferred embodiment, the activation function satisfies the formula for F (x)Where x represents the input of the pooling layer of the convolutional neural network, where a represents the output of the pooling layer and b represents the bias parameter.

In a preferred embodiment, the calculation model of the weight W isWherein W represents the weight of the character, F _in Represents the number of times the nth character appears in the ith sample, M represents the total number of samples, M _n Indicating the number of times the nth character occurs, and α indicates a coefficient parameter.

In a preferred embodiment, the data preprocessing module comprises a data cleaning unit for removing irrelevant words, a data perturbation unit for privacy protection and a data marking unit, wherein the data perturbation unit protects the privacy of a user through a perturbation algorithm, and the perturbation algorithm comprises the following steps: the total attribute in the data set is assumed to be I, wherein the I comprises confidential attributes in the data set; calculating variances of all the I attributes, and assuming j' to be the attribute with the largest variance in the I attributes; calculating the average value of the data in the attribute j ', and dividing the data in the attribute j' into 2 subsets according to the average value; and repeating the steps of calculating the variance and calculating the mean value for any two child nodes, and when the child nodes contain less than the pre-specified record number, replacing the privacy data with the obtained mean value to disturb the data.

In a preferred embodiment, the influence evaluation module includes a data crawling unit, a classification scoring unit, a construction evaluation model unit, and a result judging unit, firstly calculates the number of fans, the number of praise, the liveness of the users, and the social network score, calculates the comprehensive score of the influence of the users through a formula, and when the influence exceeds a preset value, carries out real-time monitoring on abnormal users, and includes the following steps:

step S11, data crawling: according to the list of the abnormal users, crawling the detailed information of the users to obtain a data set;

step S12, classifying and scoring: classifying the number of the user fans, the number of the praise points, the liveness and the social network value to obtain four groups of data sets, and then setting scoring rules to evaluate four aspects of the user to obtain four groups of scores;

step S13, constructing an evaluation model: substituting four sets of scores into the modelCalculating to obtain an influence comprehensive score of the user, wherein A represents a user fan number score, the score is divided into 1-5 grades according to the fan number, ai represents a user fan number score of each internet platform, B represents a praise number of the user, the score is divided into 1-5 grades according to the praise number, and Bi represents a praise number of the user of each internet platform;

step S14, a result judgment unit: and judging whether the comprehensive scoring result of the influence of the user exceeds a preset value, and transmitting a user list exceeding the preset value to an abnormality monitoring module.

In a preferred embodiment, the liveness H includes a posting output liveness H1 and a leave-evaluation liveness H2, hi represents a posting liveness of an ith internet platform, where the social network value P includes a richness P1 of the social network and a affinity social relationship value score P2, pi represents a posting liveness of the ith internet platform, and a calculation formula of the liveness isWherein c1 represents a constant, and w1 represents a posting output liveness h1 weight; the calculation formula of the social network value P is +.>Where c2 represents a constant and w2 represents a posting output liveness p1 weight.

In a preferred embodiment, the data storage unit comprises an original data storage unit, an analysis result storage unit and a temporary cache unit, wherein the original data storage unit is used for storing the data obtained by the preprocessing module; the analysis result storage unit is used for storing data generated by the abnormal language identification module, the user influence evaluation module and the abnormal monitoring module; the temporary caching unit is used for storing the popular analysis data and collecting the data in real time so as to accelerate the response speed of the system.

In order to achieve the above purpose, the present application provides the following technical solutions: a method of a big data based internet platform user behavior analysis system, the method comprising the steps of:

step S21, data preprocessing: transmitting the acquired user data to a data storage unit for waiting for calling after processing;

step S22, constructing an abnormal user language identification model: inputting text training, feature extraction and result classification through a model to obtain an abnormal user list and a corresponding abnormal language, and transmitting the abnormal user list to a user influence evaluation step;

step S23, influence evaluation of the user: calculating influence scores of users according to the number of fan, the number of praise and the liveness of the users and the scores of the social networks, and transmitting an abnormal user list with large influence to a real-time monitoring step;

step S24, monitoring in real time: monitoring the real-time behavior of an abnormal user, timely obtaining an abnormal user blacklist, and transmitting the abnormal user blacklist to a data storage module;

step S25, data sharing: and the characteristics of the blacklist users are extracted, the blacklist users are transmitted to a sharing unit in the data storage module, the data sharing unit transmits the blacklist to each internet platform, and measures are taken according to the retention rate of the users on the platform after the internet receives the blacklist.

In a preferred embodiment, the user retention is the frequency of occurrence of the user within the time d, the user retention within a period of time is plotted with a line graph, the longest time span is set to 60 days, and the next day retention within the 60 days, 2 days later, 3 days later, 4 days later, 5 days later, 6 days later, 7 days later, 15 days later, 30 days later, 60 days later retention are represented using 9 different color broken lines.

The application has the technical effects and advantages that:

the application recognizes abnormal language in big data by providing an abnormal language recognition model to obtain an abnormal user list, then evaluates the influence of the abnormal user by the influence evaluation unit, obtains a user list needing to be monitored in real time according to the comprehensive influence, finally obtains a blacklist according to the monitoring result, and shares the blacklist to other internet platforms, thereby improving the supervision efficiency and saving the calculation resources; and a disturbance algorithm is adopted in data processing, noise is added to the related attributes, and disturbance is carried out on the non-related attributes, so that the range of privacy protection is widened, and the deviation in data mining is reduced.

Drawings

Fig. 1 is a block diagram of a system architecture of the present application.

Fig. 2 is a flowchart of an abnormal language identification module according to the present application.

FIG. 3 is a flowchart of an influence evaluation method according to the present application.

Fig. 4 is a flow chart of a method of the system of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "module," "system," and the like as used herein are intended to include a computer-related entity, such as, but not limited to, hardware, firmware, a combination of hardware and software, or software in execution. For example, a module may be, but is not limited to: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a module. One or more modules may be located in one process and/or thread of execution, and one module may be located on one computer and/or distributed between two or more computers.

The interconnected network platforms in the application realize mutual compatibility by providing interfaces for the other party, and users can enter one platform from the other platform through the interfaces.

Example 1

The embodiment provides an Internet platform user behavior analysis system based on big data as shown in figure 1, which comprises a data acquisition module, a data preprocessing module, a data storage module, an abnormal language identification module, a user influence evaluation module and an abnormal monitoring module,

as shown in fig. two, the abnormal speech recognition module builds an abnormal speech recognition model based on a deep learning algorithm, and the abnormal speech recognition module comprises a text training unit, a feature recognition unit and a result classification unit, and comprises the following steps:

step S02, feature extraction: performing bidirectional sequence extraction on the character vector Si obtained in the step S01 to obtain X ¹ ，X ¹ Time sequence information, semantic sequence information and other sequence information of Si are contained; x is to be ¹ Inputting a pooling layer of the convolutional neural network to perform convolution operation to obtain feature vectors of comment texts, and transmitting the feature vectors to a full-connection layer of the convolutional neural network, wherein W represents weights of feature characters, and b represents bias parameters;

Further, the activation function is F (x) satisfying the formulaWhere x represents the input of the pooling layer of the convolutional neural network, where a represents the output of the pooling layer and b represents the bias parameter.

Further, the calculation model of the weight W is as followsWherein W represents the weight of the character, F _in Represents the number of times the nth character appears in the ith sample, M represents the total number of samples, M _n Indicating the number of times the nth character occurs, and α indicates a coefficient parameter.

Further, the data preprocessing module comprises a data cleaning unit for removing irrelevant words, a data disturbance unit for protecting privacy and a data marking unit, wherein the data disturbance unit protects user privacy through a disturbance algorithm, and the disturbance algorithm comprises the following steps: the total attribute in the data set is assumed to be I, wherein the I comprises confidential attributes in the data set; calculating variances of all the I attributes, and assuming j' to be the attribute with the largest variance in the I attributes; calculating the average value of the data in the attribute j ', and dividing the data in the attribute j' into 2 subsets according to the average value; and repeating the steps of calculating the variance and calculating the mean value for any two child nodes, and when the child nodes contain less than the pre-specified record number, replacing the privacy data with the obtained mean value to disturb the data.

Further, the influence evaluation module includes a data crawling unit, a classification scoring unit, an evaluation model building unit, and a result judging unit, calculates the number of fans, the number of praise, the liveness of the users, and the social network score, calculates the comprehensive score of the influence of the users through a formula, and monitors in real time when the influence exceeds a preset value, as shown in fig. 3, the method includes the following steps:

Further, the liveness H includes a posting output liveness H1 and a leave-evaluation liveness H2, hi represents the posting liveness of the ith internet platform, wherein the social network value P includes the richness P1 and the affinity social relationship value score P2 of the social network, pi represents the posting liveness of the ith internet platform, and the calculation formula of the liveness isWherein c1 represents a constant, and w1 represents a posting output liveness h1 weight; the calculation formula of the social network value P is +.>Where c2 represents a constant and w2 represents a posting output liveness p1 weight.

Further, the data storage unit comprises an original data storage unit, an analysis result storage unit and a temporary cache unit, wherein the original data storage unit is used for storing the data obtained by the preprocessing module; the analysis result storage unit is used for storing data generated by the abnormal language identification module, the user influence evaluation module and the abnormal monitoring module; the temporary caching unit is used for storing the popular analysis data and collecting the data in real time so as to accelerate the response speed of the system.

In order to achieve the above purpose, the present application provides the following technical solutions: an internet platform user behavior analysis system based on big data, as shown in fig. 4, the method comprises the following steps:

Further, the user retention rate is the occurrence frequency of the user in the time d, the user retention rate in a period of time is depicted by a line graph, the longest time span is set to 60 days, and the next day retention, 2 days later retention, 3 days later retention, 4 days later retention, 5 days later retention, 6 days later retention, 7 days later retention, 15 days later retention, 30 days later retention and 60 days later retention in the 60 days are represented by using broken lines with 9 different colors.

To sum up: the abnormal language in the big data is identified by the abnormal language identification model to obtain an abnormal user list, the influence of the abnormal user is evaluated by the influence evaluation unit, the user list needing to be monitored in real time is obtained according to the comprehensive influence, the blacklist is obtained according to the monitoring result, the blacklist is shared to other internet platforms, the user list needing to be monitored in real time is screened out of the abnormal user list through influence comprehensive scoring, and therefore real-time monitoring of a large number of users is avoided, supervision efficiency is improved, and computing resources are saved.

The present embodiment provides only one implementation and does not specifically limit the protection scope of the present application.

Finally: the foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. The utility model provides an internet platform user behavior analysis system based on big data, includes data acquisition module, data preprocessing module, data storage module, abnormal language identification module, user influence evaluation module, abnormal monitoring module, its characterized in that: the data acquisition module is used for acquiring user data in the internet platform and transmitting the acquired data to the data preprocessing module; the data preprocessing module is used for processing data, cleaning, disturbing and marking the data and transmitting the preprocessed data to the data storage module; the data storage module is used for storing the preprocessed data and waiting for the call of the abnormal language identification module; the abnormal language identification module calls the data in the storage module, identifies the abnormal language of the user in the data and transmits the result to the user influence evaluation module; the user influence evaluation module is used for evaluating influence of abnormal users, transmitting an abnormal user list exceeding a preset value to the abnormal monitoring module, wherein the abnormal monitoring module is used for monitoring behaviors of the abnormal users, obtaining real-time data, enhancing detection of the users, marking the users with multiple abnormal behaviors as a blacklist, transmitting the blacklist users to the sharing unit in the data storage module, and transmitting the blacklist to each internet platform by the data sharing unit;

step S03, classifying results: accessing an activation function F (x) in the full connection layer, adding a nonlinear factor for the model, and then accessing a classifier to output a classification result, thereby obtaining an abnormal speech user list and a corresponding abnormal speech;

the influence evaluation module comprises a data crawling unit, a classification scoring unit, an evaluation model building unit and a result judging unit, wherein the number of fan, the number of praise and the liveness of users are calculated, the score of a social network is calculated, the comprehensive score of the influence of the users is obtained through formula calculation, and when the influence exceeds a preset value, abnormal users are monitored in real time, the influence evaluation module comprises the following steps:

step S13, constructing an evaluation model: four groups are combinedScore substitution into a modelCalculating to obtain an influence comprehensive score of the user, wherein A represents a user fan number score, the score is divided into 1-5 grades according to the fan number, ai represents a user fan number score of each internet platform, B represents a praise number of the user, the score is divided into 1-5 grades according to the praise number, and Bi represents a praise number of the user of each internet platform; the liveness H comprises posting output liveness H1 and leave evaluation liveness H2, hi represents the posting liveness of the ith Internet platform, wherein the social network value P comprises the richness P1 and the affinity social relationship value score P2 of the social network, pi represents the posting liveness of the ith Internet platform, and the calculation formula of the liveness is ∈>Wherein c1 represents a constant, and w1 represents a posting output liveness h1 weight; the calculation formula of the social network value P is as followsWherein c2 represents a constant, w2 represents a posting output liveness p1 weight;

2. The internet platform user behavior analysis system based on big data according to claim 1, wherein: the activation function is F (x) which satisfies the formulaWhere x represents the input of the pooling layer of the convolutional neural network, where a represents the output of the pooling layer and b represents the bias parameter.

3. The internet platform user behavior based on big data according to claim 1An analysis system, characterized in that: the calculation model of the weight W is as followsWherein W represents the weight of the character, F _in Represents the number of times the nth character appears in the ith sample, M represents the total number of samples, M _n Indicating the number of times the nth character occurs, and α indicates a coefficient parameter.

4. The internet platform user behavior analysis system based on big data according to claim 1, wherein: the data preprocessing module comprises a data cleaning unit for removing irrelevant words, a data disturbance unit for privacy protection and a data marking unit, wherein the data disturbance unit protects user privacy through a disturbance algorithm, and the disturbance algorithm comprises the following steps: in the dataset, assume that the total amount of attributes is I, where I includes confidential attributes in the dataset; calculating variances of all the I attributes, and assuming j' to be the attribute with the largest variance in the I attributes; calculating the average value of the data in the attribute j ', and dividing the data in the attribute j' into 2 subsets according to the average value; and repeating the steps of calculating the variance and calculating the mean value for any two child nodes, and when the child nodes contain less than the pre-specified record number, replacing the privacy data with the obtained mean value to disturb the data.

5. The internet platform user behavior analysis system based on big data according to claim 1, wherein: the data storage unit comprises an original data storage unit, an analysis result storage unit and a temporary cache unit, wherein the original data storage unit is used for storing data obtained by the preprocessing module; the analysis result storage unit is used for storing data generated by the abnormal language identification module, the user influence evaluation module and the abnormal monitoring module; the temporary caching unit is used for storing the popular analysis data and collecting the data in real time so as to accelerate the response speed of the system.

6. An internet platform user behavior analysis method based on big data, which is used for implementing the internet platform user behavior analysis system based on big data according to any one of claims 1-5, and is characterized in that: the method comprises the following steps:

7. The internet platform user behavior analysis method based on big data according to claim 6, wherein the method comprises the following steps: the user retention rate is the occurrence frequency of the user in the time d, the user retention rate in a period of time is depicted by a line graph, the longest time span is set to 60 days, and the next day retention in the 60 days, 2-day retention, 3-day retention, 4-day retention, 5-day retention, 6-day retention, 7-day retention, 15-day retention, 30-day retention and 60-day retention are represented by using broken lines with 9 different colors.