CN116662769B - User behavior analysis system and method based on deep learning model - Google Patents
User behavior analysis system and method based on deep learning model Download PDFInfo
- Publication number
- CN116662769B CN116662769B CN202310961231.8A CN202310961231A CN116662769B CN 116662769 B CN116662769 B CN 116662769B CN 202310961231 A CN202310961231 A CN 202310961231A CN 116662769 B CN116662769 B CN 116662769B
- Authority
- CN
- China
- Prior art keywords
- malicious
- reporting
- language
- user
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000006399 behavior Effects 0.000 title claims abstract description 106
- 238000004458 analytical method Methods 0.000 title claims abstract description 92
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000013136 deep learning model Methods 0.000 title claims abstract description 18
- 238000004364 calculation method Methods 0.000 claims abstract description 32
- 238000013135 deep learning Methods 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 11
- 238000012163 sequencing technique Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 34
- 238000013145 classification model Methods 0.000 claims description 14
- 230000000694 effects Effects 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 9
- 210000002569 neuron Anatomy 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 230000001186 cumulative effect Effects 0.000 claims description 5
- 230000003213 activating effect Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000001303 quality assessment method Methods 0.000 claims 1
- 239000000758 substrate Substances 0.000 claims 1
- 230000014509 gene expression Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000007526 fusion splicing Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application discloses a user behavior analysis system and a method based on a deep learning model, and particularly relates to the field of deep learning, wherein the system comprises a task acquisition module, a priority calculation module and a user speaking analysis module, wherein the task acquisition module is used for acquiring a reporting task in a network platform; the priority calculating module is used for acquiring the priority indexes of the reporting tasks and sequencing the reporting tasks according to the priority indexes of the reporting tasks; analyzing the reporting task to obtain the information of the reporting person, obtaining the importance degree of the reporting task, and adjusting the processing sequence of the reporting task according to the importance degree; the user language analysis module is used for identifying whether the reported content is malicious language or not and evaluating the malicious degree, and solves the problems that the existing user behavior analysis system is low in processing reporting task and cannot process the reported content in time.
Description
Technical Field
The application relates to the technical field of deep learning, in particular to a user behavior analysis system and method based on a deep learning model.
Background
With the development of internet, more and more internet platforms are used, people release speaking expression views in the internet platforms, but because of the virtual nature and the transmissibility of the network platforms, people have great differences between speaking and real speaking in the network platforms, people are very likely to release malicious utterances in the network platforms, and because of the rapidity of network transmission, the rapid transmission of the malicious utterances is caused, and difficulties are increased in building good network environments, so that the internet platforms need to monitor user utterances.
At present, supervision of an internet platform mainly depends on user reporting, a platform manager analyzes a user talk aimed at by reporting, and judges whether the user talk is malicious or not, but a method of manually processing the reporting task is relied on to cause slow processing of the reporting task, the existing user behavior analysis system is slow in processing the reporting task, improper behaviors of users cannot be accurately identified, and in addition, the number of reporting behaviors generated by the large number of users of the internet platform is massive, so that the platform is difficult to process the reported content in time, and good network environment maintenance is not facilitated.
Disclosure of Invention
In order to overcome the defects in the prior art, the application provides a user behavior analysis system and a method based on a deep learning model, which are used for obtaining the importance degree of a reporting task by analyzing the reporting task, adjusting the processing sequence of the reporting task according to the importance degree and establishing a malicious language model through deep learning so as to solve the problems in the background art.
In order to achieve the above purpose, the present application provides the following technical solutions: a user behavior analysis system based on a deep learning model comprises a task acquisition module, a priority calculation module, a user language analysis module, a reporting behavior analysis module and a user management module,
the task acquisition module is used for acquiring a reporting task in the network platform, a user sends a reporting task request to the management end, the information of the reporting task comprises the information of a reporter, the information of a reporter and the text of a reporting target, and the reporting task is transmitted to the priority calculation module;
the priority computing module is used for acquiring the priority index of the reporting task, sequencing the reporting task according to the priority index of the reporting task, and comprises an activity parameter computing unit, a quality score computing unit, a concern degree parameter computing unit and a task priority index computing unit aiming at a reporter;
the user speech analysis module is used for identifying whether the reported content is a malicious speech and evaluating the malicious degree, and comprises a speech text preprocessing unit, a malicious speech identification model and a speech malicious degree evaluation unit, and transmitting a user speech analysis result to the reporting behavior analysis module;
the reporting behavior analysis module judges whether reporting behaviors of the reporting person are malicious or not based on the result of the malicious language analysis module, evaluates malicious reporting hazard degree, calculates a malicious reporting behavior scoring loss value, and calculates a successful reporting behavior scoring rewarding value;
the priority index is obtained by providing m users reporting user utterances, obtaining the liveness parameter HY_i, quality score SZ_i and attention degree parameter SG_i of each reporter, normalizing the obtained parameters and inputting the normalized parameters into the priority indexNumerical calculation formulaWherein lambda is 1 、λ 2 Is a preset constant, and has a value ranging from 0.1 to 1.0]Wherein t is i Real-time representing calculation priority and time t for earliest reporting task request 0 And calculating to obtain the priority index of the reporting task, and preferentially processing the reporting task with high priority index.
Preferably, the normalization process is one of linear normalization, nonlinear normalization, or average-zero normalization.
Preferably, the calculation of the user activity parameter HY_i satisfies the formulaWhere ta represents the current month online time of the presenter, tb represents the account usage time of the presenter, and sa represents the number of reviews posted by the user in the current month.
Preferably, the calculation of the user quality score sz_i satisfies the formulaWherein SZ is 0 Representing an initial quality score of the user, wherein YE_i represents a malicious language score loss value, which satisfies the formula +.>Wherein ey represents the number of malicious utterances in the current month of the user, sa represents the number of comments posted by the current month of the user, and sigma 1 Representing the malicious degree of malicious language, XE_i represents a malicious reporting behavior score loss value, CE_i represents a successful reporting behavior score reward value, and the initial values of YE_i, XE_i and CE_i are 0.
Preferably, the calculation of the user attention parameter sg_i satisfies the formulaWhere SFen represents the number of fan-shapes of the user, SDia represents the cumulative endorsement of the user, where μ 1 Sum mu 2 Is a preset coefficient and is more than or equal to 0 and less than or equal to mu 1 More than or equal to 1 and less than or equal to 0 mu 2 Is less than or equal to 1 and mu 1 2 +μ 2 2 =1。
Preferably, the speaker text preprocessing unit is configured to obtain keywords of the speaker text, where the keywords of the speaker text are words obtained by splitting sentences of the speaker text into vocabularies through regular expressions, removing stop words, and filtering nonsensical words, including but not limited to operations of converting related letters into lower case and converting expressions, and the application is not limited specifically.
Preferably, the user speech analysis module is used for judging whether the speech belongs to a malicious speech through a malicious speech recognition model, the speech text processed by the speech text preprocessing unit is input into the malicious speech recognition model, the malicious speech recognition model comprises a first channel and a second channel, the first channel is used for acquiring spatial features of the speech, the second channel is used for acquiring vector feature space of the text of the speech through a first-order markov chain algorithm, fusion splicing is carried out on the features extracted by the first channel and the second channel based on a cross attention mechanism, the features are subjected to secondary classification through a softmax classifier through a full connection layer, when the value of an output layer is close to 1, the value is close to 0, the fact that the speech is malicious is not indicated, the malicious speech and the malicious speech are obtained, the malicious speech recognition model is based on a deep learning frame, the neural network weight parameters, bias parameters and an activation function of the first channel are initialized respectively, the weight parameters and the bias parameters are updated based on the forward propagation function, the updated weight parameters and the bias parameters are propagated backward based on the updated weight parameters, and the updated weight parameters are required to meet the accuracy rate and the accuracy rate of the model.
Preferably, the speech malicious degree evaluation unit is configured to evaluate the malicious degree σ of the malicious speech 1 ,Where Ye is the original malicious degree, μ is the average of the original malicious degrees, δ is the standard deviation of the original malicious degree, σ 1 The value of (1, 0) satisfies the formula +.>Where wai (Aj) represents the subject A to which the malicious language pertains j Corresponding weight, L represents keyword accumulated length of malicious language, L 0 The unit keyword length is represented, M represents the number of reported people, the topic category of malicious utterances is obtained, the malicious utterances are set to belong to M topics, the topics are marked as A= { A1, A2, … Aj, … and Am }, the weight coefficient of each topic is marked as Wai, the malicious utterances topic classification model is built based on a deep learning framework, the deep learning framework comprises an input layer, an implicit layer and an output layer, and M neurons are included in the implicit layer.
Preferably, the obtaining of the malicious language topic classification model includes the following steps:
step S01, initializing a model: defining initial parameters of deep learning and defining weight parameters W between neural networks ij Bias parameter b i Activating the function f (·) and outputting a result satisfying the formulaWherein E is i Representing the input word vector, w ij Representing the connection weight of the ith neuron and the jth neuron, P ij Representing malicious language E i Belonging to the subject A j Probability of (2);
step S02, forward propagation: inputting malicious utterances E i Malicious utterances including n key words, denoted E i ={S m1 ,S m2 ,…,S mn ' output malicious language belongs to subject A j Probability P of (2) ij The probability that a malicious utterance belongs to each topic is obtained and is recorded as a set P= { P i1 ,P i2 ,…,P ij ,…,P im Taking the topic corresponding to max (P) as a model to speculate that the topic to which the malicious language belongs is A_i, and the probability of belonging to the topic A_i is P max ;
Step S03, calculating a loss function, and setting a malicious language E i The actual theme is A_j, and a model is set to speculate malicious language E i Belonging to the subject matterThe probability of A_j is P ij By the formulaCalculating a loss function;
step S04, back propagation: updating weight parameters and bias parameters according to the loss values obtained by the loss function calculation, reversely transmitting the input information, updating the weight parameters and the bias parameters, repeating until the loss functions meet the threshold requirements, completing training of the model, and obtaining the malicious language topic classification model.
Preferably, the updated weight parameter W' ij Satisfy the formula asThe updated bias parameters satisfy the formula +.>WhereinαIs the learning rate of deep learning, satisfies the formulaWhereinα 0 Is the learning rate parameter constant which is initially set, and the value is between 0.01 and 0.05]Where epoch_num is the number of times forward and backward propagation is completed.
Preferably, the reporting behavior analysis module includes a malicious reporting behavior analysis unit and a successful reporting behavior analysis unit, where the malicious reporting behavior analysis unit calculates a malicious reporting behavior scoring loss value, sets the reporting times ZJ of the reporting person, the reporting success times CJ of the reporting person, the activity hy_i of the reporting person, and the malicious reporting behavior scoring loss value, and satisfies a formulaThe successful reporting behavior analysis unit is used for calculating a successful reporting behavior scoring rewarding value, and the formula ++>。
Preferably, the user management module is based on a user language analysis module and reporting behavior analysisThe analysis result of the module processes the report task, updates the quality score of the user, and brings the acquired malicious report behavior score loss value XE_i, successful report behavior score reward value CE_i and malicious language score loss value YE_i into the calculation formula of the user quality score SZ_iAnd finishing updating the user quality score.
In order to achieve the above purpose, the present application provides the following technical solutions: a user behavior analysis method based on a deep learning model comprises the following steps:
step S001, acquiring a report task: in the internet platform, a user set A reports that the language of a user B is a malicious language to a management platform, reporting actions generate a report A, a reported person B and report target content C, and the number of the user set A is more than or equal to 1;
step S002, calculating task priority: based on the acquired activity parameters HY_i, quality scores SZ_i and attention degree parameters SG_i of each presenter, inputting the acquired parameters into a priority index calculation formula after normalization processing to acquire priority indexes of the reporting tasks, and sequencing the reporting tasks according to the priority indexes of the reporting tasks;
step S003, user speaking analysis: the method comprises the steps of respectively initializing a neural network weight parameter, a bias parameter and an activation function of a first channel and a second channel based on a deep learning frame, obtaining a loss function through forward propagation, updating the weight parameter and the bias parameter once based on the loss function, performing backward propagation based on the updated weight parameter and bias parameter to obtain a malicious language identification model, analyzing whether a user language is a malicious language or not through the malicious language identification model, obtaining a theme of the malicious language based on a malicious language theme classification model, obtaining a weight coefficient corresponding to the malicious language theme based on a theme Aj to which the malicious language belongs, and calculating the malicious degree of the malicious language;
step S004, reporting behavior analysis: the reporting behavior analysis module judges whether reporting behaviors of a reporter are malicious or not based on a malicious language analysis result, evaluates malicious reporting hazard degree, calculates malicious reporting behavior scoring loss values and calculates successful reporting behavior scoring rewarding values;
step S005, user management: based on the analysis results of the user language analysis module and the reporting behavior analysis module, processing reporting tasks and updating the quality scores of the users.
The application has the technical effects and advantages that:
according to the application, the importance degree of the reporting task is obtained by analyzing the reporting task, the processing sequence of the reporting task is adjusted according to the importance degree, the task acquisition module and the priority calculation module are measures for improving the processing efficiency of the reporting task by a platform management end, the more important reporting task is obtained by calculating the priority of the task, the reporting task with high priority is processed by using limited computer resources, the good network environment is favorably maintained, a user language model is established by deep learning, whether the reporting task comprises a malicious language is judged, the harm and the effect of the user reporting behavior are obtained by the reporting behavior analysis module, the malicious language or the malicious reporting behavior of the user is obtained according to the analysis of the reporting task, and the quality score of the user is updated by the analysis result, so that the problems that the existing user behavior analysis system provided in the background technology processes the reporting task is slow, the improper behavior of the user cannot be accurately identified, the existing platform is difficult to process the reporting task in time, and the good network environment is not favorably maintained are solved.
Drawings
Fig. 1 is a block diagram of the overall structure of the system of the present application.
Fig. 2 is a flowchart for constructing a malicious language topic classification model according to the present application.
Fig. 3 is a flow chart of the method of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "module," "system," and the like as used herein are intended to include a computer-related entity, such as, but not limited to, hardware, firmware, a combination of hardware and software, or software in execution. For example, a module may be, but is not limited to: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a module. One or more modules may be located in one process and/or thread of execution, and one module may be located on one computer and/or distributed between two or more computers.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
Example 1
The application provides a user behavior analysis system based on a deep learning model as shown in figure 1, which comprises a task acquisition module, a priority calculation module, a user language analysis module, a reporting behavior analysis module and a user management module,
the task acquisition module is used for acquiring a reporting task in the network platform, a user sends a reporting task request to the management end, the information of the reporting task comprises the information of a reporter, the information of a reporter and the text of a reporting target, and the reporting task is transmitted to the priority calculation module;
the priority computing module is used for acquiring the priority index of the reporting task, and sequencing the reporting task according to the priority index of the reporting task, and comprises an activity parameter computing unit, a quality scoring computing unit, a concern degree parameter computing unit and a task priority index computing unit;
the user speech analysis module is used for identifying whether the reported content is a malicious speech, evaluating the malicious degree of the speech, and comprises a speech text preprocessing unit, a malicious speech identification model and a speech malicious degree evaluation unit, and transmitting a user speech analysis result to the reporting behavior analysis module and the user management module;
the reporting behavior analysis module judges whether reporting behaviors of the reporting person are malicious or not based on the result of the malicious language analysis module, evaluates malicious reporting hazard degree, calculates a malicious reporting behavior scoring loss value, calculates a successful reporting behavior scoring rewarding value, and transmits the reporting behavior analysis result to the user management module;
and the user management module processes the reporting task based on the analysis results of the user language analysis module and the reporting behavior analysis module and updates the quality score of the user.
Further, the priority index is obtained by the following steps: m users are arranged to report user utterances, the liveness parameter HY_i, quality score SZ_i and attention degree parameter SG_i of each presenter are obtained, and the obtained parameters are input into a priority index calculation formula after normalization processingWherein lambda is 1 、λ 2 Is a preset constant, and has a value ranging from 0.1 to 1.0]Wherein t is i Real-time representing calculation priority and time t for earliest reporting task request 0 And calculating to obtain the priority index of the reporting task, and preferentially processing the reporting task with high priority index.
Further, the normalization process is one of linear normalization, nonlinear normalization or average-zero value normalization, and in the embodiment of the present application, average-zero value normalization is adopted.
Further, the calculation of the user activity parameter HY_i satisfies the formulaWhere ta represents the current month online time of the presenter, tb represents the account usage time of the presenter, and sa represents the number of reviews posted by the user in the current month.
Further, the calculation of the user quality score sz_i satisfies the formulaWherein SZ is 0 Representing an initial quality score of the user, wherein YE_i represents a malicious language score loss value, which satisfies the formula +.>Wherein ey represents the number of malicious utterances in the current month of the user, sa represents the number of comments posted by the current month of the user, and sigma 1 Representing the malicious degree of malicious language, XE_i represents a malicious reporting behavior score loss value, CE_i represents a successful reporting behavior score reward value, and the initial values of YE_i, XE_i and CE_i are 0.
Further, the calculation of the user attention degree parameter sg_i satisfies the formulaWhere SFen represents the number of fan-shapes of the user, SDia represents the cumulative endorsement of the user, where μ 1 Sum mu 2 Is a preset coefficient and is more than or equal to 0 and less than or equal to mu 1 More than or equal to 1 and less than or equal to 0 mu 2 Is less than or equal to 1 and mu 1 2 +μ 2 2 =1。
Further, the speaker text preprocessing unit is configured to obtain keywords of the speaker text, where the keywords of the speaker text are words obtained by splitting sentences into vocabularies through regular expressions of the speaker text, removing stop words, and filtering nonsensical words, and the operations include converting related letters into lower cases and converting expressions into characters.
Further, the user speech analysis module is used for judging whether the speech belongs to a malicious speech through the malicious speech recognition model, the speech text processed by the speech text preprocessing unit is input into the malicious speech recognition model, the malicious speech recognition model comprises a first channel and a second channel, the first channel is used for acquiring spatial features of the speech, the second channel is used for acquiring vector feature space of the text of the speech through a first-order Markov chain algorithm, fusion splicing is carried out on the features extracted by the first channel and the second channel based on a cross attention mechanism, the features are subjected to secondary classification through a softmax classifier through a full connection layer, when the numerical value of an output layer is close to 1, the numerical value is close to 0, the speech is not malicious, the malicious speech and the non-malicious speech are obtained, the malicious speech recognition model is based on a deep learning frame, the neural network weight parameters, the bias parameters and the activation functions of the first channel are initialized respectively, the weight parameters and the bias parameters are updated based on the forward propagation function, the updated weight parameters and the bias parameters are propagated backwards based on the updated weight parameters, and the updated weight parameters and the bias parameters are required to meet the accuracy rate and the accuracy rate of the model.
Further, the speech malicious degree evaluation unit is used for evaluating the malicious degree sigma of the malicious speech 1 ,Where Ye is the original malicious degree, μ is the average of the original malicious degrees, δ is the standard deviation of the original malicious degree, σ 1 The value of (1, 0) satisfies the formula +.>Therein wai (Aj) Representing subject A to which malicious language pertains j Corresponding weight, L represents keyword accumulated length of malicious language, L 0 The unit keyword length is represented, M represents the number of reported people, the topic category of the malicious language is obtained, the malicious language is set to belong to M topics, the topics are marked as A= { A1, A2, … Aj, …, am }, and the weight coefficient of each topic is marked as Wai.
Further, the malicious degree evaluation unit includes a malicious language topic classification model, the malicious language topic classification model is built based on a deep learning framework, the deep learning framework includes an input layer, an hidden layer, and an output layer, and m neurons are included in the hidden layer, as shown in fig. 2, and the method includes the following steps:
step S01, initializing a model: defining initial parameters for deep learning, using P ij Representing malicious language E i Belonging to the subject A j Probability of (2);
step S02, forward propagation: inputting malicious utterances E i Outputting malicious utterances belonging to subject A j Probability P of (2) ij Obtaining the probability of the malicious language belonging to each topic, marking the probability as a set P, taking the topic corresponding to max (P) as a model to infer that the topic to which the malicious language belongs is A_i, and the probability of the topic A_i belongs to Pmax;
step S03, calculating a loss function, and setting a malicious language E i The actual theme is A_j, the manually marked malicious language belongs to the theme, and a model is set to speculate the malicious language E i The probability belonging to topic A_j is P ij The probability Pmax and the probability P obtained by the model ij Inputting a loss function;
step S04, back propagation: updating weight parameters and bias parameters according to the loss values obtained by the loss function calculation, reversely transmitting the input information, updating the weight parameters and the bias parameters, repeating until the loss functions meet the threshold requirements, completing training of the model, and obtaining the malicious language topic classification model.
Further, in step S01, a weight parameter W between the neural networks is defined ij Bias parameter b i Activating the function f (·) and outputting a result satisfying the formulaWherein E is i Representing the input word vector, w ij Representing the connection weight of the ith neuron and the jth neuron, P ij Representing malicious language E i Belonging to the subject A j Is a probability of (2).
Further, in step S02, the topic a_j refers to the topic to which the artificially marked malicious language belongs.
Further, in step S03, the loss function is。
Further, updated weight parameter W' ij Satisfy the formula asThe updated bias parameters satisfy the formula +.>WhereinαIs the learning rate of deep learning, satisfies the formulaWhereinα 0 Is the learning rate parameter constant which is initially set, and the value is between 0.01 and 0.05]Where epoch_num is the number of times forward and backward propagation is completed.
Further, the reporting behavior analysis module comprises a malicious reporting behavior analysis unit and a successful reporting behavior analysis unit, the malicious reporting behavior analysis unit calculates a malicious reporting behavior scoring loss value, the reporting times ZJ of the reporting person, the reporting success times CJ of the reporting person and the activity HY_i of the reporting person are set, and the malicious reporting behavior scoring loss value meets the formulaThe successful reporting behavior analysis unit is used for calculating a successful reporting behavior scoring rewarding value, and the formula ++>。
Further, the user management module obtains whether the reporting task is a malicious language according to the malicious language identification model, if so, performs deleting and hiding operations on the malicious language, reduces the quality score of the reported person, increases the quality score of the reported person, if not, evaluates whether the reporting behavior of the reported person is malicious, reduces the quality score of the reported person according to the malicious reporting behavior, substitutes the obtained malicious reporting behavior score loss value XE_i, successful reporting behavior score reward value CE_i and malicious language score loss value YE_i into the calculation formula of the user quality score SZ_iAnd finishing updating the user quality score.
Example 2
As shown in fig. 3, the present application provides a user behavior analysis method based on a deep learning model, which includes the following steps:
step S001, acquiring a report task: in the internet platform, a user set A reports that the language of a user B is a malicious language to a management platform, reporting actions generate a report A, a reported person B and report target content C, and the number of the user set A is more than or equal to 1;
step S002, calculating task priority: acquiring an activity parameter HY_i, a quality score SZ_i and a concern degree parameter SG_i of each presenter, inputting the acquired parameters into a priority index calculation formula after normalization processing to acquire a priority index of a reporting task, and sequencing the reporting task according to the priority index of the reporting task;
step S003, user speaking analysis: analyzing whether the user language is the malicious language or not through the malicious language identification model, acquiring the theme of the malicious language based on the malicious language theme classification model, acquiring the weight coefficient corresponding to the malicious language theme based on the theme Aj to which the malicious language belongs, and calculating the malicious degree of the malicious language;
step S004, reporting behavior analysis: the reporting behavior analysis module judges whether reporting behaviors of a reporter are malicious or not based on a malicious language analysis result, evaluates malicious reporting hazard degree, calculates malicious reporting behavior scoring loss values and calculates successful reporting behavior scoring rewarding values;
step S005, user management: based on the analysis results of the user language analysis module and the reporting behavior analysis module, processing reporting tasks and updating the quality scores of the users.
Further, in step S003, the neural network weight parameters, bias parameters and activation functions of the first channel and the second channel are initialized based on the deep learning framework, the loss function is obtained by forward propagation, the weight parameters and bias parameters are updated once based on the loss function, and backward propagation is performed based on the updated weight parameters and bias parameters, so as to obtain the malicious language identification model.
Finally: the foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.
Claims (8)
1. A user behavior analysis system based on a deep learning model is characterized in that: comprises a task acquisition module, a priority calculation module, a user language analysis module and a reporting behavior analysis module,
the task acquisition module is used for acquiring a reporting task in the network platform, wherein the information of the reporting task comprises the information of a reported person, the information of the reported person and the report-oriented speaking text;
the priority calculating module is used for acquiring the priority indexes of the reporting tasks and sequencing the reporting tasks according to the priority indexes of the reporting tasks;
the user speech analysis module is used for identifying whether the reported content is a malicious speech or not and evaluating the malicious degree, and comprises a speech malicious degree evaluation unit; the language malicious degree evaluation unit is used for evaluating the malicious degree sigma of the malicious language 1 Satisfy the formulaWhere Ye is the original malicious degree, μ is the average of the original malicious degrees, δ is the standard deviation of the original malicious degree, σ 1 The value of (1, 0) satisfies the formulaTherein wai (Aj) Representing subject A to which malicious language pertains j Corresponding weight, L represents keyword accumulated length of malicious language, L 0 The length of unit keywords is represented, m represents the number of reported people, and the topic category of malicious language is obtained;
the reporting behavior analysis module judges whether reporting behaviors of the reporting person are malicious or not based on the result of the malicious language analysis module, evaluates malicious reporting hazard degree, calculates a malicious reporting behavior scoring loss value, and calculates a successful reporting behavior scoring rewarding value;
the priority index is obtained by providing m users with reporting user utterances, and obtaining the liveness parameter HY_i and quality assessment of each reporting personThe SZ_i and the attention degree parameter SG_i are divided, and the obtained parameters are input into a priority index calculation formula after normalization processingWherein lambda is 1 、λ 2 Is a preset constant, and has a value ranging from 0.1 to 1.0]Wherein t is i The real-time for representing the calculation priority is set with the earliest time t for reporting the task request 0 Calculating to obtain priority indexes of the reporting tasks, and preferentially processing the reporting tasks with high priority indexes;
the calculation of the activity parameter HY_i of the presenter satisfies the formulaWherein ta represents the current month online time of the reporter, tb represents the account number use time of the reporter, sa represents the number of comments issued by the user in the current month;
the calculation of the user quality score SZ_i meets the formulaWherein SZ is 0 Representing an initial quality score for the user, wherein YE_i represents that the malicious language score loss value satisfies the formula +.>Wherein ey represents the number of malicious utterances in the current month of the user, sa represents the number of comments posted by the current month of the user, and sigma 1 Representing the malicious degree of malicious language, wherein XE_i represents a malicious reporting behavior grading loss value, CE_i represents a successful reporting behavior grading rewarding value, and the initial values of YE_i, XE_i and CE_i are 0;
the attention degree parameter SG_i of the presenter satisfies the formulaWhere SFen represents the number of fan-shapes of the user, SDia represents the cumulative endorsement of the user, where μ 1 Sum mu 2 Is a preset coefficient and is more than or equal to 0 and less than or equal to mu 1 More than or equal to 1 and less than or equal to 0 mu 2 Is less than or equal to 1 and mu 1 2 +μ 2 2 =1。
2. A deep learning model based user behavior analysis system as claimed in claim 1, wherein: the user speech analysis module is used for judging whether the speech belongs to a malicious speech or not through a malicious speech recognition model, and comprises a speech text preprocessing unit, a malicious speech recognition model and a speech malicious degree evaluation unit, the speech text processed by the speech text preprocessing unit is input into the malicious speech recognition model, the malicious speech recognition model comprises a first channel and a second channel, the first channel is used for acquiring space characteristics of the speech, the second channel is used for acquiring vector characteristic spaces of the text of the speech through a first-order Markov chain algorithm, the characteristics extracted by the first channel and the second channel are fused and spliced based on a cross attention mechanism, and are classified by a softmax classifier through a full-connection layer, when the value of an output layer is close to 1, the value is close to 0, the speech is not malicious, the malicious speech and the malicious speech are obtained, the malicious speech recognition model is based on a deep learning frame, the neural network weight parameters, bias parameters and activation functions of the first channel and the second channel are initialized, the loss functions are obtained through forward propagation, the loss functions are updated based on the loss functions, the weight parameters are updated based on the loss functions, the bias parameters are updated, the weight parameters are updated based on the bias parameters, the bias parameters are satisfied, the accuracy and the accuracy is satisfied, and the accuracy is achieved after the accuracy is achieved.
3. A deep learning model based user behavior analysis system as claimed in claim 1, wherein: let malicious utterances belong to M kinds of topics, the topics are denoted as A= { A1, A2, … Aj, …, am }, the weight coefficient of each topic is denoted as Wai, the uttered malicious degree evaluation unit comprises a malicious uttered topic classification model, the malicious uttered topic classification model is built based on a deep learning framework, the deep learning framework comprises an input layer, an implicit layer and an output layer, and M neurons are included in the implicit layer.
4. A deep learning model based user behavior analysis system according to claim 3, wherein: the acquisition of the malicious language topic classification model comprises the following steps:
step S01, initializing a model: defining initial parameters of deep learning and defining weight parameters W between neural networks ij Bias parameter b i Activating the function f (·) and outputting a result satisfying the formulaWherein E is i Representing the input word vector, w ij Representing the connection weight of the ith neuron and the jth neuron, P ij Representing malicious language E i Belonging to the subject A j Probability of (2);
step S02, inputting malicious language E i Malicious utterances including n key words, denoted E i ={S m1 ,S m2 ,…,S mn ' output malicious language belongs to subject A j Probability P of (2) ij The probability that a malicious utterance belongs to each topic is obtained and is recorded as a set P= { P i1 ,P i2 ,…,P ij ,…,P im Taking the topic corresponding to max (P) as a model to speculate that the topic to which the malicious language belongs is A_i, and the probability of belonging to the topic A_i is P max ;
S03, calculating a loss function, setting the theme actually described by the malicious language Ei as A_j, and setting a model to speculate the malicious language E i The probability belonging to topic A_j is P ij By the formulaCalculating a loss function;
step S04, back propagation: updating weight parameters and bias parameters according to the loss values obtained by the loss function calculation, reversely transmitting the input information, updating the weight parameters and the bias parameters, repeating until the loss functions meet the threshold requirements, completing training of the model, and obtaining the malicious language topic classification model.
5. A substrate according to claim 4A user behavior analysis system for a deep learning model, characterized by: updated weight parameter W' ij Satisfy the formula asThe updated bias parameters satisfy the formulaWhereinαIs the learning rate of deep learning, satisfies the formula +.>Whereinα 0 Is the learning rate parameter constant which is initially set, and the value is between 0.01 and 0.05]Where epoch_num is the number of times forward and backward propagation is completed.
6. A deep learning model based user behavior analysis system as claimed in claim 1, wherein: the reporting behavior analysis module comprises a malicious reporting behavior analysis unit and a successful reporting behavior analysis unit, wherein the malicious reporting behavior analysis unit calculates a malicious reporting behavior scoring loss value, sets the reporting times ZJ of a reporter and the reporting success times CJ of the reporter, and the activity HY_i of the reporter, and the malicious reporting behavior scoring loss value meets the formulaThe successful reporting behavior analysis unit is used for calculating a successful reporting behavior scoring rewarding value, and the formula ++>。
7. The deep learning model based user behavior analysis system of claim 6 wherein: the system comprises a user management module, wherein the user management module processes reporting tasks based on analysis results of a user language analysis module and a reporting behavior analysis module, updates quality scores of users and obtains the quality scoresIs carried into the calculation formula of the user quality score SZ_i, wherein the malicious report behavior score loss value XE_i, the successful report behavior score reward value CE_i and the malicious language score loss value YE_iAnd finishing updating the user quality score.
8. A method for analyzing user behavior based on a deep learning model, for implementing the user behavior analysis system based on a deep learning model as set forth in any one of claims 1 to 7, characterized in that: comprises the following steps:
step S001, acquiring a report task: in the internet platform, a user set A reports that the language of a user B is a malicious language to a management platform, reporting actions generate a report A, a reported person B and report target content C, and the number of the user set A is more than or equal to 1;
step S002, calculating task priority: based on the acquired activity parameter HY_i, quality score SZ_i and attention degree parameter SG_i of each presenter, the acquired parameters are input into a priority index calculation formula after normalization processing to acquire the priority index of the reporting task, the reporting task is ordered according to the priority index of the reporting task,
the calculation of the activity parameter HY_i of the presenter satisfies the formulaWherein ta represents the current month online time of the reporter, tb represents the account number use time of the reporter, sa represents the number of comments issued by the user in the current month;
the calculation of the user quality score SZ_i meets the formulaWherein SZ is 0 Representing an initial quality score for the user, wherein YE_i represents that the malicious language score loss value satisfies the formula +.>Wherein ey represents the number of malicious utterances in the current month of the user, sa represents the number of comments posted by the current month of the user, and sigma 1 Representing the malicious degree of malicious language, wherein XE_i represents a malicious reporting behavior grading loss value, CE_i represents a successful reporting behavior grading rewarding value, and the initial values of YE_i, XE_i and CE_i are 0;
the attention degree parameter SG_i of the presenter satisfies the formulaWhere SFen represents the number of fan-shapes of the user, SDia represents the cumulative endorsement of the user, where μ 1 Sum mu 2 Is a preset coefficient and is more than or equal to 0 and less than or equal to mu 1 More than or equal to 1 and less than or equal to 0 mu 2 Is less than or equal to 1 and mu 1 2 +μ 2 2 =1;
Step S003, user speaking analysis: the method comprises the steps of respectively initializing a neural network weight parameter, a bias parameter and an activation function of a first channel and a second channel based on a deep learning framework, obtaining a loss function through forward propagation, updating the weight parameter and the bias parameter once based on the loss function, performing backward propagation based on the updated weight parameter and the bias parameter, obtaining a malicious language identification model, analyzing whether a user language is a malicious language or not through the malicious language identification model, obtaining a theme of the malicious language based on the malicious language theme classification model, and obtaining a theme A of the malicious language based on the theme which the malicious language belongs to j Obtaining a weight coefficient corresponding to a malicious language topic, calculating the malicious degree of the malicious language, and calculating the malicious degree sigma of the malicious language 1 Satisfy the formulaWhere Ye is the original malicious degree, μ is the average of the original malicious degrees, δ is the standard deviation of the original malicious degree, σ 1 The value of (1, 0) satisfies the formula +.>Therein wai (Aj) Representing subject A to which malicious language pertains j Corresponding weight, L represents keyword cumulative length of malicious languageDegree, L 0 The length of unit keywords is represented, and m represents the number of reported people;
step S004, reporting behavior analysis: the reporting behavior analysis module judges whether reporting behaviors of a reporter are malicious or not based on a malicious language analysis result, evaluates malicious reporting hazard degree, calculates malicious reporting behavior scoring loss values and calculates successful reporting behavior scoring rewarding values;
step S005, user management: based on the analysis results of the user language analysis module and the reporting behavior analysis module, processing reporting tasks and updating the quality scores of the users.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310961231.8A CN116662769B (en) | 2023-08-02 | 2023-08-02 | User behavior analysis system and method based on deep learning model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310961231.8A CN116662769B (en) | 2023-08-02 | 2023-08-02 | User behavior analysis system and method based on deep learning model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116662769A CN116662769A (en) | 2023-08-29 |
CN116662769B true CN116662769B (en) | 2023-10-13 |
Family
ID=87724688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310961231.8A Active CN116662769B (en) | 2023-08-02 | 2023-08-02 | User behavior analysis system and method based on deep learning model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116662769B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678331A (en) * | 2012-09-05 | 2014-03-26 | 阿里巴巴集团控股有限公司 | Reported message processing method and device |
CN105681257A (en) * | 2014-11-19 | 2016-06-15 | 腾讯科技(深圳)有限公司 | Information reporting method and system based on instant messaging interactive platform |
CN105704005A (en) * | 2014-11-28 | 2016-06-22 | 深圳市腾讯计算机系统有限公司 | Malicious user reporting method and device, and reporting information processing method and device |
CN106157119A (en) * | 2016-07-11 | 2016-11-23 | 广东聚联电子商务股份有限公司 | A kind of method that e-commerce purchases system platform report processes |
KR20180116560A (en) * | 2017-04-17 | 2018-10-25 | 이세진 | Monitoring method about media reporting current issues and the system for the same |
CN115840844A (en) * | 2022-12-17 | 2023-03-24 | 深圳市新联鑫网络科技有限公司 | Internet platform user behavior analysis system based on big data |
CN116244441A (en) * | 2023-03-16 | 2023-06-09 | 四川大学 | Social network offensiveness language detection method based on multitasking learning |
-
2023
- 2023-08-02 CN CN202310961231.8A patent/CN116662769B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678331A (en) * | 2012-09-05 | 2014-03-26 | 阿里巴巴集团控股有限公司 | Reported message processing method and device |
CN105681257A (en) * | 2014-11-19 | 2016-06-15 | 腾讯科技(深圳)有限公司 | Information reporting method and system based on instant messaging interactive platform |
CN105704005A (en) * | 2014-11-28 | 2016-06-22 | 深圳市腾讯计算机系统有限公司 | Malicious user reporting method and device, and reporting information processing method and device |
CN106157119A (en) * | 2016-07-11 | 2016-11-23 | 广东聚联电子商务股份有限公司 | A kind of method that e-commerce purchases system platform report processes |
KR20180116560A (en) * | 2017-04-17 | 2018-10-25 | 이세진 | Monitoring method about media reporting current issues and the system for the same |
CN115840844A (en) * | 2022-12-17 | 2023-03-24 | 深圳市新联鑫网络科技有限公司 | Internet platform user behavior analysis system based on big data |
CN116244441A (en) * | 2023-03-16 | 2023-06-09 | 四川大学 | Social network offensiveness language detection method based on multitasking learning |
Also Published As
Publication number | Publication date |
---|---|
CN116662769A (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105741832B (en) | Spoken language evaluation method and system based on deep learning | |
EP1989701B1 (en) | Speaker authentication | |
Morrison | A comparison of procedures for the calculation of forensic likelihood ratios from acoustic–phonetic data: Multivariate kernel density (MVKD) versus Gaussian mixture model–universal background model (GMM–UBM) | |
CN110175229B (en) | Method and system for on-line training based on natural language | |
WO2017133165A1 (en) | Method, apparatus and device for automatic evaluation of satisfaction and computer storage medium | |
CN108447490A (en) | The method and device of Application on Voiceprint Recognition based on Memorability bottleneck characteristic | |
CN109299267B (en) | Emotion recognition and prediction method for text conversation | |
CN110085215B (en) | Language model data enhancement method based on generation countermeasure network | |
CN108255805A (en) | The analysis of public opinion method and device, storage medium, electronic equipment | |
EP1701337B1 (en) | Method of speech recognition | |
CN111966878B (en) | Public sentiment event reversal detection method based on machine learning | |
CN110704618B (en) | Method and device for determining standard problem corresponding to dialogue data | |
WO2023078370A1 (en) | Conversation sentiment analysis method and apparatus, and computer-readable storage medium | |
CN113111152A (en) | Depression detection method based on knowledge distillation and emotion integration model | |
CN114429134B (en) | Hierarchical high-quality speech mining method and device based on multivariate semantic representation | |
Fan et al. | The impact of student learning aids on deep learning and mobile platform on learning behavior | |
CN116662769B (en) | User behavior analysis system and method based on deep learning model | |
CN111400489B (en) | Dialog text abstract generating method and device, electronic equipment and storage medium | |
Vinyals et al. | Chasing the metric: Smoothing learning algorithms for keyword detection | |
CN111985214A (en) | Human-computer interaction negative emotion analysis method based on bilstm and attention | |
KR20210123545A (en) | Method and apparatus for conversation service based on user feedback | |
KR20180005876A (en) | System and method for personal credit rating through voice analysis | |
CN114119194A (en) | Intelligent face-examination wind control early warning method and system | |
JPH1195795A (en) | Voice quality evaluating method and recording medium | |
Vukojičić et al. | Optimization of Multimodal Trait Prediction Using Particle Swarm Optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |