CN103795592A - Online water navy detection method and device - Google Patents

Online water navy detection method and device Download PDF

Info

Publication number
CN103795592A
CN103795592A CN201410027720.7A CN201410027720A CN103795592A CN 103795592 A CN103795592 A CN 103795592A CN 201410027720 A CN201410027720 A CN 201410027720A CN 103795592 A CN103795592 A CN 103795592A
Authority
CN
China
Prior art keywords
training
model
dbn model
data
dbn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410027720.7A
Other languages
Chinese (zh)
Other versions
CN103795592B (en
Inventor
孙卫强
牛温佳
赵卫中
管洋洋
黄超
李倩
胡玥
刘萍
郭莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201410027720.7A priority Critical patent/CN103795592B/en
Publication of CN103795592A publication Critical patent/CN103795592A/en
Application granted granted Critical
Publication of CN103795592B publication Critical patent/CN103795592B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an online water navy detection method and device. The online water nary detection method includes the first step of expressing original user description information as a normalized user description vector, screening out classified data from the user description vector, enabling a% of the classified data to serve as training data of a DBN model, and enabling b% of the classified data to serve as detection data of the DBN model, the second step of training the DBN model through the training data and outputting the DBN model obtained through training, the third step of verifying the convergence and the judgment accuracy of the outputted DBN model and regulating related parameters in the first step and the second step according to a detection result till the outputted DBN model satisfies a preset convergence condition or an end condition, and the fourth step of detecting an online water navy through the final DBN model. According to the online water navy detection method and device, not only are convergence and accuracy of an online water navy detection algorithm improved, and the model training time under mass sample data is shortened.

Description

The detection method of network waterborne troops and device
Technical field
The present invention relates to networking technology area, relate in particular to detection method and the device of a kind of network waterborne troops.
Background technology
Along with development and the progress of information technology, cyberspace has become the mankind's the 5th dimension activity space beyond the land, sea, air of continuing, day space-time, especially after WEB2.0 technology is applied to internet arena, the development of the social application such as forum, microblogging is like a raging fire especially.But when cyberspace develops rapidly, the safety problem that it brings is also more and more outstanding, wherein common to derive from the safety problem of " network waterborne troops "." network waterborne troops " is hired by network public relations firms; beam back for particular topic the network user that note is made a show of power; investigate according to associated mechanisms; the pushing hands that China is engaged in network marketing activity has reached millions of people, be hired by establishment " network waterborne troops " scale increasingly, come into the open, group's interests." the cruel cat female event " of flutterring to cat from " closing down king Lao Ji marketing plan scheme " is again to " Mongolia Ox frames door " scandal, can say, " network waterborne troops " touched the bottom line of law, even some " network waterborne troops " is abroad under unique mechanism operating, each World Jam issue attack information, start a rumour speech and instigation language at home, manufacture contradiction, carry out the Internet culture infiltration of malice, harm national security.Visible, supervise very urgent to " network waterborne troops ".
Being different from physical surroundings, there is intrinsic open characteristics and unique regularity of information dissemination in the virtual environment of network forum, and this brings very large challenge just to " network waterborne troops " supervision, is mainly reflected in following two aspects:
First, in network forum, the propagation of popular information is blast trend, therefore deletes the mode of note afterwards and can not thoroughly remedy harmful consequences, even, deletes note behavior itself and can be utilized by waterborne troops on the contrary, to a certain extent the authenticity of " confirmation " message content.
Secondly, network forum comprises mass data, how to construct effective algorithm from a large amount of random extracting data available informations, becomes the biggest obstacle of " network waterborne troops " supervision.
Therefore, not only to improve network of relation legal system, announce government affairs situation and public accident trend in time from law and Systematic the supervision of " network waterborne troops ", more need the feature in conjunction with network forum, improve large-scale consumer data-handling capacity from technological layer, research and improvement are applicable to the related algorithm that " network waterborne troops " detected, thereby " network waterborne troops " user in identification forum, the issue that stops waterborne troops to be pasted on source.
" network waterborne troops " detected is in fact classification problem, a kind of general way is to analyze relevant information and the historical behavior of known sorted users, therefrom extract the feature that the user of waterborne troops is different from normal users, and then unknown sorted users information is analyzed, judge most likely " network waterborne troops " of which user.The algorithm that is usually used at present classification problem comprises Bayesian network, SVMs, KNN, neural net etc., wherein, Bayesian network is the algorithm of classifying by probability statistics knowledge, it is by Bayes' theorem forecast sample classification, but Bayesian establishment itself needs a very strong conditional independence assumption prerequisite, and this hypothesis is often invalid in actual conditions, thereby its classification accuracy can decline greatly; SVMs needs the space vector of calculated in advance sample, and set the affect weights of each dimension on final result in vector, weight setting process depends on historical experience and case study to a great extent, and the quality of weight setting has also directly affected the judgement accuracy of algorithm; KNN algorithm is a kind of Lazy learning method, and it deposits sample, until just move learning algorithm need to classify time, if sample set more complicated may cause very large computing cost, the real-time of impact classification.Neural network algorithm is the algorithm that is most commonly used to classification problem, this algorithm is determined model parameter by the mode of training, can reflect objectively the influence degree of each influencing factor to final result, and the training of neural net be classification before carry out, can not bring extra time overhead to assorting process.But basic neural network model complexity, in the time that training set is larger, training process is consuming time oversize, and be very easily absorbed in locally optimal solution because network initial weight arranges improper, this just means can exist while carrying out " network waterborne troops " discovery with basic neural network algorithm that poor astringency, accuracy rate are low, the problem of length consuming time.
Summary of the invention
Technical problem to be solved by this invention is to provide detection method and the device of a kind of network waterborne troops, has improved convergence and the accuracy rate of network waterborne troops detection algorithm, shortens the Massive Sample data drag training time.
For solving the problems of the technologies described above, the present invention proposes the detection method of a kind of network waterborne troops, comprising:
Step 1, original user description information is expressed as to normalized user profile vector, from described user profile vector, filter out grouped data, using this a% of grouped data as the training data of degree of depth belief network DBN model, using this b% of grouped data as the detection data of DBN model, a is greater than b, and a and b sum equal 100, the type of described user description information is by user's chosen in advance, and whether the described data of sorted users refer to be marked as the user data of network waterborne troops;
Step 2, with described training data training DBN model, the DBN model that output training obtains, is called output DBN model by the DBN model of this output;
Step 3, check convergence and the determination rate of accuracy of described output DBN model, adjust the relevant parameter in described step 1 and step 2 according to assay, until described output DBN model reaches the default condition of convergence or end condition, wherein, described determination rate of accuracy is exported DBN model described in described detection Data Detection and is obtained by adopting;
Step 4, is used final DBN model to detect network waterborne troops, and described final DBN model refers to the output DBN model that reaches the described default condition of convergence or end condition.
Further, the detection method of above-mentioned network waterborne troops also can have following characteristics, and the initial value of a is 60.
Further, the detection method of above-mentioned network waterborne troops also can have following characteristics, and whether the convergence of the DBN model obtaining according to step 2 and determination rate of accuracy reach expection requires to adjust avalue.
Further, the detection method of above-mentioned network waterborne troops also can have following characteristics, in step 2, the training process of described DBN model comprises the pre-training process of model and model trim process, the pre-training process of described model adopts the RBM training that walk abreast of Downpour SGD algorithm, the described model trim process employing MapReduce algorithm PSO-BP neural metwork training that walks abreast.
Further, the detection method of above-mentioned network waterborne troops also can have following characteristics, and described user description information comprises registration time length, login frequency, online hours, user name length, Password Length, the ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browses model time, post time, bean vermicelli number and pay close attention to number relatively.
For solving the problems of the technologies described above, the invention allows for the checkout gear of a kind of network waterborne troops, comprise user data pretreatment module, DBN model training module, cooperative module and detection module, DBN model training module is connected with user data pretreatment module, cooperative module and detection module respectively, cooperative module is also connected with user data pretreatment module, wherein:
User data pretreatment module, for original user description information is expressed as to normalized user profile vector, from described user profile vector, filter out grouped data, using this a% of grouped data as the training data of degree of depth belief network DBN model, using this b% of grouped data as the detection data of DBN model, a is greater than b, and a and b sum equal 100, the type of described user description information is by user's chosen in advance, and whether the described data of sorted users refer to be marked as the user data of network waterborne troops;
DBN model training module, for described training data training DBN model, exports the DBN model that training obtains, and the DBN model of this output is called to output DBN model;
Cooperative module, for checking convergence and the determination rate of accuracy of described output DBN model, adjust the relevant parameter in described step 1 and step 2 according to assay, until described output DBN model reaches the default condition of convergence or end condition, wherein, described determination rate of accuracy is exported DBN model described in described detection Data Detection and is obtained by adopting;
Detection module, for using final DBN model to detect network waterborne troops, described final DBN model refers to the output DBN model that reaches the described default condition of convergence or end condition.
Further, the checkout gear of above-mentioned network waterborne troops also can have following characteristics, and the initial value of a is 60.
Further, the checkout gear of above-mentioned network waterborne troops also can have following characteristics, and whether the convergence of the DBN model obtaining according to step 2 and determination rate of accuracy reach expection requires to adjust avalue.
Further, the checkout gear of above-mentioned network waterborne troops also can have following characteristics, the training process of described DBN model comprises the pre-training process of model and model trim process, described DBN model training module comprises pre-training unit and fine-adjusting unit, described pre-training unit is used for adopting the walk abreast RBM training of Downpour SGD algorithm, and described fine-adjusting unit is used for adopting the MapReduce algorithm PSO-BP neural metwork training that walks abreast.
Further, the checkout gear of above-mentioned network waterborne troops also can have following characteristics, and described user description information comprises registration time length, login frequency, online hours, user name length, Password Length, the ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browses model time, post time, bean vermicelli number and pay close attention to number relatively.
Detection method and the device of network of the present invention waterborne troops, both improved convergence and the accuracy rate of network waterborne troops detection algorithm, shortened again the Massive Sample data drag training time, solved long problem of Massive Sample data drag training time.
Accompanying drawing explanation
Fig. 1 is the flow chart of the detection method of network waterborne troops in the embodiment of the present invention;
Fig. 2 is the parallel processing schematic diagram of user profile vector in the embodiment of the present invention;
Fig. 3 is that in the embodiment of the present invention, the each dimension number range of the vector of the user profile based on MapReduce algorithm is determined flow chart;
Fig. 4 is basic DBN model schematic diagram;
Fig. 5 is Downpour SGD model schematic diagram;
Fig. 6 is the parallel RBM training algorithm flow chart based on Downpour SGD;
Fig. 7 is monolayer BP neural network structure figure;
Fig. 8 is a list particle PSO-BP neural network BP training algorithm flow chart;
Fig. 9 is the multilayer synergistic mechanism schematic diagram based on workflow;
Figure 10 is the structured flowchart of the checkout gear of network waterborne troops in the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, principle of the present invention and feature are described, example, only for explaining the present invention, is not intended to limit scope of the present invention.
Fig. 1 is the flow chart of the detection method of network waterborne troops in the embodiment of the present invention.As shown in Figure 1, in the present embodiment, the detection method flow process of network waterborne troops can comprise the steps:
Step S101, original user description information is expressed as to normalized user profile vector, from described user profile vector, filter out grouped data, using this a% of grouped data as DBN(Deep Belief Network, degree of depth belief network) training data of model, using this b% of grouped data as the detection data of DBN model, a is greater than b, and a and b sum equal 100, the type of described user description information is by user's chosen in advance, and whether described grouped data refers to be marked as the user data of network waterborne troops;
Step S102, with described training data training DBN model, the DBN model that output training obtains, is called output DBN model by the DBN model of this output;
Wherein, the training process of DBN model comprises the pre-training process of model and model trim process, the pre-training process of described model adopts the RBM training that walk abreast of Downpour SGD algorithm, the described model trim process employing MapReduce algorithm PSO-BP neural metwork training that walks abreast.
Downpour SGD algorithm and MapReduce algorithm are prior art, Downpour SGD algorithm and MapReduce algorithm are not described in detail herein.
Step S103, convergence and the determination rate of accuracy of check output DBN model, adjust the relevant parameter in described step S101 and step S102 according to assay, until described output DBN model reaches the default condition of convergence or end condition, wherein, described determination rate of accuracy is exported DBN model described in described detection Data Detection and is obtained by adopting;
Step S104, uses final DBN model to detect network waterborne troops, and final DBN model refers to the output DBN model that reaches the described default condition of convergence or end condition.
Below above-mentioned steps is described in further detail.
In step S101, user description information is converted into certain mathematical form and represents.Objectively, an internet forum user comprises a lot of descriptors, such as user's hour of log-on, all previous landing time, user name, password, log in IP, browsing history, the historical record of posting, money order receipt to be signed and returned to the sender historical record, the good friend of forum record, bean vermicelli record, pay close attention to user record etc.In the present invention, the wherein more representational information (these information i.e. table 1 column information) of choosing as a reference, is classified to user, and proposes accordingly user profile multiattribute describing framework, and frame structure is as shown in table 1.
Table 1 user profile multiattribute describing framework
Attribute-name Explanation Computational methods
RegisterPeriod Registration time length Registration forum time length
LoginFrequency Login frequency Log in number of times/registration time length
OnlinePeriod Online hours Forum's line duration length
UsernameLength User name length User name length
PasswordLength Password Length Password Length
PostRate The ratio of posting Post and count/always paste number
ReplyRate Money order receipt to be signed and returned to the sender ratio Money order receipt to be signed and returned to the sender number/always paste number
SurfingFrequency Relatively browse the model time Browse model time/online hours
EditingFrequency Relatively post the time Time/the online hours of posting
FansNumber Bean vermicelli number Bean vermicelli number
ConsiderNumber Pay close attention to number Pay close attention to number
From table 1, in the embodiment of the present invention, user description information can comprise registration time length, login frequency, online hours, user name length, Password Length, the ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model time, post time, bean vermicelli number and pay close attention to number relatively.
By the user profile multiattribute describing framework of table 1, user description information can be converted into the list of digital form, such as, the user description information of certain user A, after user profile multiattribute describing framework is abstract, can be expressed as shown in table 2.
Table 2 user profile attribute list example
Attribute-name Value
RegisterPeriod 792 days
LoginFrequency 100 times/792 days
OnlinePeriod 89 hours
UsernameLength 6
PasswordLength 6
PostRate 20 pieces/20 pieces
ReplyRate 0 piece/20 pieces
SurfingFrequency 83 hours/89 hours
EditingFrequency 6 hours/89 hours
FansNumber 20
ConsiderNumber 3
According to the model of table 1, can realize the quantization means to user description information.For example, the user A in table 2 can represent with vector [792 days, 100 times/792 days, 89 hours/792 days, 6,6,20 pieces/20 pieces, 0 piece, 83 hours/89 hours, 6 hours/89 hours, 20,3], and this vector is called user profile vector.Similarly, all user description information in forum can be converted into user profile vector, thereby realize the mathematical notation of user profile.
In addition, in order to facilitate the setting of initial weight in follow-up DBN model training, need to guarantee that the numerical value of the each dimension of user profile vector is [1,1] between, therefore in the present invention, the each dimension to user profile vector is normalized, first extract the data span of the each dimension of all user profiles vector in forum, then the dimension that logarithm value scope exceeds [1,1] is normalized.
Fig. 2 is the parallel processing schematic diagram of user profile vector in the embodiment of the present invention.As shown in Figure 2, user profile vector generates and user profile vector normalization process, all can apply parallel model and calculate.Wherein, at user profile vector generation phase, all user description information can be divided at random to the parallel processing of m group, be responsible for for each group the descriptor of all users in group to be converted into user profile vector, again the user profile vector of all groups is distributed to No. ID successively, thereby obtain user ID and the right set of user profile vector.
Fig. 3 is that in the embodiment of the present invention, the each dimension number range of the vector of the user profile based on MapReduce algorithm is determined flow chart.In user profile vector normalization process, as shown in Figure 3, first utilize MapReduce algorithm, find the number range of each dimension in user profile vector, determine that the span of which dimension, not between [1,1], then finds out the maximum of absolute value in these dimension values, and this dimension is normalized with this absolute value, in normalization process, also can carry out according to the mode that all user profile vectors is divided into m group is parallel.
By above-mentioned processing, can obtain the set of normalized user profile vector.Wherein some data is classified, and whether some user is marked as " network waterborne troops ", and these class data are called as grouped data, and the set of these class data is called as " categorized data set ".In order to carry out follow-up DBN model training, need " categorized data set " to be divided into two parts, wherein a part is called " training dataset ", for carrying out the training of DBN model parameter, another part is called " test data set ", for detection of the determination rate of accuracy of obtained DBN model.Divide and mix at the sample size of two data sets, because DBN model need to be learnt abundant sample and could simulate the hiding rule in these samples, so general " training dataset " sample size is more, still " training dataset " sample size too much can bring again the drawback that amount of calculation increases.For this problem, first the present invention chooses in " categorized data set " 60% sample as " training dataset ", whether reaches expection afterwards require to adjust this ratio (" training dataset " accounts for categorized data set " ratio) according to the DBN model convergence obtaining and determination rate of accuracy.How does the convergence of DBN model judge?
In step S102, DBN(Deep Belief Network, degree of depth belief network) model is the one of degree of depth neural net, the probability generation model being made up of multilayer stochastic variable node.Fig. 4 is basic DBN model schematic diagram.As shown in Figure 4, basic DBN model is by two-layer RBM(Restricted Boltzmann Machines, limited Boltzmann machine) and one deck BP neural net (Back Propagation Neural Network) composition.The training process of DBN model is divided into two processes: the pre-training process of model and model trim process.Wherein, the pre-training process of model adopts the RBM training that walk abreast of DownpourSGD algorithm, the model trim process employing MapReduce algorithm PSO-BP neural metwork training that walks abreast.
Referring to Fig. 4, the pre-training process of model adopts successively and carrys out the two-layer RBM in training pattern without the method for the greedy study of supervision: first will input data X and ground floor hidden layer H 0as a RBM, the parameter that training obtains this RBM (connects V 0with H 0weight matrix W 0, V 0and H 0the biasing a of each node and b), then fixes the parameter of this RBM, H 0regard visible layer as, H 1regard hidden layer as, train second RBM, and obtain its parameter, now just completed the pre-training process of DBN model, determined the initial parameter of two-layer RBM.In this process, the learning process of every layer of RBM is separate, has greatly simplified the training process of model.
After pre-training, just whole network can be equivalent to BP neural net, this BP neural net comprises two-layer concealed nodes, wherein the network parameter between input layer and ground floor concealed nodes and two-layer concealed nodes has all completed initialization, only need to carry out random initializtion to the network parameter of second layer concealed nodes and output node, just can carry out error back propagation training to this network according to the training method of normal BP neural net, until model reaches convergence or end condition, this process is known as model trim process.
In the pre-training process of DBN model, adopt successively and train respectively two-layer RBM without the method for the greedy study of supervision, than traditional Multi-Layer Feedback training pattern, this mode has been simplified the training process of model, has accelerated to a certain extent the training speed of model.But in the face of magnanimity training dataset, the training of individual layer RBM still needs for a long time, and parallelization processing has been done in the training that therefore the present invention is directed to individual layer RBM, thereby accelerate the pre-speed of training of DBN model, shortened the pre-training stage required time of DBN model.
The present invention carries out parallel processing with DownPour SGD algorithm to RBM training process.Fig. 5 is Downpour SGD model schematic diagram.As shown in Figure 5, the basic thought that parallel RBM based on Downpour SGD realizes is: training data is divided into some subsets, be distributed on multiple Worker servers, on each Worker server, move the copy of a RBM model, Worker server only needs to communicate with parameter server.The parameter of model is upgraded and is undertaken by the parameter server of stored parameter, and this parameter server has been preserved the current state of all parameters of model.Training stage, each Worker obtains respectively the parameter of model current state from parameter server, and carries out min-batch according to this parameter, calculates and upgrades after gradient, and result is pushed back to parameter server.In a simple realization of Downpour SGD, can set every n fetchinferior mini-batch operation is obtained the parameter after once upgrading, every n to parameter server pushinferior mini-batch operation pushes a gradient updating to parameter server.
Fig. 6 is the parallel RBM training algorithm flow chart based on Downpour SGD.In Fig. 6, η represents the renewal speed of parameter with gradient, n fetchand n pushrepresent respectively from parameter server synchronization parameter and cycle of uploading gradient to parameter server.
In DownPour SGD, the gradient updating process of parameter is asynchronous carrying out, and in this manner, the machine even if a Worker server is delayed, also can not affect the work of other Worker servers.Although asynchronous refresh process can cause parameter in each Worker to have difference slightly, in existing realization, algorithm entirety still has good stability.
After two-layer RBM parameter training, just complete the pre-training process of DBN model, now DBN model can be equivalent to four layers of BP neural net, wherein the initialization of parameter between lower three layers is complete, next need the highest parameter between two-layer of random initializtion, and train this BP neural net with training dataset, carry out the trim process of DBN model.
Model trim process adopts the MapReduce algorithm PSO-BP neural metwork training that walks abreast.BP neural net is a kind of by the multilayer feedforward neural network of error backpropagation algorithm training.Fig. 7 is monolayer BP neural network structure figure.As shown in Figure 7, the training of BP neural net is made up of information forward-propagating and two processes of error back propagation, in the time that forward-propagating result is not inconsistent with expection output, calculate the difference of output valve and desired value, and connect weights according to the correction of Gradient Descent mode, till the error that this process is performed until network output is reduced to acceptable degree.
The training process of BP neural net is the optimum combination of finding network weight by successively iteration and backpropagation in essence, thereby minimize the difference of network output and expection output, but in training process, the process of network weight being adjusted by error back propagation is very slow.PSO-BP neural network algorithm is the optimization to BP neural net error back propagation process, by PSO(Particle Swarm Optimization, particle cluster algorithm) in multi-dimensional search space, the process of iteration searching optimal location has replaced error back propagation process, thus accelerate the convergence rate of BP neural net.
In PSO-BP neural network algorithm, the Definition of Vector that network parameter is formed is the particle position vector in population, certain parameter vector drag output and the error amount of expection output are defined to the good and bad measurement index of position for this reason, apparently, this index is less, representation parameter more approaches optimized parameter, and particle position is better.The particle of first initialization some when algorithm starts, each particle is preserved the memory of the historical optimal location of its current location, historical optimal location, present speed and population.Every evolution generation, particle utilizes current information and recall info to adjust oneself position and speed, and upgrades memory.Particle is constantly adjusted position in multi-dimensional search space, until population arrives poised state.The optimum particle position now obtaining, has just represented the neural net optimized parameter that training obtains.
In view of PSO-BP train samples data volume is very large, the present invention uses MapReduce algorithm to carry out Parallel Implementation to PSO-BP neural metwork training process, thereby accelerates convergence of algorithm speed.Wherein, the iterative process of each particle operates on a PSO-BP-Worker, in management server, preserve the good and bad measurement index in position corresponding to global optimum's positional information and global optimum position, after each particle iterative process renewal iteration complete is taken turns, all will be to the synchronous optimal location information of management server, until reach the iterations of regulation or reach the condition of convergence.
The algorithm flow chart of the PSO-BP neural metwork training process that each particle is carried out as shown in Figure 8.In Fig. 8, N represents maximum iteration time; x i, x l, x grepresent the historical optimal location vector sum global optimum position vector of current location vector, the particle i of particle i; max i, max l, max grepresent respectively the good and bad measurement index of current location of particle i, the good and bad measurement index of the good and bad measurement index of historical optimal location and global optimum position of particle i; ω represents PSO algorithm inertia weight; c 1, c 2represent the PSO Algorithm Learning factor.
In PSO-BP neural metwork training process based on MapReduce model, the iterative process of each particle is all moved on an independent PSO-BP-Worker, each PSO-BP-Worker and management server communication, be used for safeguarding the good and bad measurement index information in global optimum position and global optimum position, this mode has very large extensibility, can accelerate like a cork the searching process of population by increasing primary number, thereby accelerate convergence of algorithm speed.
In DBN model training process, the difference of parameters setting brings impact may to follow-up output, and then impact finally obtains the determination rate of accuracy of DBN model.Such as, the ratio of the training dataset of choosing in user data pretreatment module is too low, can be unfavorable for the extraction of waterborne troops's user characteristics, causes final DBN model determination rate of accuracy low; If the selection of maximum iteration time is too low in RBM training process, can make RBM network training immature, and then it is improper to cause follow-up PSO-BP neural net initial weight to arrange, and likely causes DBN model to be absorbed in local optimum, can not reach the determination rate of accuracy of expection; If population number of particles arranges too smallly in PSO-BP neural metwork training process, can make network convergence slow, may within the maximum iteration time of specifying, can not reach convergence; If maximum iteration time arranges too smallly in PSO-BP neural metwork training process, may cause training process to finish ahead of time, and now DBN model is not restrained.Therefore in step S103, need to be according to the relevant parameter in the convergence of DBN model and the default reverse set-up procedure S101 of determination rate of accuracy and step S102.
In step S103 of the present invention, according to the incidence relation between above-mentioned parameter, use for reference the thought of workflow, define and finally obtained the process of feedback of DBN model to user data pretreatment module and DBN model training module, thereby oppositely adjust the relevant parameter in user data pretreatment module and DBN model training module according to the convergence of DBN model and determination rate of accuracy, improve the performance that finally obtains DBN model.
Workflow is the business process that a class can completely or partially automatically perform, and it is according to a series of process rules, and document, information or task can be transmitted and carry out between different executors.WfMC(Workflow Management Coalition, WFMC) in defined Work flow model basic in 4, respectively: series connection model, parallel model, preference pattern and circulation model, this patent combines series connection model, preference pattern and circulation model wherein, has defined the multilayer synergistic mechanism based on workflow.
According to description before, can determine 3 series connection models that workflow comprises, respectively: after user data pretreatment module completes, enter the pre-training stage of DBN model, after the pre-training stage of DBN model, enter the DBN model fine setting stage, DBN model enters DBN model detection-phase after the fine setting stage; 2 judgment models that workflow comprises, respectively: whether PSO-BP model restrains, whether DBN model reaches determination rate of accuracy threshold value.Wherein, in first judgment models, if Rule of judgment is set up, what need execution is " entering the DBN model fine setting stage ", if Rule of judgment is false, what need execution is " increase PSO algorithm iteration number of times, increase PSO algorithm population number of particles, and enter the pre-training stage of DBN model "; In second judgment models, if Rule of judgment is set up, flow process finishes, if Rule of judgment is false, what need execution is " increase RBM algorithm iteration number of times, increase training dataset ratio in data preprocessing module, the access customer data preprocessing module of going forward side by side ".Now, the Work flow model of formation as shown in Figure 9.Fig. 9 is the multilayer synergistic mechanism schematic diagram based on workflow.
In above-mentioned definite workflow, original user data generates the set of normalized user profile vector through user data pretreatment module, and completes the initialization of network weight parameter through the pre-training process of DBN, enters the DBN fine setting stage.If there is in the fine setting stage situation that PSO-BP model is not restrained, increase iterations, the increase PSO algorithm population number of particles of PSO algorithm, until PSO-BP model reaches convergence, now just obtain the DBN model after training.After detecting with test data set pair DBN model, if find that the determination rate of accuracy of DBN model does not reach the threshold value of expection, increase training dataset in user data pretreatment module and account for the ratio of categorized data set, and increase the iterations of RBM algorithm in the pre-training stage of DBN, re-start the training of DBN model, until the DBN model that training obtains reaches the determination rate of accuracy of expection.
The detection method of network of the present invention waterborne troops, it is a kind of DBN layered cooperative method detecting towards waterborne troops, the method is carried out the identification of network waterborne troops with the improvement DBN model of Parallel Implementation, and define the synergistic mechanism between various piece in DBN model, both convergence and the accuracy rate of waterborne troops's detection algorithm had been improved, shorten again the Massive Sample data drag training time, solved long problem of Massive Sample data drag training time.
The invention allows for the checkout gear of a kind of network waterborne troops, in order to implement the detection method of above-mentioned network waterborne troops.The description of the detection method to network of the present invention waterborne troops explanation is all applicable to the checkout gear of network of the present invention waterborne troops above.
Figure 10 is the structured flowchart of the checkout gear of network waterborne troops in the embodiment of the present invention.As shown in figure 10, in the present embodiment, the checkout gear of network waterborne troops comprises user data pretreatment module 100, DBN model training module 200, cooperative module 300 and detection module 400, DBN model training module 200 is connected with user data pretreatment module 100, cooperative module 300 and detection module 400 respectively, and cooperative module 300 is also connected with user data pretreatment module 100.Wherein, user data pretreatment module 100 is for being expressed as original user description information normalized user profile vector, from described user profile vector, filter out grouped data, using this a% of grouped data as the training data of degree of depth belief network DBN model, using this b% of grouped data as the detection data of DBN model, a is greater than b, and a and b sum equal 100, the type of described user description information is by user's chosen in advance, and whether the described data of sorted users refer to be marked as the user data of network waterborne troops.DBN model training module 200, for described training data training DBN model, is exported the DBN model that training obtains, and the DBN model of this output is called to output DBN model.Cooperative module 300 is for checking convergence and the determination rate of accuracy of described output DBN model, adjust the relevant parameter in described step 1 and step 2 according to assay, until described output DBN model reaches the default condition of convergence or end condition, wherein, described determination rate of accuracy is exported DBN model described in described detection Data Detection and is obtained by adopting.Detection module 400 is for using final DBN model to detect network waterborne troops, and described final DBN model refers to the output DBN model that reaches the described default condition of convergence or end condition.
In embodiments of the present invention, the training process of DBN model comprises the pre-training process of model and model trim process, and DBN model training module 200 can comprise pre-training unit and fine-adjusting unit.Pre-training unit is used for adopting the Downpour SGD algorithm RBM that walk abreast to train, and fine-adjusting unit is used for adopting the MapReduce algorithm PSO-BP neural metwork training that walks abreast.
In embodiments of the present invention, user description information can comprise registration time length, login frequency, online hours, user name length, Password Length, the ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model time, post time, bean vermicelli number and pay close attention to number relatively.
In embodiments of the present invention, the initial value of a can be set to 60.Whether the convergence of the DBN model that can obtain according to DBN model training module afterwards and determination rate of accuracy reach the value of expection requirement adjustment a.
The checkout gear of network of the present invention waterborne troops, adopt a kind of DBN layered cooperative method detecting towards waterborne troops, the method is carried out the identification of network waterborne troops with the improvement DBN model of Parallel Implementation, and define the synergistic mechanism between various piece in DBN model, both convergence and the accuracy rate of network waterborne troops detection algorithm had been improved, shorten again the Massive Sample data drag training time, solved long problem of Massive Sample data drag training time.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. a detection method for network waterborne troops, is characterized in that, comprising:
Step 1, original user description information is expressed as to normalized user profile vector, from described user profile vector, filter out grouped data, using this a% of grouped data as the training data of degree of depth belief network DBN model, using this b% of grouped data as the detection data of DBN model, a is greater than b, and a and b sum equal 100, the type of described user description information is by user's chosen in advance, and whether the described data of sorted users refer to be marked as the user data of network waterborne troops;
Step 2, with described training data training DBN model, the DBN model that output training obtains, is called output DBN model by the DBN model of this output;
Step 3, check convergence and the determination rate of accuracy of described output DBN model, adjust the relevant parameter in described step 1 and step 2 according to assay, until described output DBN model reaches the default condition of convergence or end condition, wherein, described determination rate of accuracy is exported DBN model described in described detection Data Detection and is obtained by adopting;
Step 4, is used final DBN model to detect network waterborne troops, and described final DBN model refers to the output DBN model that reaches the described default condition of convergence or end condition.
2. the detection method of network according to claim 1 waterborne troops, is characterized in that, the initial value of a is 60.
3. the detection method of network according to claim 2 waterborne troops, is characterized in that, whether the convergence of the DBN model obtaining according to step 2 and determination rate of accuracy reach expection requires to adjust the value of a.
4. the detection method of network according to claim 1 waterborne troops, it is characterized in that, in step 2, the training process of described DBN model comprises the pre-training process of model and model trim process, the pre-training process of described model adopts the RBM training that walk abreast of Downpour SGD algorithm, the described model trim process employing MapReduce algorithm PSO-BP neural metwork training that walks abreast.
5. the detection method of network according to claim 1 waterborne troops, it is characterized in that, described user description information comprises registration time length, login frequency, online hours, user name length, Password Length, the ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browses model time, post time, bean vermicelli number and pay close attention to number relatively.
6. the checkout gear of a network waterborne troops, it is characterized in that, comprise user data pretreatment module, DBN model training module, cooperative module and detection module, DBN model training module is connected with user data pretreatment module, cooperative module and detection module respectively, cooperative module is also connected with user data pretreatment module, wherein:
User data pretreatment module, for original user description information is expressed as to normalized user profile vector, from described user profile vector, filter out grouped data, using this a% of grouped data as the training data of degree of depth belief network DBN model, using this b% of grouped data as the detection data of DBN model, a is greater than b, and a and b sum equal 100, the type of described user description information is by user's chosen in advance, and whether the described data of sorted users refer to be marked as the user data of network waterborne troops;
DBN model training module, for described training data training DBN model, exports the DBN model that training obtains, and the DBN model of this output is called to output DBN model;
Cooperative module, for checking convergence and the determination rate of accuracy of described output DBN model, adjust the relevant parameter in described step 1 and step 2 according to assay, until described output DBN model reaches the default condition of convergence or end condition, wherein, described determination rate of accuracy is exported DBN model described in described detection Data Detection and is obtained by adopting;
Detection module, for using final DBN model to detect network waterborne troops, described final DBN model refers to the output DBN model that reaches the described default condition of convergence or end condition.
7. the checkout gear of network according to claim 6 waterborne troops, is characterized in that, the initial value of a is 60.
8. the checkout gear of network according to claim 7 waterborne troops, is characterized in that, whether the convergence of the DBN model obtaining according to DBN model training module and determination rate of accuracy reach expection requires to adjust the value of a.
9. the checkout gear of network according to claim 6 waterborne troops, it is characterized in that, the training process of described DBN model comprises the pre-training process of model and model trim process, described DBN model training module comprises pre-training unit and fine-adjusting unit, described pre-training unit is used for adopting the walk abreast RBM training of Downpour SGD algorithm, and described fine-adjusting unit is used for adopting the MapReduce algorithm PSO-BP neural metwork training that walks abreast.
10. the checkout gear of network according to claim 6 waterborne troops, it is characterized in that, described user description information comprises registration time length, login frequency, online hours, user name length, Password Length, the ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browses model time, post time, bean vermicelli number and pay close attention to number relatively.
CN201410027720.7A 2014-01-21 2014-01-21 Online water navy detection method and device Active CN103795592B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410027720.7A CN103795592B (en) 2014-01-21 2014-01-21 Online water navy detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410027720.7A CN103795592B (en) 2014-01-21 2014-01-21 Online water navy detection method and device

Publications (2)

Publication Number Publication Date
CN103795592A true CN103795592A (en) 2014-05-14
CN103795592B CN103795592B (en) 2017-01-25

Family

ID=50670914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410027720.7A Active CN103795592B (en) 2014-01-21 2014-01-21 Online water navy detection method and device

Country Status (1)

Country Link
CN (1) CN103795592B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862785A (en) * 2017-10-16 2018-03-30 深圳市中钞信达金融科技有限公司 Bill authentication method and device
CN107977397A (en) * 2017-09-08 2018-05-01 华瑞新智科技(北京)有限公司 Internet user's notice index calculation method and system based on deep learning
CN108197696A (en) * 2018-01-31 2018-06-22 湖北工业大学 A kind of network navy account recognition methods and system
CN108449295A (en) * 2018-02-05 2018-08-24 西安电子科技大学昆山创新研究院 Combined modulation recognition methods based on RBM networks and BP neural network
CN110362818A (en) * 2019-06-06 2019-10-22 中国科学院信息工程研究所 Microblogging rumour detection method and system based on customer relationship structure feature
CN110457630A (en) * 2019-07-30 2019-11-15 北京航空航天大学 A kind of open source community thumbs up the recognition methods and system of user extremely

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090018897A1 (en) * 2007-07-13 2009-01-15 Breiter Hans C System and method for determining relative preferences for marketing, financial, internet, and other commercial applications
CN102571484A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for detecting and finding online water army
CN102629904A (en) * 2012-02-24 2012-08-08 安徽博约信息科技有限责任公司 Detection and determination method of network navy
CN103198161A (en) * 2013-04-28 2013-07-10 中国科学院计算技术研究所 Microblog ghostwriter identifying method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090018897A1 (en) * 2007-07-13 2009-01-15 Breiter Hans C System and method for determining relative preferences for marketing, financial, internet, and other commercial applications
CN102571484A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for detecting and finding online water army
CN102629904A (en) * 2012-02-24 2012-08-08 安徽博约信息科技有限责任公司 Detection and determination method of network navy
CN103198161A (en) * 2013-04-28 2013-07-10 中国科学院计算技术研究所 Microblog ghostwriter identifying method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977397A (en) * 2017-09-08 2018-05-01 华瑞新智科技(北京)有限公司 Internet user's notice index calculation method and system based on deep learning
CN107862785A (en) * 2017-10-16 2018-03-30 深圳市中钞信达金融科技有限公司 Bill authentication method and device
CN108197696A (en) * 2018-01-31 2018-06-22 湖北工业大学 A kind of network navy account recognition methods and system
CN108449295A (en) * 2018-02-05 2018-08-24 西安电子科技大学昆山创新研究院 Combined modulation recognition methods based on RBM networks and BP neural network
CN110362818A (en) * 2019-06-06 2019-10-22 中国科学院信息工程研究所 Microblogging rumour detection method and system based on customer relationship structure feature
CN110457630A (en) * 2019-07-30 2019-11-15 北京航空航天大学 A kind of open source community thumbs up the recognition methods and system of user extremely
CN110457630B (en) * 2019-07-30 2022-03-29 北京航空航天大学 Method and system for identifying abnormal praise user in open source community

Also Published As

Publication number Publication date
CN103795592B (en) 2017-01-25

Similar Documents

Publication Publication Date Title
US9875294B2 (en) Method and apparatus for classifying object based on social networking service, and storage medium
Zhang et al. CAP: Community activity prediction based on big data analysis
CN103795592A (en) Online water navy detection method and device
CN102708153B (en) Self-adaption finding and predicting method and system for hot topics of online social network
CN105975573B (en) A kind of file classification method based on KNN
CN109145112A (en) A kind of comment on commodity classification method based on global information attention mechanism
US20160132904A1 (en) Influence score of a brand
CN103729678A (en) Navy detection method and system based on improved DBN model
WO2023065859A1 (en) Item recommendation method and apparatus, and storage medium
CN103745002B (en) Method and system for recognizing hidden paid posters on basis of fusion of behavior characteristic and content characteristic
CN113761359B (en) Data packet recommendation method, device, electronic equipment and storage medium
CN110378744A (en) Civil aviaton's frequent flight passenger value category method and system towards incomplete data system
CN106202065A (en) A kind of across language topic detecting method and system
Yu et al. Research on agricultural product price forecasting model based on improved BP neural network
CN105844334B (en) A kind of temperature interpolation method based on radial base neural net
CN103440352A (en) Method and device for analyzing correlation among objects based on deep learning
US10853689B2 (en) Methods for more effectively moderating one or more images and devices thereof
Li Accurate digital marketing communication based on intelligent data analysis
Sun et al. PhysiNet: A combination of physics‐based model and neural network model for digital twins
CN109977131A (en) A kind of house type matching system
WO2023087933A1 (en) Content recommendation method and apparatus, device, storage medium, and program product
WO2020093817A1 (en) Identity verification method and device
CN104572820B (en) The generation method and device of model, importance acquisition methods and device
CN106657106A (en) Semantic IoT service verification method and system based on tense description logic ALC-mu
Wei et al. Research on intelligent design mechanism of landscape lamp with regional cultural value based on interactive genetic algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant