CN103795592B - Online water navy detection method and device - Google Patents
Online water navy detection method and device Download PDFInfo
- Publication number
- CN103795592B CN103795592B CN201410027720.7A CN201410027720A CN103795592B CN 103795592 B CN103795592 B CN 103795592B CN 201410027720 A CN201410027720 A CN 201410027720A CN 103795592 B CN103795592 B CN 103795592B
- Authority
- CN
- China
- Prior art keywords
- training
- model
- dbn model
- data
- dbn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention relates to an online water navy detection method and device. The online water nary detection method includes the first step of expressing original user description information as a normalized user description vector, screening out classified data from the user description vector, enabling a% of the classified data to serve as training data of a DBN model, and enabling b% of the classified data to serve as detection data of the DBN model, the second step of training the DBN model through the training data and outputting the DBN model obtained through training, the third step of verifying the convergence and the judgment accuracy of the outputted DBN model and regulating related parameters in the first step and the second step according to a detection result till the outputted DBN model satisfies a preset convergence condition or an end condition, and the fourth step of detecting an online water navy through the final DBN model. According to the online water navy detection method and device, not only are convergence and accuracy of an online water navy detection algorithm improved, and the model training time under mass sample data is shortened.
Description
Technical field
The present invention relates to networking technology area, more particularly, to a kind of detection method of network navy and device.
Background technology
Development with information technology and progress, cyberspace has become land, sea, air of continuing, the people beyond sky space-time
Class the 5th tie up activity space, especially after web2.0 technology is applied to internet arena, the social networking application such as forum, microblogging send out
Exhibition is even more like a raging fire.But while cyberspace develops rapidly, the safety problem that it is brought is also more and more prominent, its
In with from " network navy " safety problem most commonly seen." network navy " is to be hired by network public relations firms, for spy
Determine theme to carry out beaming back the network user that note is made a show of power, according to associated mechanisms investigation, the pushing hands that China is engaged in network marketing activity reaches
Millions of people, be hired by " network navy " of establishment increasingly scale, come into the open, group's interests.From " closing down Wang Laoji
Marketing plan scheme " arrives " Mongolia Ox frames door " scandal it may be said that " network navy " has been touched again to " cruel cat female's event " that cat flutters
Encounter the bottom line of law, or even under some " network navy " mechanism operating of having ulterior motives abroad, at home respectively
World Jam issues attack information, start a rumour speech and instigation language, manufactures contradiction, carries out the Internet culture infiltration of malice, endanger state
Family's safety.It can be seen that, " network navy " is carried out supervise very urgent.
Different from physical surroundings, there is intrinsic open characteristics and unique Information Communication in the virtual environment of network forum
Rule, this brings very big challenge just to " network navy " supervision, is mainly reflected in following two aspects:
First, in network forum, the propagation of popular information is in blast trend, and the mode therefore afterwards deleting note can not be to damage
Evil result is thoroughly remedied, or even, delete note behavior itself and can be utilized by waterborne troops on the contrary, " confirm " in message to a certain extent
The authenticity held.
Secondly, network forum comprises mass data, how to construct effective algorithm from random extracting data can in a large number
With information, become the biggest obstacle that " network navy " supervises.
Therefore, to the supervision of " network navy " not only will from law and Systematic improve network of relation legal system and
Shi Gongbu government affairs situation and public accident trend, with greater need for the feature with reference to network forum, improve extensive using from technological layer
User data disposal ability, research and improvement are applied to the related algorithm that " network navy " detects, thus identifying the " network in forum
Waterborne troops " user, stops the issue of waterborne troops's patch on source.
" network navy " detection is substantially classification problem, and a kind of widespread practice is the correlation of sorted users known to analysis
Information and historical behavior, therefrom extract the feature that waterborne troops user is different from normal users, and then unknown sorted users information are entered
Row analysis, judges which user most likely " network navy ".Be usually used at present classification problem algorithm include Bayesian network,
SVMs, knn, neutral net etc., wherein, Bayesian network is the algorithm classified with probability statistics knowledge, and it leads to
Cross Bayes' theorem forecast sample classification, but Bayesian establishment itself needs a very strong conditional independence assumption
Premise, and this hypothesis is often invalid in a practical situation, thus its classification accuracy can be greatly reduced;SVMs
Need the space vector of calculated in advance sample, and set the impact weights to final result for each dimension, weight setting in vector
Process is largely dependent upon historical experience and case study, and the quality of weight setting also directly affects the judgement standard of algorithm
Really property;Knn algorithm is a kind of Lazy learning method, and it deposits sample, until needing just to run learning algorithm during classification, if sample
This collection is more complicated, may result in very big computing cost, the real-time of impact classification.Neural network algorithm is most commonly used for
The algorithm of classification problem, this algorithm determines model parameter by way of training, can objectively reflect each influence factor pair
The influence degree of final result, and the training of neutral net is to carry out before classification, will not bring additionally to assorting process
Time overhead.But basic neural network model is complicated, when training set is larger, training process takes oversize, and pole
Easily it is absorbed in locally optimal solution because the setting of network initial weight is improper, this means that and is carried out with basic neural network algorithm
" network navy " can have that poor astringency, accuracy rate are low, time-consuming when finding.
Content of the invention
The technical problem to be solved is to provide a kind of detection method of network navy and device, improves network
The convergence of waterborne troops's detection algorithm and accuracy rate, shorten the Massive Sample data drag training time.
For solving above-mentioned technical problem, the present invention proposes a kind of detection method of network navy, comprising:
Step one, original user description information is expressed as normalized user's description vectors, from described user description
Vector in filter out grouped data, using this grouped data a% as depth belief network dbn model training data, will
This grouped data b% as dbn model detection data, a is more than b, and a and b sum are equal to 100, described user's description letter
The type of breath is preselected by user, and the described data of sorted users refers to have been labeled as the number of users of whether network navy
According to;
Step 2, trains dbn model, the dbn model that output training obtains, the dbn that this is exported with described training data
Model is referred to as exporting dbn model;
Step 3, checks convergence and the determination rate of accuracy of described output dbn model, adjusts described step according to assay
Rapid one and step 2 in relevant parameter, until described output dbn model reaches the default condition of convergence or end condition, wherein,
Using described detection data, described determination rate of accuracy is by detecting that described output dbn model obtains;
Step 4, is detected to network navy using final dbn model, and described final dbn model refers to reach described
The output dbn model of the default condition of convergence or end condition.
Further, the detection method of above-mentioned network navy also can have the characteristics that, the initial value of a is 60.
Further, the detection method of above-mentioned network navy also can have the characteristics that, according to the dbn obtained by step 2
Whether the convergence of model and determination rate of accuracy reach expected requirement adjustmentaValue.
Further, the detection method of above-mentioned network navy also can have the characteristics that, in step 2, described dbn model
Training process include model pre-training process and model trim process, described model pre-training process adopt downpour sgd
Algorithm carries out parallel rbm training, and described model trim process carries out parallel pso-bp neutral net instruction using mapreduce algorithm
Practice.
Further, the detection method of above-mentioned network navy also can have the characteristics that, described user description information includes
Registration time length, log in frequency, online hours, user name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model
Time, time of relatively posting, bean vermicelli number and concern number.
For solving above-mentioned technical problem, the invention allows for a kind of detection means of network navy, including user data
Pretreatment module, dbn model training module, cooperative module and detection module, dbn model training module is pre- with user data respectively
Processing module, cooperative module are connected with detection module, and cooperative module is also connected with user data pretreatment module, wherein:
User data pretreatment module, for by original user description information be expressed as normalized user describe to
Amount, filters out grouped data from described user's description vectors, using the a% of this grouped data as depth belief network dbn
The training data of model, using this grouped data b% as dbn model detection data, a is more than b, and a and b sum is equal to
100, the type of described user description information is preselected by user, and whether the described data of sorted users refers to have been labeled as
The user data of network navy;
Dbn model training module, for training dbn model with described training data, the dbn model obtaining is trained in output,
This dbn model exporting is referred to as exporting dbn model;
Cooperative module, for checking convergence and the determination rate of accuracy of described output dbn model, adjusts according to assay
Relevant parameter in described step one and step 2, until described output dbn model reaches the default condition of convergence or end condition,
Wherein, using described detection data, described determination rate of accuracy is by detecting that described output dbn model obtains;
Detection module, for being detected to network navy using final dbn model, described final dbn model refers to reach
Output dbn model to the described default condition of convergence or end condition.
Further, the detection means of above-mentioned network navy also can have the characteristics that, the initial value of a is 60.
Further, the detection means of above-mentioned network navy also can have the characteristics that, according to the dbn obtained by step 2
Whether the convergence of model and determination rate of accuracy reach expected requirement adjustmentaValue.
Further, the detection means of above-mentioned network navy also can have the characteristics that, the training process of described dbn model
Including model pre-training process and model trim process, described dbn model training module includes pre-training unit and fine-adjusting unit,
Described pre-training unit is used for carrying out parallel rbm training using downpour sgd algorithm, and described fine-adjusting unit is used for adopting
Mapreduce algorithm carries out parallel pso-bp neural metwork training.
Further, the detection means of above-mentioned network navy also can have the characteristics that, described user description information includes
Registration time length, log in frequency, online hours, user name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model
Time, time of relatively posting, bean vermicelli number and concern number.
The detection method of the network navy of the present invention and device, had both improve convergence and the standard of network navy detection algorithm
Really rate, shortens the Massive Sample data drag training time again, solves the Massive Sample data drag training time long
Problem.
Brief description
Fig. 1 is the flow chart of the detection method of network navy in the embodiment of the present invention;
Fig. 2 is the parallel processing schematic diagram of user's description vectors in the embodiment of the present invention;
Fig. 3 is that in the embodiment of the present invention, the user's description vectors each dimension number range based on mapreduce algorithm determines
Flow chart;
Fig. 4 is basic dbn model schematic;
Fig. 5 is downpour sgd model schematic;
Fig. 6 is the parallel rbm training algorithm flow chart based on downpour sgd;
Fig. 7 is individual layer bp neural network structure figure;
Fig. 8 is list particle pso-bp neural network BP training algorithm flow chart;
Fig. 9 is the multilayer synergistic mechanism schematic diagram based on workflow;
Figure 10 is the structured flowchart of the detection means of network navy in the embodiment of the present invention.
Specific embodiment
Below in conjunction with accompanying drawing, the principle of the present invention and feature are described, example is served only for explaining the present invention, and
Non- for limiting the scope of the present invention.
Fig. 1 is the flow chart of the detection method of network navy in the embodiment of the present invention.As shown in figure 1, in the present embodiment, net
The detection method flow process of network waterborne troops may include steps of:
Step s101, original user description information is expressed as normalized user's description vectors, retouches from described user
State and in vector, filter out grouped data, using the a% of this grouped data as dbn(deep belief network, depth is believed
Read network) training data of model, using this grouped data b% as dbn model detection data, a is more than b, and a and b it
With equal to 100, the type of described user description information is preselected by user, described grouped data refer to have been labeled as be
The user data of no network navy;
Step s102, trains dbn model, the dbn model that output training obtains with described training data, this is exported
Dbn model is referred to as exporting dbn model;
Wherein, the training process of dbn model includes model pre-training process and model trim process, described model pre-training
Process carries out parallel rbm training using downpour sgd algorithm, and described model trim process is carried out using mapreduce algorithm
Parallel pso-bp neural metwork training.
Downpour sgd algorithm and mapreduce algorithm are prior art, herein not to downpour sgd algorithm and
Mapreduce algorithm is described in detail.
Step s103, the convergence of inspection output dbn model and determination rate of accuracy, adjust described step according to assay
Relevant parameter in s101 and step s102, until described output dbn model reaches the default condition of convergence or end condition, its
In, using described detection data, described determination rate of accuracy is by detecting that described output dbn model obtains;
Step s104, is detected to network navy using final dbn model, and final dbn model refers to reach described pre-
If the output dbn model of the condition of convergence or end condition.
Below above-mentioned steps are described in further detail.
In step s101, user description information is converted into certain mathematical form and is indicated.Objectively, a network opinion
Altar user comprises a lot of description informations, such as user's registration time, all previous landing time, user name, password, logs in ip, browses
Historical record, historical record of posting, money order receipt to be signed and returned to the sender historical record, forum good friend record, bean vermicelli record, concern user record etc..This
In bright, choose and wherein compare representational information (these information are 1 column information of table) as reference, user is carried out point
Class, and user profile many attribute descriptions framework is proposed accordingly, frame structure is as shown in table 1.
Table 1 user profile many attribute descriptions framework
Attribute-name | Explanation | Computational methods |
registerperiod | Registration time length | Registration forum time length |
loginfrequency | Log in frequency | Login times/registration time length |
onlineperiod | Online hours | Forum's line duration length |
usernamelength | User name length | User name length |
passwordlength | Password Length | Password Length |
postrate | Post ratio | Post number/always paste number |
replyrate | Money order receipt to be signed and returned to the sender ratio | Money order receipt to be signed and returned to the sender number/always paste number |
surfingfrequency | Relatively browse the model time | Browse model time/online hours |
editingfrequency | Relatively post the time | Post the time/online hours |
fansnumber | Bean vermicelli number | Bean vermicelli number |
considernumber | Concern number | Concern number |
From table 1, in the embodiment of the present invention, user description information can include registration time length, log in frequency, online when
Length, user name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model time, relatively post time, bean vermicelli number
With concern number.
By user profile many attribute descriptions framework of table 1, user description information can be converted into the row of digital form
Table, such as, the user description information of certain user a, after user profile many attribute descriptions framework is abstract, can be expressed as table 2
Shown.
Table 2 user profile attribute list example
Attribute-name | Value |
registerperiod | 792 days |
loginfrequency | 100 times/792 days |
onlineperiod | 89 hours |
usernamelength | 6 |
passwordlength | 6 |
postrate | 20/20 |
replyrate | 0/20 |
surfingfrequency | 83 hours/89 hours |
editingfrequency | 6 hours/89 hours |
fansnumber | 20 |
considernumber | 3 |
Model according to table 1, it is possible to achieve the quantization means to user description information.For example, the user a in table 2 is permissible
With vector, [792 days, 100 times/792 days, 89 hours/792 days, 6,6,20/20,0,83 hours/89 hours, 6 is little
When/89 hours, 20,3] represent, this vector is referred to as user's description vectors.It is likewise possible to users all in forum are retouched
The information of stating is converted into user's description vectors, it is achieved thereby that the mathematical notation of user profile.
Additionally, for convenience in follow-up dbn model training initial weight setting, need to ensure user's description vectors each
The numerical value of dimension, all between [- 1,1], is normalized to each dimension of user's description vectors therefore in the present invention,
First extract the data span of all user's each dimensions of description vectors in forum, then logarithm value scope exceeds [- 1,1]
Dimension is normalized.
Fig. 2 is the parallel processing schematic diagram of user's description vectors in the embodiment of the present invention.As shown in Fig. 2 user describe to
Amount generates and user's description vectors normalization process, and parallel model all can be applied to be calculated.Wherein, user describe to
All user description information can be randomly divided into m group parallel processing by amount generation phase, and each group is responsible for users all in group
Description information be converted into user's description vectors, more all groups of user's description vectors are sequentially allocated No. id, thus being used
Family id and the set of user's description vectors pair.
Fig. 3 is that in the embodiment of the present invention, the user's description vectors each dimension number range based on mapreduce algorithm determines
Flow chart.During the normalization of user's description vectors, as shown in figure 3, first with mapreduce algorithm, finding user's description
Vector in each dimension number range, the span determining which dimension not between [- 1,1], then by these dimensions
In value, the maximum of absolute value is found out, and with this absolute value, this dimension is normalized, can also during normalization
Carry out parallel according to all user's description vectors being divided into the mode of m group.
By above-mentioned process, normalized user's description vectors set can be obtained.Wherein some data is to divide
Class, that is, " network navy " whether some user be marked as, and this class data is referred to as grouped data, this
The set of class data is referred to as " categorized data set ".In order to carry out follow-up dbn model training, needing will " grouped data
Collection " is divided into two parts, and a portion is referred to as " training dataset ", and for carrying out the training of dbn model parameter, another part claims
For " test data set ", for detecting the determination rate of accuracy of obtained dbn model.Sample size distribution in two datasets
On, hiding rule in these samples could be simulated because dbn model needs to learn enough samples, so general " training
Data set " sample size is more, but the drawbacks of " training dataset " sample size excessively can bring amount of calculation to increase again.For
This problem, the present invention chooses in " categorized data set " 60% sample first as " training dataset ", afterwards according to gained
To dbn model convergence and determination rate of accuracy whether reach and expected require to adjust this ratio that (i.e. " training dataset " accounts for
The ratio of categorized data set ").How the convergence of dbn model judges?
In step s102, dbn(deep belief network, depth belief network) model is deep neural network
One kind, the generative probabilistic model being made up of multilayer stochastic variable node.Fig. 4 is basic dbn model schematic.As shown in figure 4,
Basic dbn model is by two-layer rbm(restricted boltzmann machines, limited Boltzmann machine) and one layer of bp god
Through network (back propagation neural network) composition.The training process of dbn model is divided into two processes: mould
Type pre-training process and model trim process.Wherein, model pre-training process carries out parallel rbm using downpoursgd algorithm
Training, model trim process carries out parallel pso-bp neural metwork training using mapreduce algorithm.
Referring to Fig. 4, model pre-training process carrys out the two-layer in training pattern using the method for successively unsupervised greedy study
Rbm: first by input data x and ground floor hidden layer h0As a rbm, the parameter that training obtains this rbm (connects v0With
h0Weight matrix w0、v0And h0The biasing a of each node and parameter b), then fixing this rbm, h0It is regarded as visible layer,
H1It is regarded as hidden layer, train second rbm, and obtain its parameter, now just complete the pre-training process of dbn model, really
Determine the initial parameter of two-layer rbm.During this, the learning process of every layer of rbm is separate, enormously simplify the instruction of model
Practice process.
It is possible to whole network is equivalent to bp neutral net after pre-training, this bp neutral net comprises two-layer and hides
Node, the wherein network parameter between input layer and ground floor concealed nodes and two-layer concealed nodes are complete initialization,
Only need to carry out random initializtion to the network parameter of second layer concealed nodes and output node it is possible to according to normal bp nerve
The training method of network carries out error back propagation training to this network, until model reaches convergence or end condition, this mistake
Journey is referred to as model trim process.
During dbn model pre-training, the method using successively unsupervised greedy study is respectively trained two-layer rbm, phase
In traditional Multi-Layer Feedback training pattern, this mode simplifies the training process of model to ratio, accelerates mould to a certain extent
The training speed of type.But in the face of magnanimity training dataset, the training of individual layer rbm still needs for a long time, the therefore present invention
Parallelization process has been done in training for individual layer rbm, thus accelerating the speed of dbn model pre-training, shortens dbn model pre-
Training stage required time.
The present invention carries out parallel processing with downpour sgd algorithm to rbm training process.Fig. 5 is downpour sgd mould
Type schematic diagram.As shown in figure 5, based on the basic thought that the parallel rbm of downpour sgd realizes being: training data is divided into
Some subsets, are distributed on multiple worker servers, run copying of a rbm model on each worker server
Shellfish, worker server only needs to be communicated with parameter server.The parameter of model updates the parameter service by storing parameter
Device is carried out, and this parameter server saves the current state of all parameters of model.Training stage, each worker is respectively from parameter
Server obtains the parameter of model current state, and executes min-batch according to this parameter, after calculating renewal gradient, will tie
Fruit pushes back parameter server.In a simple realization of downpour sgd, every n can be setfetchSecondary mini-batch
Operate and obtain the parameter after once updating, every n to parameter serverpushSecondary mini-batch operation pushes a gradient updating and arrives
Parameter server.
Fig. 6 is the parallel rbm training algorithm flow chart based on downpour sgd.In Fig. 6, η represents parameter with gradient
Renewal speed, nfetchAnd npushRepresent the cycle uploading gradient from parameter server synchronization parameter with to parameter server respectively.
In downpour sgd the gradient updating process of parameter be asynchronous carry out, in this manner, though one
Worker server is delayed machine, does not also interfere with the work of other worker servers.Although asynchronous refresh process can lead to each
In worker, parameter has difference slightly, but in existing realization, algorithm integrally still has good stability.
After two-layer rbm parameter training, just complete the pre-training process of dbn model, now can will be equivalent for dbn model
For four layers of bp neutral net, the parameter between wherein lower three layers is initialized to be finished, and next needs random initializtion
Parameter between highest two-layer, and train this bp neutral net with training dataset, that is, carry out the trim process of dbn model.
Model trim process carries out parallel pso-bp neural metwork training using mapreduce algorithm.Bp neutral net is
A kind of multilayer feedforward neural network by error backpropagation algorithm training.Fig. 7 is individual layer bp neural network structure figure.As Fig. 7
Shown, the training of bp neutral net is made up of information forward-propagating and two processes of error back propagation, when forward-propagating result
When not being inconsistent with anticipated output, calculate the difference of output valve and desired value, and decline mode correction connection weight, this mistake according to gradient
Journey be performed until network output error be reduced to acceptable degree till.
The training process of bp neutral net is substantially to find the optimum of network weight by successively iteration and backpropagation
Combination, thus minimizing the difference of network output and anticipated output, but in training process, by error back propagation to network
The process of weighed value adjusting is very slow.Pso-bp neural network algorithm is the optimization to bp neutral net error back propagation process,
By pso(particle swarm optimization, particle cluster algorithm) in multidimensional search space, iteration finds optimum position
The process put instead of error back propagation process, thus accelerating the convergence rate of bp neutral net.
In pso-bp neural network algorithm, particle position that vector that network parameter is formed is defined as in population to
Amount, by the output of certain parameter vector drag and the error amount of anticipated output be defined as this position good and bad measurement index it is clear that
Ground, this index is less, then, closer to optimized parameter, that is, particle position is better for representation parameter.First initialize certain when algorithm starts
The particle of quantity, each particle preserves its current location, history optimal location, present speed and population history optimal location
Memory.An often evolution generation, particle adjusts position and the speed of oneself using current information and recall info, and updates memory.
Particle continuous adjustment position in multidimensional search space, until population reaches poised state.The optimum particle position now obtaining,
Just represent the neutral net optimized parameter that training obtains.
In view of pso-bp train samples data volume is very big, the present invention uses mapreduce algorithm to pso-bp
Neural network training process carries out Parallel Implementation, thus accelerating convergence of algorithm speed.Wherein, the iterative process fortune of each particle
Row, on a pso-bp-worker, preserves global optimum's positional information in management server and global optimum position is corresponding
Position quality measurement index, after each particle iterative process updates the complete wheel of iteration, will be synchronous optimum to management server
Positional information, until it reaches the iterations of regulation or till reaching the condition of convergence.
The algorithm flow chart of the pso-bp neural network training process of each particle execution is as shown in Figure 8.In Fig. 8, n represents
Maximum iteration time;xi、xl、xgRepresent current location vector, the history optimal location vector sum global optimum of particle i of particle i
Position vector;maxi、maxl、maxgRepresent current location quality measurement index, the history optimal location of particle i of particle i respectively
Good and bad measurement index and global optimum's position quality measurement index;ω represents pso algorithm inertia weight;c1、c2Represent pso algorithm
Studying factors.
Based in the pso-bp neural network training process of mapreduce model, the iterative process of each particle is all one
Run on individual single pso-bp-worker, each pso-bp-worker and management server communication, for safeguarding the overall situation
Optimal location and global optimum's position quality measurement index information, this mode has very big extensibility, can easily pass
Increase primary number to accelerate the searching process of population, thus accelerating convergence of algorithm speed.
During dbn model training, the difference of parameters setting may bring impact, Jin Erying to follow-up output
Sound finally obtains the determination rate of accuracy of dbn model.Such as, the ratio of the training dataset chosen in user data pretreatment module
Too low, the extraction of waterborne troops's user characteristics can be unfavorable for, lead to final dbn model determination rate of accuracy low;In rbm training process
If the selection of big iterations is too low, rbm network training can be made immature, and then lead to follow-up pso-bp neutral net
Initial weight setting improper it is possible to cause dbn model to be absorbed in local optimum it is impossible to reach expected determination rate of accuracy;pso-
If the setting of population number of particles is too small in bp neural network training process, network convergence can be made slow, may be specified
Convergence can not be reached within maximum iteration time;If maximum iteration time was arranged in pso-bp neural network training process
Little, training process may be led to terminate ahead of time, and now dbn model not converged.Therefore in step s103, need basis
Relevant parameter in the convergence of dbn model and reverse set-up procedure s101 of default determination rate of accuracy and step s102.
In step s103 of the present invention, according to the incidence relation between above-mentioned parameter, use for reference the thought of workflow, definition
Finally give dbn model to the process of feedback of user data pretreatment module and dbn model training module, thus according to dbn
The convergence of model and determination rate of accuracy reversely adjust the related ginseng in user data pretreatment module and dbn model training module
Number, improves the performance finally giving dbn model.
Workflow is the business process that a class can completely or partially execute automatically, it according to a series of process rules,
Document, information or task can be transmitted between different executors and be executed.Wfmc(workflow management
Coalition, WFMC) defined in Work flow model basic in 4, be respectively: series model gang mould
Type, preference pattern and circulation model, this patent combines series model therein, preference pattern and circulation model, defines base
Multilayer synergistic mechanism in workflow.
According to description before it may be determined that 3 series model including of workflow, it is respectively: user data is located in advance
Enter the dbn model pre-training stage after the completion of reason module, enter dbn model fine setting stage, dbn after the dbn model pre-training stage
The dbn model inspection stage is entered after the model fine setting stage;2 judgment models that workflow includes, be respectively: pso-bp mould
Whether type restrains, whether dbn model reaches determination rate of accuracy threshold value.Wherein, in first judgment models, if Rule of judgment becomes
Vertical, then need execution is " entering the dbn model fine setting stage ", if Rule of judgment is false, that need execution is " increase pso
Algorithm iteration number of times, increase pso algorithm population number of particles, and enter the dbn model pre-training stage ";Judge mould at second
In type, if Rule of judgment is set up, flow process terminates, if Rule of judgment is false, need execution is " to increase rbm algorithm iteration
Training dataset ratio in number of times, increase data preprocessing module, access customer data preprocessing module of going forward side by side ".Now, formation
Work flow model is as shown in Figure 9.Fig. 9 is the multilayer synergistic mechanism schematic diagram based on workflow.
In the workflow of above-mentioned determination, original user data generates normalized use through user data pretreatment module
Family description vectors set, and the initialization of network weight parameter is completed through dbn pre-training process, enter the dbn fine setting stage.As
, in the fine setting stage, the situation that pso-bp model is not restrained in fruit, then increase the iterations of pso algorithm, increase pso algorithm kind
Group's number of particles, the dbn model until pso-bp model reaches convergence, after now just being trained.Use test data set pair
After dbn model inspection, if it find that the determination rate of accuracy of dbn model is not reaching to expected threshold value, then increase user data
In pretreatment module, training dataset accounts for the ratio of categorized data set, and increases the iteration of rbm algorithm in the dbn pre-training stage
Number of times, re-starts the training of dbn model, till training the dbn model obtaining to reach expected determination rate of accuracy.
The detection method of the network navy of the present invention, is a kind of dbn layered cooperative method towards waterborne troops's detection, the method
Carry out network navy identification with the improvement dbn model of Parallel Implementation, and define the collaborative machine between various pieces in dbn model
System, had both improve convergence and the accuracy rate of waterborne troops's detection algorithm, had shortened the Massive Sample data drag training time again, solution
Determine Massive Sample data drag training time long problem.
The invention allows for a kind of detection means of network navy, in order to implement the detection side of above-mentioned network navy
Method.Above the description explanation of the detection method of inventive network waterborne troops is all applied to the detection dress of the network navy of the present invention
Put.
Figure 10 is the structured flowchart of the detection means of network navy in the embodiment of the present invention.As shown in Figure 10, the present embodiment
In, the detection means of network navy includes user data pretreatment module 100, dbn model training module 200, cooperative module 300
With detection module 400, dbn model training module 200 respectively with user data pretreatment module 100, cooperative module 300 and detection
Module 400 is connected, and cooperative module 300 is also connected with user data pretreatment module 100.Wherein, user data pretreatment module
100 are used for for original user description information being expressed as normalized user's description vectors, sieve from described user's description vectors
Select grouped data, using the a% of this grouped data as the training data of depth belief network dbn model, this is classified
The b% of data is more than b as the detection data of dbn model, a, and a is equal to 100 with b sum, the type of described user description information
Preselected by user, the described data of sorted users refers to have been labeled as the user data of whether network navy.Dbn model
Training module 200 is used for training dbn model, the dbn model that output training obtains, the dbn that this is exported with described training data
Model is referred to as exporting dbn model.Cooperative module 300 is used for checking convergence and the determination rate of accuracy of described output dbn model, root
Adjust the relevant parameter in described step one and step 2 according to assay, until described output dbn model reaches default convergence
Condition or end condition, wherein, using described detection data, described determination rate of accuracy is by detecting that described output dbn model obtains
Arrive.Detection module 400 is used for using final dbn model, network navy being detected, described final dbn model refers to reach institute
State the output dbn model of the default condition of convergence or end condition.
In embodiments of the present invention, the training process of dbn model includes model pre-training process and model trim process,
Dbn model training module 200 can include pre-training unit and fine-adjusting unit.Pre-training unit is used for using downpour sgd
Algorithm carries out parallel rbm training, and fine-adjusting unit is used for carrying out parallel pso-bp neural metwork training using mapreduce algorithm.
In embodiments of the present invention, user description information can include registration time length, log in frequency, online hours, user
Name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model time, relatively post time, bean vermicelli number and concern
Number.
In embodiments of the present invention, the initial value of a could be arranged to 60.Afterwards can be according to dbn model training module institute
Whether the convergence of dbn model obtaining and determination rate of accuracy reach the expected value requiring adjustment a.
The detection means of the network navy of the present invention, using a kind of dbn layered cooperative method towards waterborne troops's detection, the party
The method improvement dbn model of Parallel Implementation carries out network navy identification, and defines collaborative between various pieces in dbn model
Mechanism, had both improve convergence and the accuracy rate of network navy detection algorithm, shortened Massive Sample data drag training again
Time, solve the problems, such as that the Massive Sample data drag training time is long.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and
Within principle, any modification, equivalent substitution and improvement made etc., should be included within the scope of the present invention.
Claims (10)
1. a kind of detection method of network navy is it is characterised in that include:
Step one, original user description information is expressed as normalized user's description vectors, from described user's description vectors
In filter out grouped data, using this grouped data a% as depth belief network dbn model training data, should
Grouped data b% as dbn model detection data, a is more than b, and a and b sum are equal to 100, described user's description letter
The type of breath is preselected by user, and the described data of sorted users refers to have been labeled as the number of users of whether network navy
According to;
Step 2, trains dbn model, the dbn model that output training obtains, the dbn model that this is exported with described training data
Referred to as output dbn model;
Step 3, checks convergence and the determination rate of accuracy of described output dbn model, adjusts described step one according to assay
With the training data in step 2, reach the default condition of convergence or end condition up to described output dbn model, wherein, described
Using described detection data, determination rate of accuracy is by detecting that described output dbn model obtains;
Step 4, is detected to network navy using final dbn model, and described final dbn model refers to reach described presetting
The output dbn model of the condition of convergence or end condition.
2. the detection method of network navy according to claim 1 is it is characterised in that the initial value of a is 60.
3. the detection method of network navy according to claim 2 is it is characterised in that according to the dbn obtained by step 2
Whether the convergence of model and determination rate of accuracy reach the expected value requiring adjustment a.
4. the detection method of network navy according to claim 1 is it is characterised in that in step 2, described dbn model
Training process includes model pre-training process and model trim process, and described model pre-training process adopts downpour sgd to calculate
Method carries out parallel rbm training, and described model trim process carries out parallel pso-bp neutral net instruction using mapreduce algorithm
Practice.
5. the detection method of network navy according to claim 1 is it is characterised in that described user description information includes noting
Volume duration, when logging in frequency, online hours, user name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browsing model
Between, the time of relatively posting, bean vermicelli number and concern number.
6. a kind of detection means of network navy is it is characterised in that include user data pretreatment module, dbn model training mould
Block, cooperative module and detection module, dbn model training module respectively with user data pretreatment module, cooperative module and detection
Module is connected, and cooperative module is also connected with user data pretreatment module, wherein:
User data pretreatment module, for original user description information is expressed as normalized user's description vectors, from
Grouped data is filtered out, using the a% of this grouped data as depth belief network dbn model in described user's description vectors
Training data, using this grouped data b% as dbn model detection data, a is more than b, and a and b sum are equal to 100,
The type of described user description information is preselected by user, and the described data of sorted users refers to have been labeled as whether network
The user data of waterborne troops;
Dbn model training module, for training dbn model with described training data, the dbn model obtaining is trained in output, should
The dbn model of output is referred to as exporting dbn model;
Cooperative module, for checking convergence and the determination rate of accuracy of described output dbn model, according to assay adjustment
Training data in user data pretreatment module and dbn model training module, until described output dbn model reaches default receipts
Hold back condition or end condition, wherein, described determination rate of accuracy is by using the described detection data described output dbn model of detection
Obtain;
Detection module, for being detected to network navy using final dbn model, described final dbn model refers to reach institute
State the output dbn model of the default condition of convergence or end condition.
7. the detection means of network navy according to claim 6 is it is characterised in that the initial value of a is 60.
8. the detection means of network navy according to claim 7 is it is characterised in that according to dbn model training module institute
Whether the convergence of dbn model obtaining and determination rate of accuracy reach the expected value requiring adjustment a.
9. the detection means of network navy according to claim 6 is it is characterised in that the training process of described dbn model
Including model pre-training process and model trim process, described dbn model training module includes pre-training unit and fine-adjusting unit,
Described pre-training unit is used for carrying out parallel rbm training using downpour sgd algorithm, and described fine-adjusting unit is used for adopting
Mapreduce algorithm carries out parallel pso-bp neural metwork training.
10. the detection means of network navy according to claim 6 is it is characterised in that described user description information includes
Registration time length, log in frequency, online hours, user name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model
Time, time of relatively posting, bean vermicelli number and concern number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410027720.7A CN103795592B (en) | 2014-01-21 | 2014-01-21 | Online water navy detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410027720.7A CN103795592B (en) | 2014-01-21 | 2014-01-21 | Online water navy detection method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103795592A CN103795592A (en) | 2014-05-14 |
CN103795592B true CN103795592B (en) | 2017-01-25 |
Family
ID=50670914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410027720.7A Active CN103795592B (en) | 2014-01-21 | 2014-01-21 | Online water navy detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103795592B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107977397A (en) * | 2017-09-08 | 2018-05-01 | 华瑞新智科技(北京)有限公司 | Internet user's notice index calculation method and system based on deep learning |
CN107862785A (en) * | 2017-10-16 | 2018-03-30 | 深圳市中钞信达金融科技有限公司 | Bill authentication method and device |
CN108197696A (en) * | 2018-01-31 | 2018-06-22 | 湖北工业大学 | A kind of network navy account recognition methods and system |
CN108449295A (en) * | 2018-02-05 | 2018-08-24 | 西安电子科技大学昆山创新研究院 | Combined modulation recognition methods based on RBM networks and BP neural network |
CN110362818A (en) * | 2019-06-06 | 2019-10-22 | 中国科学院信息工程研究所 | Microblogging rumour detection method and system based on customer relationship structure feature |
CN110457630B (en) * | 2019-07-30 | 2022-03-29 | 北京航空航天大学 | Method and system for identifying abnormal praise user in open source community |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102571484A (en) * | 2011-12-14 | 2012-07-11 | 上海交通大学 | Method for detecting and finding online water army |
CN102629904A (en) * | 2012-02-24 | 2012-08-08 | 安徽博约信息科技有限责任公司 | Detection and determination method of network navy |
CN103198161A (en) * | 2013-04-28 | 2013-07-10 | 中国科学院计算技术研究所 | Microblog ghostwriter identifying method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009011820A2 (en) * | 2007-07-13 | 2009-01-22 | Wahrheit, Llc | System and method for determining relative preferences for marketing, financial, internet, and other commercial applications |
-
2014
- 2014-01-21 CN CN201410027720.7A patent/CN103795592B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102571484A (en) * | 2011-12-14 | 2012-07-11 | 上海交通大学 | Method for detecting and finding online water army |
CN102629904A (en) * | 2012-02-24 | 2012-08-08 | 安徽博约信息科技有限责任公司 | Detection and determination method of network navy |
CN103198161A (en) * | 2013-04-28 | 2013-07-10 | 中国科学院计算技术研究所 | Microblog ghostwriter identifying method and device |
Also Published As
Publication number | Publication date |
---|---|
CN103795592A (en) | 2014-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103795592B (en) | Online water navy detection method and device | |
CN105975573B (en) | A kind of file classification method based on KNN | |
CN106650789B (en) | Image description generation method based on depth LSTM network | |
CN102651088B (en) | Classification method for malicious code based on A_Kohonen neural network | |
CN103729678A (en) | Navy detection method and system based on improved DBN model | |
CN109034194B (en) | Transaction fraud behavior deep detection method based on feature differentiation | |
CN110880019B (en) | Method for adaptively training target domain classification model through unsupervised domain | |
CN105022754B (en) | Object classification method and device based on social network | |
CN103745002B (en) | Method and system for recognizing hidden paid posters on basis of fusion of behavior characteristic and content characteristic | |
CN103324939B (en) | Skewed popularity classification and parameter optimization method based on least square method supporting vector machine technology | |
CN114092832A (en) | High-resolution remote sensing image classification method based on parallel hybrid convolutional network | |
CN105844334B (en) | A kind of temperature interpolation method based on radial base neural net | |
CN103391317B (en) | A kind of system technology maturity appraisal procedure and device | |
CN107506350A (en) | A kind of method and apparatus of identification information | |
CN108491226A (en) | Spark based on cluster scaling configures parameter automated tuning method | |
CN110363230A (en) | Stacking integrated sewage handling failure diagnostic method based on weighting base classifier | |
CN106339718A (en) | Classification method based on neural network and classification device thereof | |
CN109242522A (en) | The foundation of target user's identification model, target user's recognition methods and device | |
Huang et al. | Research on urban modern architectural art based on artificial intelligence and GIS image recognition system | |
CN103440352A (en) | Method and device for analyzing correlation among objects based on deep learning | |
CN107402859A (en) | Software function verification system and verification method thereof | |
CN110909230A (en) | Network hotspot analysis method and system | |
CN110309907A (en) | It is a kind of based on go tracking self-encoding encoder dynamic missing values complementing method | |
CN106970981A (en) | A kind of method that Relation extraction model is built based on transfer matrix | |
AU2021102006A4 (en) | A system and method for identifying online rumors based on propagation influence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |