CN103795592B - Online water navy detection method and device - Google Patents

Online water navy detection method and device Download PDF

Info

Publication number
CN103795592B
CN103795592B CN201410027720.7A CN201410027720A CN103795592B CN 103795592 B CN103795592 B CN 103795592B CN 201410027720 A CN201410027720 A CN 201410027720A CN 103795592 B CN103795592 B CN 103795592B
Authority
CN
China
Prior art keywords
training
model
dbn model
data
dbn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410027720.7A
Other languages
Chinese (zh)
Other versions
CN103795592A (en
Inventor
孙卫强
牛温佳
赵卫中
管洋洋
黄超
李倩
胡玥
刘萍
郭莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201410027720.7A priority Critical patent/CN103795592B/en
Publication of CN103795592A publication Critical patent/CN103795592A/en
Application granted granted Critical
Publication of CN103795592B publication Critical patent/CN103795592B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to an online water navy detection method and device. The online water nary detection method includes the first step of expressing original user description information as a normalized user description vector, screening out classified data from the user description vector, enabling a% of the classified data to serve as training data of a DBN model, and enabling b% of the classified data to serve as detection data of the DBN model, the second step of training the DBN model through the training data and outputting the DBN model obtained through training, the third step of verifying the convergence and the judgment accuracy of the outputted DBN model and regulating related parameters in the first step and the second step according to a detection result till the outputted DBN model satisfies a preset convergence condition or an end condition, and the fourth step of detecting an online water navy through the final DBN model. According to the online water navy detection method and device, not only are convergence and accuracy of an online water navy detection algorithm improved, and the model training time under mass sample data is shortened.

Description

The detection method of network navy and device
Technical field
The present invention relates to networking technology area, more particularly, to a kind of detection method of network navy and device.
Background technology
Development with information technology and progress, cyberspace has become land, sea, air of continuing, the people beyond sky space-time Class the 5th tie up activity space, especially after web2.0 technology is applied to internet arena, the social networking application such as forum, microblogging send out Exhibition is even more like a raging fire.But while cyberspace develops rapidly, the safety problem that it is brought is also more and more prominent, its In with from " network navy " safety problem most commonly seen." network navy " is to be hired by network public relations firms, for spy Determine theme to carry out beaming back the network user that note is made a show of power, according to associated mechanisms investigation, the pushing hands that China is engaged in network marketing activity reaches Millions of people, be hired by " network navy " of establishment increasingly scale, come into the open, group's interests.From " closing down Wang Laoji Marketing plan scheme " arrives " Mongolia Ox frames door " scandal it may be said that " network navy " has been touched again to " cruel cat female's event " that cat flutters Encounter the bottom line of law, or even under some " network navy " mechanism operating of having ulterior motives abroad, at home respectively World Jam issues attack information, start a rumour speech and instigation language, manufactures contradiction, carries out the Internet culture infiltration of malice, endanger state Family's safety.It can be seen that, " network navy " is carried out supervise very urgent.
Different from physical surroundings, there is intrinsic open characteristics and unique Information Communication in the virtual environment of network forum Rule, this brings very big challenge just to " network navy " supervision, is mainly reflected in following two aspects:
First, in network forum, the propagation of popular information is in blast trend, and the mode therefore afterwards deleting note can not be to damage Evil result is thoroughly remedied, or even, delete note behavior itself and can be utilized by waterborne troops on the contrary, " confirm " in message to a certain extent The authenticity held.
Secondly, network forum comprises mass data, how to construct effective algorithm from random extracting data can in a large number With information, become the biggest obstacle that " network navy " supervises.
Therefore, to the supervision of " network navy " not only will from law and Systematic improve network of relation legal system and Shi Gongbu government affairs situation and public accident trend, with greater need for the feature with reference to network forum, improve extensive using from technological layer User data disposal ability, research and improvement are applied to the related algorithm that " network navy " detects, thus identifying the " network in forum Waterborne troops " user, stops the issue of waterborne troops's patch on source.
" network navy " detection is substantially classification problem, and a kind of widespread practice is the correlation of sorted users known to analysis Information and historical behavior, therefrom extract the feature that waterborne troops user is different from normal users, and then unknown sorted users information are entered Row analysis, judges which user most likely " network navy ".Be usually used at present classification problem algorithm include Bayesian network, SVMs, knn, neutral net etc., wherein, Bayesian network is the algorithm classified with probability statistics knowledge, and it leads to Cross Bayes' theorem forecast sample classification, but Bayesian establishment itself needs a very strong conditional independence assumption Premise, and this hypothesis is often invalid in a practical situation, thus its classification accuracy can be greatly reduced;SVMs Need the space vector of calculated in advance sample, and set the impact weights to final result for each dimension, weight setting in vector Process is largely dependent upon historical experience and case study, and the quality of weight setting also directly affects the judgement standard of algorithm Really property;Knn algorithm is a kind of Lazy learning method, and it deposits sample, until needing just to run learning algorithm during classification, if sample This collection is more complicated, may result in very big computing cost, the real-time of impact classification.Neural network algorithm is most commonly used for The algorithm of classification problem, this algorithm determines model parameter by way of training, can objectively reflect each influence factor pair The influence degree of final result, and the training of neutral net is to carry out before classification, will not bring additionally to assorting process Time overhead.But basic neural network model is complicated, when training set is larger, training process takes oversize, and pole Easily it is absorbed in locally optimal solution because the setting of network initial weight is improper, this means that and is carried out with basic neural network algorithm " network navy " can have that poor astringency, accuracy rate are low, time-consuming when finding.
Content of the invention
The technical problem to be solved is to provide a kind of detection method of network navy and device, improves network The convergence of waterborne troops's detection algorithm and accuracy rate, shorten the Massive Sample data drag training time.
For solving above-mentioned technical problem, the present invention proposes a kind of detection method of network navy, comprising:
Step one, original user description information is expressed as normalized user's description vectors, from described user description Vector in filter out grouped data, using this grouped data a% as depth belief network dbn model training data, will This grouped data b% as dbn model detection data, a is more than b, and a and b sum are equal to 100, described user's description letter The type of breath is preselected by user, and the described data of sorted users refers to have been labeled as the number of users of whether network navy According to;
Step 2, trains dbn model, the dbn model that output training obtains, the dbn that this is exported with described training data Model is referred to as exporting dbn model;
Step 3, checks convergence and the determination rate of accuracy of described output dbn model, adjusts described step according to assay Rapid one and step 2 in relevant parameter, until described output dbn model reaches the default condition of convergence or end condition, wherein, Using described detection data, described determination rate of accuracy is by detecting that described output dbn model obtains;
Step 4, is detected to network navy using final dbn model, and described final dbn model refers to reach described The output dbn model of the default condition of convergence or end condition.
Further, the detection method of above-mentioned network navy also can have the characteristics that, the initial value of a is 60.
Further, the detection method of above-mentioned network navy also can have the characteristics that, according to the dbn obtained by step 2 Whether the convergence of model and determination rate of accuracy reach expected requirement adjustmentaValue.
Further, the detection method of above-mentioned network navy also can have the characteristics that, in step 2, described dbn model Training process include model pre-training process and model trim process, described model pre-training process adopt downpour sgd Algorithm carries out parallel rbm training, and described model trim process carries out parallel pso-bp neutral net instruction using mapreduce algorithm Practice.
Further, the detection method of above-mentioned network navy also can have the characteristics that, described user description information includes Registration time length, log in frequency, online hours, user name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model Time, time of relatively posting, bean vermicelli number and concern number.
For solving above-mentioned technical problem, the invention allows for a kind of detection means of network navy, including user data Pretreatment module, dbn model training module, cooperative module and detection module, dbn model training module is pre- with user data respectively Processing module, cooperative module are connected with detection module, and cooperative module is also connected with user data pretreatment module, wherein:
User data pretreatment module, for by original user description information be expressed as normalized user describe to Amount, filters out grouped data from described user's description vectors, using the a% of this grouped data as depth belief network dbn The training data of model, using this grouped data b% as dbn model detection data, a is more than b, and a and b sum is equal to 100, the type of described user description information is preselected by user, and whether the described data of sorted users refers to have been labeled as The user data of network navy;
Dbn model training module, for training dbn model with described training data, the dbn model obtaining is trained in output, This dbn model exporting is referred to as exporting dbn model;
Cooperative module, for checking convergence and the determination rate of accuracy of described output dbn model, adjusts according to assay Relevant parameter in described step one and step 2, until described output dbn model reaches the default condition of convergence or end condition, Wherein, using described detection data, described determination rate of accuracy is by detecting that described output dbn model obtains;
Detection module, for being detected to network navy using final dbn model, described final dbn model refers to reach Output dbn model to the described default condition of convergence or end condition.
Further, the detection means of above-mentioned network navy also can have the characteristics that, the initial value of a is 60.
Further, the detection means of above-mentioned network navy also can have the characteristics that, according to the dbn obtained by step 2 Whether the convergence of model and determination rate of accuracy reach expected requirement adjustmentaValue.
Further, the detection means of above-mentioned network navy also can have the characteristics that, the training process of described dbn model Including model pre-training process and model trim process, described dbn model training module includes pre-training unit and fine-adjusting unit, Described pre-training unit is used for carrying out parallel rbm training using downpour sgd algorithm, and described fine-adjusting unit is used for adopting Mapreduce algorithm carries out parallel pso-bp neural metwork training.
Further, the detection means of above-mentioned network navy also can have the characteristics that, described user description information includes Registration time length, log in frequency, online hours, user name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model Time, time of relatively posting, bean vermicelli number and concern number.
The detection method of the network navy of the present invention and device, had both improve convergence and the standard of network navy detection algorithm Really rate, shortens the Massive Sample data drag training time again, solves the Massive Sample data drag training time long Problem.
Brief description
Fig. 1 is the flow chart of the detection method of network navy in the embodiment of the present invention;
Fig. 2 is the parallel processing schematic diagram of user's description vectors in the embodiment of the present invention;
Fig. 3 is that in the embodiment of the present invention, the user's description vectors each dimension number range based on mapreduce algorithm determines Flow chart;
Fig. 4 is basic dbn model schematic;
Fig. 5 is downpour sgd model schematic;
Fig. 6 is the parallel rbm training algorithm flow chart based on downpour sgd;
Fig. 7 is individual layer bp neural network structure figure;
Fig. 8 is list particle pso-bp neural network BP training algorithm flow chart;
Fig. 9 is the multilayer synergistic mechanism schematic diagram based on workflow;
Figure 10 is the structured flowchart of the detection means of network navy in the embodiment of the present invention.
Specific embodiment
Below in conjunction with accompanying drawing, the principle of the present invention and feature are described, example is served only for explaining the present invention, and Non- for limiting the scope of the present invention.
Fig. 1 is the flow chart of the detection method of network navy in the embodiment of the present invention.As shown in figure 1, in the present embodiment, net The detection method flow process of network waterborne troops may include steps of:
Step s101, original user description information is expressed as normalized user's description vectors, retouches from described user State and in vector, filter out grouped data, using the a% of this grouped data as dbn(deep belief network, depth is believed Read network) training data of model, using this grouped data b% as dbn model detection data, a is more than b, and a and b it With equal to 100, the type of described user description information is preselected by user, described grouped data refer to have been labeled as be The user data of no network navy;
Step s102, trains dbn model, the dbn model that output training obtains with described training data, this is exported Dbn model is referred to as exporting dbn model;
Wherein, the training process of dbn model includes model pre-training process and model trim process, described model pre-training Process carries out parallel rbm training using downpour sgd algorithm, and described model trim process is carried out using mapreduce algorithm Parallel pso-bp neural metwork training.
Downpour sgd algorithm and mapreduce algorithm are prior art, herein not to downpour sgd algorithm and Mapreduce algorithm is described in detail.
Step s103, the convergence of inspection output dbn model and determination rate of accuracy, adjust described step according to assay Relevant parameter in s101 and step s102, until described output dbn model reaches the default condition of convergence or end condition, its In, using described detection data, described determination rate of accuracy is by detecting that described output dbn model obtains;
Step s104, is detected to network navy using final dbn model, and final dbn model refers to reach described pre- If the output dbn model of the condition of convergence or end condition.
Below above-mentioned steps are described in further detail.
In step s101, user description information is converted into certain mathematical form and is indicated.Objectively, a network opinion Altar user comprises a lot of description informations, such as user's registration time, all previous landing time, user name, password, logs in ip, browses Historical record, historical record of posting, money order receipt to be signed and returned to the sender historical record, forum good friend record, bean vermicelli record, concern user record etc..This In bright, choose and wherein compare representational information (these information are 1 column information of table) as reference, user is carried out point Class, and user profile many attribute descriptions framework is proposed accordingly, frame structure is as shown in table 1.
Table 1 user profile many attribute descriptions framework
Attribute-name Explanation Computational methods
registerperiod Registration time length Registration forum time length
loginfrequency Log in frequency Login times/registration time length
onlineperiod Online hours Forum's line duration length
usernamelength User name length User name length
passwordlength Password Length Password Length
postrate Post ratio Post number/always paste number
replyrate Money order receipt to be signed and returned to the sender ratio Money order receipt to be signed and returned to the sender number/always paste number
surfingfrequency Relatively browse the model time Browse model time/online hours
editingfrequency Relatively post the time Post the time/online hours
fansnumber Bean vermicelli number Bean vermicelli number
considernumber Concern number Concern number
From table 1, in the embodiment of the present invention, user description information can include registration time length, log in frequency, online when Length, user name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model time, relatively post time, bean vermicelli number With concern number.
By user profile many attribute descriptions framework of table 1, user description information can be converted into the row of digital form Table, such as, the user description information of certain user a, after user profile many attribute descriptions framework is abstract, can be expressed as table 2 Shown.
Table 2 user profile attribute list example
Attribute-name Value
registerperiod 792 days
loginfrequency 100 times/792 days
onlineperiod 89 hours
usernamelength 6
passwordlength 6
postrate 20/20
replyrate 0/20
surfingfrequency 83 hours/89 hours
editingfrequency 6 hours/89 hours
fansnumber 20
considernumber 3
Model according to table 1, it is possible to achieve the quantization means to user description information.For example, the user a in table 2 is permissible With vector, [792 days, 100 times/792 days, 89 hours/792 days, 6,6,20/20,0,83 hours/89 hours, 6 is little When/89 hours, 20,3] represent, this vector is referred to as user's description vectors.It is likewise possible to users all in forum are retouched The information of stating is converted into user's description vectors, it is achieved thereby that the mathematical notation of user profile.
Additionally, for convenience in follow-up dbn model training initial weight setting, need to ensure user's description vectors each The numerical value of dimension, all between [- 1,1], is normalized to each dimension of user's description vectors therefore in the present invention, First extract the data span of all user's each dimensions of description vectors in forum, then logarithm value scope exceeds [- 1,1] Dimension is normalized.
Fig. 2 is the parallel processing schematic diagram of user's description vectors in the embodiment of the present invention.As shown in Fig. 2 user describe to Amount generates and user's description vectors normalization process, and parallel model all can be applied to be calculated.Wherein, user describe to All user description information can be randomly divided into m group parallel processing by amount generation phase, and each group is responsible for users all in group Description information be converted into user's description vectors, more all groups of user's description vectors are sequentially allocated No. id, thus being used Family id and the set of user's description vectors pair.
Fig. 3 is that in the embodiment of the present invention, the user's description vectors each dimension number range based on mapreduce algorithm determines Flow chart.During the normalization of user's description vectors, as shown in figure 3, first with mapreduce algorithm, finding user's description Vector in each dimension number range, the span determining which dimension not between [- 1,1], then by these dimensions In value, the maximum of absolute value is found out, and with this absolute value, this dimension is normalized, can also during normalization Carry out parallel according to all user's description vectors being divided into the mode of m group.
By above-mentioned process, normalized user's description vectors set can be obtained.Wherein some data is to divide Class, that is, " network navy " whether some user be marked as, and this class data is referred to as grouped data, this The set of class data is referred to as " categorized data set ".In order to carry out follow-up dbn model training, needing will " grouped data Collection " is divided into two parts, and a portion is referred to as " training dataset ", and for carrying out the training of dbn model parameter, another part claims For " test data set ", for detecting the determination rate of accuracy of obtained dbn model.Sample size distribution in two datasets On, hiding rule in these samples could be simulated because dbn model needs to learn enough samples, so general " training Data set " sample size is more, but the drawbacks of " training dataset " sample size excessively can bring amount of calculation to increase again.For This problem, the present invention chooses in " categorized data set " 60% sample first as " training dataset ", afterwards according to gained To dbn model convergence and determination rate of accuracy whether reach and expected require to adjust this ratio that (i.e. " training dataset " accounts for The ratio of categorized data set ").How the convergence of dbn model judges?
In step s102, dbn(deep belief network, depth belief network) model is deep neural network One kind, the generative probabilistic model being made up of multilayer stochastic variable node.Fig. 4 is basic dbn model schematic.As shown in figure 4, Basic dbn model is by two-layer rbm(restricted boltzmann machines, limited Boltzmann machine) and one layer of bp god Through network (back propagation neural network) composition.The training process of dbn model is divided into two processes: mould Type pre-training process and model trim process.Wherein, model pre-training process carries out parallel rbm using downpoursgd algorithm Training, model trim process carries out parallel pso-bp neural metwork training using mapreduce algorithm.
Referring to Fig. 4, model pre-training process carrys out the two-layer in training pattern using the method for successively unsupervised greedy study Rbm: first by input data x and ground floor hidden layer h0As a rbm, the parameter that training obtains this rbm (connects v0With h0Weight matrix w0、v0And h0The biasing a of each node and parameter b), then fixing this rbm, h0It is regarded as visible layer, H1It is regarded as hidden layer, train second rbm, and obtain its parameter, now just complete the pre-training process of dbn model, really Determine the initial parameter of two-layer rbm.During this, the learning process of every layer of rbm is separate, enormously simplify the instruction of model Practice process.
It is possible to whole network is equivalent to bp neutral net after pre-training, this bp neutral net comprises two-layer and hides Node, the wherein network parameter between input layer and ground floor concealed nodes and two-layer concealed nodes are complete initialization, Only need to carry out random initializtion to the network parameter of second layer concealed nodes and output node it is possible to according to normal bp nerve The training method of network carries out error back propagation training to this network, until model reaches convergence or end condition, this mistake Journey is referred to as model trim process.
During dbn model pre-training, the method using successively unsupervised greedy study is respectively trained two-layer rbm, phase In traditional Multi-Layer Feedback training pattern, this mode simplifies the training process of model to ratio, accelerates mould to a certain extent The training speed of type.But in the face of magnanimity training dataset, the training of individual layer rbm still needs for a long time, the therefore present invention Parallelization process has been done in training for individual layer rbm, thus accelerating the speed of dbn model pre-training, shortens dbn model pre- Training stage required time.
The present invention carries out parallel processing with downpour sgd algorithm to rbm training process.Fig. 5 is downpour sgd mould Type schematic diagram.As shown in figure 5, based on the basic thought that the parallel rbm of downpour sgd realizes being: training data is divided into Some subsets, are distributed on multiple worker servers, run copying of a rbm model on each worker server Shellfish, worker server only needs to be communicated with parameter server.The parameter of model updates the parameter service by storing parameter Device is carried out, and this parameter server saves the current state of all parameters of model.Training stage, each worker is respectively from parameter Server obtains the parameter of model current state, and executes min-batch according to this parameter, after calculating renewal gradient, will tie Fruit pushes back parameter server.In a simple realization of downpour sgd, every n can be setfetchSecondary mini-batch Operate and obtain the parameter after once updating, every n to parameter serverpushSecondary mini-batch operation pushes a gradient updating and arrives Parameter server.
Fig. 6 is the parallel rbm training algorithm flow chart based on downpour sgd.In Fig. 6, η represents parameter with gradient Renewal speed, nfetchAnd npushRepresent the cycle uploading gradient from parameter server synchronization parameter with to parameter server respectively.
In downpour sgd the gradient updating process of parameter be asynchronous carry out, in this manner, though one Worker server is delayed machine, does not also interfere with the work of other worker servers.Although asynchronous refresh process can lead to each In worker, parameter has difference slightly, but in existing realization, algorithm integrally still has good stability.
After two-layer rbm parameter training, just complete the pre-training process of dbn model, now can will be equivalent for dbn model For four layers of bp neutral net, the parameter between wherein lower three layers is initialized to be finished, and next needs random initializtion Parameter between highest two-layer, and train this bp neutral net with training dataset, that is, carry out the trim process of dbn model.
Model trim process carries out parallel pso-bp neural metwork training using mapreduce algorithm.Bp neutral net is A kind of multilayer feedforward neural network by error backpropagation algorithm training.Fig. 7 is individual layer bp neural network structure figure.As Fig. 7 Shown, the training of bp neutral net is made up of information forward-propagating and two processes of error back propagation, when forward-propagating result When not being inconsistent with anticipated output, calculate the difference of output valve and desired value, and decline mode correction connection weight, this mistake according to gradient Journey be performed until network output error be reduced to acceptable degree till.
The training process of bp neutral net is substantially to find the optimum of network weight by successively iteration and backpropagation Combination, thus minimizing the difference of network output and anticipated output, but in training process, by error back propagation to network The process of weighed value adjusting is very slow.Pso-bp neural network algorithm is the optimization to bp neutral net error back propagation process, By pso(particle swarm optimization, particle cluster algorithm) in multidimensional search space, iteration finds optimum position The process put instead of error back propagation process, thus accelerating the convergence rate of bp neutral net.
In pso-bp neural network algorithm, particle position that vector that network parameter is formed is defined as in population to Amount, by the output of certain parameter vector drag and the error amount of anticipated output be defined as this position good and bad measurement index it is clear that Ground, this index is less, then, closer to optimized parameter, that is, particle position is better for representation parameter.First initialize certain when algorithm starts The particle of quantity, each particle preserves its current location, history optimal location, present speed and population history optimal location Memory.An often evolution generation, particle adjusts position and the speed of oneself using current information and recall info, and updates memory. Particle continuous adjustment position in multidimensional search space, until population reaches poised state.The optimum particle position now obtaining, Just represent the neutral net optimized parameter that training obtains.
In view of pso-bp train samples data volume is very big, the present invention uses mapreduce algorithm to pso-bp Neural network training process carries out Parallel Implementation, thus accelerating convergence of algorithm speed.Wherein, the iterative process fortune of each particle Row, on a pso-bp-worker, preserves global optimum's positional information in management server and global optimum position is corresponding Position quality measurement index, after each particle iterative process updates the complete wheel of iteration, will be synchronous optimum to management server Positional information, until it reaches the iterations of regulation or till reaching the condition of convergence.
The algorithm flow chart of the pso-bp neural network training process of each particle execution is as shown in Figure 8.In Fig. 8, n represents Maximum iteration time;xi、xl、xgRepresent current location vector, the history optimal location vector sum global optimum of particle i of particle i Position vector;maxi、maxl、maxgRepresent current location quality measurement index, the history optimal location of particle i of particle i respectively Good and bad measurement index and global optimum's position quality measurement index;ω represents pso algorithm inertia weight;c1、c2Represent pso algorithm Studying factors.
Based in the pso-bp neural network training process of mapreduce model, the iterative process of each particle is all one Run on individual single pso-bp-worker, each pso-bp-worker and management server communication, for safeguarding the overall situation Optimal location and global optimum's position quality measurement index information, this mode has very big extensibility, can easily pass Increase primary number to accelerate the searching process of population, thus accelerating convergence of algorithm speed.
During dbn model training, the difference of parameters setting may bring impact, Jin Erying to follow-up output Sound finally obtains the determination rate of accuracy of dbn model.Such as, the ratio of the training dataset chosen in user data pretreatment module Too low, the extraction of waterborne troops's user characteristics can be unfavorable for, lead to final dbn model determination rate of accuracy low;In rbm training process If the selection of big iterations is too low, rbm network training can be made immature, and then lead to follow-up pso-bp neutral net Initial weight setting improper it is possible to cause dbn model to be absorbed in local optimum it is impossible to reach expected determination rate of accuracy;pso- If the setting of population number of particles is too small in bp neural network training process, network convergence can be made slow, may be specified Convergence can not be reached within maximum iteration time;If maximum iteration time was arranged in pso-bp neural network training process Little, training process may be led to terminate ahead of time, and now dbn model not converged.Therefore in step s103, need basis Relevant parameter in the convergence of dbn model and reverse set-up procedure s101 of default determination rate of accuracy and step s102.
In step s103 of the present invention, according to the incidence relation between above-mentioned parameter, use for reference the thought of workflow, definition Finally give dbn model to the process of feedback of user data pretreatment module and dbn model training module, thus according to dbn The convergence of model and determination rate of accuracy reversely adjust the related ginseng in user data pretreatment module and dbn model training module Number, improves the performance finally giving dbn model.
Workflow is the business process that a class can completely or partially execute automatically, it according to a series of process rules, Document, information or task can be transmitted between different executors and be executed.Wfmc(workflow management Coalition, WFMC) defined in Work flow model basic in 4, be respectively: series model gang mould Type, preference pattern and circulation model, this patent combines series model therein, preference pattern and circulation model, defines base Multilayer synergistic mechanism in workflow.
According to description before it may be determined that 3 series model including of workflow, it is respectively: user data is located in advance Enter the dbn model pre-training stage after the completion of reason module, enter dbn model fine setting stage, dbn after the dbn model pre-training stage The dbn model inspection stage is entered after the model fine setting stage;2 judgment models that workflow includes, be respectively: pso-bp mould Whether type restrains, whether dbn model reaches determination rate of accuracy threshold value.Wherein, in first judgment models, if Rule of judgment becomes Vertical, then need execution is " entering the dbn model fine setting stage ", if Rule of judgment is false, that need execution is " increase pso Algorithm iteration number of times, increase pso algorithm population number of particles, and enter the dbn model pre-training stage ";Judge mould at second In type, if Rule of judgment is set up, flow process terminates, if Rule of judgment is false, need execution is " to increase rbm algorithm iteration Training dataset ratio in number of times, increase data preprocessing module, access customer data preprocessing module of going forward side by side ".Now, formation Work flow model is as shown in Figure 9.Fig. 9 is the multilayer synergistic mechanism schematic diagram based on workflow.
In the workflow of above-mentioned determination, original user data generates normalized use through user data pretreatment module Family description vectors set, and the initialization of network weight parameter is completed through dbn pre-training process, enter the dbn fine setting stage.As , in the fine setting stage, the situation that pso-bp model is not restrained in fruit, then increase the iterations of pso algorithm, increase pso algorithm kind Group's number of particles, the dbn model until pso-bp model reaches convergence, after now just being trained.Use test data set pair After dbn model inspection, if it find that the determination rate of accuracy of dbn model is not reaching to expected threshold value, then increase user data In pretreatment module, training dataset accounts for the ratio of categorized data set, and increases the iteration of rbm algorithm in the dbn pre-training stage Number of times, re-starts the training of dbn model, till training the dbn model obtaining to reach expected determination rate of accuracy.
The detection method of the network navy of the present invention, is a kind of dbn layered cooperative method towards waterborne troops's detection, the method Carry out network navy identification with the improvement dbn model of Parallel Implementation, and define the collaborative machine between various pieces in dbn model System, had both improve convergence and the accuracy rate of waterborne troops's detection algorithm, had shortened the Massive Sample data drag training time again, solution Determine Massive Sample data drag training time long problem.
The invention allows for a kind of detection means of network navy, in order to implement the detection side of above-mentioned network navy Method.Above the description explanation of the detection method of inventive network waterborne troops is all applied to the detection dress of the network navy of the present invention Put.
Figure 10 is the structured flowchart of the detection means of network navy in the embodiment of the present invention.As shown in Figure 10, the present embodiment In, the detection means of network navy includes user data pretreatment module 100, dbn model training module 200, cooperative module 300 With detection module 400, dbn model training module 200 respectively with user data pretreatment module 100, cooperative module 300 and detection Module 400 is connected, and cooperative module 300 is also connected with user data pretreatment module 100.Wherein, user data pretreatment module 100 are used for for original user description information being expressed as normalized user's description vectors, sieve from described user's description vectors Select grouped data, using the a% of this grouped data as the training data of depth belief network dbn model, this is classified The b% of data is more than b as the detection data of dbn model, a, and a is equal to 100 with b sum, the type of described user description information Preselected by user, the described data of sorted users refers to have been labeled as the user data of whether network navy.Dbn model Training module 200 is used for training dbn model, the dbn model that output training obtains, the dbn that this is exported with described training data Model is referred to as exporting dbn model.Cooperative module 300 is used for checking convergence and the determination rate of accuracy of described output dbn model, root Adjust the relevant parameter in described step one and step 2 according to assay, until described output dbn model reaches default convergence Condition or end condition, wherein, using described detection data, described determination rate of accuracy is by detecting that described output dbn model obtains Arrive.Detection module 400 is used for using final dbn model, network navy being detected, described final dbn model refers to reach institute State the output dbn model of the default condition of convergence or end condition.
In embodiments of the present invention, the training process of dbn model includes model pre-training process and model trim process, Dbn model training module 200 can include pre-training unit and fine-adjusting unit.Pre-training unit is used for using downpour sgd Algorithm carries out parallel rbm training, and fine-adjusting unit is used for carrying out parallel pso-bp neural metwork training using mapreduce algorithm.
In embodiments of the present invention, user description information can include registration time length, log in frequency, online hours, user Name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model time, relatively post time, bean vermicelli number and concern Number.
In embodiments of the present invention, the initial value of a could be arranged to 60.Afterwards can be according to dbn model training module institute Whether the convergence of dbn model obtaining and determination rate of accuracy reach the expected value requiring adjustment a.
The detection means of the network navy of the present invention, using a kind of dbn layered cooperative method towards waterborne troops's detection, the party The method improvement dbn model of Parallel Implementation carries out network navy identification, and defines collaborative between various pieces in dbn model Mechanism, had both improve convergence and the accuracy rate of network navy detection algorithm, shortened Massive Sample data drag training again Time, solve the problems, such as that the Massive Sample data drag training time is long.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and Within principle, any modification, equivalent substitution and improvement made etc., should be included within the scope of the present invention.

Claims (10)

1. a kind of detection method of network navy is it is characterised in that include:
Step one, original user description information is expressed as normalized user's description vectors, from described user's description vectors In filter out grouped data, using this grouped data a% as depth belief network dbn model training data, should Grouped data b% as dbn model detection data, a is more than b, and a and b sum are equal to 100, described user's description letter The type of breath is preselected by user, and the described data of sorted users refers to have been labeled as the number of users of whether network navy According to;
Step 2, trains dbn model, the dbn model that output training obtains, the dbn model that this is exported with described training data Referred to as output dbn model;
Step 3, checks convergence and the determination rate of accuracy of described output dbn model, adjusts described step one according to assay With the training data in step 2, reach the default condition of convergence or end condition up to described output dbn model, wherein, described Using described detection data, determination rate of accuracy is by detecting that described output dbn model obtains;
Step 4, is detected to network navy using final dbn model, and described final dbn model refers to reach described presetting The output dbn model of the condition of convergence or end condition.
2. the detection method of network navy according to claim 1 is it is characterised in that the initial value of a is 60.
3. the detection method of network navy according to claim 2 is it is characterised in that according to the dbn obtained by step 2 Whether the convergence of model and determination rate of accuracy reach the expected value requiring adjustment a.
4. the detection method of network navy according to claim 1 is it is characterised in that in step 2, described dbn model Training process includes model pre-training process and model trim process, and described model pre-training process adopts downpour sgd to calculate Method carries out parallel rbm training, and described model trim process carries out parallel pso-bp neutral net instruction using mapreduce algorithm Practice.
5. the detection method of network navy according to claim 1 is it is characterised in that described user description information includes noting Volume duration, when logging in frequency, online hours, user name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browsing model Between, the time of relatively posting, bean vermicelli number and concern number.
6. a kind of detection means of network navy is it is characterised in that include user data pretreatment module, dbn model training mould Block, cooperative module and detection module, dbn model training module respectively with user data pretreatment module, cooperative module and detection Module is connected, and cooperative module is also connected with user data pretreatment module, wherein:
User data pretreatment module, for original user description information is expressed as normalized user's description vectors, from Grouped data is filtered out, using the a% of this grouped data as depth belief network dbn model in described user's description vectors Training data, using this grouped data b% as dbn model detection data, a is more than b, and a and b sum are equal to 100, The type of described user description information is preselected by user, and the described data of sorted users refers to have been labeled as whether network The user data of waterborne troops;
Dbn model training module, for training dbn model with described training data, the dbn model obtaining is trained in output, should The dbn model of output is referred to as exporting dbn model;
Cooperative module, for checking convergence and the determination rate of accuracy of described output dbn model, according to assay adjustment Training data in user data pretreatment module and dbn model training module, until described output dbn model reaches default receipts Hold back condition or end condition, wherein, described determination rate of accuracy is by using the described detection data described output dbn model of detection Obtain;
Detection module, for being detected to network navy using final dbn model, described final dbn model refers to reach institute State the output dbn model of the default condition of convergence or end condition.
7. the detection means of network navy according to claim 6 is it is characterised in that the initial value of a is 60.
8. the detection means of network navy according to claim 7 is it is characterised in that according to dbn model training module institute Whether the convergence of dbn model obtaining and determination rate of accuracy reach the expected value requiring adjustment a.
9. the detection means of network navy according to claim 6 is it is characterised in that the training process of described dbn model Including model pre-training process and model trim process, described dbn model training module includes pre-training unit and fine-adjusting unit, Described pre-training unit is used for carrying out parallel rbm training using downpour sgd algorithm, and described fine-adjusting unit is used for adopting Mapreduce algorithm carries out parallel pso-bp neural metwork training.
10. the detection means of network navy according to claim 6 is it is characterised in that described user description information includes Registration time length, log in frequency, online hours, user name length, Password Length, ratio of posting, money order receipt to be signed and returned to the sender ratio, relatively browse model Time, time of relatively posting, bean vermicelli number and concern number.
CN201410027720.7A 2014-01-21 2014-01-21 Online water navy detection method and device Active CN103795592B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410027720.7A CN103795592B (en) 2014-01-21 2014-01-21 Online water navy detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410027720.7A CN103795592B (en) 2014-01-21 2014-01-21 Online water navy detection method and device

Publications (2)

Publication Number Publication Date
CN103795592A CN103795592A (en) 2014-05-14
CN103795592B true CN103795592B (en) 2017-01-25

Family

ID=50670914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410027720.7A Active CN103795592B (en) 2014-01-21 2014-01-21 Online water navy detection method and device

Country Status (1)

Country Link
CN (1) CN103795592B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977397A (en) * 2017-09-08 2018-05-01 华瑞新智科技(北京)有限公司 Internet user's notice index calculation method and system based on deep learning
CN107862785A (en) * 2017-10-16 2018-03-30 深圳市中钞信达金融科技有限公司 Bill authentication method and device
CN108197696A (en) * 2018-01-31 2018-06-22 湖北工业大学 A kind of network navy account recognition methods and system
CN108449295A (en) * 2018-02-05 2018-08-24 西安电子科技大学昆山创新研究院 Combined modulation recognition methods based on RBM networks and BP neural network
CN110362818A (en) * 2019-06-06 2019-10-22 中国科学院信息工程研究所 Microblogging rumour detection method and system based on customer relationship structure feature
CN110457630B (en) * 2019-07-30 2022-03-29 北京航空航天大学 Method and system for identifying abnormal praise user in open source community

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571484A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for detecting and finding online water army
CN102629904A (en) * 2012-02-24 2012-08-08 安徽博约信息科技有限责任公司 Detection and determination method of network navy
CN103198161A (en) * 2013-04-28 2013-07-10 中国科学院计算技术研究所 Microblog ghostwriter identifying method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009011820A2 (en) * 2007-07-13 2009-01-22 Wahrheit, Llc System and method for determining relative preferences for marketing, financial, internet, and other commercial applications

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571484A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for detecting and finding online water army
CN102629904A (en) * 2012-02-24 2012-08-08 安徽博约信息科技有限责任公司 Detection and determination method of network navy
CN103198161A (en) * 2013-04-28 2013-07-10 中国科学院计算技术研究所 Microblog ghostwriter identifying method and device

Also Published As

Publication number Publication date
CN103795592A (en) 2014-05-14

Similar Documents

Publication Publication Date Title
CN103795592B (en) Online water navy detection method and device
CN105975573B (en) A kind of file classification method based on KNN
CN106650789B (en) Image description generation method based on depth LSTM network
CN102651088B (en) Classification method for malicious code based on A_Kohonen neural network
CN103729678A (en) Navy detection method and system based on improved DBN model
CN109034194B (en) Transaction fraud behavior deep detection method based on feature differentiation
CN110880019B (en) Method for adaptively training target domain classification model through unsupervised domain
CN105022754B (en) Object classification method and device based on social network
CN103745002B (en) Method and system for recognizing hidden paid posters on basis of fusion of behavior characteristic and content characteristic
CN103324939B (en) Skewed popularity classification and parameter optimization method based on least square method supporting vector machine technology
CN114092832A (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN105844334B (en) A kind of temperature interpolation method based on radial base neural net
CN103391317B (en) A kind of system technology maturity appraisal procedure and device
CN107506350A (en) A kind of method and apparatus of identification information
CN108491226A (en) Spark based on cluster scaling configures parameter automated tuning method
CN110363230A (en) Stacking integrated sewage handling failure diagnostic method based on weighting base classifier
CN106339718A (en) Classification method based on neural network and classification device thereof
CN109242522A (en) The foundation of target user's identification model, target user's recognition methods and device
Huang et al. Research on urban modern architectural art based on artificial intelligence and GIS image recognition system
CN103440352A (en) Method and device for analyzing correlation among objects based on deep learning
CN107402859A (en) Software function verification system and verification method thereof
CN110909230A (en) Network hotspot analysis method and system
CN110309907A (en) It is a kind of based on go tracking self-encoding encoder dynamic missing values complementing method
CN106970981A (en) A kind of method that Relation extraction model is built based on transfer matrix
AU2021102006A4 (en) A system and method for identifying online rumors based on propagation influence

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant