CN102982048B - A kind of method and apparatus for being used to assess junk information mining rule - Google Patents
A kind of method and apparatus for being used to assess junk information mining rule Download PDFInfo
- Publication number
- CN102982048B CN102982048B CN201110264221.6A CN201110264221A CN102982048B CN 102982048 B CN102982048 B CN 102982048B CN 201110264221 A CN201110264221 A CN 201110264221A CN 102982048 B CN102982048 B CN 102982048B
- Authority
- CN
- China
- Prior art keywords
- information
- mining rule
- rule
- sample
- evaluating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
It is an object of the invention to provide a kind of method and apparatus for being used to assess junk information mining rule, wherein, assessment equipment acquisition mining rule to be assessed and the information issue sample for assessing the mining rule;Junk information excavation is carried out to described information issue sample subsequently, based on the mining rule, and then obtains at least one evaluating corresponding with the mining rule.Compared with prior art, the present invention at least one evaluating corresponding with mining rule to be assessed by obtaining, the index being estimated to the mining rule is provided to interaction platform manager, and then renewal can be optimized to the mining rule, to improve every evaluating, interaction platform is allowd more accurately to judge junk information and handle it, so as to ensure the normal work of interaction platform.
Description
Technical field
The present invention relates to network technique field, more particularly to a kind of technology for being used to assess junk information mining rule.
Background technology
With the development and application of Internet technology, increasing user is issued and received by open interaction platform
Bulk information, makes full use of internet into the exchange and resource-sharing of row information.But, substantial amounts of rubbish is included in these information
Rubbish information, the junk information is probably some batch issue, information with illegal objective, takes a large amount of Internet resources, and
And greatly cause Network Security Vulnerabilities.Current open interaction platform takes certain measure, by rubbish
Information is excavated, and is detected and is handled the junk information in the open interaction platform.But, due to interaction platform manager without
Method knows whether the junk information in open interaction platform is effectively excavated, and then excavation, detection mode can not be entered
The corresponding optimization of row, so that can not the saving of Logistics networks resource, the purpose of the cleaning of open interaction platform.
Therefore, how junk information mining rule is effectively assessed, as one of current urgent problem to be solved.
The content of the invention
It is an object of the invention to provide a kind of method and apparatus for being used to assess junk information mining rule.
According to an aspect of the invention, there is provided a kind of method for assessing junk information mining rule, wherein, should
Method comprises the following steps:
A obtains mining rule to be assessed;
The information that b obtains for assessing the mining rule issues sample;
C is based on the mining rule, carries out junk information excavation to described information issue sample, obtains and sent out with described information
The corresponding junk information of cloth sample;
D issues sample according to the junk information with reference to described information, and acquisition is corresponding with the mining rule at least
One evaluating.
According to another aspect of the present invention, a kind of equipment for assessing junk information mining rule is additionally provided, wherein,
The equipment includes:
Rule device, the mining rule to be assessed for obtaining;
Sample acquiring device, sample is issued for obtaining the information for being used to assess the mining rule;
Junk information acquisition device, for based on the mining rule, junk information to be carried out to described information issue sample
Excavate, obtain the junk information corresponding with described information issue sample;
Parameter obtaining device, for according to the junk information, sample to be issued with reference to described information, obtains and is excavated with described
At least one corresponding evaluating of rule.
Compared with prior art, the present invention is joined by obtaining at least one corresponding with mining rule to be assessed and evaluating
Number, the index being estimated to the mining rule is provided to interaction platform manager, and then the mining rule can be carried out excellent
Change and update, to improve every evaluating so that interaction platform more can accurately judge junk information and it is carried out
Processing, so as to ensure the normal work of interaction platform, further reaches and saves Internet resources, the mesh of the open interaction platform of cleaning
's.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, of the invention is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 shows the equipment schematic diagram for assessing junk information mining rule according to one aspect of the invention;
Fig. 2 shows the equipment signal for being used to assess junk information mining rule in accordance with a preferred embodiment of the present invention
Figure;
Fig. 3 shows to be used for the method flow diagram for assessing junk information mining rule according to a further aspect of the present invention;
Fig. 4 show in accordance with a preferred embodiment of the present invention be used for assess junk information mining rule method flow diagram.
Same or analogous reference represents same or analogous part in accompanying drawing.
Embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 shows the equipment schematic diagram for assessing junk information mining rule according to one aspect of the invention.Assessment is set
Standby 1 includes Rule device 11, sample acquiring device 12, junk information acquisition device 13, parameter obtaining device 14.Here,
Assessment equipment 1 includes but is not limited to computer, network host, single network server, multiple webserver collection or multiple clothes
The cloud for device composition of being engaged in.Here, cloud is by a large amount of computers or webserver structure based on cloud computing (Cloud Computing)
Into, wherein, cloud computing is one kind of Distributed Calculation, a super virtual meter being made up of the computer collection of a group loose couplings
Calculation machine.
Specifically, Rule device 11 obtains mining rule to be assessed.More specifically, Rule device 11 is regular
Or answer event triggering to obtain mining rule to be assessed in real time, the network equipment such as by monitoring the webserver in real time is sent out
The request for the mining rule to be assessed sent, to obtain mining rule to be assessed, or regularly passes through the communication party of agreement
Formula, such as http, https communication protocol, directly read digging to be assessed from the other parts or third party device of assessment equipment 1
Pick rule.For example, it is assumed that assessment equipment 1 is the webserver, the Rule device 11 of the webserver by supervising in real time
Another webserver excavated for junk information is listened, obtains what another webserver was sent by http communication protocols
The http request being packaged into based on mining rule to be assessed, the Rule device 11 parses the http request, and obtains it
In mining rule to be assessed.And for example, Rule device 11 presses some cycles, regularly by calling predetermined application to compile
Journey interface (API) sends the request for obtaining mining rule to be assessed to third party device, and receives third party device return
Mining rule to be assessed.Those skilled in the art will be understood that the mode of above-mentioned acquisition mining rule to be assessed is only act
Example, the mode of other acquisitions that are existing or being likely to occur from now on mining rule to be assessed is such as applicable to the present invention, also should
Within the scope of the present invention, and it is incorporated herein by reference herein.
The information that sample acquiring device 12 obtains for assessing the mining rule issues sample.Specifically, sample acquisition
Device 12 by randomly extracted from network interdynamic platform such as according to the communication protocol made an appointment it is a plurality of release news, or
From information issue Sample Storehouse in obtain it is a plurality of release news, wherein, these release news in advance indicates junk information mark, with area
Point its be junk information or normal information, and release news this etc. is a plurality of as obtaining for assessing Rule device 11
The information issue sample of mining rule.Wherein, whether it is real that the junk information is identified for identifying every and releasing news
Junk information.Here, information issue sample includes but is not limited to:1) it is a plurality of to release news and its content, in Web Community
Multiple models and its content;2) junk information is identified.Here, information issue Sample Storehouse is used to store a plurality of release news and its rubbish
Rubbish message identification, including but not limited to relational database, memory storage, harddisk memory etc..For example, it is assumed that network interdynamic is flat
Releasing news in platform is stored in the webserver, sample acquiring device 12 according to the communication protocol made an appointment, such as http,
The communication protocols such as https, send to the webserver and obtain the request that the information for being used for assessing mining rule issues sample, and
Receive in the network interdynamic platform that the webserver is obtained at random it is a plurality of indicate releasing news for junk information mark, as
The information of mining rule for assessing the acquisition of Rule device 11 issues sample, wherein, the network interdynamic platform includes
But it is not limited to:Web Community, mhkc, blog, meagre, news analysis, message interactive etc..In another example, sample acquiring device 12 from
Real junk information and non-spam are obtained according to a certain percentage in information issue Sample Storehouse, and as assessing
The information issue sample for the mining rule that Rule device 11 is obtained.Those skilled in the art will be understood that above-mentioned acquisition information
The mode of issue sample is only for example, and the mode that other acquisition information that are existing or being likely to occur from now on issue sample can such as be fitted
For the present invention, it should also be included within the scope of the present invention, and be incorporated herein by reference herein.
Those skilled in the art will be understood that the execution sequence of above-mentioned Rule device 11 and sample acquiring device 12 only
For citing, in practice, they can be performed with random order, such as concurrently or sequentially.Those skilled in the art should also be understood that,
The Rule device 11 only shown for simplicity in Fig. 1 prior to the execution sequence that sample acquiring device 12 is performed, but this
It is premised on not interfering with and carry out clear, sufficient disclosure to the present invention far and away to plant omission.
Then, junk information acquisition device 13 is based on the mining rule, and rubbish letter is carried out to described information issue sample
Breath is excavated, and obtains the junk information corresponding with described information issue sample.Specifically, junk information acquisition device 13 is based on rule
Whether the mining rule that then acquisition device 11 is obtained, such as one information publisher ID information issue frequency exceedes the predetermined frequency
Threshold value, information publisher whether in blacklist, in the content that releases news whether comprising rubbish vocabulary etc., sample acquisition is filled
The carry out discriminatory analysis that releases news in the information issue sample of 12 acquisitions is put, such as when one or more releases news satisfaction such as
When any one mining rule or whole mining rule, then judge that this one or more releases news as junk information, so as to obtain
Whole junk information in information issue sample.
If for example, it is assumed that the mining rule that Rule device 11 is obtained is information publisher ID in blacklist or issued
In information comprising rubbish vocabulary then this release news as junk information;Then, the information issue sample that sample acquiring device 12 is obtained
This includes three and released news, and its content is respectively:
A " certificates handling, calls 13811112222 ",
B " everybody is happy ",
C " I wishes to make friends ";
Then, based on two mining rules, 13 pairs of junk information acquisition device this three, which releases news, to carry out judging to divide
Analysis, string matching is carried out by the content for a that releases news in rubbish dictionary, to obtain " certificates handling " for rubbish vocabulary, and issue
The information c information publisher ID a and c that releases news that judge to release news in information issue sample in blacklist, then are
Junk information.
In another example, it is assumed that if the mining rule that Rule device 11 is obtained, which is an information publisher ID, issues same hair
The frequency of the cloth information content exceed predetermined frequency threshold value and release news in comprising rubbish vocabulary then this release news as rubbish
Information;Then, the information issue sample that sample acquiring device 12 is obtained includes 20 and released news, wherein 10 release news
Content be:" head store sells all kinds of slimming drugs, favorable price ", and information publisher ID is identical, and sent out within 1 minute
Send;Then, junk information acquisition device 13 is released news to this 10 based on two mining rules and analyzed, to determine to be somebody's turn to do
10 contents that release news are identical and issued by same information publisher ID, so as to can determine whether that this ten release news as same
The continuous issue of 10 times of information, the frequency of information issue is more than 5 beats/min predetermined of frequency threshold value for 10 beats/min, while rubbish
Rubbish information acquisition device 13 carries out it string matching in rubbish dictionary, and it is rubbish to obtain " sale ", " slimming drugs " accordingly
Rubbish vocabulary, so junk information acquisition device 13 obtain the information issue sample in this 10 release news for rubbish believe
Breath.Here, rubbish vocabulary described in illustrated embodiment includes but is not limited to banned word, infringement word, indecency word, political nature, instigated
Property word, advertising words etc., rubbish dictionary described in illustrated embodiment is used to store rubbish vocabulary, including but not limited to relation number
According to storehouse, memory storage, harddisk memory etc..Those skilled in the art will be understood that the mode of above-mentioned acquisition junk information is only
Citing, other modes of acquisition junk information that are existing or being likely to occur from now on are such as applicable to the present invention, should also be included in
Within the scope of the present invention, and it is incorporated herein by reference herein.
Then, parameter obtaining device 14 issues sample according to the junk information with reference to described information, obtains and is dug with described
At least one corresponding evaluating of pick rule.Specifically, parameter obtaining device 14 is logical according to junk information acquisition device 13
Cross junk information excavate obtain junk information, and combine sample acquiring device 12 obtain information issue sample in include it is many
It is individual to release news and its junk information mark, analysed and compared, so as to obtain real rubbish letter in the junk information such as this
Quantity and non-spam quantity are ceased, and then parameter obtaining device 14 issues the quantity that releases news in sample according to information, with
At least one evaluating is obtained, as described the recall rate of mining rule.Wherein, the evaluating includes but is not limited to:1)
The recall rate corresponding with the mining rule, calculation formula is " recall rate=excavate the real of acquisition by junk information
Real junk information quantity in junk information quantity/information issue sample ";2) it is corresponding with the mining rule accurate
Rate, calculation formula is " accuracy rate=excavating the real junk information quantity obtained by junk information/is dug by junk information
The junk information quantity dug up ".Released news for example, it is assumed that information is issued in sample comprising 500, wherein junk information mark
Knowledge shows that the quantity that releases news that it is real junk information is 100, and junk information acquisition device 13 is issued from the information
It is 80 that sample, which excavates the junk information quantity obtained by junk information,;Then, parameter obtaining device 14 is sent out according to the information
Those are excavated acquisition junk information by junk information and really junk information are carried out in information issue sample by cloth sample
Compare, to obtain those by the real junk information quantity of junk information excavation acquisition as 40, and then parameter acquiring is filled
Putting 14, " accuracy rate=excavating the real junk information quantity obtained by junk information/is dug by junk information by formula
The junk information quantity dug up ", it is 50% (=40/80) to calculate and obtain evaluating accuracy rate, by formula " recall rate=
The real junk information quantity obtained/information is excavated by junk information and issues real junk information quantity in sample ", meter
It is 40% (=40/100) to calculate and obtain recall rate.Those skilled in the art will be understood that the mode of above-mentioned acquisition evaluating is only
Citing, other modes of acquisition evaluating that are existing or being likely to occur from now on are such as applicable to the present invention, should also be included in
Within the scope of the present invention, and it is incorporated herein by reference herein.
Preferably, Rule device 11, sample acquiring device 12, junk information acquisition device 13 and parameter obtaining device
Can be continuously to work between 14.Specifically, Rule device 11 obtains mining rule to be assessed;Sample acquisition
The information that device 12 obtains for assessing the mining rule issues sample;Then, junk information acquisition device 13 is based on described
Mining rule, junk information excavation is carried out to described information issue sample, obtains the rubbish corresponding with described information issue sample
Rubbish information;Then, parameter obtaining device 14 issues sample according to the junk information with reference to described information, obtains and is dug with described
At least one corresponding evaluating of pick rule.Here, it will be understood by those skilled in the art that " lasting " refers to each device difference
Mode of operation according to setting or real-time adjustment requires that carrying out the acquisition of mining rule to be assessed, information issues obtaining for sample
Take, the acquisition of junk information and the acquisition of evaluating, until to stop acquisition in a long time to be evaluated for Rule device 11
The mining rule estimated.
Here, it should be noted that citing in every numerical value as illustration example, only for understand the present invention,
True Data during not as practical application.Unless otherwise instructed, the function of the numerical value occurred elsewhere with herein
It is identical, for simplicity, repeat no more.
Preferably, sample acquiring device 12 is obtained from information issue Sample Storehouse and dug with described according to the mining rule
The corresponding information issue sample of pick rule.Specifically, the excavation that sample acquiring device 12 is obtained according to Rule device 11
Rule, such as, by carrying out matching inquiry in issuing Sample Storehouse in information, send out when matching obtains any bar mining rule with information
When the indicated mining rule that released news in cloth Sample Storehouse is corresponding, obtains this and release news, and all matching inquiries are obtained
Releasing news for obtaining issues sample as information;Or by being inquired about in issuing Sample Storehouse in information, to obtain a fixed number
Amount the or former junk information do not excavated successfully by those mining rules issues sample as information.For example, it is assumed that regular
The mining rule that acquisition device 11 is obtained is that this releases news as junk information if information publisher ID is in blacklist, is connect
, sample acquiring device 12 randomly selects several information publisher ID according to this mining rule in blacklist, and according to
These ID carry out matching inquiry in information issue Sample Storehouse, obtain some and release news, or information is issued into Sample Storehouse
In all information publisher ID released news matching inquiry is carried out in blacklist, with match obtain 200 in blacklist
In information publisher ID, and obtain corresponding with 200 information publisher ID some accordingly and release news, using as
Described information issues sample.In another example, Rule device 11 obtains mining rule, and then, sample acquiring device 12 will be any
The mining rule ID that bar mining rule is identified carries out matching inquiry in information issue Sample Storehouse, obtains and mining rule ID
Whether corresponding junk information, and digging corresponding to mining rule ID excavate the junk information successfully according to rule, enter
And the whole junk information do not excavated by the digging corresponding to it according to rule success are extracted, and will be wherein certain proportion (such as 50%)
In junk information sample is issued as information.Those skilled in the art will be understood that above-mentioned acquisition information issues the mode of sample
It is only for example, the mode that other acquisition information that are existing or being likely to occur from now on issue sample is such as applicable to the present invention,
It should be included within the scope of the present invention, and be incorporated herein by reference herein.
Fig. 2 shows the equipment signal for being used to assess junk information mining rule in accordance with a preferred embodiment of the present invention
Figure, parameter obtaining device 14 ' also includes result acquiring unit 141 ' and parameter acquiring unit 142 '.Specifically, as a result obtain single
Described information is issued default actual junk information and the junk information in sample and is compared analysis by member 141 ', obtain and
The corresponding comparative analysis result of the junk information;Then, parameter acquiring unit 142 ' is obtained according to the comparative analysis result
Take at least one of described evaluating.Here, device 11 ' -13 shown in Fig. 2 ' with device 11- above described by reference picture 1
13 content is identical, for simplicity, is incorporated herein by reference, without repeating.
More specifically, the information that result acquiring unit 141 ' obtains sample acquiring device 12 ' issues default in sample
Actual junk information and junk information acquisition device 13 ' are dug based on mining rule and are compared one by one point according to the junk information of acquisition
Analysis, to obtain the comparative analysis result corresponding with those junk information, wherein, comparative analysis result includes but is not limited to:1)
Real junk information quantity, 2 in those junk information) non-spam quantity, 3 in those junk information) those rubbish letter
The keyword of content, 4 are issued in breath in non-spam) information publisher's credit of non-spam is commented in those junk information
Valency grade, 5) those real junk information information publisher information issue frequency etc..For example, it is assumed that sample acquiring device
Releasing news as 20 in 12 ' the information issue samples obtained, the real junk information quantity in releasing news is 10
Bar;Then, junk information acquisition device 13 ' issues the junk information quantity that sample is obtained based on mining rule excavation from the information
For 6;Then, as a result acquiring unit 141 ' issues sample according to the information, and those are based on into mining rule excavates acquisition rubbish
Information is compared with real junk information in information issue sample, excavates what is obtained based on mining rule to obtain those
Real junk information quantity is 4, and it is that same information publisher ID is issued to obtain those real junk information, and the letter
The information issue frequency for ceasing publisher is 4 beats/min.
Then, the comparative analysis result that parameter acquiring unit 142 ' is obtained according to result acquiring unit 141 ', passes through formula
Calculate and obtain at least one evaluating, such as the accuracy rate corresponding with the mining rule that Rule device 11 is obtained.For example,
Connect example, releasing news as 20 in the information issue sample that sample acquiring device 12 ' is obtained, this release news in it is real
Junk information quantity be 10, junk information acquisition device 13 ' based on mining rule excavate obtain junk information be 6,
As a result acquiring unit 141 ' determines that real junk information quantity is 4, and parameter acquiring unit 142 ' is calculated by formula and obtained
Evaluating accuracy rate is 67% (=4/6), and it is 40% (=4/10) to be calculated by formula and obtain recall rate.
Those skilled in the art will be understood that the mode of above-mentioned acquisition comparative analysis result and acquisition evaluating is only to lift
Example, other acquisition comparative analysis results or the mode of acquisition evaluating existing or be likely to occur from now on are such as applicable to this
Invention, should also be included within the scope of the present invention, and be incorporated herein by reference herein.
Preferably, the mining rule is based on following at least any one and carries out junk information digging to described information issue sample
Pick:
- information issues the frequency;
- information presenting substance;
The historical behavior record of-information publisher;
The attribute of-information publisher.
1) specifically, the described information issue frequency includes but is not limited to:The information issue frequency of one information publisher, tool
The information released news the issue frequency, the information from same IP address for having identical content issue frequency etc..For example, information is sent out
Released news in cloth sample comprising 10, junk information acquisition device 13 ' releases news to this 10 and analyzed, to determine to be somebody's turn to do
10 release news in 6 release news and issued by same information publisher ID in 1 minute, information publisher issue letter
The frequency of breath is 10 beats/min and is more than 5 beats/min predetermined of frequency threshold value, so as to can determine whether that this 6 release news as rubbish letter
Breath.
2) described information issue content includes but is not limited to:Rubbish vocabulary, Duo Gefa included in information presenting substance
Cloth information has identical information presenting substance etc..For example, released news in information issue sample comprising 3, this 3 issue letters
Ceasing content is respectively:
A " certificates handling, calls 13811112222 ",
B " everybody is happy ",
C " I wishes to make friends ";
The content that this 3 release news is carried out string matching by junk information acquisition device 13 ' in rubbish dictionary, with
" certificates handling " the rubbish vocabulary released news in a is obtained, and a that judges to release news accordingly is junk information.
3) the historical behavior record of described information publisher includes but is not limited to:In the history of information publisher releases news
Hold, the history of information publisher releases news time record and the history online hours etc. of information publisher.For example, rubbish is believed
Information is issued an information publisher ID released news in sample and entered in historical behavior database by breath acquisition device 13 '
Row matching inquiry, obtain the information publisher history release news the time in 1:00 AM to 6:00 AM, and the information send out
The history of cloth person, which releases news, includes rubbish vocabulary in content, then judges that this releases news as junk information.Wherein, implementation is lifted
The historical behavior that historical behavior database in example is used for storage information publisher is recorded, including but not limited to relational database,
Memory storage, harddisk memory etc..
4) attribute of described information publisher includes but is not limited to:Whether information publisher is in blacklist, information issue
The personal background information that person pre-enters.For example, whole issues that junk information acquisition device 13 ' is issued information in sample are believed
The information publisher ID of breath carries out matching inquiry in blacklist, obtains two information publishers released news in blacklist
In, then judge that this two release news as junk information.
Those skilled in the art will be understood that can not only be individually used for carrying out information issue sample based on above-mentioned four
Junk information is excavated, and can be combined with being used for carrying out junk information excavation to information issue sample.Those skilled in the art should be able to
Understand that the mining rule of above-mentioned junk information is only for example, the excavation rule of other existing or junk information for being likely to occur from now on
The present invention is then such as applicable to, should be also included within the scope of the present invention, and be incorporated herein by reference herein.
In a preferred embodiment (reference picture 2), assessment equipment 1 also includes rule optimization device (not shown), the rule
Then optimization device optimizes the mining rule according to the evaluating.The preferred embodiment is carried out referring to Fig. 2 detailed
Description, wherein, Rule device 11 ' obtains mining rule to be assessed;Sample acquiring device 12 ' obtains described for assessing
The information issue sample of mining rule;Junk information acquisition device 13 ' is based on the mining rule, and sample is issued to described information
Junk information excavation is carried out, the junk information corresponding with described information issue sample is obtained;Result in parameter obtaining device 14 '
Described information is issued default actual junk information in sample and is compared analysis with the junk information by acquiring unit 141 ',
Obtain the comparative analysis result corresponding with the junk information;The basis of parameter acquiring unit 142 ' in parameter obtaining device 14 '
The comparative analysis result, obtains at least one of described evaluating;Its detailed process and the implementation described by foregoing reference picture 2
Rule device 11 ', sample acquiring device 12 ', junk information acquisition device 13 ' and parameter obtaining device 14 ' are held in example
Capable process is identical, for simplicity, is incorporated herein by reference, without repeating.
Specifically, the evaluating that rule optimization device is obtained according to parameter acquiring unit 142 ', such as with mining rule phase
Corresponding accuracy rate, optimizes the mining rule, such as when accuracy rate is less than default accuracy rate threshold value in evaluating, leads to
It is that the information publisher published information high to credit rating is excavated without junk information to cross adjustment mining rule, to improve standard
True rate.For example, it is assumed that it is 50% that parameter acquiring unit 142 ' calculates accuracy rate in the evaluating obtained by formula, rule is excellent
Makeup puts judging nicety rate 50% less than default accuracy rate threshold value 60%, then rule optimization device adjustment mining rule for pair
Releasing news for the high information publisher of credit rating is excavated without junk information, to improve the accuracy rate in evaluating.
Those skilled in the art will be understood that the mode of above-mentioned optimization mining rule is only for example, and other are existing or are likely to occur from now on
The mode of optimization mining rule be such as applicable to the present invention, should also be included within the scope of the present invention, and herein to draw
It is incorporated herein with mode.
Preferably, the rule optimization device can also be according to the evaluating, with reference to the comparative analysis result, optimization
The mining rule.Specifically, the evaluating that rule optimization device is obtained according to parameter acquiring unit 142 ', is such as advised with excavating
Then corresponding recall rate, and the comparative analysis result obtained according to result acquiring unit 141 ', optimizes the mining rule, example
Such as when recall rate is less than default recall rate threshold value, the mode of optimization includes but is not limited to:By reducing comparative analysis result
The shown information for being used to excavate in the mining rule of junk information issues frequency threshold value, or reduction rubbish vocabulary accumulated quantity threshold
Value etc., to improve recall rate.For example, it is assumed that recall rate is 40% in the evaluating that parameter acquiring unit 142 ' is obtained, and it is small
In default recall rate threshold value 50%, then rule optimization device is according to rubbish in the acquisition comparative analysis result of result acquiring unit 141 '
The information issue frequency average of the information publisher of rubbish information is 4 beats/min, and information is issued into frequency threshold value from 5 beats/min accordingly
Clock is reduced to 4 beats/min, to improve recall rate.In another example, it is assumed that recalled in the evaluating that parameter acquiring unit 142 ' is obtained
Rate is less than default recall rate threshold value, then rule optimization device obtains rubbish in comparative analysis result according to result acquiring unit 141 '
The rubbish vocabulary 2/bar of average included in the rubbish information content, accordingly by the rubbish vocabulary accumulated quantity threshold value of spam content
2/bar is reduced to from 3/bar, to improve recall rate.Those skilled in the art will be understood that the side of above-mentioned optimization mining rule
Formula is only for example, and other modes of optimization mining rule that are existing or being likely to occur from now on are such as applicable to the present invention, also should
Within the scope of the present invention, and it is incorporated herein by reference herein.
It is highly preferred that the rule optimization device can also be according to the evaluating, with reference to the preset of the evaluating
Parameter priority information, optimize the mining rule.Specifically, rule optimization device is such as called together always according to the evaluating
Rate and accuracy rate are returned, and according to default parameter priority information, such as accuracy rate priority is higher than recall rate, the suitable side of selection
Formula optimizes mining rule, to improve the evaluating.For example, it is assumed that accurate in the evaluating that parameter acquiring unit 142 ' is obtained
Rate is 50%, and less than default accuracy rate threshold value 60%, recall rate is 40%, and less than default recall rate threshold value 50%,
Then rule optimization device is higher than the parameter priority information of recall rate according to default accuracy rate priority, and adjustment mining rule is
What high-quality user was issued releases news without excavating, and then improves accuracy rate.Those skilled in the art will be understood that
The mode for stating optimization mining rule is only for example, and other modes of optimization mining rule that are existing or being likely to occur from now on such as may be used
Suitable for the present invention, it should also be included within the scope of the present invention, and be incorporated herein by reference herein.
It is highly preferred that assessment equipment 1 also includes priority update device (not shown), the priority update device can basis
The evaluating, updates the parameter priority information.Specifically, priority update device is according to parameter acquiring unit 142 '
The evaluating of acquisition, for example when recall rate be less than default recall rate threshold value, and accuracy rate be more than default accuracy rate threshold value
When, undated parameter priority is that recall rate priority is more than accuracy rate.For example, the evaluating that parameter acquiring unit 142 ' is obtained
Middle recall rate is less than default recall rate threshold value, and accuracy rate is more than default accuracy rate threshold value, and priority update device will be pre-
If accuracy rate priority be higher than the parameter priority information of recall rate, be updated to recall rate priority higher than accuracy rate.Ability
Field technique personnel will be understood that the mode of above-mentioned excavation undated parameter precedence information is only for example, and other are existing or from now on may be used
The mode of undated parameter precedence information that can occur such as is applicable to the present invention, should also be included in the scope of the present invention with
It is interior, and be incorporated herein by reference herein.
Preferably, the equipment also includes optimal control device (not shown), and the optimal control device, which can work as described evaluate, joins
When number reaches evaluating threshold value, terminate the optimization mining rule.Specifically, junk information acquisition device 13 ' is based on excavating
Rule carries out junk information excavation to information issue sample, obtains the junk information corresponding with information issue sample;Then,
The information is issued default actual junk information and the rubbish in sample by parameter obtaining device 14, middle result acquiring unit 141 '
Information is compared analysis, obtains the comparative analysis result corresponding with the junk information;Then, parameter acquiring unit 142 '
According to the comparative analysis result, at least one evaluating is obtained;Junk information acquisition device 13 ' and parameter obtaining device 14 '
Mining rule circulation after constantly rule-based optimization device updates is performed, and optimal control device detects what the circulation was obtained every time
Evaluating, and when evaluating reaches evaluating threshold value, terminate to optimize the principle of optimality.Wherein, evaluating threshold
Value means preset expectancy evaluation parameter.For example, when optimal control device detects accuracy rate more than predetermined accuracy rate threshold value
And recall rate be more than predetermined recall rate threshold value when, optimal control device stop optimize the mining rule.Those skilled in the art
It will be understood that the mode of above-mentioned end optimization mining rule is only for example, other end optimizations that are existing or being likely to occur from now on
The mode of mining rule is such as applicable to the present invention, should also be included within the scope of the present invention, and herein by reference
It is incorporated herein.
Preferably, the evaluating includes following at least any one:
- the recall rate corresponding with the mining rule;
- the accuracy rate corresponding with the mining rule.
Specifically, the evaluating that parameter acquiring unit 142 ' is obtained includes but is not limited to:Dug with described according to regular relative
The recall rate and the accuracy rate corresponding with the mining rule answered.Recall rate refers to junk information acquisition device 13 ' by rubbish
The real junk information quantity that information excavating is obtained issues the ratio of actual junk information quantity in sample with information;Accuracy rate
Refer to junk information acquisition device 13, excavate the real junk information quantity obtained by junk information fills with junk information acquisition
Put the 13 ' ratios that the junk information quantity obtained is excavated by junk information.Accuracy rate and recall rate may be restricted mutually
Two evaluatings, when accuracy rate is high, may cause recall rate low, when recall rate is high, accuracy rate may be caused low, therefore, need
Balance is found between recall rate and accuracy rate, the excavation of junk information is carried out in an optimal manner.Those skilled in the art
It will be understood that above-mentioned evaluating is only for example, other evaluatings that are existing or being likely to occur from now on are such as applicable to this hair
It is bright, it should also be included within the scope of the present invention, and be incorporated herein by reference herein.
Fig. 3 shows the method flow diagram for assessing junk information mining rule according to one aspect of the invention.Wherein, comment
Estimate equipment 1 and include but is not limited to computer, network host, single network server, multiple webserver collection or multiple services
The cloud that device is constituted.Here, cloud is made up of a large amount of computers or the webserver based on cloud computing (Cloud Computing),
Wherein, cloud computing is one kind of Distributed Calculation, a super virtual computing being made up of the computer collection of a group loose couplings
Machine.
Specifically, in step sl, assessment equipment 1 obtains mining rule to be assessed.More specifically, in step sl,
Assessment equipment 1 is regular or answers event triggering to obtain mining rule to be assessed in real time, such as by monitoring network service in real time
The request for the mining rule to be assessed that the network equipments such as device are sent, to obtain mining rule to be assessed, or regularly leads to
Cross the communication mode of agreement, directly such as http, https communication protocol, the other parts or third party device from assessment equipment 1
Read mining rule to be assessed.For example, it is assumed that assessment equipment 1 is the webserver, in step sl, the webserver leads to
After monitoring another webserver for junk information excavation in real time, obtain another webserver and pass through http communication protocols
The http request being packaged into based on mining rule to be assessed sent is discussed, the webserver parses the http request, and obtains
Take mining rule to be assessed therein.And for example, in step sl, assessment equipment 1 presses some cycles, regularly pre- by calling
Fixed API (API) sends the request for obtaining mining rule to be assessed to third party device, and receives the 3rd
The mining rule to be assessed that method, apparatus is returned.Those skilled in the art will be understood that above-mentioned acquisition mining rule to be assessed
Mode is only for example, and the mode of other acquisitions that are existing or being likely to occur from now on mining rule to be assessed is such as applicable to this
Invention, should also be included within the scope of the present invention, and be incorporated herein by reference herein.
In step s 2, the information that assessment equipment 1 is obtained for assessing the mining rule issues sample.Specifically, exist
In step S2, assessment equipment 1 from network interdynamic platform according to the communication protocol made an appointment by such as randomly extracting many
Bar releases news, or from information issue Sample Storehouse in obtain it is a plurality of release news, wherein, these release news and indicate rubbish in advance
Message identification, to distinguish it as junk information or normal information, and releases news this etc. is a plurality of as assessing in step
The information issue sample for the mining rule that assessment equipment 1 is obtained in S1.Wherein, the junk information is identified for identifying every hair
Whether cloth information is real junk information.Here, information issue sample includes but is not limited to:1) it is a plurality of to release news and its interior
Hold, multiple models and its content in such as Web Community;2) junk information is identified.Here, information issue Sample Storehouse is used to store
It is a plurality of to release news and its junk information mark, including but not limited to relational database, memory storage, harddisk memory etc..
For example, it is assumed that releasing news in network interdynamic platform is stored in the webserver, in step s 2, assessment equipment 1 according to
The communication protocol made an appointment, such as http, https communication protocol, send to obtain to be used to assess to the webserver and excavate
The information of rule issues the request of sample, and a plurality of indicating of receiving in the network interdynamic platform that the webserver is obtained at random
Releasing news for junk information mark, sends out as the information for assessing the mining rule that assessment equipment 1 is obtained in step sl
Cloth sample, wherein, the network interdynamic platform includes but is not limited to:It is Web Community, mhkc, blog, meagre, news analysis, short
Believe interaction etc..In another example, in step s 2, assessment equipment 1 obtains real according to a certain percentage from information issue Sample Storehouse
Junk information and non-spam, and as the letter for assessing the mining rule that assessment equipment 1 is obtained in step sl
Breath issue sample.Those skilled in the art will be understood that the mode of above-mentioned acquisition information issue sample is only for example, and other are existing
Or be likely to occur from now on acquisition information issue sample mode be such as applicable to the present invention, also should be included in the present invention protection
Within scope, and it is incorporated herein by reference herein.
Those skilled in the art will be understood that execution sequence of the above-mentioned assessment equipment 1 in step S1 and step S2 is only act
Example, in practice, they can be performed with random order, such as concurrently or sequentially.Those skilled in the art should also be understood that, Fig. 3
In the execution sequence of a kind of assessment equipment 1 that only shows for simplicity, but this omission is not interfere with pair far and away
The present invention is carried out premised on clear, sufficient disclosure.
Then, in step s3, assessment equipment 1 is based on the mining rule, and rubbish is carried out to described information issue sample
Information excavating, obtains the junk information corresponding with described information issue sample.Specifically, in step s3, the base of assessment equipment 1
In the mining rule that it is obtained in step sl, whether such as one information publisher ID information issue frequency exceedes predetermined frequency
Subthreshold, information publisher whether in blacklist, in the content that releases news whether comprising rubbish vocabulary etc., to assessment equipment
The carry out discriminatory analysis that releases news in the 1 information issue sample obtained in step s 2, such as when one or more issue letter
Breath meet as any one of mining rule or during whole mining rule, then judge that this one or more releases news as junk information,
So as to obtain whole junk information in information issue sample.
For example, it is assumed that in step sl, if the mining rule that assessment equipment 1 is obtained is information publisher ID in blacklist
Or in releasing news comprising rubbish vocabulary then this release news as junk information;Then, in step s 2, assessment equipment 1 is obtained
Information issue sample include three and release news, its content is respectively:
A " certificates handling, calls 13811112222 ",
B " everybody is happy ",
C " I wishes to make friends ";
Then, based on two mining rules, in step s3,1 pair of assessment equipment this three, which releases news, to be judged
Analysis, string matching is carried out by the content for a that releases news in rubbish dictionary, to obtain " certificates handling " for rubbish vocabulary, and hair
Cloth information c information publisher ID judges a and c that releases news that released news in information issue sample in blacklist, then
For junk information.
In another example, it is assumed that in step sl, if the mining rule that assessment equipment 1 is obtained is an information publisher ID issue
The frequency of the same content that releases news exceed predetermined frequency threshold value and release news in comprising rubbish vocabulary then this release news
For junk information;Then, in step s 2, the information issue sample that assessment equipment 1 is obtained includes 20 and released news, wherein
10 the content released news is:" head store sells all kinds of slimming drugs, favorable price ", and information publisher ID is identical, and 1
Sent within minute;Then, in step s3, assessment equipment 1 is released news progress based on two mining rules to this 10
Analysis, to determine that 10 contents that release news are identical and issued by same information publisher ID, so as to can determine whether this ten hairs
Cloth information is 10 continuously issues of same information, and the frequency that information is issued is for 10 beats/min more than predetermined frequency threshold value 5
Beat/min, while assessment equipment 1 carries out it string matching in rubbish dictionary, and " sale ", " slimming drugs " are obtained accordingly
For rubbish vocabulary, and then obtain in step s3 in the information issues sample this 10 of assessment equipment 1 release news as rubbish
Rubbish information.Here, rubbish vocabulary described in illustrated embodiment include but is not limited to banned word, infringement word, indecency word, political nature,
Agitative word, advertising words etc., rubbish dictionary described in illustrated embodiment are used to store rubbish vocabulary, including but not limited to close
It is database, memory storage, harddisk memory etc..Those skilled in the art will be understood that the mode of above-mentioned acquisition junk information
It is only for example, other modes of acquisition junk information that are existing or being likely to occur from now on are such as applicable to the present invention, should also wrap
It is contained within the scope of the present invention, and is incorporated herein by reference herein.
Then, in step s 4, assessment equipment 1 issues sample according to the junk information with reference to described information, obtain with
At least one corresponding evaluating of the mining rule.Specifically, in step s 4, assessment equipment 1 according to it in step
The junk information obtained is excavated by junk information in S3, and included in information issue sample obtained in step s 2 with reference to it
It is multiple release news and its junk information mark, analysed and compared, so as to obtain real rubbish in the junk information such as this
Rubbish information content and non-spam quantity, and then assessment equipment 1 issues the issue letter in sample according to information in step s 4
Cease quantity, to obtain at least one evaluating, the recall rate of mining rule as described.Wherein, the evaluating include but
It is not limited to:1) recall rate corresponding with the mining rule, calculation formula is " recall rate=excavate acquisition by junk information
Real junk information quantity/information issue sample in real junk information quantity ";2) it is corresponding with the mining rule
Accuracy rate, calculation formula for " accuracy rate=pass through junk information excavate obtain real junk information quantity/pass through rubbish
The junk information quantity that information excavating is obtained ".Released news for example, it is assumed that information is issued in sample comprising 500, wherein rubbish
Message identification shows that the quantity that releases news that it is real junk information is 100, in step s3, assessment equipment 1 from this
It is 80 that information, which issues sample and the junk information quantity obtained is excavated by junk information,;Then, in step s 4, assessment equipment
1 issues sample according to the information, and those are excavated by junk information and obtains real in junk information and information issue sample
Junk information be compared, using obtain those by junk information excavate obtain real junk information quantity as 40,
And then by formula, " accuracy rate=excavating the real junk information quantity obtained by junk information/passes through rubbish to assessment equipment 1
The junk information quantity that rubbish information excavating is obtained ", it is 50% (=40/80) to calculate and obtain evaluating accuracy rate, passes through formula
" recall rate=real rubbish letter in the real junk information quantity obtained/information issue sample is excavated by junk information
Cease quantity ", it is 40% (=40/100) to calculate and obtain recall rate.Those skilled in the art will be understood that above-mentioned acquisition evaluating
Mode be only for example, other modes of acquisition evaluating that are existing or being likely to occur from now on are such as applicable to the present invention,
Also it should be included within the scope of the present invention, and be incorporated herein by reference herein.
Preferably, assessment equipment 1 is continuously to work in step S1, step S2, step S3 and step S4.Specifically
Ground, in step sl, assessment equipment 1 obtain mining rule to be assessed;In step s 2, assessment equipment 1 is obtained for assessing
The information issue sample of the mining rule;Then, in step s3, assessment equipment 1 is based on the mining rule, to the letter
Breath issue sample carries out junk information excavation, obtains the junk information corresponding with described information issue sample;Then, in step
In S4, assessment equipment 1 issues sample according to the junk information with reference to described information, obtains corresponding with the mining rule
At least one of evaluating.Here, it will be understood by those skilled in the art that " lasting " refers to assessment equipment 1 in each step respectively
Mode of operation according to setting or real-time adjustment requires that carrying out the acquisition of mining rule to be assessed, information issues obtaining for sample
Take, the acquisition of junk information and the acquisition of evaluating, until assessment equipment 1 stops acquisition digging to be assessed in a long time
Pick rule.
Here, it should be noted that citing in every numerical value as illustration example, only for understand the present invention,
True Data during not as practical application.Unless otherwise instructed, the function of the numerical value occurred elsewhere with herein
It is identical, for simplicity, repeat no more.
Preferably, in step s 2, assessment equipment 1 is according to the mining rule, obtained from information issue Sample Storehouse with
The corresponding information issue sample of the mining rule.Specifically, in step s 2, assessment equipment 1 according to it in step sl
The mining rule of acquisition, such as, by carrying out matching inquiry in issuing Sample Storehouse in information, rule are excavated when matching obtains any bar
When then corresponding with the indicated mining rule that released news in information issue Sample Storehouse, obtain this and release news, and will be all
What matching inquiry was obtained releases news as information issue sample;Or by being inquired about in issuing Sample Storehouse in information, with
Junk information that is a number of or not excavated successfully by those mining rules in the past is obtained as information and issues sample.Example
Such as, it is assumed that in step sl, the mining rule that assessment equipment 1 is obtained is that the issue is believed if information publisher ID is in blacklist
Cease for junk information, then, in step s 2, assessment equipment 1 is randomly selected some according to this mining rule in blacklist
Individual information publisher ID, and matching inquiry is carried out in information issues Sample Storehouse according to these ID, obtain some and release news,
Or by information issue Sample Storehouse in all information publisher ID released news matching inquiry is carried out in blacklist, with
With 200 information publisher ID in blacklist are obtained, and if obtaining corresponding with 200 information publisher ID accordingly
Dry bar releases news, to issue sample as described information.In another example, in step sl, assessment equipment 1 obtains mining rule,
Then, in step s 2, the mining rule ID that assessment equipment 1 is identified any bar mining rule is in information issue Sample Storehouse
Matching inquiry is carried out, the junk information corresponding with mining rule ID, and the digging evidence corresponding to mining rule ID is obtained
Whether rule is excavated the junk information successfully, and then extracts the whole rubbish not excavated by the digging corresponding to it according to rule success
Information, and information issue sample will be used as wherein in the junk information of certain proportion (such as 50%).Those skilled in the art should
It is understood that the mode of above-mentioned acquisition information issue sample is only for example, other acquisition information hairs that are existing or being likely to occur from now on
The mode of cloth sample is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference herein
It is contained in this.
Fig. 4 shows the method flow for being used to assess junk information mining rule in accordance with a preferred embodiment of the present invention
Figure, specifically, in step S41 ', assessment equipment 1 by described information issue sample in default actual junk information with it is described
Junk information is compared analysis, obtains the comparative analysis result corresponding with the junk information;Then, in step S42 '
In, assessment equipment 1 obtains at least one of described evaluating according to the comparative analysis result.Here, step shown in Fig. 4
1 ' to step 3 ' it is identical with the content of step S1 to step S3 described by above reference picture 3, for simplicity, by reference
It is incorporated herein, without repeating.
More specifically, in step S41 ', the information that assessment equipment 1 obtains it in step S2 ' is issued pre- in sample
If actual junk information and assessment equipment 1 dug and carried out one by one according to the junk information of acquisition based on mining rule in step S3 '
Comparative analysis, to obtain the comparative analysis result corresponding with those junk information, wherein, comparative analysis result includes but not limited
In:1) real junk information quantity, 2 in those junk information) non-spam quantity, 3 in those junk information) those rubbish
The keyword of content, 4 are issued in rubbish information in non-spam) information publisher's letter of non-spam in those junk information
With opinion rating, 5) information issue frequency of the information publishers of those real junk information etc..For example, it is assumed that in step
In S2 ', releasing news as 20 in the information issue sample that assessment equipment 1 is obtained, the real rubbish in releasing news
Information content is 10;Then, in step S3 ', assessment equipment 1 is based on mining rule excavation from information issue sample and obtained
Junk information quantity be 6;Then, in step S41 ', assessment equipment 1 issues sample according to the information, and those are based on
Mining rule is excavated real junk information in acquisition junk information and information issue sample and is compared, to obtain those bases
The real junk information quantity for excavating acquisition in mining rule is 4, and obtains those real junk information for same letter
Publisher ID issues are ceased, and the information issue frequency of the information publisher is 4 beats/min.
Then, in step S42 ', the comparative analysis result that assessment equipment 1 is obtained according to it in step S41 ' passes through
Formula, which is calculated, obtains at least one evaluating, such as standard corresponding with the mining rule that assessment equipment 1 is obtained in step S1 '
True rate.For example, connecting example, in step S2 ', releasing news as 20 in the information issue sample that assessment equipment 1 is obtained should
Real junk information quantity in releasing news is 10, in step S3 ', and assessment equipment 1 is based on mining rule excavation and obtained
The junk information obtained is 6, in step S41 ', and assessment equipment 1 determines that real junk information quantity is 4, in step
In S42 ', it is 67% (=4/6) that assessment equipment 1, which is calculated by formula and obtains evaluating accuracy rate, is calculated and obtained by formula
Recall rate is 40% (=4/10).
Those skilled in the art will be understood that the mode of above-mentioned acquisition comparative analysis result and acquisition evaluating is only to lift
Example, other acquisition comparative analysis results or the mode of acquisition evaluating existing or be likely to occur from now on are such as applicable to this
Invention, should also be included within the scope of the present invention, and be incorporated herein by reference herein.
Preferably, the mining rule is based on following at least any one and carries out junk information digging to described information issue sample
Pick:
- information issues the frequency;
- information presenting substance;
The historical behavior record of-information publisher;
The attribute of-information publisher.
1) specifically, the described information issue frequency includes but is not limited to:The information issue frequency of one information publisher, tool
The information released news the issue frequency, the information from same IP address for having identical content issue frequency etc..For example, information is sent out
Released news in cloth sample comprising 10, in step S3 ', 1 pair of assessment equipment this 10, which releases news, to be analyzed, to determine
This 10 release news in 6 release news and issued by same information publisher ID in 1 minute, information publisher issue
The frequency of information is 10 beats/min and is more than 5 beats/min predetermined of frequency threshold value, so as to can determine whether that this 6 release news as rubbish
Information.
2) described information issue content includes but is not limited to:Rubbish vocabulary, Duo Gefa included in information presenting substance
Cloth information has identical information presenting substance etc..For example, released news in information issue sample comprising 3, this 3 issue letters
Ceasing content is respectively:
A " certificates handling, calls 13811112222 ",
B " everybody is happy ",
C " I wishes to make friends ";
In step S3 ', the content that this 3 release news is carried out string matching by assessment equipment 1 in rubbish dictionary,
To obtain " certificates handling " the rubbish vocabulary released news in a, and a that judges to release news accordingly is junk information.
3) the historical behavior record of described information publisher includes but is not limited to:In the history of information publisher releases news
Hold, the history of information publisher releases news time record and the history online hours etc. of information publisher.For example, in step
In S3 ', information is issued one in the sample information publisher ID released news in historical behavior database by assessment equipment 1
Carry out matching inquiry, obtain the information publisher history release news the time in 1:00 AM to 6:00 AM, and the information
The history of publisher, which releases news, includes rubbish vocabulary in content, then judges that this releases news as junk information.Wherein, lift real
Applying the historical behavior database in example is used for the historical behavior record of storage information publisher, including but not limited to relation data
Storehouse, memory storage, harddisk memory etc..
4) attribute of described information publisher includes but is not limited to:Whether information publisher is in blacklist, information issue
The personal background information that person pre-enters.For example, in step S3 ', assessment equipment 1 issues information whole issues in sample
The information publisher ID of information carries out matching inquiry in blacklist, obtains two information publishers released news in blacklist
In, then judge that this two release news as junk information.
Those skilled in the art will be understood that can not only be individually used for carrying out information issue sample based on above-mentioned four
Junk information is excavated, and can be combined with being used for carrying out junk information excavation to information issue sample.Those skilled in the art should be able to
Understand that the mining rule of above-mentioned junk information is only for example, the excavation rule of other existing or junk information for being likely to occur from now on
The present invention is then such as applicable to, should be also included within the scope of the present invention, and be incorporated herein by reference herein.
In a preferred embodiment (reference picture 4), the process also includes step S5 ' (not shown), in step S5 ',
Assessment equipment 1 optimizes the mining rule according to the evaluating.The preferred embodiment is carried out referring to Fig. 4 detailed
Description, wherein, in step S1 ', assessment equipment 1 obtains mining rule to be assessed;In step S2 ', assessment equipment 1 is obtained
Information for assessing the mining rule issues sample;In step S3 ', assessment equipment 1 is based on the mining rule, to institute
State information issue sample and carry out junk information excavation, obtain the junk information corresponding with described information issue sample;In step
In S41 ', described information is issued default actual junk information in sample and is compared with the junk information by assessment equipment 1
Analysis, obtains the comparative analysis result corresponding with the junk information;In step S42 ', assessment equipment 1 is according to the ratio
Compared with analysis result, at least one of described evaluating is obtained;Its detailed process in the embodiment described by foregoing reference picture 4 with commenting
Estimate the process performed in step S1 ', step S2 ', step S3 ', step S41 ' and step S42 ' of equipment 1 identical, be concise
For the sake of, it is incorporated herein by reference, without repeating.
Specifically, in step S5 ', the evaluating that assessment equipment 1 is obtained according to it in step S42 ', such as with excavation
The corresponding accuracy rate of rule, optimizes the mining rule, for example when in evaluating accuracy rate be less than default accuracy rate threshold
During value, excavated by adjusting mining rule for the information publisher published information high to credit rating without junk information,
To improve accuracy rate.For example, it is assumed that in step S42 ', assessment equipment 1 calculates accurate in the evaluating obtained by formula
Rate is 50%, in step S5 ', and the judging nicety rate 50% of assessment equipment 1 is less than default accuracy rate threshold value 60%, then assesses and set
Standby 1 adjustment mining rule is that releasing news for the information publisher high to credit rating is excavated without junk information, to improve
Accuracy rate in evaluating.Those skilled in the art will be understood that the mode of above-mentioned optimization mining rule is only for example, other
The mode of optimization mining rule that is existing or being likely to occur from now on is such as applicable to the present invention, should also be included in present invention protection
Within scope, and it is incorporated herein by reference herein.
Preferably, described in step S5 ', assessment equipment 1 can also be according to the evaluating, with reference to the comparative analysis
As a result, the mining rule is optimized.Specifically, in step S5 ', assessment equipment 1 is according to commenting that it is obtained in step S42 '
Valency parameter, recall rate such as corresponding with mining rule, and the comparative analysis knot obtained according to assessment equipment 1 in step S41 '
Really, the mining rule is optimized, such as when recall rate is less than default recall rate threshold value, the mode of optimization includes but do not limited
In:Frequency threshold value is issued by reducing the information in the mining rule for excavating junk information shown in comparative analysis result,
Or reduction rubbish vocabulary accumulated quantity threshold value etc., to improve recall rate.For example, it is assumed that in step S42 ', assessment equipment 1 is obtained
Evaluating in recall rate be 40%, and less than default recall rate threshold value 50%, then in step S5 ', assessment equipment 1
The information issue frequency average for obtaining the information publisher of junk information in comparative analysis result in step S41 according to it is 4 times/
Minute, information is issued into frequency threshold value accordingly and is reduced to 4 beats/min from 5 beats/min, to improve recall rate.In another example, it is assumed that
In step S42 ', recall rate is less than default recall rate threshold value in the evaluating that assessment equipment 1 is obtained, then in step S5 '
In, assessment equipment 1 obtains the rubbish vocabulary included in comparative analysis result in spam content according to it in step S41 '
2/bar of average, is reduced to 2/bar, to carry by the rubbish vocabulary accumulated quantity threshold value of spam content from 3/bar accordingly
High recall rate.Those skilled in the art will be understood that the mode of above-mentioned optimization mining rule is only for example, other existing or the presents
The mode for the optimization mining rule being likely to occur afterwards is such as applicable to the present invention, should also be included within the scope of the present invention,
And be incorporated herein by reference herein.
It is highly preferred that in step S5 ', assessment equipment 1 can also be according to the evaluating, with reference to the evaluating
Preset parameter priority information, optimizes the mining rule.Specifically, in step S5 ', assessment equipment 1 is always according to described
Evaluating, such as recall rate and accuracy rate, and according to default parameter priority information, such as accuracy rate priority is higher than and recalled
Rate, selects suitable method optimizing mining rule, to improve the evaluating.For example, it is assumed that in step S42 ', assessment equipment
Accuracy rate is 50% in 1 evaluating obtained, and less than default accuracy rate threshold value 60%, recall rate is 40%, and is less than
Default recall rate threshold value 50%, then in step S5 ', assessment equipment 1 is higher than recall rate according to default accuracy rate priority
Parameter priority information, adjustment mining rule be to high-quality user issue releasing news without excavate, and then raising
Accuracy rate.Those skilled in the art will be understood that the mode of above-mentioned optimization mining rule is only for example, and other are existing or from now on
The mode for the optimization mining rule being likely to occur such as is applicable to the present invention, should also be included within the scope of the present invention, and
It is incorporated herein by reference herein.
It is highly preferred that the process also includes step S6 ' (not shown), in step S6 ', assessment equipment 1 can be according to described
Evaluating, updates the parameter priority information.Specifically, in step S6 ', assessment equipment 1 is according to it in step S42 '
The evaluating of middle acquisition, for example when recall rate be less than default recall rate threshold value, and accuracy rate be more than default accuracy rate threshold
During value, undated parameter priority is that recall rate priority is more than accuracy rate.For example, in step S42 ', what assessment equipment 1 was obtained
Recall rate is less than default recall rate threshold value in evaluating, and accuracy rate is more than default accuracy rate threshold value, in step S6 '
In, default accuracy rate priority is higher than the parameter priority information of recall rate by assessment equipment 1, is updated to recall rate priority
Higher than accuracy rate.Those skilled in the art will be understood that the mode of above-mentioned excavation undated parameter precedence information is only for example, its
He is such as applicable to the present invention at the mode of undated parameter precedence information that is existing or being likely to occur from now on, should also be included in this
Within invention protection domain, and it is incorporated herein by reference herein.
Preferably, the process also includes step S7 ' (not shown), in step S7 ', and assessment equipment 1 can work as described evaluate
When parameter reaches evaluating threshold value, terminate the optimization mining rule.Specifically, in step S3 ', assessment equipment 1 is based on
Mining rule carries out junk information excavation to information issue sample, obtains the junk information corresponding with information issue sample;
Then, in step S41 ', the information is issued default actual junk information in sample and entered with the junk information by assessment equipment 1
Row comparative analysis, obtains the comparative analysis result corresponding with the junk information;Then, in step S42 ', assessment equipment 1
According to the comparative analysis result, at least one evaluating is obtained;Assessment equipment 1 is constantly based in step S3 ' and step S4 '
Its mining rule after updating in the step S5 ', which is circulated, to be performed, and in step S7 ', assessment equipment 1 detects that the circulation is obtained every time
Evaluating, and when evaluating reaches evaluating threshold value, terminate to optimize the principle of optimality.Wherein, evaluating
Threshold value means preset expectancy evaluation parameter.For example, when assessment equipment 1 detects accuracy rate more than predetermined in step S7 '
When accuracy rate threshold value and recall rate are more than predetermined recall rate threshold value, assessment equipment 1 stops optimizing the mining rule.This area skill
Art personnel will be understood that the mode of above-mentioned end optimization mining rule is only for example, other knots that are existing or being likely to occur from now on
The mode of Shu Youhua mining rules is such as applicable to the present invention, should also be included within the scope of the present invention, and herein to draw
It is incorporated herein with mode.
Preferably, the evaluating includes following at least any one:
- the recall rate corresponding with the mining rule;
- the accuracy rate corresponding with the mining rule.
Specifically, in step S42 ', the evaluating that assessment equipment 1 is obtained includes but is not limited to:With the digging according to rule
Then corresponding recall rate and the accuracy rate corresponding with the mining rule.Recall rate refers to assessment equipment 1 and led in step S3 '
Cross junk information and excavate the ratio that the real junk information quantity obtained issues actual junk information quantity in sample with information;
Accuracy rate refers to assessment equipment 1 and passed through in step S3 ' by the real junk information quantity of junk information excavation acquisition with it
Junk information excavates the ratio of the junk information quantity obtained.Accuracy rate and recall rate are that possible mutually restrict two evaluate ginseng
Number, when accuracy rate is high, may cause recall rate low, when recall rate is high, accuracy rate may be caused low, accordingly, it would be desirable in recall rate
Balance is found between accuracy rate, the excavation of junk information is carried out in an optimal manner.Those skilled in the art will be understood that
State evaluating to be only for example, other evaluatings that are existing or being likely to occur from now on are such as applicable to the present invention, should also wrap
It is contained within the scope of the present invention, and is incorporated herein by reference herein.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power
Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling
Change is included in the present invention.Any reference in claim should not be considered as to the claim involved by limitation.This
Outside, it is clear that the word of " comprising " one is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple
Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade word is used for table
Show title, and be not offered as any specific order.
Claims (16)
1. a kind of computer implemented method for assessing junk information mining rule, wherein, this method comprises the following steps:
A obtains mining rule to be assessed, wherein, the mining rule issues the frequency and/or information publisher based on information
Historical behavior is recorded and/or the attribute of information publisher carries out junk information excavation;
Wherein, methods described also includes:
The information that i is obtained for assessing the mining rule according to the mining rule issues sample;
Wherein, methods described also includes:
B is based on the mining rule, carries out junk information excavation to described information issue sample, obtains and issue sample with described information
This corresponding junk information;
C issues sample according to the junk information with reference to described information, obtains at least one corresponding with the mining rule
Evaluating, and according to the evaluating, with reference to the preset parameter priority information of the evaluating, optimize described dig
Pick rule.
2. according to the method described in claim 1, wherein, the step i also includes:
- according to the mining rule, the information issue sample corresponding with the mining rule is obtained from information issue Sample Storehouse
This.
3. method according to claim 1 or 2, wherein, the step c also includes:
- described information issued into default actual junk information and the junk information in sample be compared analysis, obtain and
The corresponding comparative analysis result of the junk information;
- according to the comparative analysis result, obtain at least one of described evaluating.
4. method according to claim 3, wherein, this method also includes step X:
X optimizes the mining rule according to the evaluating.
5. method according to claim 4, wherein, the step X also includes:
- according to the evaluating, with reference to the comparative analysis result, optimize the mining rule.
6. according to the method described in claim 1, wherein, this method also includes:
- according to the evaluating, update the parameter priority information.
7. the method according to any one of claim 4 to 6, wherein, this method also includes:
- based on the mining rule after the optimization, the step b and c is repeated, until the evaluating reaches evaluation ginseng
Number threshold value.
8. method according to claim 1 or 2, wherein, the evaluating includes following at least any one:
- the recall rate corresponding with the mining rule;
- the accuracy rate corresponding with the mining rule.
9. a kind of equipment for assessing junk information mining rule, wherein, the equipment includes:
Rule device, the mining rule to be assessed for obtaining, wherein, the mining rule is based on information and issues the frequency
And/or the historical behavior record of information publisher and/or the attribute of information publisher carry out junk information excavation;
Sample acquiring device, the information for being obtained according to the mining rule for assessing the mining rule issues sample;
Junk information acquisition device, for based on the mining rule, junk information excavation to be carried out to described information issue sample,
Obtain the junk information corresponding with described information issue sample;
Parameter obtaining device, for according to the junk information, sample to be issued with reference to described information, is obtained and the mining rule
At least one corresponding evaluating, and according to the evaluating, the preset parameter with reference to the evaluating is preferential
Level information, optimizes the mining rule.
10. equipment according to claim 9, wherein, the sample acquiring device is additionally operable to according to the mining rule, from
The information issue sample corresponding with the mining rule is obtained in information issue Sample Storehouse.
11. the equipment according to claim 9 or 10, wherein, the parameter obtaining device also includes:
As a result acquiring unit, is carried out for described information to be issued into default actual junk information in sample with the junk information
Comparative analysis, obtains the comparative analysis result corresponding with the junk information;
Parameter acquiring unit, for according to the comparative analysis result, obtaining at least one of described evaluating.
12. equipment according to claim 11, wherein, the equipment also includes:
Rule optimization device, for according to the evaluating, optimizing the mining rule.
13. equipment according to claim 12, wherein, the rule optimization device is additionally operable to according to the evaluating,
With reference to the comparative analysis result, optimize the mining rule.
14. equipment according to claim 9, wherein, the equipment also includes:
Priority update device, for according to the evaluating, updating the parameter priority information.
15. the equipment according to any one of claim 12 to 14, wherein, the equipment also includes:
Optimal control device, for when the evaluating reaches evaluating threshold value, terminating the optimization mining rule.
16. the equipment according to claim 9 or 10, wherein, the evaluating includes following at least any one:
- the recall rate corresponding with the mining rule;
- the accuracy rate corresponding with the mining rule.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110264221.6A CN102982048B (en) | 2011-09-07 | 2011-09-07 | A kind of method and apparatus for being used to assess junk information mining rule |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110264221.6A CN102982048B (en) | 2011-09-07 | 2011-09-07 | A kind of method and apparatus for being used to assess junk information mining rule |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102982048A CN102982048A (en) | 2013-03-20 |
CN102982048B true CN102982048B (en) | 2017-08-01 |
Family
ID=47856084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110264221.6A Active CN102982048B (en) | 2011-09-07 | 2011-09-07 | A kind of method and apparatus for being used to assess junk information mining rule |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102982048B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216872B (en) * | 2013-05-31 | 2017-12-01 | 腾讯科技(深圳)有限公司 | The method and device of rubbish chapters and sections in a kind of identification network novel |
CN104009970A (en) * | 2013-09-17 | 2014-08-27 | 宁波公众信息产业有限公司 | Network information acquisition method |
CN106376002B (en) * | 2015-07-20 | 2021-10-12 | 中兴通讯股份有限公司 | Management method and device and spam monitoring system |
CN107705828A (en) * | 2017-09-20 | 2018-02-16 | 广西金域医学检验所有限公司 | Prejudge detection and processing method and processing device, terminal device, the storage medium of rule |
CN108182234B (en) * | 2017-12-27 | 2021-07-09 | 鼎富智能科技有限公司 | Regular expression screening method and device |
CN109726312B (en) * | 2018-12-25 | 2021-10-08 | 广州虎牙信息科技有限公司 | Regular expression detection method, device, equipment and storage medium |
CN110427577B (en) * | 2019-06-26 | 2022-04-19 | 五八有限公司 | Content influence evaluation method and device, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101166159A (en) * | 2006-10-18 | 2008-04-23 | 阿里巴巴公司 | A method and system for identifying rubbish information |
CN101996203A (en) * | 2009-08-13 | 2011-03-30 | 阿里巴巴集团控股有限公司 | Web information filtering method and system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9471712B2 (en) * | 2004-02-09 | 2016-10-18 | Dell Software Inc. | Approximate matching of strings for message filtering |
CN101389085B (en) * | 2008-10-14 | 2012-03-21 | 中国联合网络通信集团有限公司 | Rubbish short message recognition system and method based on sending behavior |
-
2011
- 2011-09-07 CN CN201110264221.6A patent/CN102982048B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101166159A (en) * | 2006-10-18 | 2008-04-23 | 阿里巴巴公司 | A method and system for identifying rubbish information |
CN101996203A (en) * | 2009-08-13 | 2011-03-30 | 阿里巴巴集团控股有限公司 | Web information filtering method and system |
Also Published As
Publication number | Publication date |
---|---|
CN102982048A (en) | 2013-03-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102982048B (en) | A kind of method and apparatus for being used to assess junk information mining rule | |
Oerlemans et al. | Alliance portfolio diversity, radical and incremental innovation: The moderating role of technology management | |
US7698248B2 (en) | Method and system for auditing processes and projects for process improvement | |
Anoaica et al. | Quantitative description of internal activity on the ethereum public blockchain | |
CN106716454A (en) | Utilizing machine learning to identify non-technical loss | |
CN106126388A (en) | The method of monitor event, regulation engine device and rule engine system | |
CN111143673A (en) | Method and system for multi-channel customer information processing and automatic marketing | |
CN107634850B (en) | A kind of application state acquisition methods and its equipment, storage medium, server | |
CN108234171A (en) | A kind of data processing method, system and device | |
CN105247506A (en) | Service-level agreement analysis | |
CN107038620A (en) | Based on user call a taxi preference information push and device | |
CN109582885A (en) | It is a kind of that the method and device that block chain deposits card is carried out to webpage by webpage monitoring | |
CN107784504A (en) | Client pays a return visit the generation method and terminal device of event | |
CN106390451B (en) | Method and device for testing capacity of game server | |
Ba et al. | Social and rewarding microscopical dynamics in blockchain-based online social networks | |
CN110392155A (en) | It has been shown that, processing method, device and the equipment of notification message | |
CN115309913A (en) | Deep learning-based financial data risk identification method and system | |
JP2008257539A (en) | Communication analyzing device and method | |
CN103679400B (en) | Subway map shows the method and system of project process | |
CN111259922A (en) | Order data processing method and device based on customer order-returning early warning | |
CN103577541B (en) | The ranking fraud detection method and ranking fraud detection system of application program | |
JP5088233B2 (en) | Operation management apparatus, display method, and program | |
CN110928942A (en) | Index data monitoring and management method and device | |
CN116629599A (en) | Cloud management evaluation method and device, electronic equipment and storage medium | |
CN106790339A (en) | Metadata server, network device, and automatic resource management method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |