CN103853700B - A kind of event method for early warning found based on region and object information - Google Patents
A kind of event method for early warning found based on region and object information Download PDFInfo
- Publication number
- CN103853700B CN103853700B CN201210501970.0A CN201210501970A CN103853700B CN 103853700 B CN103853700 B CN 103853700B CN 201210501970 A CN201210501970 A CN 201210501970A CN 103853700 B CN103853700 B CN 103853700B
- Authority
- CN
- China
- Prior art keywords
- information
- pronoun
- event
- word
- nounoun
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of event method for early warning found based on region and object information.The method include the steps that 1) info web crawled is filtered, obtain the non-junk info web relevant to food safety affair;2) word representing place in info web is resolved, obtain place name word accurately;Based on built event information body, info web is processed, info web is included into the region that the match is successful;3) utilize regression analysis model that info web is processed, it is judged that the object type that each webpage is relevant;4) according to webpage affiliated area and relevant object type thereof, obtain the info web set of the event of setting regions, object, set up the characteristic parameter of event and periodically calculate characteristic ginseng value, if the characteristic ginseng value of certain event is continued above setting threshold value, this event being carried out early warning.The present invention improves the accuracy of event early warning and comprehensive, it is ensured that the efficiency of food safety affair early warning.
Description
Technical field
The invention belongs to areas of information technology, particularly relate to a kind of carry out specific place to crawling the internet information obtained
Reason, the method carrying out event early warning on the basis of the region that event occurs, object type etc. carry out INFORMATION DISCOVERY, mainly should
In the emergency processing of the unconventional accidents such as food safety affair information monitoring, Risk-warning works.
Background technology
In recent years, food safety affair such as toxic capsule, twice-cooked stir-frying oil, clenbuterol hydrochloride, dyeing steamed bun, plasticiser, poison cucumber etc.
Again and again occurring, this had both caused the worst social influence, had also brought substantial amounts of economic loss.In order to avoid or to greatest extent
Reducing the harm that these food safety affairs are brought, Risk-warning technology based on event starts to have obtained very big concern.For
Carry out Risk-warning based on event, this information being accomplished by finding these events in advance.
Along with the fast development of Internet, internet netizen's quantity is more and more huger, and internet is increasingly becoming netizen and sends out
Cloth information, acquisition information and the main carriers of transmission information, and by defining alternately and showing between people, tissue etc.
There are certain correspondence, the virtual society of incidence relation in real society.It has had changed into worldwide largest common data source,
And its scale the most ceaselessly increases.Under this situation, utilize the feature of internet self, it is established that perfect society's letter
Breath feedback network, finds various " possible trouble " factor that may bring crisis in advance, and the contingency management for food safety affair provides
In time, accurately, comprehensive information just seems imperative and has very important meaning.
For utilizing the information on internet to carry out the Risk-warning of food safety affair, need to obtain through certain process
The information that event is relevant.This crawls firstly the need of carrying out internet information, can carry out the letter that food safety affair is relevant afterwards
Breath extracts, finds work, to be developed can carry out early warning the most afterwards.Inside such a process, key therein
Step is by the identification of event information, and this can be by the various machine learning having supervision or unsupervised machine in theory
Learning method realizes, but combines actual information requirement and the consideration of the degree of accuracy, operability etc., often takes some accommodations
Measure.It is: set up some information classifications (such as disease) in advance to receive for each classification than the mode taked if any research work
Collect some keywords, afterwards to the info web collected based on these classification and keyword, take Keywords matching
Mode carries out information classification, and the monitoring classification information i.e. development of event on this basis.Also research work is had to have employed information
Correlation detection, name Entity recognition, utilize disease and the step such as the information extraction of address, visual displaying result to carry out
The identification of event information and the way of judgement.
In terms of the result of evaluation test, in above-mentioned way, the judgement of event information, identification, early warning etc. there is also in performance
Not enough (parameter such as accuracy rate, recall rate needs to be improved further).To this, if it is considered that in said method in non-consideration information
Exist the impact of various junk information, the degree of accuracy of information extraction technology up to now is the most sufficiently high and directly will pass through
The classification information obtained after Keywords matching there may be main body corresponding to information not in the way as same event information
The problem such as consistent, it is also not at all surprising to there is the deficiency in performance in said method.
Summary of the invention
For solving above-mentioned problem, it is an object of the invention to provide and a kind of take the particular step content to info web
It is analyzed, therefrom extracts the key elements such as the generation area of event, object type to identify institute event, afterwards according to event
The development trend method that carries out early warning.Using for reference intelligence system thinking in method, the step of formation is as described below.
1, body is set up
The feature of based food security incident and the needs of later stage information analysis, from object, region, result, association person, time
Between wait latitude set up food safety affair Information Ontology.Thus provide for the information filtering of food safety affair, INFORMATION DISCOVERY etc.
Basis.
2, information filtering
On the basis of the body of above-mentioned foundation, filter crawling the info web obtained.Filter process is broadly divided into
Two parts: food security information filters, garbage information filtering.Wherein the former is mainly by the title of information, content etc.
The method taking pattern match determines if to belong to food security information;The latter mainly practises fraud to by content, link
Unrelated suggestion, low quality suggestion and duplicity rubbish suggestion in junk information and user-generated content is by the inspection set up
Survey model to filter.Thus ensure to enter the quality of the information of subsequent process.
3, area information finds
On the basis of the area information body of above-mentioned foundation, the title of the information after crawling, filtering, content etc. are carried out ground
After nounoun pronoun etc. resolve, take pattern match, judgement recognition methods based on machine learning judgment models to carry out information and be correlated with district
The discovery in territory determines.
4, object information finds
Based on the regression analysis model set up in advance, the title of information, content etc. are carried out pin after the step such as participle, dimensionality reduction
Each object type (being previously set, such as vegetables) is carried out regression analysis, determines whether info web has with destination object with this
Relation.Thus find the object type etc. that information is relevant.Thus, calmodulin binding domain CaM information, object type information etc., can align
True determination event.
5, trend tracking, early warning and displaying
After information filtering, area information discovery, object information find, setting up the characteristic parameter of expression event such as
On the basis of page number, page browsing number, composite index etc., by periodically calculating the method for affair character parameter value to event
Development trend is tracked;And to the current each characteristic ginseng value of event and before it average in regular period compare,
If difference is just and absolute value is persistently more than certain threshold value, then carry out event early warning;Afterwards by the result exhibition of early warning analysis
Show to relevant user and service for user.
6, event terminates to judge
Event to early warning, periodically calculates each characteristic ginseng value of event, and by each characteristic ginseng value current for event and its
In regular period, the average of (from early warning day) compares before, if difference is negative and absolute value is more than certain threshold
Value, then terminate the early warning for this event.
7, body supplements and revises
In view of the changes in distribution feature of internet information, from the angle of constantly improve method efficiency, periodically to letter
The result of the processes such as breath filtration, region and object information discovery is estimated, and based on this, deficiency in body is such as omitted,
Mistakes etc. are supplemented, are revised, to improve follow-up method efficiency.
The present invention be guarantee information filter, INFORMATION DISCOVERY accurate, efficient, establish that to meet food safety affair information special
The body of point, is mainly carried out from object, result, region, time, the several latitude of association person during setting up body.Wherein, for
Each example of area information body, establishes area code, postcode, abbreviation, showplace, adjacent domains, orientation, place respectively
The add list of six latitudes.
The present invention is to improve the degree of accuracy that event information finds, is carrying out subsequent treatment to crawling the internet information obtained
Before, first it is carried out information filtering process, including food security information filtration, garbage information filtering.
The degree of accuracy that the present invention judges to improve info web relevant range to identify, first carries out pre-place to info web
After reason, the correlation word being probably place name is carried out related resolution to obtain clear and definite word, afterwards by pattern match and judgement
The modes such as model judgement judge whether information can be included into target area, thereby determine that info web relevant range.
The degree of accuracy that the present invention determines to improve info web relevant range to judge, believes for the webpage after pretreated
Breath has carried out ground nounoun pronoun parsing, relative location resolution, non-standard Shaping etc. and has processed, thus solves non-standard ground noun
Language, the low problem of the info web relevant range accuracy of judgement degree that brought such as nounoun pronoun, relative position.
The present invention, during info web relevant range judges to determine, have employed the pattern for heading message successively
Method of completing the square, the method carrying out judging for the method for mode matching of text message, judgment models based on machine learning carry out letter
The judgement of breath relevant range.Wherein, in judgment models based on machine learning carries out the method judged, by integrated region
Judgment models carries out the judgement of information relevant range, it is to avoid of the same name, brought with word contrary opinion (such as generally word is as place name) etc.
The inaccurate problem of region decision.
The present invention is in object information discovery procedure, based on the regression analysis model set up in advance, to the title of information, interior
Hold etc. carry out the step such as participle, dimensionality reduction after carry out regression analysis for each object type, with this determine info web respectively with which
A little object type have relation.
The present invention periodically calculates the relation between each characteristic ginseng value of event and the average in the range of its certain time before,
When difference is just and absolute value lasts up to a certain extent, (standard deviations of such as 3 times) carry out event early warning timely.
The present invention periodically calculates its each characteristic ginseng value to the event of early warning, and by each characteristic ginseng value current for event and
Before it, in the regular period, the average of (from early warning day) compares, if difference is negative and absolute value is more than certain threshold
Value, then terminate the early warning for this event.
Compared with prior art, advantages of the present invention:
The present invention is by setting up food safety affair Information Ontology, and on this basis to crawling the internet information obtained
Take information filtering, area information discovery, object information discovery, event early warning, event to terminate the technology such as judgement and process, protect
Food safety affair INFORMATION DISCOVERY, the accuracy of early warning and comprehensive are demonstrate,proved, it is ensured that the efficiency of food safety affair early warning.
Accompanying drawing explanation
A kind of event method for early warning flow chart found based on region and object information of Fig. 1;
Fig. 2 area information body add list schematic diagram;
Fig. 3 info web is correlated with the recognition methods flow chart of region;
Fig. 4 info web is correlated with region determination methods schematic diagram;
Fig. 5 info web based on machine learning model is correlated with region determination methods schematic diagram;
Fig. 6 event method for early warning schematic diagram.
Detailed description of the invention
The detailed description of the invention of the present invention is as it is shown in figure 1, concrete steps are described below.
1, body is set up
The needs analyzed in view of feature and late events information extraction, the tracking etc. of food safety affair, at food
In the building process of security event information body, mainly consider to build from object, region, time, result, five latitudes of association person
Vertical.Such as object instant food, can be divided into the classification such as head product, converted products, and head product can be divided into again the classes such as veterinary antibiotics
Not, by that analogy;Such as result can be divided into the classifications such as pollution, poisoning, pollute can be divided into again expired, the classification such as exceed standard, with this
Analogize;Five classifications can be divided on such as region populations, be Asia, Europe, A Feili california, America respectively
Continent, Oceania;Each classification can be finely divided again, such as Asia can be divided into East Asia, West Asia, South Asia, north Asia, in
Sub-, six, Southeast Asia classification, by that analogy;Only can not be further divided into until being categorized into, the element being a bottom is (the most real
Example).The building process of other classifications is similar to.Meanwhile, for each example in body, establish respectively correspondence synonym,
The add lists such as antonym, another name word;Additionally, for the example in area information body, establish area code, postal volume respectively
Code, be called for short, showplace (mountain, lake, sea, river, island, building), adjacent domains the adjacent peer territory of direction (east, south, west, north etc.), institute
At the add list (as shown in Figure 2) of orientation (for relative upper level, such as middle part, south etc.) six latitudes, in case follow-up letter
Breath processing procedure uses.
2, information filtering
To specific information source, use internet information crawl technology (the most general crawl, the skill such as limited range crawls
Art) information in information source is crawled.In view of there may be on a website and the incoherent content of predetermined theme, with
And there may be the situation of various junk information, in order to improve event information and find, the degree of accuracy of early warning, after information is carried out
Before continuous process, first information is filtered.Whole filter process is divided into two aspects: instant food security information filters, rubbish
Rubbish information filtering.
Food security information filters, and i.e. judges whether the information gathered belongs to the information that food security is relevant.Here
Need to consider two problems: range of information, filtering rule.About filtering rule, based on the food safety affair information set up originally
Body, during primary consideration and two latitudes of result, the title of the concrete instances of ontology by combination the two latitude,
Attributes etc. take the method for pattern match to filter;The pattern match concrete grammar taked in method include Boolean matching,
Distance coupling between frequency matched, instance name, the mode such as instance name synonym antisense coupling, instance name alias match;Tool
The mode of body selects and specific rules is set up by determining (being determined in advance and regular update) after analyzing Information Statistics.About letter
The selection of breath scope, mainly considers the title of information, two latitudes of the information content, it is contemplated that message header and the information content
There may be unmatched situation, first the title of information is processed by concrete processing procedure, if through title is believed
After breath filters, information can be included into food security information classification, then be disposed this information;The otherwise content to information
Carry out secondary judgement process.
The rubbish suggestion that web spam can be divided in the web spam page and user-generated content two kinds.Wherein, web spam page
Face can be divided into the content cheating page, the link cheating page;Rubbish suggestion varies in size according to its negative effect, can be classified as not
Credible suggestion, low quality suggestion, unrelated suggestion.Insincere suggestion, the most fraudulent suggestion, on the one hand show as specific
Object, event, personage etc. be given and do not meet the superelevation evaluation of actual conditions, compliment etc.;On the other hand it is right to be likely to show as
Specific object, event, personage etc. be given do not meet actual conditions ultralow evaluation, abuse, attack etc..Low quality suggestion, this
Planting the general length of suggestion content shorter, its content is probably useful, it is also possible to useless, but owing to its content is to specific
Topic/product description the most detailed, it is impossible to determine very much its meaning to the opinion mining of specific topics/product, the most also recognize
For being a kind of rubbish suggestion (for computer).Unrelated suggestion, this kind of suggestion mainly show as advertisement or and topic without
The content closed.
To the low quality suggestion in the web spam page of a website, user-generated content, unrelated suggestion etc., it is contemplated that its
Characteristics of spam is relatively obvious, can extract the content of sample, content based on the sample set through mark set up in advance
The feature of the latitudes such as distribution, link (needs before extraction feature info web carries out meta-data extraction, text extraction, participle, sentence
Statistics, paragraph statistics, Anchor Text statistics, link statistics etc. process) after set up detection model and detect.About content latitude
Feature, this method have employed the information extracted is carried out participle, remove stop words and through dimensionality reduction (can use document frequency
Rate method, information gain method etc.) to form content feature vector-flexible strategy afterwards be term frequencies;About distribution of content feature, in this method
Have employed the length for heading (number of characters) of information, paragraph number, sentence number, bout length (average), sentence length (average), information
Length (number of characters), Anchor Text number, Anchor Text length (number of characters-average) etc. (are set up in model process, feature are carried out normalizing
Change processes, and process is y=x/ (max+1), the characteristic value before and after wherein x, y are normalization respectively, and max is in advance to site information
Maximum obtained by the interior sample statistics this feature of set;Time before max parameter updates if there is x > max, then take x=max
+ 1, i.e. y=1);About link latitude feature, go out in this method have employed the website of information chain number account for always go out chain number ratio,
The website of information chain number of going out accounts for the Information Number always gone out in chain number ratio, Info Link rubbish page set (in advance build) and accounts for always
Go out chain number ratio, the quantity of rubbish page set (building in advance) this information of internal chaining accounts for total page number ratio etc..For above-mentioned
The feature of three dimensions, based on the junk information set set up in advance and non-spam set, forms characteristic vector also respectively
The method (such as SVMs etc.) taking machine learning set up junk information detection model (three, based on update sample
Set regular update model), the new information gathered can be filtered that (information is judged as the rule of junk information afterwards
The testing result of at least two of which model is positive example).
Meanwhile, a website user is generated the duplicity rubbish suggestion in content, it is contemplated that characteristics of spam be not it is obvious that
Principle (the standard of duplicity rubbish suggestion sample to be ensured that it is not excessive to be would rather be scarce is followed during setting up rubbish suggestion sample set
Really property), (during this main to the information being probably duplicity rubbish suggestion in conjunction with modes such as the examination & verification in knowledge based storehouse, investigations
In user-generated content to be paid close attention to, content repeats or approximates the suggestion of repetition, in the range of certain time, issue suggestion amount is the highest
Suggestion that top-N1 author is issued, the meaning that top-N2 the special object that in the range of certain time, suggestion amount is the highest is relevant
See, in the range of certain time, issue the relevant suggestion in top-N3 the highest IP address of suggestion amount, issue meaning for special object
See suggestion that top-N4 user the earliest issued and for most top-N5 the use of the suggestion correction number of times of special object
The suggestion that family is issued, and form candidate's duplicity rubbish suggestion set) carry out examination & verification confirmation.Concrete takes two kinds of methods
Confirming, one is that forward confirms, one is reversely to confirm.So-called forward confirms, if argument information content and duplicity
What the information in rubbish suggestion knowledge base described is in same part thing, the i.e. information content and duplicity rubbish suggestion knowledge base
Certain information describes and matches, then be duplicity rubbish suggestion.Data entries in duplicity rubbish suggestion knowledge base increases rule
For: for an argument information, through process after a while or prove, taking advantage of really of the information that certain user is issued afterwards
The suggestion of deceiving property, adds in knowledge base.Such as people is had to release news containing melamine in certain brand milk in certain forum, but
Someone enumerated all reasons and illustrated that this was impossible later, proved that the latter is that the interior employee of certain brand milk company takes advantage of afterwards
Deceiving caused, the most i.e. can confirm that this argument information is duplicity junk information, in addition knowledge base, (knowledge base builds and fixed in advance
Phase updates).So-called reversely confirmation, i.e. occurs that this type of information is impossible under normal circumstances existing, thus from reverse
Angle proves duplicity rubbish suggestion.The most reversely confirm the rule in knowledge base (building in advance and regular update)
For: one or more products have been issued more than N (such as 10) bar meaning by a certain user id (such as 1 minute) in the setting time
See information, then these argument information that this user is delivered are labeled as duplicity rubbish argument information.This rule can be mated
One example is: in a certain forum, 3 kinds of different products have been issued 15 evaluations in the time less than 1 minute by a certain user id
Information, from the point of view of a normal person, this is impossible.Therefore, demonstrate what this user was issued from reverse angle
The duplicity of these information.The information confirmed by said method is labeled, and forms accurate duplicity rubbish suggestion collection
Close, simultaneously for the user of often issue duplicity rubbish suggestion, i.e. issue N number of user that duplicity rubbish suggestion is most, will
It is added to blacklist in case the later stage identifies use;It addition, according to duplicity rubbish suggestion set accurately etc., conclude suggestion author
Abnormal behaviour (the most above-mentioned user issued 15 information etc. for 3 kinds of products in 1 minute) formation rule, in case after
With.Notice clear and definite confirmation one suggestion be non-duplicity rubbish suggestion there is also suitable difficulty (for an information, it is impossible to
Clear and definite be shown to be duplicity rubbish suggestion may also mean that can not explicitly stated its be not duplicity rubbish suggestion), examine
The factors such as the diversity that worry exists to time, workload and non-duplicity rubbish suggestion, the most not to non-duplicity rubbish
Suggestion is labeled.
After establishing accurate duplicity rubbish suggestion set, from the point of view of judging to identify duplicity rubbish suggestion, at present
Detection model is set up after needing to select machine learning method, sample drawn feature.Notice and obtained warp through above-mentioned process
Cross the duplicity rubbish suggestion set of mark, and the argument information set without mark, but not through the non-deception of mark
Property rubbish suggestion set.This means that and can not use general Supervised machine learning method simply, because it sets up mould
Type needs to be provided simultaneously with positive example, counter-example set.So we are employed herein one " from positive example with without labeled data learning "
Machine learning method-biasing SVM (Liu, B., Y.Dai, X.Li, W.Lee, and P.Yu.Building text
classifiers using positiveand unlabeled examples. Proceedings of IEEE
International Conference on Data Mining,2003.)。
The determination of sample characteristics during setting up about detection model, mainly considers from four latitudes in the present invention: suggestion
Author, suggestion content, suggestion distribution of content, four latitudes of chain feature (need before extraction feature info web is carried out author etc.
The extraction of meta-data extraction, text, participle, part-of-speech tagging, the extraction of name entity, sentence statistics, paragraph statistics, punctuation mark system
Meter, link statistics etc. process).Wherein the determination method about suggestion content characteristic is: carry out the argument information extracted
Participle, removes stop words, and forms content feature vector after dimensionality reduction (can use document frequency method, information gain method etc.)
(flexible strategy are term frequencies);System of selection about suggestion distribution of content feature is to select: suggestion paragraph number, bout length are (all
Value), sentence number, sentence length (average), word number, first person pronoun number, second person pronoun number, third person pronoun number etc.
(setting up in model process, be normalized feature, process is y=x/ (max+1), before wherein x, y are normalization respectively
After characteristic value, max is in advance to the maximum obtained by sample statistics this feature in site information set;In max parameter more
Time before new if there is x > max, then take x=max+1, i.e. y=1);Feature selection approach for suggestion author's latitude is choosing
Select: suggestion user name (number of characters), suggestion issuing time (time interval of distance zero point on the same day), suggestion issuing time interval
(comparing with a upper information), suggestion number of words, suggestion number/hour (till this information), suggestion number of words changing ratio (with
A upper information is compared), suggestion number changing ratio (till this information, compared with upper one hour) etc. (set up model mistake
Cheng Zhong, is normalized feature, and process is y=x/ (max+1), the characteristic value before and after wherein x, y are normalization respectively,
Max is in advance to the maximum obtained by sample statistics this feature in site information set;Max parameter update before if there is
During x > max, then take x=max+1, i.e. y=1);System of selection for the chain feature latitude of argument information is to select: suggestion
The net entering chain number, argument information outside the website of chain number, argument information is gone out in entering the website of chain number, argument information in the website of information
Go out chain number, argument information of station links the Information Number in accurate duplicity rubbish suggestion set, accurate duplicity rubbish suggestion collection
(setting up in model process, be normalized feature, process is y=x/ to the quantity etc. of conjunction internal information link argument information
(max+1), the characteristic value before and after wherein x, y are normalization respectively, max is in advance to this spy of sample statistics in site information set
Levy obtained maximum;Time before max parameter updates if there is x > max, then take x=max+1, i.e. y=1);For upper
State the feature of four dimensions, based on above-mentioned steps set up accurate duplicity rubbish suggestion set and without mark sample set (i.e.
The set of other samples composition in user-generated content collections of web pages), form characteristic vector respectively and set up detection model (four
Individual, based on updating sample set regular update model).
Afterwards can to newly crawl the user-generated content information obtained carry out duplicity rubbish suggestion identification filter.First
First carrying out blacklist identification, to belonging to the information that in blacklist, user issues, Direct Recognition is duplicity rubbish suggestion;For surplus
Remaining suggestion, the rule concluded according to aforementioned process according to reversely confirming (i.e. existing under normal circumstances, occur that this type of information is
Impossible, thus prove duplicity rubbish suggestion from reverse angle) mode be identified, for abnormal meaning
See, be identified as duplicity rubbish suggestion;The duplicity rubbish suggestion detection mould that remaining suggestion is set up as procedure described above
Type is identified, and identification process is, argument information carries out the judgement of four models respectively, if at least three models judge
For positive example, then this information is identified as duplicity rubbish suggestion.
After above filtration step, (instant food is safety-related to participate in the information in follow-up processing procedure
Non-spam) relative mass is higher, and this accurately provides the foundation for what follow-up processed.
3, area information finds (as shown in Figure 3)
(1) info web pretreatment
Obtain and filtered info web crawling, extract its title, source, author, issuing time, issuing web site institute
At metadata informations such as ground and preserve, the body matter simultaneously extracting info web preserves.
To the info web title extracted, body matter, segmenter is used to carry out (including depending on based on statistics and dictionary to it
The body set up according to step 1 forms dictionary of place name) participle (and record the literary composition that word relative information title and body matter are constituted
This starts, the relative position terminated, affiliated sentence, relative sentence start and the characteristic parameter such as the relative position terminated), adopt afterwards
With based on vocabulary (vocabulary in advance arrange formation and regular update, including at the same time as name and place name word, have it
His specific meanings but be also likely to be the word etc. of place name simultaneously;One city of such as Wuzhong-Ningxia Hui Autonomous Region, can be simultaneously
Name;One county of Founder-Heilongjiang Province, can be upright company simultaneously;Although note that the word such as Wu containing specific suffix
Loyal city is then not got rid of) matching process the word being not likely to be place name is got rid of.
(2) nounoun pronoun resolves
There may be in the web page title information, text message of participle some represent places pronouns, such as this province,
This city, this province etc..Itself exact geographic location cannot be directly shown, it is therefore desirable to it is solved owing to these pronouns are literal
Analysis.
1) for carrying out the parsing of ground nounoun pronoun, initially setting up the sliding window that pronoun resolves, sliding window length L is the most true
Fixed (such as by determining after analytically word number distribution situation between nounoun pronoun and its antecedent).
2) the most selectively whether there is the rational geographical term (the Liao Dynasty that such as this province is corresponding in L word before nounoun pronoun after
Peaceful etc., based on setting up in advance rule judgment), if it is present use between the geographical term of following foundation and ground nounoun pronoun
Existence refers to the judgment models of relation and judges, if there is referring to relation, then determines pronoun pair according to referring to relation
The geographical term answered, resolves and terminates that (refer to, if there is multiple, the geographical terms that relation is set up, then chosen distance ground nounoun pronoun is
Near geographical term), otherwise carry out step 3).
3) if there is not rational geographical term or model in L word to judge that referring to relation does not exists, then select
Before ground nounoun pronoun, in 2L word, whether (without departing from whole sentence, such as identify with fullstop) exists rational geographical term, as
Fruit exists, then use the judgment models that whether there is the relation of referring between the geographical term of following foundation and ground nounoun pronoun to sentence
Disconnected, if there is referring to relation, then according to referring to the geographical term that relation determines that pronoun is corresponding, resolve and terminate (if there is many
The individual geographical term referring to relation establishment, then the geographical term that chosen distance ground nounoun pronoun is nearest), otherwise carry out step 4).
4) if there is not rational geographical term or model in 2L word to judge that referring to relation does not exists, then basis
The information source obtained in metadata extraction process or location, website use extraction or the method nounoun pronoun definitely replaced
Refer to place name.
The method for building up of judgment models: compile the info web comprising ground nounoun pronoun etc. and form sample set, and right
Each ground nounoun pronoun and 2L (L length sync rapid 1) before it in sample set information) geographical term in individual word (without departing from
Sentence range) between the relation that refers to be labeled, as class variable;To each ground nounoun pronoun in sample set information and its
2L (L length sync rapid 1) before) relation between geographical term (without departing from sentence range) in individual word extracts dependency number
According to, set up message sample about this characteristic vector of relation between nounoun pronoun and geographical term over the ground: include geographical term suffix
(suffix i.e. represents place name or has place name feature, " autonomous region " in such as " Xinjiang Uygur Autonomous Regions ") length (suffix
Number of words is divided by text size), geographical term and ground nounoun pronoun between distance (word number is divided by text size), geographical term distance
Relative distance (word number is divided by text size) that text starts, nounoun pronoun distance text start relative distance (word number divided by
Text size), geographical term distance sentence start relative distance (word number is divided by text size), nounoun pronoun distance sentence open
(word number is long divided by text for the relative distance that the relative distance (word number is divided by text size) of beginning, geographical term distance sentence terminates
Degree), the nounoun pronoun distance relative distance (word number is divided by text size) etc. that terminates of sentence;Select machine learning method afterwards
Whether (such as svm) sets up between geographical term and ground nounoun pronoun based on above-mentioned sample set, class variable and characteristic vector
There is the judgment models of the relation that refers to.
Whether there is the method that the relation of referring to carries out judging over the ground between nounoun pronoun and geographical term based on judgment models is:
First extracting the related data of relation between geographical term and ground nounoun pronoun and form characteristic vector, the data of extraction specifically include ground
(word number is divided by literary composition for distance between reason noun suffix lengths (suffix number of words is divided by text size), geographical term and ground nounoun pronoun
This length), geographical term distance text start relative distance (word number is divided by text size), nounoun pronoun distance text start
Relative distance (word number is divided by text size), (word number is long divided by text for the geographical term distance relative distance that starts of sentence
Degree), nounoun pronoun distance sentence start relative distance (word number is divided by text size), the geographical term distance phase that terminates of sentence
Adjust the distance (word number is divided by text size), the nounoun pronoun distance relative distance (word number is divided by text size) etc. that terminates of sentence.
Judgment models based on above-mentioned foundation is identified judging afterwards, and according to judged result nounoun pronoun definitely and geographical term it
Between the relation that refers to whether exist.
(3) non-standard words resolves
In the web page title information, text message of participle, there may be some words representing place employ
, there are beijing, bj etc. as in Chinese text in off-gauge linguistic form.To this, based on the standard word and non-standard set up
The word table of comparisons (is set up and regular update) in advance, to off-gauge place name word form by the way of being replaced after inquiry
Resolve.
(4) relative location resolution
In the web page title information, text message of participle, there may be some words representing place employ relatively
The expression way of position, such as southwest China province etc..Same, these Expression of languages do not have clear and definite place name name yet
Claim.For solving this problem, based on the area information instances of ontology set up in step 1 and add list thereof, to these relative lane place
Domain information is inquired about and is resolved, and obtains place name word accurately and (such as to southwest China province, believes in conjunction with the region set up
Breath body, first looks for the province title belonging to China, and the province belonging to each is inquired about the attached of its orientation, place latitude
Add table, the province that orientation, all places is southwest is extracted, substitute southwest China province accordingly, complete to resolve).
(5) region determines
The determination work of information associated area can be carried out after info web has been carried out pretreatment and related resolution, this
During mainly include two steps: be respectively adopted pattern match, machine learning judgment models carries out sentencing of information relevant range
Disconnected (as shown in Figure 4).
What region determined aims at identification information relevant range, and the discovery for food safety affair information provides region base
Plinth.Considering the problems such as accuracy, amount of calculation and operability, the method first taking pattern match during this is entered
OK.Here need to consider two problems: range of information, matched rule.About matched rule, based on the area information set up originally
Body (i.e. dimension latitude in region in body), during main consider part body instance name, attribute etc., concrete by combination
The title of these instances of ontology, attribute etc. take the method for pattern match to judge;The pattern match tool taked in method
Body method includes the modes such as the distance coupling between Boolean matching, frequency matched, instance name;Concrete mode selects and specifically advises
Then set up by determining (being determined in advance and regular update) after Information Statistics are analyzed.About the selection of range of information, lead here
The title of information to be considered, two latitudes of the information content, it is contemplated that message header and the information content there may be unmatched feelings
Condition, first processes the title of information in concrete processing procedure, if the title of information is used above-mentioned pattern match
After method processes, information can be included into currently selected region (such as Beijing), then the pattern match for this region processes
Complete;Otherwise the content to this information uses above-mentioned method for mode matching to carry out quadratic modes matching treatment for this region.
Follow the principle that it is not excessive to be would rather be scarce during this, ensure to identify the degree of accuracy of judged result as far as possible.
If through above-mentioned pattern matching process, this information cannot be included into a certain region, then use based on machine learning
The region decision model that method is set up carries out third time and judges to determine.The process setting up region decision model in advance is: based on whole
Info web sample set that reason (same to step (1)-(4)), mark (whether being associated with certain region) are crossed (is set up in advance and periodically
Update), the title of message sample, content word (selecting and instances of ontology title, the word of attributes match) are combined-
By these words according to administrative place name (referring to province, city etc.), area code, postcode, abbreviation, showplace (mountain, lake, sea, river, island
Small island, building etc.) five classifications carry out sorting out five characteristic vectors of composition (wherein in vector, term weighing is term frequencies, it is considered to
To the importance of title word, the weight of title word is multiplied by pre-determined multiple).Afterwards, machine learning method is used
(SVMs etc.) to each target area set up region decision models based on above-mentioned five characteristic vectors (5, based on more
New sample set regular update model).Information is carried out third time and judges that the process determined is: will be through step (1)-(4)
Process, resolve after but cannot be included into the title of some region of information, content word (select and instances of ontology title, attribute
The word of coupling) combine: according to administrative place name (referring to province, city etc.), area code, postcode, abbreviation, showplace (mountain,
Lake, sea, river, island, building etc.) five classifications carry out sorting out five vectors of composition (wherein in vector, term weighing are word frequency
Rate, it is contemplated that the importance of title word, is multiplied by pre-determined multiple to the weight of title word), and respectively to these five
Vector uses five region decision models of aforementioned foundation to carry out detection judgement, and the result judging detection is weighted
(flexible strategy are true divided by the method for word frequency sum in five classifications according to word frequency sum in classification each in info web
Fixed), if weighing computation results is more than the threshold value being previously set, then this information can be included into this region;Otherwise, then this information is not
This region (as shown in Figure 5) can be included into.
4, object information finds
The i.e. object type identification of object information discovery of info web, i.e. determine the content described by info web and which kind of
Object is about (and relevant with which kind of event factor, cause which kind of consequence) etc..Its objective is to combine discovery in info web
Area information, object information etc. the most uniquely determine event.
To this end, consider the problems such as the accuracy of identification, amount of calculation and operability, during take regression analysis
Method carry out.The range of information used in method, is message header and the content of each webpage to be combined, and carries out
Participle, remove stop words, dimensionality reduction after to form the term weighing of the characteristic vector (as independent variable) of this webpage-wherein be term frequencies,
In view of the importance of title word, the weight of title word is multiplied by pre-determined multiple;Same, to and body in right
As, result, association person's instance name, the term weighing of attributes match be multiplied by pre-determined multiple.For each object type,
The characteristic vector data of above-mentioned webpage is substituted into corresponding logistic regression model (in advance to need kind and the foundation distinguished
Sample set based on set up model) in, judge according to Regression Analysis Result, this info web whether with this object type
There is relation.
Wherein, the method for building up of regression analysis model is: based on the info web sample set arranging, marking (in advance
Set up and regular update), after the title of message sample, content word combined and carry out participle, removing stop words, dimensionality reduction
Forming characteristic vector (as independent variable)-wherein term weighing is term frequencies, it is contemplated that the importance of title word, to title
The weight of word is multiplied by pre-determined multiple;Same, to and body in object, result, association person's instance name, attribute
The term weighing joined is multiplied by pre-determined multiple;Object type belonging to info web is labeled simultaneously (1 represent belong to
This object type, 0 expression are not belonging to this object type, as dependent variable), use logistic method to set up pin based on this
Regression analysis model to each object type.
5, trend tracking, early warning and displaying
From the point of view of practice, in conjunction with the area information found in abovementioned steps, object type information etc., can align
True determination event (i.e. representing, with the common factor of the information belonging to above-mentioned two latitude, the information that event is relevant).
On the basis of the region of info web and object type key element identification, set up the characteristic parameter-tool of expression event
Information page number that the employing of body is relevant with event, page browsing number, the page forward number, specific website page browsing number, specific
Under domain name, (being obtained by the method summary parameter of weighting, flexible strategy pass through Dare for website page browsing number and composite index
Philippine side method determines, but need to ensure that flexible strategy sum is 1) etc. represent the feature of event, and periodically feature is joined by (such as every 1 hour)
Number carries out calculating process.And according to the change of time, the comprehensive situation of change analyzing these affair character parameters.
On the basis of above-mentioned event trend is followed the trail of, periodically (the most every 12 hours) calculate each characteristic parameter of expression event
(including composite index) numerical value, and the average in the regular period before each characteristic ginseng value current for event and its (is examined at present
Consider the feature propagated to network event, have selected one month as the calculating cycle, it is possible to be adjusted according to situation) compare,
If difference is just and absolute value is more than certain threshold value (standard deviation of such as 3 times, threshold value is previously set), part enters the most as to this
Row early warning initializes.
This having carried out early warning initialized event afterwards be tracked, periodically (the most every 12 hours) calculate expression event
Each characteristic parameter (including composite index) numerical value, and by before each characteristic ginseng value current for event and its in regular period
Average (being presently contemplated that the feature that network event is propagated, one month before selecting early warning to initialize as the cycle of calculating, it is possible to
It is adjusted according to situation) compare, if difference lasting (such as 24 hours, be determined in advance) is just and absolute value is more than certain
Threshold value (standard deviation of such as 3 times, threshold value is previously set), part carries out formal early warning (as shown in Figure 6) the most as to this.Otherwise
Cancel the early warning Initialize installation of part as to this.
Wherein threshold value determination method is: at history (in such as 1 year) the delta data base of each characteristic parameter of Collection Events
On plinth, and combine the time of origin of history food safety affair through confirming, region, the data such as scale (can be pacified from food
Total correlation administrative department obtains), calculate each characteristic ginseng value of event and the average of (such as month) in its regular period before
Between difference form variable-as independent variable, would indicate that whether special properties food safety affair occurs (1 represent occur, 0
Represent and do not occur) variable as dependent variable, use the method for logistic regression analysis set up above-mentioned independent variable, dependent variable it
Between regressive prediction model.Based on this model, the historical variations trend characteristic of binding events characteristic parameter, select so that because of
Variate-value is that the suitable argument value of 1 is as threshold value.
Obtained info web is being carried out information filtering, event information discovery, trend tracking and the base of early warning analysis
On plinth, result analysis obtained shows user by the way of form, figure etc..And provide short message, postal to early warning information
Parts etc. send the method for service sent out immediately.
6, event terminates to judge
The event of alignment type early warning, on the basis of above-mentioned event trend is followed the trail of, periodically (the most every 12 hours) computational chart
Show each characteristic parameter (the including composite index) numerical value of event, and by a timing before each characteristic ginseng value current for event and its
Average in phase (is presently contemplated that the feature that network event is propagated, have selected and start to start day to calculating the previous day day from early warning
Till as the cycle of calculating, it is possible to be adjusted according to situation) compare, if difference is negative and absolute value is more than certain threshold
Value (standard deviation of such as 3 times, threshold value is previously set), then it is assumed that this event terminates.Terminate the early warning of part as to this.
7, body supplements and revises
Event information find, early warning analysis whole during, the food safety affair Information Ontology of structure is to information
The performance of the steps such as filtration, INFORMATION DISCOVERY has important impact.Accordingly, it is considered to the changes in distribution feature of internet information,
From the angle of constantly improve method efficiency, need periodically the result of the process such as information filtering, INFORMATION DISCOVERY to be estimated.
And the deficiency in body is such as omitted, mistake etc. is supplemented, revise, the efficiency follow-up to improve method.
Thus, intactly achieve from crawling extraction food safety affair information the internet information obtained, and according to
Event evolution carries out early warning and the overall process for user's service in time.During, by taking information filtering, area information
Discovery, object type INFORMATION DISCOVERY, trend are followed the trail of and the technology such as early warning ensure that event information find, early warning accurate.This will
Important Information base is provided for the Risk-warning of food safety affair, quick emergency processing etc..
What deserves to be explained is, the present invention cannot be only used for the contingency management of food safety affair, transforms a little, can apply
To other, in the emergency processing work such as the Risk-warning of the unconventional accident that can obtain event information from internet.
Claims (11)
1. the event method for early warning found based on region and object information, the steps include:
1) set up a food safety affair Information Ontology, and each example in body is set up an add list respectively;
2) info web crawled is filtered, obtain the non-junk info web relevant to food safety affair;
3) represent that the word in place resolves in the info web after filtering, obtain place name word accurately;Based on described
In food safety affair Information Ontology, the instances of ontology title of region dimension, attribute use method for mode matching to the net after resolving
Page information processes, and info web is included into the region that the match is successful;
4) for the object type of each setting, utilize regression analysis model that info web is processed, it is judged that each webpage
Relevant object type;Wherein, the method utilizing regression analysis model to process info web is: by the letter of each webpage
Breath title and content combine, and carry out participle, remove stop words, dimensionality reduction after form the characteristic vector of this webpage, by webpage
Characteristic vector as the independent variable of regression analysis model, webpage is processed, it is judged that it is the most relevant to object type;
5) according to step 3), 4) the webpage affiliated area determined and relevant object type thereof, obtain setting regions, object
The info web set of event, sets up the characteristic parameter of event and periodically calculates characteristic ginseng value, if the feature ginseng of certain event
Numerical value persistently set the time exceed setting threshold value then this event is carried out early warning.
2. the method for claim 1, it is characterised in that the side that the word representing place in info web is resolved
Method is:
1) for ground nounoun pronoun, judge whether exist between ground nounoun pronoun and its geographical term above occurred by a judgment models
Refer to relation, if it is present ground nounoun pronoun is replaced with corresponding geographical term;
2) based on standard word and the non-standard word table of comparisons, place name word non-standard in word is resolved, by non-standard words
Language replaces with standard word;
3) based on the region dimension in described food safety affair Information Ontology, the relative position area information in word is carried out
Resolve, obtain place name word accurately;
Wherein, the method for building up of described judgment models is: the info web comprising ground nounoun pronoun is formed a sample set, and right
In sample set the relation that refers between nounoun pronoun and the geographical term before it be labeled, as class variable;Set up
The characteristic vector of relation between ground nounoun pronoun and the geographical term before it: then select machine learning method based on described sample
Set, class variable and characteristic vector set up the judgment models that whether there is the relation of referring between geographical term and ground nounoun pronoun;
Wherein, it is judged that the method that whether there is the relation of referring between ground nounoun pronoun and its geographical term above occurred is: calculate
Between ground nounoun pronoun and geographical term, the characteristic vector value of relation, utilizes described judgment models to sentence described characteristic vector value
Disconnected, whether the relation that refers between nounoun pronoun and geographical term exists definitely.
3. method as claimed in claim 1 or 2, it is characterised in that described food safety affair Information Ontology includes object, district
Territory, time, result, five latitudes of association person;The content of described add list includes synonym, antonym, three latitudes of another name word;
Wherein, for region dimension, the content of annex table also includes area code, postcode, abbreviation, showplace, adjacent domains, institute
At six, orientation latitude.
4. method as claimed in claim 3, it is characterised in that step 3) to the word representing place in the info web after filtering
Language uses segmenter that message header and body matter carry out participle before resolving, and records participle gained word and relatively believe
The phase that breath title starts with the text that body matter is constituted, the relative position terminated, affiliated sentence, relative sentence start and terminate
To position.
5. method as claimed in claim 4, it is characterised in that initially setting up a noun list dubiously, record can be used as other names
The place name claimed, then with described noun list dubiously to step 3) participle gained word mates, and filters the word of coupling;Its
In, if the word of coupling has the suffix representing place name, then retain this word.
6. method as claimed in claim 2, it is characterised in that the ground nounoun pronoun representing place in info web is resolved
Method be:
61) sliding window of a length of L that a pronoun resolves is set up;
62) selectively whether there is geographical term in L word before nounoun pronoun, if it is present use judgment models to sentence
Disconnected, if there is referring to relation, then according to referring to the geographical term that relation determines that pronoun is corresponding, parsing terminates, and otherwise walks
Rapid 63);
63) selectively whether there is geographical term in 2L word before nounoun pronoun, if it is present use judgment models to sentence
Disconnected, if there is referring to relation, then according to referring to the geographical term that relation determines that pronoun is corresponding, parsing terminates, and otherwise walks
Rapid 64);
64) use the method extracted or replace true according to the information source obtained in metadata extraction process or location, website
Surely nounoun pronoun refer to place name.
7. the method as described in claim 2 or 6, it is characterised in that the component bag of the sampling feature vectors in described judgment models
Include: geographical term suffix lengths, geographical term and ground nounoun pronoun between distance, geographical term distance text start relative away from
From, nounoun pronoun distance text start relative distance, geographical term distance sentence start relative distance, nounoun pronoun distance
Relative distance that relative distance that sentence starts, geographical term distance sentence terminates, nounoun pronoun distance sentence terminate relative
Distance.
8. method as claimed in claim 6, it is characterised in that in step 62) in, if existed in L word before ground nounoun pronoun
Multiple geographical terms referring to relation establishment, then the geographical term that chosen distance ground nounoun pronoun is nearest;In step 64) in, if
Multiple geographical term referring to relation establishment, the then ground that chosen distance ground nounoun pronoun is nearest is there is in 2L word before ground nounoun pronoun
Reason noun.
9. the method for claim 1, it is characterised in that the characteristic parameter of described event includes: the information relevant to event
Page number, page browsing number, the page forward number, the page browsing number setting website, set under domain name website page browsing number with
And the composite index of above-mentioned parameter.
10. method as claimed in claim 1 or 2, it is characterised in that periodically calculate the numerical value of described characteristic parameter, and by event
Current each characteristic ginseng value and the average in its regular period before compare, if difference is just and absolute value is more than one
Fixed threshold value, it is determined that part carries out early warning initialization as to this;To having carried out the event of early warning Initialize installation, continue periodically meter
Calculate the numerical value of described characteristic parameter, and the average in the regular period before each characteristic ginseng value current for event and its is compared
Relatively, if difference is just and absolute value is persistently more than certain threshold value, part carries out formal early warning the most as to this, otherwise cancels this
The early warning Initialize installation of event.
11. methods as claimed in claim 10, it is characterised in that the event of alignment type early warning, periodically calculate each of expression event
Characteristic parameter numerical value, and the average in the regular period before each characteristic ginseng value current for event and its is compared, if
Difference is negative and absolute value is more than certain threshold value, terminates the early warning of part as to this.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210501970.0A CN103853700B (en) | 2012-11-29 | 2012-11-29 | A kind of event method for early warning found based on region and object information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210501970.0A CN103853700B (en) | 2012-11-29 | 2012-11-29 | A kind of event method for early warning found based on region and object information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103853700A CN103853700A (en) | 2014-06-11 |
CN103853700B true CN103853700B (en) | 2016-09-07 |
Family
ID=50861368
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210501970.0A Active CN103853700B (en) | 2012-11-29 | 2012-11-29 | A kind of event method for early warning found based on region and object information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103853700B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105068989B (en) * | 2015-07-23 | 2018-05-04 | 中国测绘科学研究院 | Place name address extraction method and device |
CN106547913B (en) * | 2016-11-25 | 2020-04-21 | 网易(杭州)网络有限公司 | Page information collection and classification feedback method, device and system |
CN107103052B (en) * | 2017-01-23 | 2020-09-15 | 南威软件股份有限公司 | Big data early warning method based on data model |
CN107357888B (en) * | 2017-07-10 | 2021-06-15 | 北京星选科技有限公司 | Method and device for providing raw material information and electronic equipment |
CN110727793B (en) * | 2018-06-28 | 2023-03-24 | 百度在线网络技术(北京)有限公司 | Method, device, terminal and computer readable storage medium for area identification |
CN109558966B (en) * | 2018-10-28 | 2022-05-17 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Processing system for intelligently judging evidence and predicting occurrence of event |
CN110457562A (en) * | 2019-08-15 | 2019-11-15 | 中国农业大学 | A kind of food safety affair classification method and device based on neural network model |
CN110688557A (en) * | 2019-09-23 | 2020-01-14 | 中国农业大学 | Food safety event-oriented early warning method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101488150A (en) * | 2009-03-04 | 2009-07-22 | 哈尔滨工程大学 | Real-time multi-view network focus event analysis apparatus and analysis method |
CN101751458A (en) * | 2009-12-31 | 2010-06-23 | 暨南大学 | Network public sentiment monitoring system and method |
CN102193951A (en) * | 2010-03-19 | 2011-09-21 | 华为技术有限公司 | Information extracting method and system |
CN102708096A (en) * | 2012-05-29 | 2012-10-03 | 代松 | Network intelligence public sentiment monitoring system based on semantics and work method thereof |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5390840B2 (en) * | 2008-11-27 | 2014-01-15 | 株式会社日立製作所 | Information analyzer |
-
2012
- 2012-11-29 CN CN201210501970.0A patent/CN103853700B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101488150A (en) * | 2009-03-04 | 2009-07-22 | 哈尔滨工程大学 | Real-time multi-view network focus event analysis apparatus and analysis method |
CN101751458A (en) * | 2009-12-31 | 2010-06-23 | 暨南大学 | Network public sentiment monitoring system and method |
CN102193951A (en) * | 2010-03-19 | 2011-09-21 | 华为技术有限公司 | Information extracting method and system |
CN102708096A (en) * | 2012-05-29 | 2012-10-03 | 代松 | Network intelligence public sentiment monitoring system based on semantics and work method thereof |
Non-Patent Citations (1)
Title |
---|
突发公共事件网络舆情监测指标体系研究;谈国新等;《华中师范大学学报(人文社会科学版)》;20100531;第49卷(第3期);第66-69页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103853700A (en) | 2014-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103854064B (en) | Event occurrence risk prediction and early warning method targeted to specific zone | |
CN103853700B (en) | A kind of event method for early warning found based on region and object information | |
CN103854063B (en) | A kind of prediction of event occurrence risk method for early warning based on internet opening imformation | |
CN103176981B (en) | A kind of event information excavates and the method for early warning | |
CN103853738B (en) | A kind of recognition methods of info web correlation region | |
Bozarth et al. | Toward a better performance evaluation framework for fake news classification | |
CN106598944B (en) | A kind of civil aviaton's security public sentiment sentiment analysis method | |
US11048712B2 (en) | Real-time and adaptive data mining | |
US9229977B2 (en) | Real-time and adaptive data mining | |
CN105138570B (en) | The doubtful crime degree calculation method of network speech data | |
CN103853744B (en) | Deceptive junk comment detection method oriented to user generated contents | |
CN107609103A (en) | It is a kind of based on push away spy event detecting method | |
CN103176984B (en) | Duplicity rubbish suggestion detection method in a kind of user-generated content | |
CN110457404A (en) | Social media account-classification method based on complex heterogeneous network | |
CN103092975A (en) | Detection and filter method of network community garbage information based on topic consensus coverage rate | |
CN103810274A (en) | Multi-feature image tag sorting method based on WordNet semantic similarity | |
CN105893484A (en) | Microblog Spammer recognition method based on text characteristics and behavior characteristics | |
CN102170447A (en) | Method for detecting phishing webpage based on nearest neighbour and similarity measurement | |
CN114693906A (en) | Travel reimbursement abnormal behavior detection method and system based on space-time rule | |
Marivate et al. | Catching crime: Detection of public safety incidents using social media | |
CN103684896B (en) | Method of detecting website cheating based on domain name resolution characteristics | |
CN101350019B (en) | Method for abstracting web page information based on vector model between predefined slots | |
CN109033351A (en) | The merging method and device of merit data | |
Mou et al. | Align voting behavior with public statements for legislator representation learning | |
Wei et al. | Location-based event detection using geotagged semantic graphs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |