CN103853700B - A kind of event method for early warning found based on region and object information - Google Patents

A kind of event method for early warning found based on region and object information Download PDF

Info

Publication number
CN103853700B
CN103853700B CN201210501970.0A CN201210501970A CN103853700B CN 103853700 B CN103853700 B CN 103853700B CN 201210501970 A CN201210501970 A CN 201210501970A CN 103853700 B CN103853700 B CN 103853700B
Authority
CN
China
Prior art keywords
information
pronoun
event
word
nounoun
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210501970.0A
Other languages
Chinese (zh)
Other versions
CN103853700A (en
Inventor
杨风雷
黎建辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN201210501970.0A priority Critical patent/CN103853700B/en
Publication of CN103853700A publication Critical patent/CN103853700A/en
Application granted granted Critical
Publication of CN103853700B publication Critical patent/CN103853700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of event method for early warning found based on region and object information.The method include the steps that 1) info web crawled is filtered, obtain the non-junk info web relevant to food safety affair;2) word representing place in info web is resolved, obtain place name word accurately;Based on built event information body, info web is processed, info web is included into the region that the match is successful;3) utilize regression analysis model that info web is processed, it is judged that the object type that each webpage is relevant;4) according to webpage affiliated area and relevant object type thereof, obtain the info web set of the event of setting regions, object, set up the characteristic parameter of event and periodically calculate characteristic ginseng value, if the characteristic ginseng value of certain event is continued above setting threshold value, this event being carried out early warning.The present invention improves the accuracy of event early warning and comprehensive, it is ensured that the efficiency of food safety affair early warning.

Description

A kind of event method for early warning found based on region and object information
Technical field
The invention belongs to areas of information technology, particularly relate to a kind of carry out specific place to crawling the internet information obtained Reason, the method carrying out event early warning on the basis of the region that event occurs, object type etc. carry out INFORMATION DISCOVERY, mainly should In the emergency processing of the unconventional accidents such as food safety affair information monitoring, Risk-warning works.
Background technology
In recent years, food safety affair such as toxic capsule, twice-cooked stir-frying oil, clenbuterol hydrochloride, dyeing steamed bun, plasticiser, poison cucumber etc. Again and again occurring, this had both caused the worst social influence, had also brought substantial amounts of economic loss.In order to avoid or to greatest extent Reducing the harm that these food safety affairs are brought, Risk-warning technology based on event starts to have obtained very big concern.For Carry out Risk-warning based on event, this information being accomplished by finding these events in advance.
Along with the fast development of Internet, internet netizen's quantity is more and more huger, and internet is increasingly becoming netizen and sends out Cloth information, acquisition information and the main carriers of transmission information, and by defining alternately and showing between people, tissue etc. There are certain correspondence, the virtual society of incidence relation in real society.It has had changed into worldwide largest common data source, And its scale the most ceaselessly increases.Under this situation, utilize the feature of internet self, it is established that perfect society's letter Breath feedback network, finds various " possible trouble " factor that may bring crisis in advance, and the contingency management for food safety affair provides In time, accurately, comprehensive information just seems imperative and has very important meaning.
For utilizing the information on internet to carry out the Risk-warning of food safety affair, need to obtain through certain process The information that event is relevant.This crawls firstly the need of carrying out internet information, can carry out the letter that food safety affair is relevant afterwards Breath extracts, finds work, to be developed can carry out early warning the most afterwards.Inside such a process, key therein Step is by the identification of event information, and this can be by the various machine learning having supervision or unsupervised machine in theory Learning method realizes, but combines actual information requirement and the consideration of the degree of accuracy, operability etc., often takes some accommodations Measure.It is: set up some information classifications (such as disease) in advance to receive for each classification than the mode taked if any research work Collect some keywords, afterwards to the info web collected based on these classification and keyword, take Keywords matching Mode carries out information classification, and the monitoring classification information i.e. development of event on this basis.Also research work is had to have employed information Correlation detection, name Entity recognition, utilize disease and the step such as the information extraction of address, visual displaying result to carry out The identification of event information and the way of judgement.
In terms of the result of evaluation test, in above-mentioned way, the judgement of event information, identification, early warning etc. there is also in performance Not enough (parameter such as accuracy rate, recall rate needs to be improved further).To this, if it is considered that in said method in non-consideration information Exist the impact of various junk information, the degree of accuracy of information extraction technology up to now is the most sufficiently high and directly will pass through The classification information obtained after Keywords matching there may be main body corresponding to information not in the way as same event information The problem such as consistent, it is also not at all surprising to there is the deficiency in performance in said method.
Summary of the invention
For solving above-mentioned problem, it is an object of the invention to provide and a kind of take the particular step content to info web It is analyzed, therefrom extracts the key elements such as the generation area of event, object type to identify institute event, afterwards according to event The development trend method that carries out early warning.Using for reference intelligence system thinking in method, the step of formation is as described below.
1, body is set up
The feature of based food security incident and the needs of later stage information analysis, from object, region, result, association person, time Between wait latitude set up food safety affair Information Ontology.Thus provide for the information filtering of food safety affair, INFORMATION DISCOVERY etc. Basis.
2, information filtering
On the basis of the body of above-mentioned foundation, filter crawling the info web obtained.Filter process is broadly divided into Two parts: food security information filters, garbage information filtering.Wherein the former is mainly by the title of information, content etc. The method taking pattern match determines if to belong to food security information;The latter mainly practises fraud to by content, link Unrelated suggestion, low quality suggestion and duplicity rubbish suggestion in junk information and user-generated content is by the inspection set up Survey model to filter.Thus ensure to enter the quality of the information of subsequent process.
3, area information finds
On the basis of the area information body of above-mentioned foundation, the title of the information after crawling, filtering, content etc. are carried out ground After nounoun pronoun etc. resolve, take pattern match, judgement recognition methods based on machine learning judgment models to carry out information and be correlated with district The discovery in territory determines.
4, object information finds
Based on the regression analysis model set up in advance, the title of information, content etc. are carried out pin after the step such as participle, dimensionality reduction Each object type (being previously set, such as vegetables) is carried out regression analysis, determines whether info web has with destination object with this Relation.Thus find the object type etc. that information is relevant.Thus, calmodulin binding domain CaM information, object type information etc., can align True determination event.
5, trend tracking, early warning and displaying
After information filtering, area information discovery, object information find, setting up the characteristic parameter of expression event such as On the basis of page number, page browsing number, composite index etc., by periodically calculating the method for affair character parameter value to event Development trend is tracked;And to the current each characteristic ginseng value of event and before it average in regular period compare, If difference is just and absolute value is persistently more than certain threshold value, then carry out event early warning;Afterwards by the result exhibition of early warning analysis Show to relevant user and service for user.
6, event terminates to judge
Event to early warning, periodically calculates each characteristic ginseng value of event, and by each characteristic ginseng value current for event and its In regular period, the average of (from early warning day) compares before, if difference is negative and absolute value is more than certain threshold Value, then terminate the early warning for this event.
7, body supplements and revises
In view of the changes in distribution feature of internet information, from the angle of constantly improve method efficiency, periodically to letter The result of the processes such as breath filtration, region and object information discovery is estimated, and based on this, deficiency in body is such as omitted, Mistakes etc. are supplemented, are revised, to improve follow-up method efficiency.
The present invention be guarantee information filter, INFORMATION DISCOVERY accurate, efficient, establish that to meet food safety affair information special The body of point, is mainly carried out from object, result, region, time, the several latitude of association person during setting up body.Wherein, for Each example of area information body, establishes area code, postcode, abbreviation, showplace, adjacent domains, orientation, place respectively The add list of six latitudes.
The present invention is to improve the degree of accuracy that event information finds, is carrying out subsequent treatment to crawling the internet information obtained Before, first it is carried out information filtering process, including food security information filtration, garbage information filtering.
The degree of accuracy that the present invention judges to improve info web relevant range to identify, first carries out pre-place to info web After reason, the correlation word being probably place name is carried out related resolution to obtain clear and definite word, afterwards by pattern match and judgement The modes such as model judgement judge whether information can be included into target area, thereby determine that info web relevant range.
The degree of accuracy that the present invention determines to improve info web relevant range to judge, believes for the webpage after pretreated Breath has carried out ground nounoun pronoun parsing, relative location resolution, non-standard Shaping etc. and has processed, thus solves non-standard ground noun Language, the low problem of the info web relevant range accuracy of judgement degree that brought such as nounoun pronoun, relative position.
The present invention, during info web relevant range judges to determine, have employed the pattern for heading message successively Method of completing the square, the method carrying out judging for the method for mode matching of text message, judgment models based on machine learning carry out letter The judgement of breath relevant range.Wherein, in judgment models based on machine learning carries out the method judged, by integrated region Judgment models carries out the judgement of information relevant range, it is to avoid of the same name, brought with word contrary opinion (such as generally word is as place name) etc. The inaccurate problem of region decision.
The present invention is in object information discovery procedure, based on the regression analysis model set up in advance, to the title of information, interior Hold etc. carry out the step such as participle, dimensionality reduction after carry out regression analysis for each object type, with this determine info web respectively with which A little object type have relation.
The present invention periodically calculates the relation between each characteristic ginseng value of event and the average in the range of its certain time before, When difference is just and absolute value lasts up to a certain extent, (standard deviations of such as 3 times) carry out event early warning timely.
The present invention periodically calculates its each characteristic ginseng value to the event of early warning, and by each characteristic ginseng value current for event and Before it, in the regular period, the average of (from early warning day) compares, if difference is negative and absolute value is more than certain threshold Value, then terminate the early warning for this event.
Compared with prior art, advantages of the present invention:
The present invention is by setting up food safety affair Information Ontology, and on this basis to crawling the internet information obtained Take information filtering, area information discovery, object information discovery, event early warning, event to terminate the technology such as judgement and process, protect Food safety affair INFORMATION DISCOVERY, the accuracy of early warning and comprehensive are demonstrate,proved, it is ensured that the efficiency of food safety affair early warning.
Accompanying drawing explanation
A kind of event method for early warning flow chart found based on region and object information of Fig. 1;
Fig. 2 area information body add list schematic diagram;
Fig. 3 info web is correlated with the recognition methods flow chart of region;
Fig. 4 info web is correlated with region determination methods schematic diagram;
Fig. 5 info web based on machine learning model is correlated with region determination methods schematic diagram;
Fig. 6 event method for early warning schematic diagram.
Detailed description of the invention
The detailed description of the invention of the present invention is as it is shown in figure 1, concrete steps are described below.
1, body is set up
The needs analyzed in view of feature and late events information extraction, the tracking etc. of food safety affair, at food In the building process of security event information body, mainly consider to build from object, region, time, result, five latitudes of association person Vertical.Such as object instant food, can be divided into the classification such as head product, converted products, and head product can be divided into again the classes such as veterinary antibiotics Not, by that analogy;Such as result can be divided into the classifications such as pollution, poisoning, pollute can be divided into again expired, the classification such as exceed standard, with this Analogize;Five classifications can be divided on such as region populations, be Asia, Europe, A Feili california, America respectively Continent, Oceania;Each classification can be finely divided again, such as Asia can be divided into East Asia, West Asia, South Asia, north Asia, in Sub-, six, Southeast Asia classification, by that analogy;Only can not be further divided into until being categorized into, the element being a bottom is (the most real Example).The building process of other classifications is similar to.Meanwhile, for each example in body, establish respectively correspondence synonym, The add lists such as antonym, another name word;Additionally, for the example in area information body, establish area code, postal volume respectively Code, be called for short, showplace (mountain, lake, sea, river, island, building), adjacent domains the adjacent peer territory of direction (east, south, west, north etc.), institute At the add list (as shown in Figure 2) of orientation (for relative upper level, such as middle part, south etc.) six latitudes, in case follow-up letter Breath processing procedure uses.
2, information filtering
To specific information source, use internet information crawl technology (the most general crawl, the skill such as limited range crawls Art) information in information source is crawled.In view of there may be on a website and the incoherent content of predetermined theme, with And there may be the situation of various junk information, in order to improve event information and find, the degree of accuracy of early warning, after information is carried out Before continuous process, first information is filtered.Whole filter process is divided into two aspects: instant food security information filters, rubbish Rubbish information filtering.
Food security information filters, and i.e. judges whether the information gathered belongs to the information that food security is relevant.Here Need to consider two problems: range of information, filtering rule.About filtering rule, based on the food safety affair information set up originally Body, during primary consideration and two latitudes of result, the title of the concrete instances of ontology by combination the two latitude, Attributes etc. take the method for pattern match to filter;The pattern match concrete grammar taked in method include Boolean matching, Distance coupling between frequency matched, instance name, the mode such as instance name synonym antisense coupling, instance name alias match;Tool The mode of body selects and specific rules is set up by determining (being determined in advance and regular update) after analyzing Information Statistics.About letter The selection of breath scope, mainly considers the title of information, two latitudes of the information content, it is contemplated that message header and the information content There may be unmatched situation, first the title of information is processed by concrete processing procedure, if through title is believed After breath filters, information can be included into food security information classification, then be disposed this information;The otherwise content to information Carry out secondary judgement process.
The rubbish suggestion that web spam can be divided in the web spam page and user-generated content two kinds.Wherein, web spam page Face can be divided into the content cheating page, the link cheating page;Rubbish suggestion varies in size according to its negative effect, can be classified as not Credible suggestion, low quality suggestion, unrelated suggestion.Insincere suggestion, the most fraudulent suggestion, on the one hand show as specific Object, event, personage etc. be given and do not meet the superelevation evaluation of actual conditions, compliment etc.;On the other hand it is right to be likely to show as Specific object, event, personage etc. be given do not meet actual conditions ultralow evaluation, abuse, attack etc..Low quality suggestion, this Planting the general length of suggestion content shorter, its content is probably useful, it is also possible to useless, but owing to its content is to specific Topic/product description the most detailed, it is impossible to determine very much its meaning to the opinion mining of specific topics/product, the most also recognize For being a kind of rubbish suggestion (for computer).Unrelated suggestion, this kind of suggestion mainly show as advertisement or and topic without The content closed.
To the low quality suggestion in the web spam page of a website, user-generated content, unrelated suggestion etc., it is contemplated that its Characteristics of spam is relatively obvious, can extract the content of sample, content based on the sample set through mark set up in advance The feature of the latitudes such as distribution, link (needs before extraction feature info web carries out meta-data extraction, text extraction, participle, sentence Statistics, paragraph statistics, Anchor Text statistics, link statistics etc. process) after set up detection model and detect.About content latitude Feature, this method have employed the information extracted is carried out participle, remove stop words and through dimensionality reduction (can use document frequency Rate method, information gain method etc.) to form content feature vector-flexible strategy afterwards be term frequencies;About distribution of content feature, in this method Have employed the length for heading (number of characters) of information, paragraph number, sentence number, bout length (average), sentence length (average), information Length (number of characters), Anchor Text number, Anchor Text length (number of characters-average) etc. (are set up in model process, feature are carried out normalizing Change processes, and process is y=x/ (max+1), the characteristic value before and after wherein x, y are normalization respectively, and max is in advance to site information Maximum obtained by the interior sample statistics this feature of set;Time before max parameter updates if there is x > max, then take x=max + 1, i.e. y=1);About link latitude feature, go out in this method have employed the website of information chain number account for always go out chain number ratio, The website of information chain number of going out accounts for the Information Number always gone out in chain number ratio, Info Link rubbish page set (in advance build) and accounts for always Go out chain number ratio, the quantity of rubbish page set (building in advance) this information of internal chaining accounts for total page number ratio etc..For above-mentioned The feature of three dimensions, based on the junk information set set up in advance and non-spam set, forms characteristic vector also respectively The method (such as SVMs etc.) taking machine learning set up junk information detection model (three, based on update sample Set regular update model), the new information gathered can be filtered that (information is judged as the rule of junk information afterwards The testing result of at least two of which model is positive example).
Meanwhile, a website user is generated the duplicity rubbish suggestion in content, it is contemplated that characteristics of spam be not it is obvious that Principle (the standard of duplicity rubbish suggestion sample to be ensured that it is not excessive to be would rather be scarce is followed during setting up rubbish suggestion sample set Really property), (during this main to the information being probably duplicity rubbish suggestion in conjunction with modes such as the examination & verification in knowledge based storehouse, investigations In user-generated content to be paid close attention to, content repeats or approximates the suggestion of repetition, in the range of certain time, issue suggestion amount is the highest Suggestion that top-N1 author is issued, the meaning that top-N2 the special object that in the range of certain time, suggestion amount is the highest is relevant See, in the range of certain time, issue the relevant suggestion in top-N3 the highest IP address of suggestion amount, issue meaning for special object See suggestion that top-N4 user the earliest issued and for most top-N5 the use of the suggestion correction number of times of special object The suggestion that family is issued, and form candidate's duplicity rubbish suggestion set) carry out examination & verification confirmation.Concrete takes two kinds of methods Confirming, one is that forward confirms, one is reversely to confirm.So-called forward confirms, if argument information content and duplicity What the information in rubbish suggestion knowledge base described is in same part thing, the i.e. information content and duplicity rubbish suggestion knowledge base Certain information describes and matches, then be duplicity rubbish suggestion.Data entries in duplicity rubbish suggestion knowledge base increases rule For: for an argument information, through process after a while or prove, taking advantage of really of the information that certain user is issued afterwards The suggestion of deceiving property, adds in knowledge base.Such as people is had to release news containing melamine in certain brand milk in certain forum, but Someone enumerated all reasons and illustrated that this was impossible later, proved that the latter is that the interior employee of certain brand milk company takes advantage of afterwards Deceiving caused, the most i.e. can confirm that this argument information is duplicity junk information, in addition knowledge base, (knowledge base builds and fixed in advance Phase updates).So-called reversely confirmation, i.e. occurs that this type of information is impossible under normal circumstances existing, thus from reverse Angle proves duplicity rubbish suggestion.The most reversely confirm the rule in knowledge base (building in advance and regular update) For: one or more products have been issued more than N (such as 10) bar meaning by a certain user id (such as 1 minute) in the setting time See information, then these argument information that this user is delivered are labeled as duplicity rubbish argument information.This rule can be mated One example is: in a certain forum, 3 kinds of different products have been issued 15 evaluations in the time less than 1 minute by a certain user id Information, from the point of view of a normal person, this is impossible.Therefore, demonstrate what this user was issued from reverse angle The duplicity of these information.The information confirmed by said method is labeled, and forms accurate duplicity rubbish suggestion collection Close, simultaneously for the user of often issue duplicity rubbish suggestion, i.e. issue N number of user that duplicity rubbish suggestion is most, will It is added to blacklist in case the later stage identifies use;It addition, according to duplicity rubbish suggestion set accurately etc., conclude suggestion author Abnormal behaviour (the most above-mentioned user issued 15 information etc. for 3 kinds of products in 1 minute) formation rule, in case after With.Notice clear and definite confirmation one suggestion be non-duplicity rubbish suggestion there is also suitable difficulty (for an information, it is impossible to Clear and definite be shown to be duplicity rubbish suggestion may also mean that can not explicitly stated its be not duplicity rubbish suggestion), examine The factors such as the diversity that worry exists to time, workload and non-duplicity rubbish suggestion, the most not to non-duplicity rubbish Suggestion is labeled.
After establishing accurate duplicity rubbish suggestion set, from the point of view of judging to identify duplicity rubbish suggestion, at present Detection model is set up after needing to select machine learning method, sample drawn feature.Notice and obtained warp through above-mentioned process Cross the duplicity rubbish suggestion set of mark, and the argument information set without mark, but not through the non-deception of mark Property rubbish suggestion set.This means that and can not use general Supervised machine learning method simply, because it sets up mould Type needs to be provided simultaneously with positive example, counter-example set.So we are employed herein one " from positive example with without labeled data learning " Machine learning method-biasing SVM (Liu, B., Y.Dai, X.Li, W.Lee, and P.Yu.Building text classifiers using positiveand unlabeled examples. Proceedings of IEEE International Conference on Data Mining,2003.)。
The determination of sample characteristics during setting up about detection model, mainly considers from four latitudes in the present invention: suggestion Author, suggestion content, suggestion distribution of content, four latitudes of chain feature (need before extraction feature info web is carried out author etc. The extraction of meta-data extraction, text, participle, part-of-speech tagging, the extraction of name entity, sentence statistics, paragraph statistics, punctuation mark system Meter, link statistics etc. process).Wherein the determination method about suggestion content characteristic is: carry out the argument information extracted Participle, removes stop words, and forms content feature vector after dimensionality reduction (can use document frequency method, information gain method etc.) (flexible strategy are term frequencies);System of selection about suggestion distribution of content feature is to select: suggestion paragraph number, bout length are (all Value), sentence number, sentence length (average), word number, first person pronoun number, second person pronoun number, third person pronoun number etc. (setting up in model process, be normalized feature, process is y=x/ (max+1), before wherein x, y are normalization respectively After characteristic value, max is in advance to the maximum obtained by sample statistics this feature in site information set;In max parameter more Time before new if there is x > max, then take x=max+1, i.e. y=1);Feature selection approach for suggestion author's latitude is choosing Select: suggestion user name (number of characters), suggestion issuing time (time interval of distance zero point on the same day), suggestion issuing time interval (comparing with a upper information), suggestion number of words, suggestion number/hour (till this information), suggestion number of words changing ratio (with A upper information is compared), suggestion number changing ratio (till this information, compared with upper one hour) etc. (set up model mistake Cheng Zhong, is normalized feature, and process is y=x/ (max+1), the characteristic value before and after wherein x, y are normalization respectively, Max is in advance to the maximum obtained by sample statistics this feature in site information set;Max parameter update before if there is During x > max, then take x=max+1, i.e. y=1);System of selection for the chain feature latitude of argument information is to select: suggestion The net entering chain number, argument information outside the website of chain number, argument information is gone out in entering the website of chain number, argument information in the website of information Go out chain number, argument information of station links the Information Number in accurate duplicity rubbish suggestion set, accurate duplicity rubbish suggestion collection (setting up in model process, be normalized feature, process is y=x/ to the quantity etc. of conjunction internal information link argument information (max+1), the characteristic value before and after wherein x, y are normalization respectively, max is in advance to this spy of sample statistics in site information set Levy obtained maximum;Time before max parameter updates if there is x > max, then take x=max+1, i.e. y=1);For upper State the feature of four dimensions, based on above-mentioned steps set up accurate duplicity rubbish suggestion set and without mark sample set (i.e. The set of other samples composition in user-generated content collections of web pages), form characteristic vector respectively and set up detection model (four Individual, based on updating sample set regular update model).
Afterwards can to newly crawl the user-generated content information obtained carry out duplicity rubbish suggestion identification filter.First First carrying out blacklist identification, to belonging to the information that in blacklist, user issues, Direct Recognition is duplicity rubbish suggestion;For surplus Remaining suggestion, the rule concluded according to aforementioned process according to reversely confirming (i.e. existing under normal circumstances, occur that this type of information is Impossible, thus prove duplicity rubbish suggestion from reverse angle) mode be identified, for abnormal meaning See, be identified as duplicity rubbish suggestion;The duplicity rubbish suggestion detection mould that remaining suggestion is set up as procedure described above Type is identified, and identification process is, argument information carries out the judgement of four models respectively, if at least three models judge For positive example, then this information is identified as duplicity rubbish suggestion.
After above filtration step, (instant food is safety-related to participate in the information in follow-up processing procedure Non-spam) relative mass is higher, and this accurately provides the foundation for what follow-up processed.
3, area information finds (as shown in Figure 3)
(1) info web pretreatment
Obtain and filtered info web crawling, extract its title, source, author, issuing time, issuing web site institute At metadata informations such as ground and preserve, the body matter simultaneously extracting info web preserves.
To the info web title extracted, body matter, segmenter is used to carry out (including depending on based on statistics and dictionary to it The body set up according to step 1 forms dictionary of place name) participle (and record the literary composition that word relative information title and body matter are constituted This starts, the relative position terminated, affiliated sentence, relative sentence start and the characteristic parameter such as the relative position terminated), adopt afterwards With based on vocabulary (vocabulary in advance arrange formation and regular update, including at the same time as name and place name word, have it His specific meanings but be also likely to be the word etc. of place name simultaneously;One city of such as Wuzhong-Ningxia Hui Autonomous Region, can be simultaneously Name;One county of Founder-Heilongjiang Province, can be upright company simultaneously;Although note that the word such as Wu containing specific suffix Loyal city is then not got rid of) matching process the word being not likely to be place name is got rid of.
(2) nounoun pronoun resolves
There may be in the web page title information, text message of participle some represent places pronouns, such as this province, This city, this province etc..Itself exact geographic location cannot be directly shown, it is therefore desirable to it is solved owing to these pronouns are literal Analysis.
1) for carrying out the parsing of ground nounoun pronoun, initially setting up the sliding window that pronoun resolves, sliding window length L is the most true Fixed (such as by determining after analytically word number distribution situation between nounoun pronoun and its antecedent).
2) the most selectively whether there is the rational geographical term (the Liao Dynasty that such as this province is corresponding in L word before nounoun pronoun after Peaceful etc., based on setting up in advance rule judgment), if it is present use between the geographical term of following foundation and ground nounoun pronoun Existence refers to the judgment models of relation and judges, if there is referring to relation, then determines pronoun pair according to referring to relation The geographical term answered, resolves and terminates that (refer to, if there is multiple, the geographical terms that relation is set up, then chosen distance ground nounoun pronoun is Near geographical term), otherwise carry out step 3).
3) if there is not rational geographical term or model in L word to judge that referring to relation does not exists, then select Before ground nounoun pronoun, in 2L word, whether (without departing from whole sentence, such as identify with fullstop) exists rational geographical term, as Fruit exists, then use the judgment models that whether there is the relation of referring between the geographical term of following foundation and ground nounoun pronoun to sentence Disconnected, if there is referring to relation, then according to referring to the geographical term that relation determines that pronoun is corresponding, resolve and terminate (if there is many The individual geographical term referring to relation establishment, then the geographical term that chosen distance ground nounoun pronoun is nearest), otherwise carry out step 4).
4) if there is not rational geographical term or model in 2L word to judge that referring to relation does not exists, then basis The information source obtained in metadata extraction process or location, website use extraction or the method nounoun pronoun definitely replaced Refer to place name.
The method for building up of judgment models: compile the info web comprising ground nounoun pronoun etc. and form sample set, and right Each ground nounoun pronoun and 2L (L length sync rapid 1) before it in sample set information) geographical term in individual word (without departing from Sentence range) between the relation that refers to be labeled, as class variable;To each ground nounoun pronoun in sample set information and its 2L (L length sync rapid 1) before) relation between geographical term (without departing from sentence range) in individual word extracts dependency number According to, set up message sample about this characteristic vector of relation between nounoun pronoun and geographical term over the ground: include geographical term suffix (suffix i.e. represents place name or has place name feature, " autonomous region " in such as " Xinjiang Uygur Autonomous Regions ") length (suffix Number of words is divided by text size), geographical term and ground nounoun pronoun between distance (word number is divided by text size), geographical term distance Relative distance (word number is divided by text size) that text starts, nounoun pronoun distance text start relative distance (word number divided by Text size), geographical term distance sentence start relative distance (word number is divided by text size), nounoun pronoun distance sentence open (word number is long divided by text for the relative distance that the relative distance (word number is divided by text size) of beginning, geographical term distance sentence terminates Degree), the nounoun pronoun distance relative distance (word number is divided by text size) etc. that terminates of sentence;Select machine learning method afterwards Whether (such as svm) sets up between geographical term and ground nounoun pronoun based on above-mentioned sample set, class variable and characteristic vector There is the judgment models of the relation that refers to.
Whether there is the method that the relation of referring to carries out judging over the ground between nounoun pronoun and geographical term based on judgment models is: First extracting the related data of relation between geographical term and ground nounoun pronoun and form characteristic vector, the data of extraction specifically include ground (word number is divided by literary composition for distance between reason noun suffix lengths (suffix number of words is divided by text size), geographical term and ground nounoun pronoun This length), geographical term distance text start relative distance (word number is divided by text size), nounoun pronoun distance text start Relative distance (word number is divided by text size), (word number is long divided by text for the geographical term distance relative distance that starts of sentence Degree), nounoun pronoun distance sentence start relative distance (word number is divided by text size), the geographical term distance phase that terminates of sentence Adjust the distance (word number is divided by text size), the nounoun pronoun distance relative distance (word number is divided by text size) etc. that terminates of sentence. Judgment models based on above-mentioned foundation is identified judging afterwards, and according to judged result nounoun pronoun definitely and geographical term it Between the relation that refers to whether exist.
(3) non-standard words resolves
In the web page title information, text message of participle, there may be some words representing place employ , there are beijing, bj etc. as in Chinese text in off-gauge linguistic form.To this, based on the standard word and non-standard set up The word table of comparisons (is set up and regular update) in advance, to off-gauge place name word form by the way of being replaced after inquiry Resolve.
(4) relative location resolution
In the web page title information, text message of participle, there may be some words representing place employ relatively The expression way of position, such as southwest China province etc..Same, these Expression of languages do not have clear and definite place name name yet Claim.For solving this problem, based on the area information instances of ontology set up in step 1 and add list thereof, to these relative lane place Domain information is inquired about and is resolved, and obtains place name word accurately and (such as to southwest China province, believes in conjunction with the region set up Breath body, first looks for the province title belonging to China, and the province belonging to each is inquired about the attached of its orientation, place latitude Add table, the province that orientation, all places is southwest is extracted, substitute southwest China province accordingly, complete to resolve).
(5) region determines
The determination work of information associated area can be carried out after info web has been carried out pretreatment and related resolution, this During mainly include two steps: be respectively adopted pattern match, machine learning judgment models carries out sentencing of information relevant range Disconnected (as shown in Figure 4).
What region determined aims at identification information relevant range, and the discovery for food safety affair information provides region base Plinth.Considering the problems such as accuracy, amount of calculation and operability, the method first taking pattern match during this is entered OK.Here need to consider two problems: range of information, matched rule.About matched rule, based on the area information set up originally Body (i.e. dimension latitude in region in body), during main consider part body instance name, attribute etc., concrete by combination The title of these instances of ontology, attribute etc. take the method for pattern match to judge;The pattern match tool taked in method Body method includes the modes such as the distance coupling between Boolean matching, frequency matched, instance name;Concrete mode selects and specifically advises Then set up by determining (being determined in advance and regular update) after Information Statistics are analyzed.About the selection of range of information, lead here The title of information to be considered, two latitudes of the information content, it is contemplated that message header and the information content there may be unmatched feelings Condition, first processes the title of information in concrete processing procedure, if the title of information is used above-mentioned pattern match After method processes, information can be included into currently selected region (such as Beijing), then the pattern match for this region processes Complete;Otherwise the content to this information uses above-mentioned method for mode matching to carry out quadratic modes matching treatment for this region. Follow the principle that it is not excessive to be would rather be scarce during this, ensure to identify the degree of accuracy of judged result as far as possible.
If through above-mentioned pattern matching process, this information cannot be included into a certain region, then use based on machine learning The region decision model that method is set up carries out third time and judges to determine.The process setting up region decision model in advance is: based on whole Info web sample set that reason (same to step (1)-(4)), mark (whether being associated with certain region) are crossed (is set up in advance and periodically Update), the title of message sample, content word (selecting and instances of ontology title, the word of attributes match) are combined- By these words according to administrative place name (referring to province, city etc.), area code, postcode, abbreviation, showplace (mountain, lake, sea, river, island Small island, building etc.) five classifications carry out sorting out five characteristic vectors of composition (wherein in vector, term weighing is term frequencies, it is considered to To the importance of title word, the weight of title word is multiplied by pre-determined multiple).Afterwards, machine learning method is used (SVMs etc.) to each target area set up region decision models based on above-mentioned five characteristic vectors (5, based on more New sample set regular update model).Information is carried out third time and judges that the process determined is: will be through step (1)-(4) Process, resolve after but cannot be included into the title of some region of information, content word (select and instances of ontology title, attribute The word of coupling) combine: according to administrative place name (referring to province, city etc.), area code, postcode, abbreviation, showplace (mountain, Lake, sea, river, island, building etc.) five classifications carry out sorting out five vectors of composition (wherein in vector, term weighing are word frequency Rate, it is contemplated that the importance of title word, is multiplied by pre-determined multiple to the weight of title word), and respectively to these five Vector uses five region decision models of aforementioned foundation to carry out detection judgement, and the result judging detection is weighted (flexible strategy are true divided by the method for word frequency sum in five classifications according to word frequency sum in classification each in info web Fixed), if weighing computation results is more than the threshold value being previously set, then this information can be included into this region;Otherwise, then this information is not This region (as shown in Figure 5) can be included into.
4, object information finds
The i.e. object type identification of object information discovery of info web, i.e. determine the content described by info web and which kind of Object is about (and relevant with which kind of event factor, cause which kind of consequence) etc..Its objective is to combine discovery in info web Area information, object information etc. the most uniquely determine event.
To this end, consider the problems such as the accuracy of identification, amount of calculation and operability, during take regression analysis Method carry out.The range of information used in method, is message header and the content of each webpage to be combined, and carries out Participle, remove stop words, dimensionality reduction after to form the term weighing of the characteristic vector (as independent variable) of this webpage-wherein be term frequencies, In view of the importance of title word, the weight of title word is multiplied by pre-determined multiple;Same, to and body in right As, result, association person's instance name, the term weighing of attributes match be multiplied by pre-determined multiple.For each object type, The characteristic vector data of above-mentioned webpage is substituted into corresponding logistic regression model (in advance to need kind and the foundation distinguished Sample set based on set up model) in, judge according to Regression Analysis Result, this info web whether with this object type There is relation.
Wherein, the method for building up of regression analysis model is: based on the info web sample set arranging, marking (in advance Set up and regular update), after the title of message sample, content word combined and carry out participle, removing stop words, dimensionality reduction Forming characteristic vector (as independent variable)-wherein term weighing is term frequencies, it is contemplated that the importance of title word, to title The weight of word is multiplied by pre-determined multiple;Same, to and body in object, result, association person's instance name, attribute The term weighing joined is multiplied by pre-determined multiple;Object type belonging to info web is labeled simultaneously (1 represent belong to This object type, 0 expression are not belonging to this object type, as dependent variable), use logistic method to set up pin based on this Regression analysis model to each object type.
5, trend tracking, early warning and displaying
From the point of view of practice, in conjunction with the area information found in abovementioned steps, object type information etc., can align True determination event (i.e. representing, with the common factor of the information belonging to above-mentioned two latitude, the information that event is relevant).
On the basis of the region of info web and object type key element identification, set up the characteristic parameter-tool of expression event Information page number that the employing of body is relevant with event, page browsing number, the page forward number, specific website page browsing number, specific Under domain name, (being obtained by the method summary parameter of weighting, flexible strategy pass through Dare for website page browsing number and composite index Philippine side method determines, but need to ensure that flexible strategy sum is 1) etc. represent the feature of event, and periodically feature is joined by (such as every 1 hour) Number carries out calculating process.And according to the change of time, the comprehensive situation of change analyzing these affair character parameters.
On the basis of above-mentioned event trend is followed the trail of, periodically (the most every 12 hours) calculate each characteristic parameter of expression event (including composite index) numerical value, and the average in the regular period before each characteristic ginseng value current for event and its (is examined at present Consider the feature propagated to network event, have selected one month as the calculating cycle, it is possible to be adjusted according to situation) compare, If difference is just and absolute value is more than certain threshold value (standard deviation of such as 3 times, threshold value is previously set), part enters the most as to this Row early warning initializes.
This having carried out early warning initialized event afterwards be tracked, periodically (the most every 12 hours) calculate expression event Each characteristic parameter (including composite index) numerical value, and by before each characteristic ginseng value current for event and its in regular period Average (being presently contemplated that the feature that network event is propagated, one month before selecting early warning to initialize as the cycle of calculating, it is possible to It is adjusted according to situation) compare, if difference lasting (such as 24 hours, be determined in advance) is just and absolute value is more than certain Threshold value (standard deviation of such as 3 times, threshold value is previously set), part carries out formal early warning (as shown in Figure 6) the most as to this.Otherwise Cancel the early warning Initialize installation of part as to this.
Wherein threshold value determination method is: at history (in such as 1 year) the delta data base of each characteristic parameter of Collection Events On plinth, and combine the time of origin of history food safety affair through confirming, region, the data such as scale (can be pacified from food Total correlation administrative department obtains), calculate each characteristic ginseng value of event and the average of (such as month) in its regular period before Between difference form variable-as independent variable, would indicate that whether special properties food safety affair occurs (1 represent occur, 0 Represent and do not occur) variable as dependent variable, use the method for logistic regression analysis set up above-mentioned independent variable, dependent variable it Between regressive prediction model.Based on this model, the historical variations trend characteristic of binding events characteristic parameter, select so that because of Variate-value is that the suitable argument value of 1 is as threshold value.
Obtained info web is being carried out information filtering, event information discovery, trend tracking and the base of early warning analysis On plinth, result analysis obtained shows user by the way of form, figure etc..And provide short message, postal to early warning information Parts etc. send the method for service sent out immediately.
6, event terminates to judge
The event of alignment type early warning, on the basis of above-mentioned event trend is followed the trail of, periodically (the most every 12 hours) computational chart Show each characteristic parameter (the including composite index) numerical value of event, and by a timing before each characteristic ginseng value current for event and its Average in phase (is presently contemplated that the feature that network event is propagated, have selected and start to start day to calculating the previous day day from early warning Till as the cycle of calculating, it is possible to be adjusted according to situation) compare, if difference is negative and absolute value is more than certain threshold Value (standard deviation of such as 3 times, threshold value is previously set), then it is assumed that this event terminates.Terminate the early warning of part as to this.
7, body supplements and revises
Event information find, early warning analysis whole during, the food safety affair Information Ontology of structure is to information The performance of the steps such as filtration, INFORMATION DISCOVERY has important impact.Accordingly, it is considered to the changes in distribution feature of internet information, From the angle of constantly improve method efficiency, need periodically the result of the process such as information filtering, INFORMATION DISCOVERY to be estimated. And the deficiency in body is such as omitted, mistake etc. is supplemented, revise, the efficiency follow-up to improve method.
Thus, intactly achieve from crawling extraction food safety affair information the internet information obtained, and according to Event evolution carries out early warning and the overall process for user's service in time.During, by taking information filtering, area information Discovery, object type INFORMATION DISCOVERY, trend are followed the trail of and the technology such as early warning ensure that event information find, early warning accurate.This will Important Information base is provided for the Risk-warning of food safety affair, quick emergency processing etc..
What deserves to be explained is, the present invention cannot be only used for the contingency management of food safety affair, transforms a little, can apply To other, in the emergency processing work such as the Risk-warning of the unconventional accident that can obtain event information from internet.

Claims (11)

1. the event method for early warning found based on region and object information, the steps include:
1) set up a food safety affair Information Ontology, and each example in body is set up an add list respectively;
2) info web crawled is filtered, obtain the non-junk info web relevant to food safety affair;
3) represent that the word in place resolves in the info web after filtering, obtain place name word accurately;Based on described In food safety affair Information Ontology, the instances of ontology title of region dimension, attribute use method for mode matching to the net after resolving Page information processes, and info web is included into the region that the match is successful;
4) for the object type of each setting, utilize regression analysis model that info web is processed, it is judged that each webpage Relevant object type;Wherein, the method utilizing regression analysis model to process info web is: by the letter of each webpage Breath title and content combine, and carry out participle, remove stop words, dimensionality reduction after form the characteristic vector of this webpage, by webpage Characteristic vector as the independent variable of regression analysis model, webpage is processed, it is judged that it is the most relevant to object type;
5) according to step 3), 4) the webpage affiliated area determined and relevant object type thereof, obtain setting regions, object The info web set of event, sets up the characteristic parameter of event and periodically calculates characteristic ginseng value, if the feature ginseng of certain event Numerical value persistently set the time exceed setting threshold value then this event is carried out early warning.
2. the method for claim 1, it is characterised in that the side that the word representing place in info web is resolved Method is:
1) for ground nounoun pronoun, judge whether exist between ground nounoun pronoun and its geographical term above occurred by a judgment models Refer to relation, if it is present ground nounoun pronoun is replaced with corresponding geographical term;
2) based on standard word and the non-standard word table of comparisons, place name word non-standard in word is resolved, by non-standard words Language replaces with standard word;
3) based on the region dimension in described food safety affair Information Ontology, the relative position area information in word is carried out Resolve, obtain place name word accurately;
Wherein, the method for building up of described judgment models is: the info web comprising ground nounoun pronoun is formed a sample set, and right In sample set the relation that refers between nounoun pronoun and the geographical term before it be labeled, as class variable;Set up The characteristic vector of relation between ground nounoun pronoun and the geographical term before it: then select machine learning method based on described sample Set, class variable and characteristic vector set up the judgment models that whether there is the relation of referring between geographical term and ground nounoun pronoun;
Wherein, it is judged that the method that whether there is the relation of referring between ground nounoun pronoun and its geographical term above occurred is: calculate Between ground nounoun pronoun and geographical term, the characteristic vector value of relation, utilizes described judgment models to sentence described characteristic vector value Disconnected, whether the relation that refers between nounoun pronoun and geographical term exists definitely.
3. method as claimed in claim 1 or 2, it is characterised in that described food safety affair Information Ontology includes object, district Territory, time, result, five latitudes of association person;The content of described add list includes synonym, antonym, three latitudes of another name word;
Wherein, for region dimension, the content of annex table also includes area code, postcode, abbreviation, showplace, adjacent domains, institute At six, orientation latitude.
4. method as claimed in claim 3, it is characterised in that step 3) to the word representing place in the info web after filtering Language uses segmenter that message header and body matter carry out participle before resolving, and records participle gained word and relatively believe The phase that breath title starts with the text that body matter is constituted, the relative position terminated, affiliated sentence, relative sentence start and terminate To position.
5. method as claimed in claim 4, it is characterised in that initially setting up a noun list dubiously, record can be used as other names The place name claimed, then with described noun list dubiously to step 3) participle gained word mates, and filters the word of coupling;Its In, if the word of coupling has the suffix representing place name, then retain this word.
6. method as claimed in claim 2, it is characterised in that the ground nounoun pronoun representing place in info web is resolved Method be:
61) sliding window of a length of L that a pronoun resolves is set up;
62) selectively whether there is geographical term in L word before nounoun pronoun, if it is present use judgment models to sentence Disconnected, if there is referring to relation, then according to referring to the geographical term that relation determines that pronoun is corresponding, parsing terminates, and otherwise walks Rapid 63);
63) selectively whether there is geographical term in 2L word before nounoun pronoun, if it is present use judgment models to sentence Disconnected, if there is referring to relation, then according to referring to the geographical term that relation determines that pronoun is corresponding, parsing terminates, and otherwise walks Rapid 64);
64) use the method extracted or replace true according to the information source obtained in metadata extraction process or location, website Surely nounoun pronoun refer to place name.
7. the method as described in claim 2 or 6, it is characterised in that the component bag of the sampling feature vectors in described judgment models Include: geographical term suffix lengths, geographical term and ground nounoun pronoun between distance, geographical term distance text start relative away from From, nounoun pronoun distance text start relative distance, geographical term distance sentence start relative distance, nounoun pronoun distance Relative distance that relative distance that sentence starts, geographical term distance sentence terminates, nounoun pronoun distance sentence terminate relative Distance.
8. method as claimed in claim 6, it is characterised in that in step 62) in, if existed in L word before ground nounoun pronoun Multiple geographical terms referring to relation establishment, then the geographical term that chosen distance ground nounoun pronoun is nearest;In step 64) in, if Multiple geographical term referring to relation establishment, the then ground that chosen distance ground nounoun pronoun is nearest is there is in 2L word before ground nounoun pronoun Reason noun.
9. the method for claim 1, it is characterised in that the characteristic parameter of described event includes: the information relevant to event Page number, page browsing number, the page forward number, the page browsing number setting website, set under domain name website page browsing number with And the composite index of above-mentioned parameter.
10. method as claimed in claim 1 or 2, it is characterised in that periodically calculate the numerical value of described characteristic parameter, and by event Current each characteristic ginseng value and the average in its regular period before compare, if difference is just and absolute value is more than one Fixed threshold value, it is determined that part carries out early warning initialization as to this;To having carried out the event of early warning Initialize installation, continue periodically meter Calculate the numerical value of described characteristic parameter, and the average in the regular period before each characteristic ginseng value current for event and its is compared Relatively, if difference is just and absolute value is persistently more than certain threshold value, part carries out formal early warning the most as to this, otherwise cancels this The early warning Initialize installation of event.
11. methods as claimed in claim 10, it is characterised in that the event of alignment type early warning, periodically calculate each of expression event Characteristic parameter numerical value, and the average in the regular period before each characteristic ginseng value current for event and its is compared, if Difference is negative and absolute value is more than certain threshold value, terminates the early warning of part as to this.
CN201210501970.0A 2012-11-29 2012-11-29 A kind of event method for early warning found based on region and object information Active CN103853700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210501970.0A CN103853700B (en) 2012-11-29 2012-11-29 A kind of event method for early warning found based on region and object information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210501970.0A CN103853700B (en) 2012-11-29 2012-11-29 A kind of event method for early warning found based on region and object information

Publications (2)

Publication Number Publication Date
CN103853700A CN103853700A (en) 2014-06-11
CN103853700B true CN103853700B (en) 2016-09-07

Family

ID=50861368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210501970.0A Active CN103853700B (en) 2012-11-29 2012-11-29 A kind of event method for early warning found based on region and object information

Country Status (1)

Country Link
CN (1) CN103853700B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068989B (en) * 2015-07-23 2018-05-04 中国测绘科学研究院 Place name address extraction method and device
CN106547913B (en) * 2016-11-25 2020-04-21 网易(杭州)网络有限公司 Page information collection and classification feedback method, device and system
CN107103052B (en) * 2017-01-23 2020-09-15 南威软件股份有限公司 Big data early warning method based on data model
CN107357888B (en) * 2017-07-10 2021-06-15 北京星选科技有限公司 Method and device for providing raw material information and electronic equipment
CN110727793B (en) * 2018-06-28 2023-03-24 百度在线网络技术(北京)有限公司 Method, device, terminal and computer readable storage medium for area identification
CN109558966B (en) * 2018-10-28 2022-05-17 西南电子技术研究所(中国电子科技集团公司第十研究所) Processing system for intelligently judging evidence and predicting occurrence of event
CN110457562A (en) * 2019-08-15 2019-11-15 中国农业大学 A kind of food safety affair classification method and device based on neural network model
CN110688557A (en) * 2019-09-23 2020-01-14 中国农业大学 Food safety event-oriented early warning method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488150A (en) * 2009-03-04 2009-07-22 哈尔滨工程大学 Real-time multi-view network focus event analysis apparatus and analysis method
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
CN102193951A (en) * 2010-03-19 2011-09-21 华为技术有限公司 Information extracting method and system
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5390840B2 (en) * 2008-11-27 2014-01-15 株式会社日立製作所 Information analyzer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488150A (en) * 2009-03-04 2009-07-22 哈尔滨工程大学 Real-time multi-view network focus event analysis apparatus and analysis method
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
CN102193951A (en) * 2010-03-19 2011-09-21 华为技术有限公司 Information extracting method and system
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
突发公共事件网络舆情监测指标体系研究;谈国新等;《华中师范大学学报(人文社会科学版)》;20100531;第49卷(第3期);第66-69页 *

Also Published As

Publication number Publication date
CN103853700A (en) 2014-06-11

Similar Documents

Publication Publication Date Title
CN103854064B (en) Event occurrence risk prediction and early warning method targeted to specific zone
CN103853700B (en) A kind of event method for early warning found based on region and object information
CN103854063B (en) A kind of prediction of event occurrence risk method for early warning based on internet opening imformation
CN103176981B (en) A kind of event information excavates and the method for early warning
CN103853738B (en) A kind of recognition methods of info web correlation region
Bozarth et al. Toward a better performance evaluation framework for fake news classification
CN106598944B (en) A kind of civil aviaton's security public sentiment sentiment analysis method
US11048712B2 (en) Real-time and adaptive data mining
US9229977B2 (en) Real-time and adaptive data mining
CN105138570B (en) The doubtful crime degree calculation method of network speech data
CN103853744B (en) Deceptive junk comment detection method oriented to user generated contents
CN107609103A (en) It is a kind of based on push away spy event detecting method
CN103176984B (en) Duplicity rubbish suggestion detection method in a kind of user-generated content
CN110457404A (en) Social media account-classification method based on complex heterogeneous network
CN103092975A (en) Detection and filter method of network community garbage information based on topic consensus coverage rate
CN103810274A (en) Multi-feature image tag sorting method based on WordNet semantic similarity
CN105893484A (en) Microblog Spammer recognition method based on text characteristics and behavior characteristics
CN102170447A (en) Method for detecting phishing webpage based on nearest neighbour and similarity measurement
CN114693906A (en) Travel reimbursement abnormal behavior detection method and system based on space-time rule
Marivate et al. Catching crime: Detection of public safety incidents using social media
CN103684896B (en) Method of detecting website cheating based on domain name resolution characteristics
CN101350019B (en) Method for abstracting web page information based on vector model between predefined slots
CN109033351A (en) The merging method and device of merit data
Mou et al. Align voting behavior with public statements for legislator representation learning
Wei et al. Location-based event detection using geotagged semantic graphs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant