CN104572958A - Event extraction based sensitive information monitoring method - Google Patents
Event extraction based sensitive information monitoring method Download PDFInfo
- Publication number
- CN104572958A CN104572958A CN201410849418.XA CN201410849418A CN104572958A CN 104572958 A CN104572958 A CN 104572958A CN 201410849418 A CN201410849418 A CN 201410849418A CN 104572958 A CN104572958 A CN 104572958A
- Authority
- CN
- China
- Prior art keywords
- event
- word
- sentence
- dictionary
- role
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses an event extraction based sensitive information monitoring method. The method comprises the steps of 1) creating a trigger word dictionary and an event element role dictionary; 2) training models marked with training corpus by the machine learning method to acquire a maximum entropy model MT for determining the type of an event and a maximum entropy model MR for extracting event elements from event sentences; 3) filtering corpuses of the event to be extracted according to the trigger words, and treating the sentences which are matched with the set trigger words as the candidate events; 4) classifying the candidate events through the maximum entropy model MT to obtain the event sentence with the set event type; 5) extracting each element term of the event from the event sentences obtained in step 4) according to the event element role dictionary and the maximum entropy model MR so as to finish the event extracting; matching the extracted event with the monitored event; if matching successfully, determining that the extracted event is sensitive information. With the adoption of the method, the monitoring efficiency of the sensitive information is greatly increased.
Description
Technical field
The invention belongs to areas of information technology, relate to a kind of sensitive information monitoring method based on event extraction, be mainly used in the fields such as natural language processing, data mining, information retrieval, food security.
Background technology
Along with the universal rapidly of internet and development, mass data information produces in a network and propagates, and informational capacity rapidly increases with exponential speedup.Data volume is large, structure disunity, and redundance is higher is the feature of these information, and traditional information acquiring pattern has been difficult to meet the demands, and from the data ocean of vastness, how to select oneself interested information fast becomes urgent problems.The research of information extraction is exactly produce under this background.
The object of information extraction is referred to and to be identified from document by the method for natural language processing and extract the interested information of people, structureless text is converted into structuring or semi-structured information, for user's inquiry and further analyzing and processing.Event extraction is the important research direction of of information extraction, is interested for people in text event shown with structurized form.
Event refers to and to occur in certain specific time slice and territorial scope, by one or more participation of roles, the something be made up of one or more action.Research at present for event extraction mainly contains two kinds of methods: the method for pattern match and machine learning.The method of pattern match, close to the mode of thinking of people, more pays close attention to the definition of event schema, and carry out event extraction by definition extraction template, accuracy rate is higher, and the representation of knowledge is directly perceived, nature.But this mode depends on concrete field and form, portable is capable poor, and hand-coding rule is highly professional, and compilation process is more consuming time and be difficult to cover all situation.Method dirigibility based on machine learning is good, and do not need too many manual intervention and domain knowledge, recall rate is higher, but the Measures compare of machine learning relies on language material, if language material is improper may have impact to extraction result.
The main Corpus--based Method model of method of current machine study, statistical model conventional at present has Hidden Markov Model (HMM) (HiddenMarkov Model, HMM), maximum entropy model (Maximum Entropy Model, ME), support vector machine (Support VectorMachine, SVM), condition random field (Conditional Random Field, CRF) etc.But this kind of method also exists the problem of the Sparse caused because of the problem of language material own, the complicacy of Feature Selection and Chinese itself affects final extraction result simultaneously, often occur non-event information mistake as event information extraction out in reality, the incomplete situation of the Event element extracted.
Summary of the invention
The object of the invention is to propose a kind of sensitive information monitoring method based on event extraction, the method can be applied to the sensitive information monitoring work of aspect, the field such as food security, information retrieval.
First the present invention carries out event category judgement, does preliminary judgement by trigger word to event, the generation of the direct firing event of trigger word, is the key character determining event category, the sentence matching trigger word is called candidate events.Carry out to candidate events judgement of classifying by maximum entropy model, what prediction probability met threshold value becomes real event more.Afterwards elemental recognition is carried out to event, extracted each Event element in sentence by named entity recognition, syntactic analysis and maximum entropy model, finally complete the extraction work of event.It is characterized in that described method comprises:
Step 1: language material pre-service.For field of food safety, collect the corpus of text that food security is relevant, the corpus in the setting field of collecting is marked.
Step 2: build trigger word dictionary.In dictionary, the content of every a line comprises trigger word and event category corresponding to trigger word.
Step 3: build Event element role dictionary.Every a line content in this dictionary comprises word in event and event role corresponding to word.Such as:
2014.12.25: event time;
State General Administration for Quality Supervision: event promoter;
Dictionary title is called: Event element role dictionary.
Step 4: for the corpus of mark, adopts the method training pattern of machine learning, obtains the maximum entropy model MT of decision event classification.
Step 5: for each word in the event sentence in the corpus marked, extracts word and sentence characteristics, and training maximum entropy model MR is used for extracting Event element from event sentence.
Step 6: for the un-annotated data needing extraction event, judges whether it is candidate events by match triggers word.
Step 7: for candidate events, is judged further by application maximum entropy model MT, obtains real event sentence, namely belong to the event sentence of setting event category.
Step 8: for real event sentence, application maximum entropy model MR extracts each element word of event, completes event extraction task.
Step 9: the event be drawn into mated with event to be monitored if Event element is all identical, is then same event, namely judges that the event that is drawn into is as sensitive information.
Described step 2 specifically comprises:
Step 201: manual formulate seed trigger word dictionary, the every a line in dictionary comprises seed trigger word and event category corresponding to trigger word, and event category contains all categories that needs are predicted.
Step 202: for each seed trigger word T, obtains its all synonym, near synonym language by coupling thesaurus, carries out the expansion of seed trigger word dictionary according to certain rule.
Step 203: each seed trigger word of circular treatment, until traversal terminates, has built trigger word dictionary.
Described step 3 specifically comprises:
Step 301: manual formulate seed Event element role dictionary, the every a line in dictionary comprises element word in event and role category corresponding to word.Role category in dictionary contains role categories all in frequent event.
Step 302: for each element word R, obtains its all synonym, near synonym language by coupling thesaurus, carries out the expansion of dictionary according to certain rule.
Step 303: each seed Event element of circular treatment role word, until traversal terminates structure complete Event element role dictionary.
Described step 4 specifically comprises:
Step 401: read corpus, segmentation, subordinate sentence process are carried out for each section of language material.
Step 402: for each sentence, judges whether to comprise event information by label.
Step 403: for the sentence comprising event information, carries out participle and part of speech identification to sentence.The trigger word of acquisition event, event type.The sentence not comprising event information does not have affair character, abandons and does not process.
Step 404: to the feature extracting event entity after the sentence participle comprising event information, selected feature comprises morphology and the feature such as part of speech, event category of P word after the morphology of P word before trigger word, the part of speech of trigger word, trigger word and part of speech, trigger word.
Step 405: the input file feature of all events being generated consolidation form, obtains maximum entropy model MT by the method training of machine learning.
Described step 5 specifically comprises:
Step 501: for each the element word in event, extracts essential characteristic and the context environmental feature of element word.
Step 502: the input file feature of all elements word in event sentence being generated consolidation form, carries out model training by the method for machine learning and obtains maximum entropy model MR.
Described step 6 specifically comprises:
Step 601: read language material to be extracted, carries out segmentation, subordinate sentence process to language material.
Step 602: carry out word segmentation processing for each sentence, judges whether comprise trigger word in word, if comprise trigger word, sentence is classified as candidate events sentence.
Described step 7 specifically comprises:
Step 701: for candidate events sentence, obtains the part of speech that word is corresponding after participle.Extract the feature of candidate events sentence, specific features is as described in step 404.
Step 702: by all features extracted, generate the input file of consolidation form, the maximum entropy model MT in applying step 4 predicts, the threshold value of prediction probability and setting is compared, and exceedes threshold value and then divides candidate events into real event.
Described step 8 specifically comprises:
Step 801: participle, part of speech identification, named entity recognition and syntactic analysis work are carried out for real event sentence.
Step 802: judge whether each word after participle appears in event role dictionary, mark event role characteristic.
Step 803: the feature extracting word in event sentence, comprises the context environmental feature of word essential characteristic and word.Generate the file of consolidation form to process, adopt maximum entropy model MR to predict, the word selecting prediction probability maximum for each role category is as final Event element.
Step 804: circular treatment event sentence, finally completes event extraction task.
Compared with prior art, good effect of the present invention is:
Often exist by non-event mistake as event extraction out in existing method and technology, and the Event element extracted has the problem of disappearance.The present invention with a large amount of corpus for rely on, build trigger word dictionary and Event element role dictionary, adopt the method training of machine learning to obtain the model of event extraction, ensure that accuracy and the integrality of feature, can effectively solve the above-mentioned two problems existed in event extraction.Thought based on this method achieves the program function of event extraction, and done corresponding test with the language material in field of food safety, can see that the event category of extraction is comparatively accurate and each element information of event is also comparatively complete by result, thus substantially increase the monitoring efficiency of sensitive information.
Accompanying drawing explanation
Fig. 1 is language material pre-service and the process flow diagram building trigger word dictionary.
Fig. 2 is the process flow diagram building element role dictionary.
Fig. 3 extracts corpus feature, and machine learning generates the process flow diagram being used for the maximum entropy model that event category judges.
Fig. 4 is the feature of word in extraction event, uses machine learning method to generate the process flow diagram being used for the maximum entropy model that Event element role identifies.
Fig. 5 is to pending language material identification candidate sentence, adopts maximum entropy model to screen the process flow diagram of real event sentence.
Fig. 6 is the process flow diagram to adopting maximum entropy model to obtain each event role word in event sentence.
Embodiment
Below in conjunction with accompanying drawing, this method is described in detail.
Fig. 1 is language material pre-service and the implementation process building trigger word dictionary, and concrete grammar comprises:
Step 1: language material pre-service.The manual event language material collected food security and be correlated with, marks the corpus collected, marks event, the Role Information of the trigger word in mark event, event type information, Event element for each sentence in language material by the mode labelled.
The quality and scale of language material greatly affects the result of machine learning, and the language material used in this method is the manual text collecting screening, and the representativeness of outstanding event, contains all event types to be processed simultaneously.By labelling to language material, in routine processes process, identify in sentence whether comprise event information, and each element role in the type of event and event.
Step 2: build trigger word dictionary.In dictionary, the content of every a line comprises trigger word and event category corresponding to trigger word.
Event trigger word is the key character determining event category, clearly have expressed the generation of event, the task of event type recognition can be converted to the identification mission of trigger word classification, and trigger word identifies also important role for the role of follow-up Event element simultaneously.
The specific implementation process of this step comprises:
Step 201: manual formulate seed trigger word dictionary, the every a line in dictionary comprises seed trigger word and event category corresponding to trigger word, and event category contains all categories that needs are predicted.
Step 202: for each seed trigger word T, obtain its all synonym, near synonym language by " the Chinese concept dictionary " of coupling Peking University computational language institute, judge in all words, whether there be M the event category above in seed trigger word dictionary and belonging to trigger word identical with the classification of trigger word T.If the same all synonyms, near synonym language are placed in seed trigger word dictionary, affiliated event category is identical with the event category of T.
Step 203: each seed trigger word of circular treatment, until traversal terminates, has built trigger word dictionary.
Fig. 2 is the implementation process building element role trigger word dictionary, and concrete steps comprise:
Step 3: build Event element role dictionary.
Event Role Information is comprised, the information such as the time that event occurs in general, place, event participant (the event person of sending and event recipient), event result description in each event.Event role is many to be served as by entity word, each role construction of event Global Information of event.Be that work is identified to the role of each element in event to the extraction Task Switching of Event element.
The specific implementation process of this step comprises:
Step 301: manual formulate seed element role dictionary, the every a line in dictionary comprises element word in event and role category corresponding to word, and the role category in dictionary contains role categories all in frequent event.
Step 302: for each element word R, obtain its all synonym, near synonym language by " the Chinese concept dictionary " of coupling Peking University computational language institute, judge in all words, whether there is N number of role category above in element role dictionary and belonging to element word identical with the role category of element word R.If the same all synonyms, near synonym language are placed in role's dictionary, affiliated role category is identical with the role category of R.
Step 303: each seed trigger word of circular treatment, until traversal terminates, has built Event element role dictionary.
Fig. 3, by extracting feature to corpus, carries out machine learning and finally generates maximum entropy model for identifying event category.Concrete steps comprise:
Step 4: for the corpus of mark, adopts the method training pattern of machine learning, obtains the maximum entropy model MT of decision event classification.Maximum entropy model is based on maximum entropy theory, and namely when we need to predict the probability distribution of a random occurrence, our prediction should meet all known condition, and does not do any subjectivity hypothesis to the situation of the unknown.In this case, probability distribution is the most even, the least risk of prediction.A distinguishing feature of maximum entropy model does not require that feature is separate, therefore can add arbitrarily the effective feature of final classification.A large amount of word itself and contextual feature is related in this method, and the dimension of feature not quite identical, therefore adopt maximum entropy method to carry out model training and prediction.
The specific implementation process of this step comprises:
Step 401: read corpus, segmentation, subordinate sentence process are carried out for each section of language material.
Step 402: for each sentence, judges whether to comprise event information by label.
Step 403: for the sentence comprising event information, carries out participle and part of speech identification to sentence.The trigger word of acquisition event, event type.
Step 404: to the feature extracting event entity after the sentence subordinate sentence comprising event information, selected feature comprises morphology and the feature such as part of speech, event category of P word after the morphology of P word before trigger word, the part of speech of trigger word, trigger word and part of speech, trigger word.
Step 405: the input file feature of all events being generated consolidation form, obtains maximum entropy model MT by the method training of machine learning.
Fig. 4 extracts feature to word each in event sentence, carries out machine learning and the final maximum entropy model generated for identifying an element role in event entity.Concrete steps comprise:
Step 5: for each word in event sentence, extracts word and sentence characteristics, and training maximum entropy model MR is used for extracting Event element from event sentence.Should contain all roles of Event element in corpus, the corresponding role of each word in event, the role of word identifies the many classification Processing tasks be finally converted into word.
The specific implementation process of this step comprises:
Step 501: for each the element word in event, extract elemental characteristic, the event type that specific features comprises the character types of the Event element word in the morphology of element word, part of speech, named entity recognition result, character types, the morphology of an element word front and back Q word, part of speech and these words, the interdependent chain of syntactic relation between trigger word, affiliated event followed in word.
Step 502: the input file feature of all elements word in event sentence being generated consolidation form, carries out model training by the method for machine learning and obtains maximum entropy model MR.
Fig. 5 carries out event extraction identification to new language material, judges candidate events after pre-service by trigger word, screens real event for the candidate events obtained by maximum entropy model MR.Concrete steps comprise:
Step 6: for the language material needing extraction event, carries out segmentation, subordinate sentence, word segmentation processing, judging whether to appear in trigger word dictionary, if there is then sentence being divided into candidate events, otherwise process being abandoned in sentence for the word in each sentence.
The specific implementation process of this step comprises:
Step 601: read language material to be extracted, carries out segmentation, subordinate sentence process to language material.
Step 602: carry out word segmentation processing for each sentence, judges whether comprise trigger word in word, if comprise trigger word, sentence is classified as candidate events sentence.
Step 7: for candidate events, carry out by application maximum entropy model MT process of classifying, model can export the probability that current candidate event belongs to each event category more, by probability and the threshold comparison pre-set, exceedes threshold value and then event is incorporated into as corresponding classification.
The specific implementation process of this step comprises:
Step 701: for candidate events sentence, obtains the part of speech that word is corresponding after participle.Extract the feature of candidate events, specific features is as described in step 404.
Step 702: for all candidate events sentences, generate the input file of consolidation form, the maximum entropy model MT in applying step 4 predicts, the threshold value of prediction probability and setting is compared, and exceedes threshold value and then divides candidate events into real event.
Fig. 6 extracts feature to the event sentence identified, comprise the word essential characteristic in event sentence and contextual feature, adopt maximum entropy model MR to carry out the judgement of event role category to each word, the word selecting often kind of role category prediction probability maximum is as the final word of event.Concrete steps comprise:
Step 8: for real event sentence, application maximum entropy model MR extracts each element word of event, completes event extraction task.
The specific implementation process of this step comprises:
Step 801: participle, part of speech identification, named entity recognition and syntactic analysis work are carried out for real event sentence.
Step 802: judge whether each word after participle appears in event role dictionary, mark event role characteristic.Step 803: the feature extracting word in event sentence, specific features is as described in step 501.Generate the file of consolidation form to process, adopt maximum entropy model MR to predict, the word selecting prediction probability maximum for each role category is as final Event element.
Step 804: circular treatment event sentence, completes event extraction task.
Step 9: the event be drawn into mated with the event be drawn in monitored text message if Event element is all identical, is then same event, namely judges that monitored text message is as sensitive information.
So far, the whole flow process of the event extraction method based on maximum entropy model in this paper terminates.First the present invention identifies candidate events by trigger word, extracts essential characteristic and context environmental feature for the candidate events identified, and application maximum entropy model screens, and compares select real event sentence to prediction of output probability results and threshold value.After determining event sentence, ensuing work is each component determining event, this method continues to adopt maximum entropy model, feature, the word context environmental characteristic sum syntactic feature of each word in extraction event sentence, each word is carried out to the many classification prediction of role category, the word selecting often kind of character list prediction probability maximum is as the final component of event.The present invention adopts machine learning method, based on a large amount of language material, ensure that popularity and the accuracy of model, and achieves and extract effect preferably.
Claims (10)
1., based on a sensitive information monitoring method for event extraction, the steps include:
1) a trigger word dictionary and an Event element role dictionary is built; Wherein, corresponding event category of each trigger word in trigger word dictionary, records role's title that Event element in each event is corresponding in Event element role dictionary;
2) for marking corpus, adopting the method training pattern of machine learning, obtaining the maximum entropy model MT of decision event classification and the maximum entropy model MR for extracting Event element from event sentence;
3) filtering needing the language material of extraction event according to trigger word, will the sentence alternatively event of setting trigger word be matched;
4) by maximum entropy model MT, described candidate events is classified, obtain the event sentence belonging to setting event category;
5) according to Event element role dictionary and maximum entropy model MR from step 4) extract each element word of event gained event sentence, complete event extraction;
6) event be drawn into being mated with event to be monitored, if Event element is all identical, is then same event, namely judges that the event that is drawn into is as sensitive information.
2. the method for claim 1, is characterized in that, the construction method of described trigger word dictionary is:
21) initial setting up one seed trigger word dictionary, the every a line in dictionary comprises seed trigger word and event category corresponding to trigger word, and described event category is the multiple classifications containing needs prediction;
22) for each seed trigger word T, obtain its all synonym, near synonym language by coupling thesaurus, carry out the expansion of seed trigger word dictionary, obtain described trigger word dictionary.
3. the method for claim 1, is characterized in that, the construction method of described Event element role dictionary is:
31) initial setting up seed Event element role dictionary, the every a line in dictionary comprises element word in event and role category corresponding to word; Described role category contains the multiple role categories in each setting event;
32) for each element word R, obtain its all synonym, near synonym language by coupling thesaurus, carry out the expansion of Event element role dictionary.
4. the method as described in claim 1 or 2 or 3, is characterized in that, the method obtaining described maximum entropy model MT is:
41) reading marks corpus, carries out segmentation, subordinate sentence process for each section of language material;
42) for each sentence obtained, judge whether to comprise event information by label; For the sentence comprising event information, participle and part of speech identification are carried out to sentence, obtain trigger word, the event type of event;
43) extract the feature of event entity, then extracted feature is generated the input file of consolidation form, obtain described maximum entropy model MT by the method training of machine learning.
5. method as claimed in claim 4, is characterized in that, described feature comprises morphology and part of speech, the event category of P word after the morphology of P word before trigger word, the part of speech of trigger word, trigger word and part of speech, trigger word.
6. the method as described in claim 1 or 2 or 3, is characterized in that, the method obtaining described maximum entropy model MR is:
61) reading marks corpus, carries out segmentation, subordinate sentence process for each section of language material;
62) for each the element word in each the event sentence obtained, extract elemental characteristic and generated the input file of consolidation form, carrying out model training by the method for machine learning and obtain described maximum entropy model MR.
7. method as claimed in claim 6, it is characterized in that, described elemental characteristic comprises: the morphology of element word, part of speech, named entity recognition result, character types, the character types of the Event element word before element word in the morphology of Q word, part of speech and these words, the character types of the Event element word after element word in the morphology of Q word, part of speech and these words, the interdependent chain of syntactic relation between element word and trigger word, the event type of event belonging to element word.
8. the method for claim 1, it is characterized in that, the acquisition methods of described candidate events is: first to needing the language material of extraction event to carry out segmentation, subordinate sentence process, then word segmentation processing is carried out to each sentence obtained after subordinate sentence, judge whether comprise trigger word in word, if comprise trigger word, sentence is classified as candidate events sentence.
9. the method for claim 1, is characterized in that, described acquisition belong to setting event category event sentence method for: first part of speech corresponding to word is obtained after carrying out participle for candidate events sentence described in each; Then extract the feature of candidate events sentence and generated the input file of consolidation form, then described maximum entropy model MT is utilized to predict, the threshold value of prediction probability and setting is compared, exceedes threshold value and then the candidate events of correspondence is divided into the event sentence belonging to setting event category.
10. the method for claim 1, is characterized in that, described step 5) in carry out event extraction method be: first participle, part of speech identification, named entity recognition and syntactic analysis are carried out to the event sentence belonging to setting event category; Then judge whether each word after participle appears in described Event element role dictionary, mark event role characteristic, in extraction event sentence word feature and generate the input file of consolidation form, then adopt described maximum entropy model MR to predict, select the maximum word of prediction probability as final Event element for each role category.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410849418.XA CN104572958B (en) | 2014-12-29 | 2014-12-29 | A kind of sensitive information monitoring method based on event extraction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410849418.XA CN104572958B (en) | 2014-12-29 | 2014-12-29 | A kind of sensitive information monitoring method based on event extraction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104572958A true CN104572958A (en) | 2015-04-29 |
CN104572958B CN104572958B (en) | 2018-10-02 |
Family
ID=53089020
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410849418.XA Active CN104572958B (en) | 2014-12-29 | 2014-12-29 | A kind of sensitive information monitoring method based on event extraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104572958B (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105138515A (en) * | 2015-09-02 | 2015-12-09 | 百度在线网络技术(北京)有限公司 | Named entity recognition method and device |
CN105389304A (en) * | 2015-10-27 | 2016-03-09 | 小米科技有限责任公司 | Event extraction method and apparatus |
CN107301167A (en) * | 2017-05-25 | 2017-10-27 | 中国科学院信息工程研究所 | A kind of work(performance description information recognition methods and device |
CN107562772A (en) * | 2017-07-03 | 2018-01-09 | 南京柯基数据科技有限公司 | Event extraction method, apparatus, system and storage medium |
CN108701014A (en) * | 2016-03-09 | 2018-10-23 | 电子湾有限公司 | Inquiry database for tail portion inquiry |
CN109344251A (en) * | 2018-09-11 | 2019-02-15 | 东南大学 | A kind of particular text information extraction method based on layer classifier and template matching |
CN109376202A (en) * | 2018-10-30 | 2019-02-22 | 青岛理工大学 | NLP-based enterprise supply relationship automatic extraction and analysis method |
CN109582949A (en) * | 2018-09-14 | 2019-04-05 | 阿里巴巴集团控股有限公司 | Event element abstracting method, calculates equipment and storage medium at device |
CN110298039A (en) * | 2019-06-20 | 2019-10-01 | 北京百度网讯科技有限公司 | Recognition methods, system, equipment and the computer readable storage medium of event |
CN110309256A (en) * | 2018-03-09 | 2019-10-08 | 北京国双科技有限公司 | The acquisition methods and device of event data in a kind of text |
CN110309296A (en) * | 2018-03-09 | 2019-10-08 | 北京国双科技有限公司 | A kind of Event Distillation method and device |
CN110348018A (en) * | 2019-07-16 | 2019-10-18 | 苏州大学 | The method for completing simple event extraction using part study |
CN110597997A (en) * | 2019-07-19 | 2019-12-20 | 中国人民解放军国防科技大学 | Military scenario text event extraction corpus iterative construction method and device |
CN110737821A (en) * | 2018-07-03 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | Similar event query method, device, storage medium and terminal equipment |
WO2020041624A1 (en) * | 2018-08-22 | 2020-02-27 | Paypal, Inc. | Cleartext password detection using machine learning |
CN111723564A (en) * | 2020-05-27 | 2020-09-29 | 西安交通大学 | Event extraction and processing method for case-following electronic file |
CN111881258A (en) * | 2020-07-28 | 2020-11-03 | 广东工业大学 | Self-learning event extraction method and application thereof |
CN111967268A (en) * | 2020-06-30 | 2020-11-20 | 北京百度网讯科技有限公司 | Method and device for extracting events in text, electronic equipment and storage medium |
CN112052682A (en) * | 2020-09-02 | 2020-12-08 | 平安资产管理有限责任公司 | Event entity joint extraction method and device, computer equipment and storage medium |
CN112182346A (en) * | 2020-10-26 | 2021-01-05 | 上海蜜度信息技术有限公司 | Method and equipment for extracting entity information of emergency |
CN112241457A (en) * | 2020-09-22 | 2021-01-19 | 同济大学 | Event detection method for event of affair knowledge graph fused with extension features |
CN112380300A (en) * | 2020-12-11 | 2021-02-19 | 武汉烽火众智数字技术有限责任公司 | Multi-class event element extraction and analysis method and equipment |
CN113010593A (en) * | 2021-04-02 | 2021-06-22 | 北京智通云联科技有限公司 | Method, system and device for extracting events of unstructured text |
CN113792083A (en) * | 2021-06-02 | 2021-12-14 | 的卢技术有限公司 | Event extraction and judgment method and system |
CN115293156A (en) * | 2022-09-29 | 2022-11-04 | 四川大学华西医院 | Method and device for extracting prison short message abnormal event, computer equipment and medium |
US11593855B2 (en) | 2015-12-30 | 2023-02-28 | Ebay Inc. | System and method for computing features that apply to infrequent queries |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080052315A1 (en) * | 2004-06-30 | 2008-02-28 | Matsushita Electric Industrial Co., Ltd. | Event Importance Adjustment Method And Device For Carrying Out The Method |
CN102193951A (en) * | 2010-03-19 | 2011-09-21 | 华为技术有限公司 | Information extracting method and system |
CN102693314A (en) * | 2012-05-29 | 2012-09-26 | 代松 | Sensitive information monitoring method based on event search |
CN102693219A (en) * | 2012-06-05 | 2012-09-26 | 苏州大学 | Method and system for extracting Chinese event |
-
2014
- 2014-12-29 CN CN201410849418.XA patent/CN104572958B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080052315A1 (en) * | 2004-06-30 | 2008-02-28 | Matsushita Electric Industrial Co., Ltd. | Event Importance Adjustment Method And Device For Carrying Out The Method |
CN102193951A (en) * | 2010-03-19 | 2011-09-21 | 华为技术有限公司 | Information extracting method and system |
CN102693314A (en) * | 2012-05-29 | 2012-09-26 | 代松 | Sensitive information monitoring method based on event search |
CN102693219A (en) * | 2012-06-05 | 2012-09-26 | 苏州大学 | Method and system for extracting Chinese event |
Non-Patent Citations (1)
Title |
---|
赵妍妍: "中文事件抽取的相关技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105138515B (en) * | 2015-09-02 | 2018-10-19 | 百度在线网络技术(北京)有限公司 | Name entity recognition method and device |
CN105138515A (en) * | 2015-09-02 | 2015-12-09 | 百度在线网络技术(北京)有限公司 | Named entity recognition method and device |
CN105389304B (en) * | 2015-10-27 | 2018-11-02 | 小米科技有限责任公司 | Event Distillation method and device |
CN105389304A (en) * | 2015-10-27 | 2016-03-09 | 小米科技有限责任公司 | Event extraction method and apparatus |
US11593855B2 (en) | 2015-12-30 | 2023-02-28 | Ebay Inc. | System and method for computing features that apply to infrequent queries |
CN108701014A (en) * | 2016-03-09 | 2018-10-23 | 电子湾有限公司 | Inquiry database for tail portion inquiry |
CN107301167A (en) * | 2017-05-25 | 2017-10-27 | 中国科学院信息工程研究所 | A kind of work(performance description information recognition methods and device |
CN107562772A (en) * | 2017-07-03 | 2018-01-09 | 南京柯基数据科技有限公司 | Event extraction method, apparatus, system and storage medium |
CN107562772B (en) * | 2017-07-03 | 2020-03-24 | 南京柯基数据科技有限公司 | Event extraction method, device, system and storage medium |
CN110309256A (en) * | 2018-03-09 | 2019-10-08 | 北京国双科技有限公司 | The acquisition methods and device of event data in a kind of text |
CN110309296A (en) * | 2018-03-09 | 2019-10-08 | 北京国双科技有限公司 | A kind of Event Distillation method and device |
CN110737821A (en) * | 2018-07-03 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | Similar event query method, device, storage medium and terminal equipment |
US10762192B2 (en) | 2018-08-22 | 2020-09-01 | Paypal, Inc. | Cleartext password detection using machine learning |
WO2020041624A1 (en) * | 2018-08-22 | 2020-02-27 | Paypal, Inc. | Cleartext password detection using machine learning |
CN109344251A (en) * | 2018-09-11 | 2019-02-15 | 东南大学 | A kind of particular text information extraction method based on layer classifier and template matching |
CN109582949A (en) * | 2018-09-14 | 2019-04-05 | 阿里巴巴集团控股有限公司 | Event element abstracting method, calculates equipment and storage medium at device |
CN109376202B (en) * | 2018-10-30 | 2021-08-03 | 青岛理工大学 | NLP-based enterprise supply relationship automatic extraction and analysis method |
CN109376202A (en) * | 2018-10-30 | 2019-02-22 | 青岛理工大学 | NLP-based enterprise supply relationship automatic extraction and analysis method |
CN110298039A (en) * | 2019-06-20 | 2019-10-01 | 北京百度网讯科技有限公司 | Recognition methods, system, equipment and the computer readable storage medium of event |
CN110298039B (en) * | 2019-06-20 | 2023-05-30 | 北京百度网讯科技有限公司 | Event place identification method, system, equipment and computer readable storage medium |
CN110348018A (en) * | 2019-07-16 | 2019-10-18 | 苏州大学 | The method for completing simple event extraction using part study |
CN110597997A (en) * | 2019-07-19 | 2019-12-20 | 中国人民解放军国防科技大学 | Military scenario text event extraction corpus iterative construction method and device |
CN110597997B (en) * | 2019-07-19 | 2022-03-22 | 中国人民解放军国防科技大学 | Military scenario text event extraction corpus iterative construction method and device |
CN111723564A (en) * | 2020-05-27 | 2020-09-29 | 西安交通大学 | Event extraction and processing method for case-following electronic file |
CN111967268A (en) * | 2020-06-30 | 2020-11-20 | 北京百度网讯科技有限公司 | Method and device for extracting events in text, electronic equipment and storage medium |
CN111967268B (en) * | 2020-06-30 | 2024-03-19 | 北京百度网讯科技有限公司 | Event extraction method and device in text, electronic equipment and storage medium |
CN111881258A (en) * | 2020-07-28 | 2020-11-03 | 广东工业大学 | Self-learning event extraction method and application thereof |
CN111881258B (en) * | 2020-07-28 | 2023-06-20 | 广东工业大学 | Self-learning event extraction method and application thereof |
CN112052682A (en) * | 2020-09-02 | 2020-12-08 | 平安资产管理有限责任公司 | Event entity joint extraction method and device, computer equipment and storage medium |
CN112241457A (en) * | 2020-09-22 | 2021-01-19 | 同济大学 | Event detection method for event of affair knowledge graph fused with extension features |
CN112182346A (en) * | 2020-10-26 | 2021-01-05 | 上海蜜度信息技术有限公司 | Method and equipment for extracting entity information of emergency |
CN112380300A (en) * | 2020-12-11 | 2021-02-19 | 武汉烽火众智数字技术有限责任公司 | Multi-class event element extraction and analysis method and equipment |
CN113010593A (en) * | 2021-04-02 | 2021-06-22 | 北京智通云联科技有限公司 | Method, system and device for extracting events of unstructured text |
CN113010593B (en) * | 2021-04-02 | 2024-02-13 | 北京智通云联科技有限公司 | Event extraction method, system and device for unstructured text |
CN113792083A (en) * | 2021-06-02 | 2021-12-14 | 的卢技术有限公司 | Event extraction and judgment method and system |
CN115293156A (en) * | 2022-09-29 | 2022-11-04 | 四川大学华西医院 | Method and device for extracting prison short message abnormal event, computer equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN104572958B (en) | 2018-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104572958B (en) | A kind of sensitive information monitoring method based on event extraction | |
CN104598535B (en) | A kind of event extraction method based on maximum entropy | |
CN109189942B (en) | Construction method and device of patent data knowledge graph | |
CN106649818B (en) | Application search intention identification method and device, application search method and server | |
CN110020422B (en) | Feature word determining method and device and server | |
CN107330011A (en) | The recognition methods of the name entity of many strategy fusions and device | |
CN111309910A (en) | Text information mining method and device | |
CN107102993B (en) | User appeal analysis method and device | |
CN105930411A (en) | Classifier training method, classifier and sentiment classification system | |
CN103699625A (en) | Method and device for retrieving based on keyword | |
CN104076944A (en) | Chat emoticon input method and device | |
CN107544988B (en) | Method and device for acquiring public opinion data | |
CN110032639A (en) | By the method, apparatus and storage medium of semantic text data and tag match | |
CN105426356A (en) | Target information identification method and apparatus | |
CN103294664A (en) | Method and system for discovering new words in open fields | |
CN104809105B (en) | Recognition methods and the system of event argument and argument roles based on maximum entropy | |
CN105183717A (en) | OSN user emotion analysis method based on random forest and user relationship | |
CN113742733B (en) | Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type | |
CN113076735B (en) | Target information acquisition method, device and server | |
CN112069312B (en) | Text classification method based on entity recognition and electronic device | |
CN110334180B (en) | Mobile application security evaluation method based on comment data | |
CN110909542B (en) | Intelligent semantic serial-parallel analysis method and system | |
CN104881399B (en) | Event recognition method and system based on probability soft logic PSL | |
CN111177367A (en) | Case classification method, classification model training method and related products | |
CN110968664A (en) | Document retrieval method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |