CN108846018A - A kind of Chinese food safety media event Information Automatic Extraction method towards news - Google Patents

A kind of Chinese food safety media event Information Automatic Extraction method towards news Download PDF

Info

Publication number
CN108846018A
CN108846018A CN201810427945.XA CN201810427945A CN108846018A CN 108846018 A CN108846018 A CN 108846018A CN 201810427945 A CN201810427945 A CN 201810427945A CN 108846018 A CN108846018 A CN 108846018A
Authority
CN
China
Prior art keywords
food
news
food safety
information
corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810427945.XA
Other languages
Chinese (zh)
Inventor
陈瑛
程曦瑶
侯文俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN201810427945.XA priority Critical patent/CN108846018A/en
Publication of CN108846018A publication Critical patent/CN108846018A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The Chinese food safety media event Information Automatic Extraction method towards news that the present invention relates to a kind of, includes the following steps:S1 step:Acquire Internet news structuring food prods news corpus library;China geography information knowledge base is arranged, geography information knowledge base is constructed;Food variety information, structuring food prods type knowledge base are collected from network;S2 step:Text classification is carried out using news corpus of the textual classification model to food news corpus library, obtains food safety news corpus;S3 step:Time To Event in the food safety news corpus is extracted first, secondly, venue location point in the food safety news corpus is extracted using geography information knowledge base, finally, being extracted using food variety knowledge base to the food name and information that are related in the food safety news corpus;S4 step:The time extracted, place and food information are counted;S5 step:The time, place and food statistical information are shown using visualization technique.The present invention can accurately extract the information of the time of food safety affair, these three dimensions of place and food variety from Internet news, and count to these information, finally intuitively be shown statistical result.

Description

A kind of Chinese food safety media event Information Automatic Extraction method towards news
Technical field
The present invention relates to natural language processing fields, more particularly to a kind of Chinese food safety news thing towards news Part Information Automatic Extraction method.
Background technique
With internet information explosive growth and propagation, human society comes into the information epoch extremely abundant.By It is related to the health and lives of masses itself in food safety, so food safety affair has become in the various network informations The focus of public concern.For government regulation angle, once the negative event for relating to food safety occurs, government is wished to It is enough to obtain the message at the first moment, and its negative effect is reduced within the shortest time.Therefore, how to be mentioned from network automatically It takes, analyze the critical issue that these food safety affairs are food safety Regulations.
Since same topic information is often issued in the different network platforms in the different time, it is comprehensive to be not easy to people Understand information.Under this background, how natural language processing technique is utilized, target letter is automatically extracted out from mass text data Breath seems particularly necessary.In addition it is also considerable for how being clearly presented to the user target information.It is directed to food as a result, The specific requirements of Security incident handling, the present invention propose a network-oriented news food safety affair extracts automatically with visually Change analysis method, Text Classification and the information extraction technique for comprehensively utilizing natural language processing field are new from mass network Food safety affair relevant information is automatically extracted out in news.
Summary of the invention
(1) technical problems to be solved
The Chinese food safety media event information towards news that the technical problem to be solved in the present invention is to provide a kind of is certainly Dynamic abstracting method, food safety affair relevant information can be automatically extracted out from mass network news, solves artificial extraction The time-consuming and laborious problem of food safety affair information.
(2) technical solution
In order to solve the above-mentioned technical problems, the present invention provides a kind of, and the Chinese food safety media event towards news is believed Automatic abstracting method is ceased, the described method comprises the following steps:
Step S1:Acquire Internet news structuring food prods news corpus library;Arrange China geography information knowledge base, building Geography information knowledge base;Food variety information architecture food variety knowledge base is found from network;
Step S2:Text classification is carried out using news corpus of the textual classification model to food news corpus library, is eaten Product security news corpus;
Step S3:Time To Event in the food safety news corpus is extracted first, secondly, using geographical Information knowledge library extracts venue location point in the food safety news corpus, finally, utilizing food variety knowledge Library extracts the food name in the food safety news corpus;
Step S4:The time extracted, place and food information are counted;
Step S5:The time, place and food statistical information are shown using visualization technique.
Further, step S1 is specifically included:
Collect food news from food news website (food partner net), extract title therein, the date, source, abstract, The information such as text are saved in database using unified xml format, structuring food prods news corpus library;
China geography information knowledge base is arranged to deposit using tree structure according to the classification format in province, city, county, street It stores up in database, constructs geography information knowledge base;
Conventional food title and its information are obtained from network, and food is divided by 33 classes according to QS standard, it will be described The food name and information got is classified according to this 33 major class, structuring food prods type knowledge base;
Further, step S2 is specifically included:
Corpus in food news corpus library is utilized into the integrated form food safety text classification based on deep learning Method carries out classification processing to it, finally obtains food news corpus and food safety news corpus;
Further, step S3 is specifically included:
Time To Event in the food safety news corpus is extracted first, due to food safety news when The speed that effect property and nowadays news media propagate, the time that the report of news is occurred as event, therefore directly extract news The time of report is as food safety affair time of origin;
Secondly, being mentioned using the geography information knowledge base to venue location point in the food safety news corpus It takes, the information in the geography information knowledge base is extracted in the food safety news, obtains the food safety The geography information of evental news;
Finally, using the food variety knowledge base to food name involved in the food safety news corpus It extracts, the information in the food variety knowledge base is extracted in the food safety news, obtains the food The food name and food variety information of product security news;
Further, step S4 is specifically included:
The food safety affair time of origin is counted according to year span, obtains time statistics letter Breath;
The food safety affair scene is counted according to annual each provinces and cities, obtains place statistical information;
The food name obtained in the food safety affair is extracted, carries out classification system according to food variety Meter, obtains food variety statistical information;
Further, step S5 is specifically included:
The time statistical information is shown in line chart;
By the place statistical information, annual each province and city food safety affair quantity is opened up in heating power map Show;
By the food variety statistical information, it is shown in histogram.
Detailed description of the invention
Fig. 1 is a kind of Chinese food safety media event Information Automatic Extraction method flow diagram towards news;
Fig. 2 is that the information visualization based on time dimension is shown.
Fig. 3 is that the information visualization based on Spatial Dimension is shown.
Fig. 4 is to be shown based on food variety information visuallization.
Specific embodiment
To keep the contents of the present invention clearer, embodiment of the present invention is carried out specifically below in conjunction with attached drawing It is bright.
A kind of Chinese food safety media event Information Automatic Extraction method towards news provided by the invention, can be certainly Dynamic extract from Internet news obtains food safety time related information, and visualizes to obtained information. Its work flow diagram is as shown in Figure 1.
Step S1:The acquisition of corpus is carried out, relevant knowledge library is constructed;The specific steps are:
Data are crawled from food partner net, the data of collection are pre-processed, including:
Processing is formatted to the news documents crawled, extracts title therein, date, source, abstract, text Data are saved in database using unified format, obtain food news relevant documentation corpus, totally 170053 documents;
China geography information knowledge base is arranged to deposit using tree structure according to the classification format in province, city, county, street It stores up in database, constructs geography information knowledge base;
Conventional food title and its information are obtained from network, and food is divided by 33 classes according to QS standard, will acquire To food name and information classify according to this 33 major class, structuring food prods type knowledge base, including 1550 food Title;
Step S2:Text classification is carried out using news corpus of the textual classification model to food news corpus library, is eaten Product security news corpus;The specific steps are:
By the food news relevant documentation corpus, text classification is carried out using textual classification model, is given according to model Label out finally obtains 11354 food safety affair news documents;
Step S3:The extraction of food safety affair relevant information is carried out to the food safety affair news documents;Specific step Suddenly it is:
Due to the timeliness of news report, therefore using the news report time as relevant food security incident time of origin, mention The report time of the food safety affair news documents is taken to obtain relevant food security incident time of origin;
Food safety affair scene is carried out in the way of Rule Extraction by the China geography information knowledge base Information extraction obtains food safety affair scene;
Food information involved in food safety affair is carried out in the way of Rule Extraction by the food variety knowledge base It extracts, obtains the food information and its food variety information of the food safety affair;
Step S4:The obtained food safety affair relevant information is counted;The specific steps are:
It to the food safety affair time of origin, is counted according to the time, obtains the annual whole nation and food safety occurs The quantity of event;
It to the food safety affair scene, is counted according to time, provinces and cities, obtains annual each provinces and cities and sent out The quantity of raw food safety affair;
Food information and its food variety information to the food safety affair count, and obtain annual each food species The food safety affair quantity that class is occurred;
Step S5:The food safety affair relevant information statistical result is visualized:
The quantity that food safety affair occurs for the annual whole nation is shown using the form of line chart, such as Fig. 2 institute Show;
The quantity for the food safety affair that annual each provinces and cities are occurred, each province and city occur respectively according to the time Food safety affair quantity be shown on heating power map, as shown in Figure 3;
The food safety affair quantity that annual each food variety is occurred, according to the time respectively by each food species The food safety affair quantity of class is shown in histogram, as shown in Figure 4.
The above embodiments are only used to illustrate the present invention, and not limitation of the present invention, the technology in relation to technical field Personnel can also make a variety of changes, therefore all equivalent technical sides in the case where not departing from the method for the present invention and range Case also belongs to scope of the invention, and scope of patent protection proper right of the invention requires to limit.

Claims (6)

1. a kind of Chinese food safety media event Information Automatic Extraction method towards news, includes the following steps:
Obtain data, structuring food prods news corpus library, geography information knowledge base, food variety knowledge base;
Corpus in food news corpus library is subjected to text classification, obtains food safety news corpus;
Information extraction is carried out to the food safety news corpus, carries out food peace from three time, place, food variety dimensions Total event information extraction;
The obtained food safety affair information that extracts is counted;
The statistical result is visualized.
2. a kind of Chinese food safety media event Information Automatic Extraction method towards news according to claim 1, It is characterized in that, obtaining data, structuring food prods news corpus library, geography information knowledge base, food variety knowledge base are specific to wrap It includes:
News is collected from food news website (food partner net), extracts the letter such as title therein, date, source, abstract, text Breath, is saved in database using unified format, obtains food news corpus, structuring food prods news corpus library;
Chinese geography information is arranged, geography information knowledge base, including this 4 ranks of province, city, county and township/town/street are constructed, It specifically include 31 province/municipalities directly under the Central Government, 355 prefecture-level cities, 2831 county-level city/areas and 40548 township/town/streets;
Common common food name and its information are collected from network, structuring food prods type knowledge base specifically includes 33 A food variety, wherein altogether including 1550 food names.
3. a kind of Chinese food safety media event Information Automatic Extraction method towards news according to claim 1, It is characterized in that, corpus in food news corpus library is carried out text classification, food safety news corpus is obtained, it is specific to wrap It includes:
The food news corpus is classified using the integrated form food safety file classification method based on deep learning, is obtained To food safety news corpus.
4. a kind of Chinese food safety media event Information Automatic Extraction method towards news according to claim 1, It is characterized in that, to the food safety news corpus carry out information extraction, from the time, place, three dimensions of food variety into Row food safety affair information extraction, specifically includes:
For the food safety news corpus, the report time extracted in each food safety news documents occurs as event Time;
For the food safety news corpus, each food is extracted using geography information knowledge base described in claim 2 and is pacified Place in full news documents, including province, city, county, town level Four;
For the food safety news corpus, each food is extracted using food variety knowledge base described in claim 2 and is pacified Food variety information in full news documents.
5. a kind of Chinese food safety media event Information Automatic Extraction method towards news according to claim 1, It is characterized in that, being counted to the obtained food safety affair information that extracts, specifically include:
The time is extracted as a result, counting food safety affair according to 1 year for time span occurs quantity;
The place is extracted as a result, counting the food safety affair quantity that annual each province city occurs;
The food variety is extracted as a result, being obtained according to the food variety progress statistic of classification in the food variety knowledge base Quantity occurs for the food safety affair occurred to each food variety.
6. a kind of Chinese food safety media event Information Automatic Extraction method towards news according to claim 1, It is characterized in that, being visualized to the statistical result, specifically include:
For the time-based statistical result, statistical result displaying is carried out using line chart;
For the statistical result space-based, statistical result is shown using the heating power map in visualization technique;
For the statistical result of the food variety, statistical result is shown using histogram.
CN201810427945.XA 2018-05-07 2018-05-07 A kind of Chinese food safety media event Information Automatic Extraction method towards news Pending CN108846018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810427945.XA CN108846018A (en) 2018-05-07 2018-05-07 A kind of Chinese food safety media event Information Automatic Extraction method towards news

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810427945.XA CN108846018A (en) 2018-05-07 2018-05-07 A kind of Chinese food safety media event Information Automatic Extraction method towards news

Publications (1)

Publication Number Publication Date
CN108846018A true CN108846018A (en) 2018-11-20

Family

ID=64212797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810427945.XA Pending CN108846018A (en) 2018-05-07 2018-05-07 A kind of Chinese food safety media event Information Automatic Extraction method towards news

Country Status (1)

Country Link
CN (1) CN108846018A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135796A (en) * 2019-04-12 2019-08-16 平安普惠企业管理有限公司 A kind of project data methods of exhibiting and system
CN110532333A (en) * 2019-07-30 2019-12-03 平安科技(深圳)有限公司 Public good achievements exhibition method and relevant device
CN110532333B (en) * 2019-07-30 2024-06-21 平安科技(深圳)有限公司 Public welfare achievement display method and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408093A (en) * 2014-11-14 2015-03-11 中国科学院计算技术研究所 News event element extracting method and device
CN104424201A (en) * 2013-08-21 2015-03-18 富士通株式会社 Method and device for providing food safety information
CN106570164A (en) * 2016-11-07 2017-04-19 中国农业大学 Integrated foodstuff safety text classification method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424201A (en) * 2013-08-21 2015-03-18 富士通株式会社 Method and device for providing food safety information
CN104408093A (en) * 2014-11-14 2015-03-11 中国科学院计算技术研究所 News event element extracting method and device
CN106570164A (en) * 2016-11-07 2017-04-19 中国农业大学 Integrated foodstuff safety text classification method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李清光 等: "中国食品安全事件空间分布特点与变化趋势", 《经济地理》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135796A (en) * 2019-04-12 2019-08-16 平安普惠企业管理有限公司 A kind of project data methods of exhibiting and system
CN110532333A (en) * 2019-07-30 2019-12-03 平安科技(深圳)有限公司 Public good achievements exhibition method and relevant device
CN110532333B (en) * 2019-07-30 2024-06-21 平安科技(深圳)有限公司 Public welfare achievement display method and related equipment

Similar Documents

Publication Publication Date Title
CN112270027B (en) Paperless intelligent interactive examination method for city design based on entity model
CN105468605B (en) Entity information map generation method and device
CN104700191A (en) Intelligent mobile terminal based agricultural product quality and safety geographic information supervision and management method and system
Strötgen et al. Extraction and exploration of spatio-temporal information in documents
CN101751458A (en) Network public sentiment monitoring system and method
CN106682150A (en) Information processing method and device
CN102890702A (en) Internet forum-oriented opinion leader mining method
CN105138670A (en) Audio file label generation method and system
WO2014000518A1 (en) Public opinion information display system and method
CN110533212A (en) Urban waterlogging public sentiment monitoring and pre-alarming method based on big data
CN107872454A (en) A kind of monitoring of ultra-large type internet platform protection based on security rank threat information and analysis system and method based on big data technology
CN108229810A (en) Industry analysis system and method based on network information resource
Xu et al. Perceived pollution and inbound tourism for Shanghai: a panel VAR approach
CN102253939A (en) Searching method and system based on cloud computing technology
KR20150059208A (en) Device for analyzing the time-space correlation of the event in the social web media and method thereof
Cruz et al. Semantic extraction of geographic data from web tables for big data integration
Cheng et al. Process and application of data mining in the university library
CN108846018A (en) A kind of Chinese food safety media event Information Automatic Extraction method towards news
CN112363996B (en) Method, system and medium for establishing physical model of power grid knowledge graph
CN103995826A (en) Automatic cataloguing method for safety production supervision and administration governmental information
KR101088787B1 (en) Issue Analyzing System and Issue Data Generation Method
CN109710712B (en) Case element analysis-based crime hotspot feature mining method and system
CN106777395A (en) A kind of topic based on community's text data finds system
TW201640383A (en) Internet events automatic collection and analysis method and system thereof
Fleury et al. AMMA information system: an efficient cross‐disciplinary tool and a legacy for forthcoming projects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181120

WD01 Invention patent application deemed withdrawn after publication