CN108846018A - A kind of Chinese food safety media event Information Automatic Extraction method towards news - Google Patents
A kind of Chinese food safety media event Information Automatic Extraction method towards news Download PDFInfo
- Publication number
- CN108846018A CN108846018A CN201810427945.XA CN201810427945A CN108846018A CN 108846018 A CN108846018 A CN 108846018A CN 201810427945 A CN201810427945 A CN 201810427945A CN 108846018 A CN108846018 A CN 108846018A
- Authority
- CN
- China
- Prior art keywords
- food
- news
- food safety
- information
- corpus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The Chinese food safety media event Information Automatic Extraction method towards news that the present invention relates to a kind of, includes the following steps:S1 step:Acquire Internet news structuring food prods news corpus library;China geography information knowledge base is arranged, geography information knowledge base is constructed;Food variety information, structuring food prods type knowledge base are collected from network;S2 step:Text classification is carried out using news corpus of the textual classification model to food news corpus library, obtains food safety news corpus;S3 step:Time To Event in the food safety news corpus is extracted first, secondly, venue location point in the food safety news corpus is extracted using geography information knowledge base, finally, being extracted using food variety knowledge base to the food name and information that are related in the food safety news corpus;S4 step:The time extracted, place and food information are counted;S5 step:The time, place and food statistical information are shown using visualization technique.The present invention can accurately extract the information of the time of food safety affair, these three dimensions of place and food variety from Internet news, and count to these information, finally intuitively be shown statistical result.
Description
Technical field
The present invention relates to natural language processing fields, more particularly to a kind of Chinese food safety news thing towards news
Part Information Automatic Extraction method.
Background technique
With internet information explosive growth and propagation, human society comes into the information epoch extremely abundant.By
It is related to the health and lives of masses itself in food safety, so food safety affair has become in the various network informations
The focus of public concern.For government regulation angle, once the negative event for relating to food safety occurs, government is wished to
It is enough to obtain the message at the first moment, and its negative effect is reduced within the shortest time.Therefore, how to be mentioned from network automatically
It takes, analyze the critical issue that these food safety affairs are food safety Regulations.
Since same topic information is often issued in the different network platforms in the different time, it is comprehensive to be not easy to people
Understand information.Under this background, how natural language processing technique is utilized, target letter is automatically extracted out from mass text data
Breath seems particularly necessary.In addition it is also considerable for how being clearly presented to the user target information.It is directed to food as a result,
The specific requirements of Security incident handling, the present invention propose a network-oriented news food safety affair extracts automatically with visually
Change analysis method, Text Classification and the information extraction technique for comprehensively utilizing natural language processing field are new from mass network
Food safety affair relevant information is automatically extracted out in news.
Summary of the invention
(1) technical problems to be solved
The Chinese food safety media event information towards news that the technical problem to be solved in the present invention is to provide a kind of is certainly
Dynamic abstracting method, food safety affair relevant information can be automatically extracted out from mass network news, solves artificial extraction
The time-consuming and laborious problem of food safety affair information.
(2) technical solution
In order to solve the above-mentioned technical problems, the present invention provides a kind of, and the Chinese food safety media event towards news is believed
Automatic abstracting method is ceased, the described method comprises the following steps:
Step S1:Acquire Internet news structuring food prods news corpus library;Arrange China geography information knowledge base, building
Geography information knowledge base;Food variety information architecture food variety knowledge base is found from network;
Step S2:Text classification is carried out using news corpus of the textual classification model to food news corpus library, is eaten
Product security news corpus;
Step S3:Time To Event in the food safety news corpus is extracted first, secondly, using geographical
Information knowledge library extracts venue location point in the food safety news corpus, finally, utilizing food variety knowledge
Library extracts the food name in the food safety news corpus;
Step S4:The time extracted, place and food information are counted;
Step S5:The time, place and food statistical information are shown using visualization technique.
Further, step S1 is specifically included:
Collect food news from food news website (food partner net), extract title therein, the date, source, abstract,
The information such as text are saved in database using unified xml format, structuring food prods news corpus library;
China geography information knowledge base is arranged to deposit using tree structure according to the classification format in province, city, county, street
It stores up in database, constructs geography information knowledge base;
Conventional food title and its information are obtained from network, and food is divided by 33 classes according to QS standard, it will be described
The food name and information got is classified according to this 33 major class, structuring food prods type knowledge base;
Further, step S2 is specifically included:
Corpus in food news corpus library is utilized into the integrated form food safety text classification based on deep learning
Method carries out classification processing to it, finally obtains food news corpus and food safety news corpus;
Further, step S3 is specifically included:
Time To Event in the food safety news corpus is extracted first, due to food safety news when
The speed that effect property and nowadays news media propagate, the time that the report of news is occurred as event, therefore directly extract news
The time of report is as food safety affair time of origin;
Secondly, being mentioned using the geography information knowledge base to venue location point in the food safety news corpus
It takes, the information in the geography information knowledge base is extracted in the food safety news, obtains the food safety
The geography information of evental news;
Finally, using the food variety knowledge base to food name involved in the food safety news corpus
It extracts, the information in the food variety knowledge base is extracted in the food safety news, obtains the food
The food name and food variety information of product security news;
Further, step S4 is specifically included:
The food safety affair time of origin is counted according to year span, obtains time statistics letter
Breath;
The food safety affair scene is counted according to annual each provinces and cities, obtains place statistical information;
The food name obtained in the food safety affair is extracted, carries out classification system according to food variety
Meter, obtains food variety statistical information;
Further, step S5 is specifically included:
The time statistical information is shown in line chart;
By the place statistical information, annual each province and city food safety affair quantity is opened up in heating power map
Show;
By the food variety statistical information, it is shown in histogram.
Detailed description of the invention
Fig. 1 is a kind of Chinese food safety media event Information Automatic Extraction method flow diagram towards news;
Fig. 2 is that the information visualization based on time dimension is shown.
Fig. 3 is that the information visualization based on Spatial Dimension is shown.
Fig. 4 is to be shown based on food variety information visuallization.
Specific embodiment
To keep the contents of the present invention clearer, embodiment of the present invention is carried out specifically below in conjunction with attached drawing
It is bright.
A kind of Chinese food safety media event Information Automatic Extraction method towards news provided by the invention, can be certainly
Dynamic extract from Internet news obtains food safety time related information, and visualizes to obtained information.
Its work flow diagram is as shown in Figure 1.
Step S1:The acquisition of corpus is carried out, relevant knowledge library is constructed;The specific steps are:
Data are crawled from food partner net, the data of collection are pre-processed, including:
Processing is formatted to the news documents crawled, extracts title therein, date, source, abstract, text
Data are saved in database using unified format, obtain food news relevant documentation corpus, totally 170053 documents;
China geography information knowledge base is arranged to deposit using tree structure according to the classification format in province, city, county, street
It stores up in database, constructs geography information knowledge base;
Conventional food title and its information are obtained from network, and food is divided by 33 classes according to QS standard, will acquire
To food name and information classify according to this 33 major class, structuring food prods type knowledge base, including 1550 food
Title;
Step S2:Text classification is carried out using news corpus of the textual classification model to food news corpus library, is eaten
Product security news corpus;The specific steps are:
By the food news relevant documentation corpus, text classification is carried out using textual classification model, is given according to model
Label out finally obtains 11354 food safety affair news documents;
Step S3:The extraction of food safety affair relevant information is carried out to the food safety affair news documents;Specific step
Suddenly it is:
Due to the timeliness of news report, therefore using the news report time as relevant food security incident time of origin, mention
The report time of the food safety affair news documents is taken to obtain relevant food security incident time of origin;
Food safety affair scene is carried out in the way of Rule Extraction by the China geography information knowledge base
Information extraction obtains food safety affair scene;
Food information involved in food safety affair is carried out in the way of Rule Extraction by the food variety knowledge base
It extracts, obtains the food information and its food variety information of the food safety affair;
Step S4:The obtained food safety affair relevant information is counted;The specific steps are:
It to the food safety affair time of origin, is counted according to the time, obtains the annual whole nation and food safety occurs
The quantity of event;
It to the food safety affair scene, is counted according to time, provinces and cities, obtains annual each provinces and cities and sent out
The quantity of raw food safety affair;
Food information and its food variety information to the food safety affair count, and obtain annual each food species
The food safety affair quantity that class is occurred;
Step S5:The food safety affair relevant information statistical result is visualized:
The quantity that food safety affair occurs for the annual whole nation is shown using the form of line chart, such as Fig. 2 institute
Show;
The quantity for the food safety affair that annual each provinces and cities are occurred, each province and city occur respectively according to the time
Food safety affair quantity be shown on heating power map, as shown in Figure 3;
The food safety affair quantity that annual each food variety is occurred, according to the time respectively by each food species
The food safety affair quantity of class is shown in histogram, as shown in Figure 4.
The above embodiments are only used to illustrate the present invention, and not limitation of the present invention, the technology in relation to technical field
Personnel can also make a variety of changes, therefore all equivalent technical sides in the case where not departing from the method for the present invention and range
Case also belongs to scope of the invention, and scope of patent protection proper right of the invention requires to limit.
Claims (6)
1. a kind of Chinese food safety media event Information Automatic Extraction method towards news, includes the following steps:
Obtain data, structuring food prods news corpus library, geography information knowledge base, food variety knowledge base;
Corpus in food news corpus library is subjected to text classification, obtains food safety news corpus;
Information extraction is carried out to the food safety news corpus, carries out food peace from three time, place, food variety dimensions
Total event information extraction;
The obtained food safety affair information that extracts is counted;
The statistical result is visualized.
2. a kind of Chinese food safety media event Information Automatic Extraction method towards news according to claim 1,
It is characterized in that, obtaining data, structuring food prods news corpus library, geography information knowledge base, food variety knowledge base are specific to wrap
It includes:
News is collected from food news website (food partner net), extracts the letter such as title therein, date, source, abstract, text
Breath, is saved in database using unified format, obtains food news corpus, structuring food prods news corpus library;
Chinese geography information is arranged, geography information knowledge base, including this 4 ranks of province, city, county and township/town/street are constructed,
It specifically include 31 province/municipalities directly under the Central Government, 355 prefecture-level cities, 2831 county-level city/areas and 40548 township/town/streets;
Common common food name and its information are collected from network, structuring food prods type knowledge base specifically includes 33
A food variety, wherein altogether including 1550 food names.
3. a kind of Chinese food safety media event Information Automatic Extraction method towards news according to claim 1,
It is characterized in that, corpus in food news corpus library is carried out text classification, food safety news corpus is obtained, it is specific to wrap
It includes:
The food news corpus is classified using the integrated form food safety file classification method based on deep learning, is obtained
To food safety news corpus.
4. a kind of Chinese food safety media event Information Automatic Extraction method towards news according to claim 1,
It is characterized in that, to the food safety news corpus carry out information extraction, from the time, place, three dimensions of food variety into
Row food safety affair information extraction, specifically includes:
For the food safety news corpus, the report time extracted in each food safety news documents occurs as event
Time;
For the food safety news corpus, each food is extracted using geography information knowledge base described in claim 2 and is pacified
Place in full news documents, including province, city, county, town level Four;
For the food safety news corpus, each food is extracted using food variety knowledge base described in claim 2 and is pacified
Food variety information in full news documents.
5. a kind of Chinese food safety media event Information Automatic Extraction method towards news according to claim 1,
It is characterized in that, being counted to the obtained food safety affair information that extracts, specifically include:
The time is extracted as a result, counting food safety affair according to 1 year for time span occurs quantity;
The place is extracted as a result, counting the food safety affair quantity that annual each province city occurs;
The food variety is extracted as a result, being obtained according to the food variety progress statistic of classification in the food variety knowledge base
Quantity occurs for the food safety affair occurred to each food variety.
6. a kind of Chinese food safety media event Information Automatic Extraction method towards news according to claim 1,
It is characterized in that, being visualized to the statistical result, specifically include:
For the time-based statistical result, statistical result displaying is carried out using line chart;
For the statistical result space-based, statistical result is shown using the heating power map in visualization technique;
For the statistical result of the food variety, statistical result is shown using histogram.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810427945.XA CN108846018A (en) | 2018-05-07 | 2018-05-07 | A kind of Chinese food safety media event Information Automatic Extraction method towards news |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810427945.XA CN108846018A (en) | 2018-05-07 | 2018-05-07 | A kind of Chinese food safety media event Information Automatic Extraction method towards news |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108846018A true CN108846018A (en) | 2018-11-20 |
Family
ID=64212797
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810427945.XA Pending CN108846018A (en) | 2018-05-07 | 2018-05-07 | A kind of Chinese food safety media event Information Automatic Extraction method towards news |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108846018A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135796A (en) * | 2019-04-12 | 2019-08-16 | 平安普惠企业管理有限公司 | A kind of project data methods of exhibiting and system |
CN110532333A (en) * | 2019-07-30 | 2019-12-03 | 平安科技(深圳)有限公司 | Public good achievements exhibition method and relevant device |
CN110532333B (en) * | 2019-07-30 | 2024-06-21 | 平安科技(深圳)有限公司 | Public welfare achievement display method and related equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104408093A (en) * | 2014-11-14 | 2015-03-11 | 中国科学院计算技术研究所 | News event element extracting method and device |
CN104424201A (en) * | 2013-08-21 | 2015-03-18 | 富士通株式会社 | Method and device for providing food safety information |
CN106570164A (en) * | 2016-11-07 | 2017-04-19 | 中国农业大学 | Integrated foodstuff safety text classification method based on deep learning |
-
2018
- 2018-05-07 CN CN201810427945.XA patent/CN108846018A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104424201A (en) * | 2013-08-21 | 2015-03-18 | 富士通株式会社 | Method and device for providing food safety information |
CN104408093A (en) * | 2014-11-14 | 2015-03-11 | 中国科学院计算技术研究所 | News event element extracting method and device |
CN106570164A (en) * | 2016-11-07 | 2017-04-19 | 中国农业大学 | Integrated foodstuff safety text classification method based on deep learning |
Non-Patent Citations (1)
Title |
---|
李清光 等: "中国食品安全事件空间分布特点与变化趋势", 《经济地理》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135796A (en) * | 2019-04-12 | 2019-08-16 | 平安普惠企业管理有限公司 | A kind of project data methods of exhibiting and system |
CN110532333A (en) * | 2019-07-30 | 2019-12-03 | 平安科技(深圳)有限公司 | Public good achievements exhibition method and relevant device |
CN110532333B (en) * | 2019-07-30 | 2024-06-21 | 平安科技(深圳)有限公司 | Public welfare achievement display method and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112270027B (en) | Paperless intelligent interactive examination method for city design based on entity model | |
CN105468605B (en) | Entity information map generation method and device | |
CN104700191A (en) | Intelligent mobile terminal based agricultural product quality and safety geographic information supervision and management method and system | |
Strötgen et al. | Extraction and exploration of spatio-temporal information in documents | |
CN101751458A (en) | Network public sentiment monitoring system and method | |
CN106682150A (en) | Information processing method and device | |
CN102890702A (en) | Internet forum-oriented opinion leader mining method | |
CN105138670A (en) | Audio file label generation method and system | |
WO2014000518A1 (en) | Public opinion information display system and method | |
CN110533212A (en) | Urban waterlogging public sentiment monitoring and pre-alarming method based on big data | |
CN107872454A (en) | A kind of monitoring of ultra-large type internet platform protection based on security rank threat information and analysis system and method based on big data technology | |
CN108229810A (en) | Industry analysis system and method based on network information resource | |
Xu et al. | Perceived pollution and inbound tourism for Shanghai: a panel VAR approach | |
CN102253939A (en) | Searching method and system based on cloud computing technology | |
KR20150059208A (en) | Device for analyzing the time-space correlation of the event in the social web media and method thereof | |
Cruz et al. | Semantic extraction of geographic data from web tables for big data integration | |
Cheng et al. | Process and application of data mining in the university library | |
CN108846018A (en) | A kind of Chinese food safety media event Information Automatic Extraction method towards news | |
CN112363996B (en) | Method, system and medium for establishing physical model of power grid knowledge graph | |
CN103995826A (en) | Automatic cataloguing method for safety production supervision and administration governmental information | |
KR101088787B1 (en) | Issue Analyzing System and Issue Data Generation Method | |
CN109710712B (en) | Case element analysis-based crime hotspot feature mining method and system | |
CN106777395A (en) | A kind of topic based on community's text data finds system | |
TW201640383A (en) | Internet events automatic collection and analysis method and system thereof | |
Fleury et al. | AMMA information system: an efficient cross‐disciplinary tool and a legacy for forthcoming projects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181120 |
|
WD01 | Invention patent application deemed withdrawn after publication |