KR20170077397A - Method of automatically extracting food safety event in real time from news and social networking service data - Google Patents

Method of automatically extracting food safety event in real time from news and social networking service data Download PDF

Info

Publication number
KR20170077397A
KR20170077397A KR1020150187245A KR20150187245A KR20170077397A KR 20170077397 A KR20170077397 A KR 20170077397A KR 1020150187245 A KR1020150187245 A KR 1020150187245A KR 20150187245 A KR20150187245 A KR 20150187245A KR 20170077397 A KR20170077397 A KR 20170077397A
Authority
KR
South Korea
Prior art keywords
event
food
data
template
information
Prior art date
Application number
KR1020150187245A
Other languages
Korean (ko)
Other versions
KR101780377B1 (en
Inventor
맹성현
이강욱
장관
한경아
서민관
정승용
장경록
Original Assignee
한국과학기술원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국과학기술원 filed Critical 한국과학기술원
Priority to KR1020150187245A priority Critical patent/KR101780377B1/en
Publication of KR20170077397A publication Critical patent/KR20170077397A/en
Application granted granted Critical
Publication of KR101780377B1 publication Critical patent/KR101780377B1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06F17/2755
    • G06F17/278
    • G06F17/30312
    • G06F17/30536
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/265Personal security, identity or safety

Landscapes

  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A system and method for automatically extracting food risk events from news and SNS data in real time is disclosed. In order to automatically extract and share food-related events from a vast amount of news and SNS, the present invention can be used for analyzing event properties for food, defining and automatically expanding event templates for food, Automatically extract and share real-time food hazard incidents from news and SNS of each country through extraction and sharing module. Through this, it is possible to extract information on food hazard events that can occur anytime and anywhere in real time, minimizing the damage caused by food hazards, and sharing the extracted information with related organizations and companies to prevent food safety It is available for action.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and system for automatically extracting food-related events from news and SNS data in real-

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to information extraction technology, and more particularly, to a technology for automatically extracting incident accident information related to food safety from news or social networking service (SNS) data.

Environmental changes affecting domestic and foreign food such as disasters such as nuclear accidents in Japan, climate change, environmental pollution, globalization, and the development of food processing methods are causing food safety threats in various forms such as food poisoning and chemical abuse by microorganisms .

In order to respond quickly and accurately to food safety accidents directly linked to the safety of the public, an overall response strategy is required from detection of food safety threat signs to food threat judgment and appropriate response measures. In particular, rapid detection and judgment of risk factors related to food safety is essential to ensure the safety of the public from food-related incidents and accidents that are occurring continuously.

To this end, KFDA has hired a specialized editor to identify food safety risk factors at home and abroad from 2008, and collects food hazard information based on 37 food-related queries (food poisoning, pollution, recovery, detection, etc.) . After analyzing and evaluating the collected information, it is shared with relevant authorities and companies as well as a pharmacopoeia to utilize it as a precaution against food safety.

However, in order to collect food safety hazard information more quickly and to be able to take quick action in a situation where various food accidents are frequent, it is essential to automate the collection and analysis of food safety hazard information. In addition, as food hazards become more diverse, the automatic expansion of online information search terms is also a necessity for information collection and analysis for food.

In particular, taking into account that food safety threats can occur at any time and anywhere, it is necessary to pay attention to the rapid potential hazard information provided through news and SNS, as well as press releases from food safety authorities to obtain food safety hazard information. If necessary, measures may need to be taken to control distribution prior to releasing official press releases. Here, SNS is a representative example of Facebook, Twitter, Cacao Story, Naverband, Instagram, and the like.

The present invention aims at improving the above-mentioned conventional problem of food safety information collection method, and automatically collecting and analyzing food safety hazard information from online news or SNS information in real time to automatically extract and extract food safety hazard events And to provide a system and a method for sharing the information.

According to an aspect of the present invention, there is provided a method for automatically extracting an event for food safety, which can automatically extract and share a food-related event from an enormous amount of news and SNS. This method is implemented as a computer-readable program and is a method executed by a computer device. This method accesses an information source through a network to collect news data and / or SNS data. And performs preprocessing to convert the collected text data into a form that can be understood by a computer. The object name recognition process is performed to search for and recognize whether a preprocessed text includes a word or phrase referring to a person, a place, or any other object. Based on the set of search terms, the food risk event is recognized from the data that has undergone the preprocessing and the entity name recognition process, and the attributes of the food risk event are extracted. And stores the extracted attributes using an event template for food. The event template may be provided to a computer of the related organization and shared.

As described above, according to the method for automatically extracting an event of food safety risk according to an embodiment of the present invention, it is possible to analyze an event attribute for food, define an event template for the food and automatically expand, Information extraction and sharing modules can automatically extract and share real-time food hazard incidents from news and SNS in each country.

According to one embodiment, the preprocessing may include morphological analysis of the collected data and part-affixing operations for attaching a suitable part-of-speech to words obtained through the morphological analysis.

According to an embodiment, words that help to catch a food risk document are statistically measured and words having discriminative power higher than a predetermined criterion are selected as search terms to constitute the set of terms, The relevance analysis result of the food safety data may be fed back so as to continuously expand the above set of terms.

According to one embodiment, the event template for storing the attribute information of the food risk event is automatically generated, and when a new item that is not included in the existing event template is frequently found in the information on the food risk event, The event template may be automatically extended by adding it to the event template.

According to an embodiment, it is determined whether a plurality of collected news data and / or SNS data are related to the same event using an event recognition technique, and if the same is related to the same event, extracted from the plurality of news data and / Event attribute information can be merged and stored in the same event template.

According to one embodiment, the food risk event template may include a property management structure of a food risk event, including a place, time, subject food, cause, damage scale, and an attribute of the action.

According to another aspect of the present invention, there is provided a system for automatically extracting a food risk event. In this food-event automatic extraction system, a data collector accesses an information source through a network and collects news data and / or SNS data in real-time or real-time time. The collected text data is converted into a form that can be understood by a computer by a text preprocessing unit. The object name recognition processing unit searches and recognizes whether the text that has been subjected to the preprocessing includes a word or phrase that refers to an object such as a person or a place. The event extraction unit recognizes a food risk event in the data that has undergone the preprocessing and the entity name recognition process based on the set of search terms, extracts the attributes of the food risk event, . In order to accomplish this task, a food hazard information query word table for storing a query word set by selecting words having discriminative power higher than a predetermined criterion by statistically measuring words helpful in capturing a food risk document, Ready. In addition, a food-event event template table for storing the event template storing attribute information of the food risk event is prepared in advance. The event template for the food containing the attributes is stored in the event information database for food by the event extracting unit.

According to one embodiment, the food-event automatic extraction system may further include a query term automatic generation and expansion unit. The system automatically extracts a food item for food, statistically measures a word to help catch a food-related document, selects words having discriminative power higher than a predetermined criterion as a search term, The relevance analysis result of the changed food safety data can be fed back and the above set of terms can be continuously extended automatically.

According to one embodiment, the food-event automatic extraction system may further include an event template automatic generation and expansion unit. The automatic generation and expansion of the event template automatically generates the event template that stores the attribute information of the food risk event, and when a new item that is not included in the existing event template is found while frequently appearing in the information on the food risk event The event template may be automatically expanded by adding the item to the event template.

According to one embodiment, the food-expense event automatic extraction system may further include an event sharing unit that provides the event template stored in the food-related event knowledge database to a computer of the related institution and shares the same.

According to an embodiment, the event extracting unit may determine whether the plurality of collected news data and / or SNS data are related to the same event using an event recognition technique, and if the same is related to the same event, Or event attribute information extracted from the SNS data may be merged and stored in the same event template.

In order to respond quickly and accurately to food safety accidents directly linked to the safety of the public, food threat determination and countermeasures strategies are needed from the detection of food safety threat signs. In particular, rapid detection and judgment of risk factors related to food safety is essential to ensure the safety of the public from food-related incidents and accidents that are occurring continuously.

In order to collect and analyze food hazard information at home and abroad, we collect food hazard information based on 37 online information search terms related to food at least once a day and share it with related organizations and use it as a precaution for food safety.

However, in order to respond quickly to increasing food safety risk factors, automation and realization of food safety risk event information extraction are essential. The present invention includes a method for automating event extraction for real-time food to solve this problem, and an automatic generation and expansion function for event information template and an event information template for dynamically responding to changes in food risk factors.

Through this, it is possible to extract information on food hazards that can occur anytime and anywhere in real time to minimize the damage caused by food hazards, and to share the extracted information with related organizations and companies to prevent food safety .

FIG. 1 is a block diagram showing a configuration of a server for automatically extracting a food risk event according to a preferred embodiment of the present invention,
FIG. 2 is a view schematically showing a configuration of a system for implementing a method for automatically extracting a food risk event according to the present invention,
3 shows an example of an event template used in the present invention,
4 illustrates an example in which event attribute information is extracted from a news article and a press release collected by the food risk event retrieval server according to the present invention and is stored in an event template,
Figure 5 shows the concept of merging data from different sources about the same food-for-water event,
FIG. 6 is a flowchart illustrating a process of automatically extracting a food risk event and its attributes from news and / or SNS data according to the present invention.

For the embodiments of the invention disclosed herein, specific structural and functional descriptions are set forth for the purpose of describing an embodiment of the invention only, and it is to be understood that the embodiments of the invention may be practiced in various forms, The present invention should not be construed as limited to the embodiments described in Figs.

The present invention is capable of various modifications and various forms, and specific embodiments are illustrated in the drawings and described in detail in the text. It is to be understood, however, that the invention is not intended to be limited to the particular forms disclosed, but on the contrary, is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms may be used for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between. Other expressions that describe the relationship between components, such as "between" and "between" or "neighboring to" and "directly adjacent to" should be interpreted as well.

The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprise", "having", and the like are intended to specify the presence of stated features, integers, steps, operations, elements, components, or combinations thereof, , Steps, operations, components, parts, or combinations thereof, as a matter of principle.

Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be construed as meaning consistent with meaning in the context of the relevant art and are not to be construed as ideal or overly formal in meaning unless expressly defined in the present application .

Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

1 shows a configuration of a server 10 for automatic detection of a food risk event according to a preferred embodiment of the present invention. As shown in the figure, the food-expense event extracting server 10 includes a data collecting unit 12, a text preprocessing unit 14, an entity name recognizing unit 16, an event extracting unit 18, . The search term automatic generation and expansion unit 22 and the event template automatic generation and expansion unit 24 may be further included. These modules 12, 14, 16, 18, 20, 22, and 24 may be implemented as program modules that perform independent functions. That is, the method for automatically extracting a food risk event according to the present invention can be implemented as a computer readable program, and the program can be executed through execution by a computer.

FIG. 2 schematically shows a system 100 for implementing a method for automatically extracting a food risk event according to the present invention. The system 100 includes a server computer 110. The server computer 110 is provided with a program for event-related event extraction server 10 and executes the programs of the server 10. The server computer 110 includes, for example, a central processing unit (CPU), a main memory such as a hard disk or a nonvolatile memory, a storage device for storing programs and data, a network communication unit for communicating with external resources via a network And may be a general-purpose computer. Of course, it is also possible to install the server 10 programs in a server computer configured for other purposes and serve as the server computer 110 for the present invention.

The server computer 100 executes a program of the server 10 to access a source 130 providing news and / or SNS data (including metadata) via an external network 120 such as the Internet, Data and the like, and automatically extracts the food risk event information from the collected data. The extracted event information may be stored in a database (DB), and may be provided to the related institution computer 140 when necessary. In the present invention, the news is not only a variety of news produced by the media companies of the world but also a wide range of news items including the press releases produced by the reporters and provided to the media, .

The program for the event-by-food event extraction server 10 for carrying out such a process may include a food risk information query word table 60 for storing a set of queries generated by the automatic term generation and expansion unit 20 . The food risk event extraction server 10 may also include a food risk event template table 70 for storing event templates generated by the automatic generation and expansion unit 22 of the event template. Furthermore, the food risk event extraction server 10 may include a food risk event knowledge base DB 80 for storing information on the food risk events extracted by the event extracting unit 18.

The data collection unit 12 of the food risk event extraction server 10 program collects real-time data (including metadata) from news and SNS. Data collection can be gained through direct visits to data sources such as APIs or web pages that are exposed by service providers. The data collection unit 12 may be implemented, for example, in the form of an agent robot, and may periodically access the data source to collect data. By shortening the collection period, you can recognize events for food in real-time or in real-time.

The text preprocessing unit 14 mechanically analyzes the news and the SNS natural language data so as to facilitate the extraction of the food-related event from the collected data, and performs the work to make the computer understandable. This preprocessing operation may include, for example, morphological analysis tasks such as removal of irregular text (e.g., typos), correction of spacing errors, and proper decomposition of compound nouns. It may also include attaching parts to the words obtained from the morphological analysis, such as attaching parts to attach parts of the most appropriate form of the parts, and automatically classifying the parts. Furthermore, it can also include verse unit analysis of sentences.

A Named Entity Recognition (NER) processing unit 16 processes the object name recognition task necessary for event extraction from the preprocessed text data. The object name recognition is a task of recognizing it by searching the preprocessed text for a word or phrase that refers to a person or a place.

The event extracting unit 18 recognizes the food risk event and extracts the attributes of the food risk event such as place, time, cause, and result. The extraction operation may be performed using a query set stored in the food-related-information query word table 60, which is generated in advance by the automatic query creation and expansion unit 20. The attribute information of the extracted food risk event can be stored using the food risk event template stored in the food risk event template table 70. Thereby, an information group capable of grasping the whole of one food-hazard event can be formed. And stored in the food risk event knowledge base DB 80. The information stored there can be serviced so that it can be shared with related organizations when necessary.

The search term generation and automatic extension unit 20 automatically extends the basic set of food-related search terms for automating the extraction of food documents and events, and the expanded set of search terms captures only data related to food-related events in the data collection unit . The selection of the words to be included in the set of terms is done by statistically measuring how often the words contained in the food hazard document help to capture such types of documents and by comparing the words with higher discernibility As a search word. For example, the word 'benzene' appears infrequently in a general document, but if it appears frequently in a food-for-food document, it can be determined that the word is highly discriminating and added to a set of terms.

The event template automatic generation and expansion unit 22 automatically generates and expands an event template storing property information of the food risk event. The event template for food products basically has a property management structure of the food hazard event including the place, the time, the target food, the cause, the scale of the damage, the attribute of the action, However, the structure of the event template may not be fixed. The event template is automatically expanded by adding the item to the event template when a new item that is not included in the existing event template is frequently found in the information about the food risk event. For example, the automatic expansion of an event template may be performed automatically when the system is determined to be necessary in itself, for example, even if the item is not included in the original template, As shown in FIG.

The automatic query creation and expansion unit 20 and the event template automatic creation and expansion unit 22 do not stay in the information extraction method at the time of construction but feed back the analysis results of the association of the food safety data that changes with the lapse of time It continually handles the auto-expand function to extract events for new foods.

The event sharing unit 24 handles the functions of providing detailed attributes (food poisoning, avian flu, mad cow disease, etc.) of the extracted food risk events and events to related government agencies, industries, academia, and the like.

FIG. 3 shows an example of an event template used in the present invention. The illustrated event template 30 may include a plurality of attributes for a food hazard event. Examples of the attributes of the event template 30 include classification, place, time, object, food, content, cause effect, diffusion rate, action, information source, and the like. Certain attributes (eg, classification) may include multiple sub-attributes (eg, major, minor). The event template 30 automatically expands the event attribute field necessary for specifying and describing the food risk event by the event template automatic generation and expansion unit 22. It is needless to say that the event template 30 of FIG. 3 is merely presented as an example, and may be composed of other event templates having different kinds of attributes and structures. Each attribute of the event template 30 generates individual instances by detailed information (time, place, cause, result, and the like) of the food risk event automatically extracted by the event extracting unit 18.

FIG. 4 schematically shows an example in which event attribute information is extracted from the news article 40-1 and the press release 40-2 collected by the food risk event extraction server 10 according to the present invention and stored in the event template Giving. The example of FIG. 4 shows that the food risk event is detected from the collected news or the like (40-1, 40-2), and the attribute information related thereto is automatically extracted.

It is also possible to determine whether the collected plurality of news data and / or SNS data are related to the same event, and if the same is related to the same event, the event attribute information extracted from the plurality of news data and / or SNS data is merged into the same event template Can be stored. There may be cases where a plurality of news items have mutually complementary information about the same event although the time and source of the information are different. In such cases, you can use the Event Co-Reference Resolution technique to automatically merge the two pieces of information to get a more complete picture of the event. Figure 5 shows this graphically. The event extracting unit 18 can capture a specific event from the collected news and extract information on the location and time attribute of the event, for example. The event extracting unit 18 can also capture the same event as the specific event from an information source other than the news, for example, SNS information. In this case, it is possible to extract information on the number of victims, actions, and the like in accordance with the event from the SNS information. The extracted information can be merged with the extracted event attribute information first. Accordingly, information such as place, time, number of victims, action, and the like are secured for the specific event. Thus, the same event recognition technology can extract and provide more detailed information about a food hazard event. Here, in the event recognition technology, when a plurality of pieces of food hazard information 40-1 and 40-2 contain mutually complementary information for the same event although the generation time and the source of the information are different, the two pieces of information are automatically merged It is a technique to obtain a more complete picture of an event.

In the process of merging the event attribute information, the event template may be expanded in parallel. In FIG. 3, for example, the first template is formed in a form including a classification, a place, a time, an object, a food, an influence and an information source as attributes, Can be extracted. If the attribute of the information that can be extracted from the second food risk information 40-2 is not in the initial template, the fields corresponding to those attributes may be added to the initial template. In FIG. 3, for example, it may correspond to an attribute added with classification (major classification, middle classification), contents, cause, action, and the like. This allows the template to be extended. By extending the template in this manner, it is possible to merge attribute information about the same event extracted from a plurality of news or SNS using the above-described event recognition technology using the extended template.

Next, FIG. 6 shows a process of automatically extracting the food risk event and its attributes from the news and / or SNS data 130 according to the present invention.

6, the data collection unit 12 of the food risk event extraction server 10 executing in the server computer 110 provides news and SNS data (including metadata) to secure event information for food And collects such data (step S12). Collection of such data can be done via the Internet 120. [ In addition, it is desirable to collect information in real-time or real-time (hereinafter referred to as "real-time") in order to quickly capture and share food hazard information that may occur anytime and anywhere. Information about a data source to be accessed by the server computer 110 for data collection, such as an API or web page disclosed by a news or SNS service provider, needs to be secured in advance. The data collecting unit 12 can directly visit the service provider's API or web page based on the access information and automatically collect news data or SNS data there.

In order to automatically extract the food-related event from the news and SNS data collected in real-time in such a manner, the search word generation and automatic extension unit 20 and the event template automatic generation and expansion unit 22 are pre- You need to prepare for the automatic extraction.

The search term generation and automatic expansion unit 20 finds a document containing information related to food risk and automatically generates a basic set of food-related search terms set in order to automatically extract the food related event from the contents of the document And automatically creates and expands (step S24). The created or expanded search terms are registered in the food risk information query word table 60. The data collecting unit 12 and the event extracting unit 18 perform data collection and event extraction using the set of terms registered in the food risk information query word table 60.

The event template automatic generation and expansion unit 22 automatically generates and expands an event template storing property information of a food risk event (step S26). The event template created by the event template automatic creation and extension unit 22 is stored in the event template table 70 for use by the event extraction unit 18 for event extraction.

The automatic term retrieval and expansion unit 20 and the automatic generation and expansion unit 22 of the event template not only generate the information extraction term and the template at the time of construction but also automatically transmit the relevance analysis result of the changed food safety data To automatically extend the set of queries and templates for extracting new food-related events.

The news and SNS data collected by the data collecting unit 12 are provided to the text preprocessing unit 14. [ The text preprocessing unit 14 preprocesses the news and the SNS natural language data such as morphological analysis in order to facilitate the food-related event extraction (S14). Preprocessing such as morpheme analysis of natural language character data is already well known, so a detailed description thereof will be omitted here. In the present invention, a known preprocessing technique may be utilized.

The preprocessed data is provided to the object name recognition processing unit 16, and an object name recognition process necessary for event extraction is performed (S16). Object name recognition is the task of recognizing the presence of a word or phrase that refers to a person or place in preprocessed text. In other words, it is a task to locate and classify the predefined categories of the elements included in the preprocessed text, for example, a person's name, organization, position, time display, quantity, instantaneous value, percentage. For example, in the news (40-1) exemplified in FIG. 4, "2005.03.31. YTN News Seoul National University Girls' High School students showed massive food poisoning symptoms, and the health authorities went into epidemiological investigation. 31. "can be classified as" time "," Seoul "can be classified as" location ", and" YTN "," female high school "and" health authority "can be classified as" organization ". The entity name recognition processing technique is also a known technique, and the present invention can constitute the entity name recognition processing unit 16 based on such known entity name recognition technology.

The data on which the object name recognition process has been performed is provided to the event extracting unit 18. The event extracting unit 14 recognizes a food risk event and extracts attribute information of the food risk event such as place, time, cause, and result. The recognition of the food risk event and the extraction of the attribute information of the event can be performed based on the search term provided by the automatic term generation and expansion unit 20. The attribute information thus extracted is stored using the food-related event template (step S18).

The event extracting unit 14 stores the generated food risk event template together with the event information and the attributes included therein in the event knowledge base 32 for food (step S20).

The event sharing unit 15 provides the detailed attribution information of the food risk event and the event from the food risk event knowledge base 80 to related organizations such as food related government agencies, industries, and academia in step S22. This makes it possible for authorities to share information on food hazard events very quickly.

6 shows an example of an operation process in which the food risk event extracting server 10 automatically extracts a food risk event and its attributes from the news and the SNS, and automatically generates and expands the keyword and automatically generates and expands the event template Steps (steps S24 and S26) and steps (S12 to S22) from data collection to event information extraction and event sharing are not necessarily performed sequentially. Each step may be performed independently, and the order of execution may be arbitrary.

The present invention can be utilized to take measures to prevent food safety accidents in advance.

10: event detection server for food 12: data collection unit
14: text pre-processing unit 16:
18: Event extracting unit 20: Event sharing unit
22: Automatic Search Term Extension and Part 30: Food Event Template
40: Food Hazard Information 40-1: News
40-2: Press Release 24: Automatic Generation and Expansion of Event Templates
60: Food Information Query Table 70: Event Template for Food
80: Event Knowledge Base for Food
100: Automatic event extraction system for food
110: server computer 120: network
130: news, SNS data 140: related institution computer

Claims (12)

A method implemented by a computer-readable program and executed by a computer device,
Accessing an information source through a network and collecting news data and / or SNS data;
Performing preprocessing for converting the collected text data into a form understandable by a computer;
Performing an object name recognition process for searching and recognizing whether a word or phrase referring to an object such as a person, a place, and the like is included in the preprocessed text;
Recognizing a food risk event from data that has undergone the preprocessing and the entity name recognition process based on a set of search terms and extracting attributes of the food risk event; And
And storing the extracted attributes using an event template for the food.
The method according to claim 1, further comprising providing the event template to a computer of an affiliated organization and sharing the event template. The method according to claim 1, wherein the preprocessing includes a morphological analysis of the collected data and a part attaching operation for attaching a suitable part of speech to words obtained through the morphological analysis. 2. The method according to claim 1, further comprising the steps of: statistically measuring words that help to catch a food risk document, selecting words having discriminative power higher than a predetermined criterion as search terms to form the set of terms, Further comprising automatically feeding back the result of the association analysis of the food safety data to automatically extend the set of terms. The method of claim 1, further comprising: automatically generating the event template storing property information of the food risk event, wherein when a new item that is frequently included in the information on the food risk event and is not included in the existing event template is found, Further comprising automatically expanding the event template by adding the event template to the event template. The method according to claim 1, further comprising the steps of: determining whether a plurality of collected news data and / or SNS data are related to the same event using an event recognition technique; extracting, from the plurality of news data and / Further comprising the step of merging and storing the event attribute information in the same event template. The method according to claim 1, wherein the food risk event template includes a property risk management structure for a food risk event including a place, time, target food, cause, damage scale, Extraction method. A data collector for accessing an information source through a network and collecting news data and / or SNS data in real time or in real time;
A text preprocessing unit for converting the collected text data into a form understandable by a computer;
An object name recognition processing unit for searching and recognizing whether a preprocessed text includes a word or a phrase referring to an object such as a person or a place;
Recognizing a food risk event from data that has undergone the preprocessing and the entity name recognition process based on a set of search terms, extracting attributes of the food risk event, and storing the extracted attributes using an event template for food An event extracting unit;
A food risk information query word table for statistically measuring words helpful in capturing a food risk document and storing a set of words formed by selecting words having discrimination power higher than a predetermined criterion as search terms;
A food to event template table for storing the event template storing property information of a food risk event; And
And a food risk event knowledge database for storing a food risk event template including the attributes by the event extracting unit.
9. The method according to claim 8, further comprising the steps of: statistically measuring words helpful in capturing a food risk document, selecting words having a discriminative power higher than a predetermined criterion as a search term, Further comprising a query automatic generation and expansion unit that continuously and automatically extends the query set by feeding back the analysis result of the association of the safety data. The method of claim 8, further comprising: automatically generating the event template storing property information of the food risk event, wherein when a new item that is frequently included in the information on the food risk event and is not included in the existing event template is found, Further comprising an event template automatic generation and expansion unit for automatically expanding the event template by adding the event template to the event template. The system as claimed in claim 8, further comprising an event sharing unit for providing the event template stored in the food risk event knowledge database to a computer of an affiliated organization and sharing the same. The method according to claim 8, wherein the event extracting unit discriminates whether the plurality of collected news data and / or SNS data are related to the same event by using an event recognition technique, and if the news data and / Or event attribute information extracted from the SNS data is merged and stored in the same event template.
KR1020150187245A 2015-12-28 2015-12-28 Method of automatically extracting food safety event in real time from news and social networking service data KR101780377B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150187245A KR101780377B1 (en) 2015-12-28 2015-12-28 Method of automatically extracting food safety event in real time from news and social networking service data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150187245A KR101780377B1 (en) 2015-12-28 2015-12-28 Method of automatically extracting food safety event in real time from news and social networking service data

Publications (2)

Publication Number Publication Date
KR20170077397A true KR20170077397A (en) 2017-07-06
KR101780377B1 KR101780377B1 (en) 2017-09-21

Family

ID=59354168

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150187245A KR101780377B1 (en) 2015-12-28 2015-12-28 Method of automatically extracting food safety event in real time from news and social networking service data

Country Status (1)

Country Link
KR (1) KR101780377B1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190124403A (en) * 2018-04-26 2019-11-05 대한민국(행정안전부 국립재난안전연구원장) System And Method For Extracting Attribute Data of Disaster
CN112052910A (en) * 2020-09-21 2020-12-08 深圳海关动植物检验检疫技术中心 Food safety classification method and device, computer equipment and storage medium
KR102276761B1 (en) * 2020-08-28 2021-07-13 대한민국 How to automatically extract information on the cause of disaster
CN113723925A (en) * 2021-08-31 2021-11-30 平安养老保险股份有限公司 User data merging method and device, computer equipment and storage medium
KR20230050673A (en) * 2021-10-08 2023-04-17 주식회사 리니토 Twofold semi-automatic symbolic propagation method of training data for natural language understanding model, and device therefor
KR102695536B1 (en) * 2023-04-19 2024-08-14 중앙대학교 산학협력단 Irregular/bad food monitoring device and method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190124403A (en) * 2018-04-26 2019-11-05 대한민국(행정안전부 국립재난안전연구원장) System And Method For Extracting Attribute Data of Disaster
KR102276761B1 (en) * 2020-08-28 2021-07-13 대한민국 How to automatically extract information on the cause of disaster
CN112052910A (en) * 2020-09-21 2020-12-08 深圳海关动植物检验检疫技术中心 Food safety classification method and device, computer equipment and storage medium
CN112052910B (en) * 2020-09-21 2024-05-10 深圳海关动植物检验检疫技术中心 Food safety classification method, device, computer equipment and storage medium
CN113723925A (en) * 2021-08-31 2021-11-30 平安养老保险股份有限公司 User data merging method and device, computer equipment and storage medium
KR20230050673A (en) * 2021-10-08 2023-04-17 주식회사 리니토 Twofold semi-automatic symbolic propagation method of training data for natural language understanding model, and device therefor
KR102695536B1 (en) * 2023-04-19 2024-08-14 중앙대학교 산학협력단 Irregular/bad food monitoring device and method

Also Published As

Publication number Publication date
KR101780377B1 (en) 2017-09-21

Similar Documents

Publication Publication Date Title
KR101780377B1 (en) Method of automatically extracting food safety event in real time from news and social networking service data
Rizzo et al. NERD: a framework for unifying named entity recognition and disambiguation extraction tools
CN103049435B (en) Text fine granularity sentiment analysis method and device
Wanner et al. State-of-the-Art Report of Visual Analysis for Event Detection in Text Data Streams.
Bourequat et al. Sentiment analysis approach for analyzing iPhone release using support vector machine
CN104915446A (en) Automatic extracting method and system of event evolving relationship based on news
CN104408093A (en) News event element extracting method and device
Nagar et al. Using text and data mining techniques to extract stock market sentiment from live news streams
CN102207857B (en) Method, device and system for identifying graphical user interface (GUI) element
KR20170115109A (en) Text-Mining Application Technique for Productive Construction Document Management
Cremisini et al. A challenging dataset for bias detection: the case of the crisis in the ukraine
CN103488675A (en) Automatic precise extraction device for multi-webpage news comment contents
Hienert et al. Automatic Classification and Relationship Extraction for Multi-Lingual and Multi-Granular Events from Wikipedia.
CN113918794B (en) Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium
Wang et al. Smart contract vulnerability detection using code representation fusion
CN109948015B (en) Meta search list result extraction method and system
Xue et al. Cross-media topic detection associated with hot search queries
Li-Juan et al. A classification method of Vietnamese news events based on maximum entropy model
Cherichi et al. Big data analysis for event detection in microblogs
JP6496952B2 (en) Data processing apparatus, data processing system, data processing method and program
Barker et al. Assessing the Comparability of News Texts.
Ali et al. Computer vision and machine learning approaches for metadata enrichment to improve searchability of historical newspaper collections
Hassaine et al. Hyper rectangular trend analysis application to islamic rulings (fatwas)
Xabier Saralegi Kimatu, a tool for cleaning non-content text parts from HTML docs
Ma et al. SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right