WO2018036239A1

WO2018036239A1 - Method, apparatus and system for monitoring internet media events based on industry knowledge mapping database

Info

Publication number: WO2018036239A1
Application number: PCT/CN2017/087000
Authority: WO
Inventors: 何超; 梁颖琪; 车慧诗
Original assignee: 慧科讯业有限公司
Priority date: 2016-08-24
Filing date: 2017-06-02
Publication date: 2018-03-01
Also published as: TWI664539B; CN107783973A; TW201807602A; CN107783973B

Abstract

A method for constructing an industry knowledge mapping database, comprising the following steps: acquiring industry data from a data source (S51); performing data processing on the industry data to extract an entity related to the industry and a corresponding entity attribute and/or an entity relationship (S52); and constructing the industry knowledge mapping database based on the extracted entity, entity attribute and/or entity relationship (S53). A method for monitoring a specific media event related to an industry based on the constructed industry knowledge mapping database, comprising the following steps: acquiring Internet media data (S31); performing event detection, event evaluation and screening based on the acquired Internet media data to acquire the specific media event related to an industry (S32); recognizing a directly related entity corresponding to the specific media event (S33); based on the directly related entity, accessing the industry knowledge mapping database, to determine an indirectly related entity corresponding to the specific media event (S34); and sending an early warning message to the directly related entity and/or indirectly related entity (S35).

Description

Method, device and system for monitoring internet media events based on industry knowledge map database

Technical field

The invention relates to the field of internet media monitoring, in particular to a technology for constructing an industry knowledge map database and a technology for monitoring internet media events based on the constructed industry knowledge map database.

Background technique

The rapid development of computers, communications, and network technologies has led to an increase in the performance of terminal devices including PCs, tablets, smartphones, and Internet TVs. Correspondingly, Internet media, especially Internet social media, has gradually become one of the main ways for the public to obtain news information by virtue of its diversity, speed, interactivity, reproducibility and multimedia.

However, while the Internet media information has the advantages of timeliness and flexible accessibility, the openness of its information source and mode of communication also leads to the following problems: some sensitive or unauthorized Messages (eg, trade secrets) and even false news are rapidly spread by a large number of users on the Internet media platform, and thus evolve into media events that adversely affect related individuals, businesses/institutions, industries, and society. Therefore, it is necessary to monitor media events in the Internet media and take corresponding measures after monitoring media events that meet certain conditions to reduce or eliminate their potential impacts.

The existing Internet media monitoring technology has the following defects: 1) Using the interest matching method to provide users with Internet media monitoring, users need to customize the content topics of interest, related entities, etc., so only the user can be identified in the monitoring. The defined entity is directly related to the event, and the event that is not defined by the user but indirectly related to the entity of interest to the user is not recognized; 2) the attribute of the monitoring object is single, and can only provide for a single media category and data source (for example, a specific social Media, news media, forums, blogs, etc.), single data type (generally text), single language monitoring.

Summary of the invention

An object of the present invention is to provide a technology for constructing an industry knowledge map database, which extracts and stores relevant data for a specific industry or field in a knowledge map database, and the constructed industry knowledge map database can be applied to Internet media monitoring. To achieve automation and in-depth monitoring of relevant Internet media events.

Another object of the present invention is to provide a technique for monitoring Internet media events based on the constructed industry knowledge map database, which can identify indirectly related entities corresponding to specific media events, and can be used for multiple types. Internet media data is monitored.

In order to achieve the above object, the specific technical solution provided by the present invention is as follows.

The present invention provides a method for constructing an industry knowledge map database, comprising the steps of: obtaining industry data from a data source; performing data processing on the industry data to extract entities related to the industry and corresponding entity attributes and/or Or an entity relationship; constructing the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.

Preferably, the step of acquiring industry data is implemented by acquiring structured industry data from a third-party industry database, the structured industry data comprising a plurality of fields; and the step of performing data processing on the industry data by the following manner Implementation: data cleaning and extraction-conversion-loading (ETL) processing of the structured industry data; the steps of constructing an industry knowledge map database are implemented by: based on extracted entities, entity attributes, and/or entity relationships Generating the industry knowledge map database.

Preferably, the step of obtaining industry data is implemented by using a web crawler technology to obtain industry-related data from an internet data source, the internet data source comprising an unstructured or semi-structured data source; The step of performing data processing on the industry data is implemented by using an information extraction technique in natural language processing to perform entity identification and relationship extraction on the industry-related data to extract the entity, the entity attribute, and/or the entity relationship; The step of constructing an industry knowledge map database is accomplished by supplementing or updating the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships. Further preferably, the above steps are performed periodically at a predetermined cycle.

Preferably, the step of obtaining industry data is implemented by using an application program interface (API) to obtain industry-related data from an internet data source in an inquiry manner, the internet data source including an open data source; The step of data processing by the industry data is implemented by data cleaning and extraction-conversion-loading of the industry-related data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships. (ETL) processing; the step of constructing an industry knowledge map database is implemented by: importing the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships Lines are added or updated. Further preferably, the above steps are performed periodically at a predetermined cycle.

Preferably, the step of acquiring industry data is implemented by acquiring an industry-related Internet media data from an Internet data source by using an application program interface (API) or a web crawler technology; and the step of performing data processing on the industry data Implementing, by performing event detection, event evaluation, and screening on the Internet media data, extracting specific media events related to the industry, and identifying corresponding directly related entities from the Internet media data; The steps of the industry knowledge map database are implemented by supplementing the industry knowledge map database based on the particular media event and corresponding directly related entities, wherein the particular media event is supplemented to the industry as an abstract entity Knowledge map database. Further preferably, in the step of performing data processing on the industry data, the directly related entity corresponding to the specific media event is identified by at least one of: identifying from the text data based on the entity recognition in the natural language processing Entity; identifying an entity from image or video data based on image or video recognition processing; or identifying an entity from audio or video data based on a speech recognition process. Further preferably, the specific media event comprises a negative event, an emergency, a crisis event, a mass event, a public opinion event or other event of industry significance. Further preferably, the above steps are performed in real time without interruption.

Preferably, the step of constructing an industry knowledge map database comprises performing semantic disambiguation and entity linking on the extracted entities. Further preferably, the step of performing semantic disambiguation and entity linking on the extracted entity is further implemented by at least one of the following methods: performing semantic elimination on each extracted entity reference one by one based on entity knowledge Dissimilarity and entity linkage; based on the topic consistency hypothesis, using the association of candidate entities in the knowledge base, the extracted entities are consistently semantically disambiguated and entity linked.

The invention also provides a method for monitoring specific media events related to an industry based on the industry knowledge map database constructed in the invention, comprising the steps of: acquiring internet media data; performing event detection based on the acquired internet media data; , event evaluation and screening to obtain the specific media event related to the industry; identifying a directly related entity corresponding to the specific media event; accessing the industry knowledge map database based on the directly related entity to determine An indirect related entity corresponding to a specific media event; sending an alert message to the directly related entity and/or the indirectly related entity.

Preferably, the performing event detection, the event evaluation and the event detection in the screening step comprise the steps of: classifying the content in the acquired internet media data to obtain content for a specific topic; from the obtained content. Identifying the entities involved; performing sentiment analysis on the obtained content and the identified entities, and filtering the obtained content based on the results of the sentiment analysis; performing event discovery based on the filtered content, Cluster media events and discover new media events. Further preferably, the event detection further comprises the steps of: analyzing the authenticity of the event based on the attributes of the media event, and sorting and/or filtering the media events according to the analysis result.

Preferably, in the step of identifying the directly related entity corresponding to the specific media event, the directly related entity corresponding to the specific media event is identified by at least one of: identifying the slave text based on the entity in the natural language processing Identifying an entity in the data; identifying the entity from the image or video data based on image or video recognition processing; or identifying the entity from the audio or video data based on the speech recognition process.

Preferably, the step of accessing the industry knowledge map database is implemented by querying in the industry knowledge map database to determine the indirectly related entity based on the directly related entity.

Preferably, the step of accessing the industry knowledge map database is implemented by using data mining techniques in the industry knowledge map database to determine the indirectly related entities based on the directly related entities.

The present invention also provides an apparatus for constructing an industry knowledge map database, comprising: a data acquisition module for acquiring industry data from a data source; and a data processing module for performing data processing on the industry data to extract and An industry-related entity and corresponding entity attribute and/or entity relationship; a database building module for constructing the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship.

Preferably, the data acquisition module obtains industry data by obtaining structured industry data from a third-party industry database, the structured industry data including a plurality of fields; the data processing module performs data processing by: Data cleaning and extraction-conversion-loading (ETL) processing of the structured industry data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships; the database building module is constructed by Industry Knowledge Atlas Database: The industry knowledge map database is generated based on the extracted entities, entity attributes, and/or entity relationships.

Preferably, the data acquisition module obtains industry data by using industry crawler technology to obtain industry-related data from an Internet data source, the Internet data source comprising an unstructured or semi-structured data source; The processing module performs data processing by using an information extraction technique in natural language processing to perform entity identification and relationship extraction on the industry-related data to extract the entity, the entity attribute, and/or the entity relationship; The building module constructs an industry knowledge map database by supplementing or updating the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.

Preferably, the data acquisition module obtains industry data by using an application program interface (API) Obtaining industry-related data from an Internet data source in an inquiry manner, the Internet data source including an open data source; the data processing module performs data processing by extracting an entity related to the industry and a corresponding entity Data cleaning and extraction-conversion-loading (ETL) processing of the industry-related data prior to the attribute and/or entity relationship; the database building module constructs an industry knowledge map database by: based on the extracted entity, The industry knowledge map database is supplemented or updated with entity attributes and/or entity relationships.

Preferably, the data acquisition module acquires industry data by acquiring industry-related Internet media data from an Internet data source by using an application program interface (API) or a web crawler technology; the data processing module is Performing data processing: performing event detection, event evaluation, and screening on the internet media data to extract specific media events related to the industry, and identifying corresponding directly related entities from the internet media data; The module constructs an industry knowledge map database by supplementing the industry knowledge map database based on the particular media event and corresponding directly related entities, wherein the particular media event is supplemented to the industry knowledge as an abstract entity In the map database.

Advantageously, said database building module further identifies a directly related entity corresponding to said particular media event by at least one of: identifying an entity from text data based on entity recognition in natural language processing; based on image or video recognition Processing identifies an entity from image or video data; or identifies an entity from audio or video data based on a speech recognition process.

Preferably, the database construction module comprises: a module for semantic disambiguation and entity linking of the extracted entities. Further preferably, the module for performing semantic disambiguation and entity linking on the extracted entity further performs semantic disambiguation and entity linking by at least one of: based on entity knowledge, for each extracted entity Semantic disambiguation and entity linking are performed independently, and semantic disambiguation and entity linking are performed consistently on the extracted entities by using the association of candidate entities in the knowledge base based on the topic consistency hypothesis.

Preferably, the specific media event comprises a negative event, an emergency, a crisis event, a mass event, a public opinion event or other event of industry significance.

The present invention also provides a system for monitoring specific media events related to the industry, comprising: a data acquisition unit for obtaining industry data from a data source; and a data processing unit for performing data processing on the industry data, Extracting an entity related to the industry and a corresponding entity attribute and/or entity relationship; a database building unit, configured to build the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship; database storage unit : used to store the built industry knowledge map database; media event monitoring unit: For acquiring internet media data, performing event detection, event evaluation, and screening based on the acquired internet media data to obtain the industry-specific specific media event, and identifying a directly related entity corresponding to the specific media event; database access Means for accessing the industry knowledge map database based on the directly related entity to determine an indirect related entity corresponding to the specific media event; a message sending unit, configured to the directly related entity and/or The non-directly related entity sends an alert message.

Preferably, the data obtaining unit comprises: a structured data obtaining unit, configured to obtain structured industry data from a third-party industry database, the structured industry data comprising a plurality of fields; the data processing unit comprising: structured data a processing unit, configured to perform data cleaning and extract-convert-load (ETL) processing on the structured industry data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships; The building unit includes: a database generating unit configured to generate the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship.

Preferably, the data acquisition unit comprises: an industry-related data acquisition unit, configured to obtain industry-related data from an Internet data source, including an unstructured or semi-structured data source, by using a web crawler technology; The data processing unit includes: an industry-related data processing unit, configured to perform entity identification and relationship extraction on the industry-related data by using an information extraction technique in natural language processing to extract the entity, entity attributes, and/or Entity relationship; the database construction unit includes: a database supplement/update unit for supplementing or updating the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship.

Preferably, the data obtaining unit includes: an industry-related data acquiring unit, configured to acquire industry-related data from an Internet data source by using an application program interface (API), where the Internet data source includes an open data source; The data processing unit includes: an industry-related data processing unit, configured to perform data cleaning and extraction on the industry-related data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships - Conversion-loading (ETL) processing; the database building unit includes a database supplement/update unit for supplementing or updating the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.

Preferably, the data obtaining unit comprises: a media data acquiring unit, configured to acquire industry-related Internet media data from an Internet data source by using an application program interface (API) or a web crawler technology; the data processing unit comprises: media a data processing unit, configured to perform event detection, event evaluation, and screening on the internet media data to extract a specific media event related to the industry, and identify a pair from the internet media data a direct related entity; the database construction unit comprising: a database supplement/update unit for supplementing the industry knowledge map database based on the specific media event and a corresponding directly related entity, wherein the specific medium Events are added as abstract entities to the industry knowledge map database.

Preferably, the database supplementing/updating unit is further configured to perform semantic disambiguation and entity linking on the extracted entity.

Preferably, the media event monitoring unit is further configured to: perform topic classification on the content in the acquired internet media data to obtain content for a specific topic; identify the involved entity from the obtained content; The content and the identified entity perform sentiment analysis, and filter the obtained content based on the result of the sentiment analysis; perform event discovery based on the filtered content to cluster media events and discover new media events. Further preferably, the media event monitoring unit is further configured to: analyze the authenticity of the event based on the attribute of the media event, and sort and/or filter the media event according to the analysis result.

Preferably, the database access unit is further configured to query the industry knowledge map database to determine the indirectly related entity based on the directly related entity.

Preferably, the database access unit is further configured to: use the data mining technology to determine the indirectly related entity in the industry knowledge map database based on the directly related entity.

By implementing the technical solution provided by the present invention, the following technical effects can be obtained: 1) automating and deep monitoring of related Internet media events for one or more target fields or industries, and being able to identify non-corresponding to specific media events Directly related entities; 2) Automated processing of Internet media data for multiple data sources, multiple data types, and multiple languages in monitoring.

DRAWINGS

1 is an exemplary flowchart of a method for constructing an industry knowledge map database provided by the present invention;

2 is an exemplary structured industry data provided by the present invention;

3 is an exemplary flowchart of a method for monitoring media events provided by the present invention;

4 is an exemplary flowchart of another method for constructing an industry knowledge map database provided by the present invention;

FIG. 5 is an exemplary flowchart of another method for constructing an industry knowledge map database provided by the present invention; FIG.

6 is an exemplary block diagram of a system for monitoring media events provided by the present invention.

detailed description

The specific embodiments of the present invention are described in the form of the embodiments of the present invention in conjunction with the accompanying drawings. The embodiments described in the form of the embodiments are merely exemplary, and the concept of the present invention can be implemented without these specific contents.

The present invention provides a technique for constructing an industry knowledge map database and a technique for monitoring Internet media events based on the constructed industry knowledge map database to achieve the objectives of the present invention.

The invention relates to the application of knowledge graph database technology. The Knowledge Mapping Database is a special database for knowledge management that facilitates the collection, collation, and extraction of knowledge in related fields. Entities, entity attributes, and entity relationships are defined in the Knowledge Graph database. Among them, the entity corresponds to things in the real world (for example, a company A, a character X), and each entity can be identified by a globally unique ID. Entity attributes are used to describe the intrinsic properties of an entity (for example, company A, Chinese and English names of person X). Entity relationships are used to connect entities to describe the connections between entities (for example, the relationship between person X and company A). By constructing a knowledge map database, knowledge of entities, entity attributes, and entity relationships can be utilized more efficiently and in depth to discover complex connections between things.

As a database, the knowledge map database can be stored in a variety of forms. For example, the knowledge map database can be stored in a traditional relational database using the semantic network RDF (Resource Description Framework) triplet, or a new non-relational database. Preferably, the knowledge map database can be stored using a graph database, such as Neo4j, OrientDB, Titan-BerkeleyDB, HyperGraphDB, and the like.

Depending on the size and use of the knowledge map database, the data sources used to build the knowledge map database can be varied. For example, the data source can be an open source of encyclopedia data (eg, Baidu Encyclopedia, Wikipedia, etc.), or a structured database (eg, Wikidata, DBpedia, vertical websites, or specialized databases for specific industries, etc.) ), can also be any related third-party semi-structured or unstructured data sources (for example, professional websites, content published on Internet media, including news, company annual reports, corporate announcements, etc.).

Those skilled in the art will appreciate that the knowledge map database constructed in the present invention is oriented to a particular field or industry during the construction process, but is not limited to a single industry. The built knowledge map database implements attributes and events, entities and events that are related to one or more industries, and entities and entities, entities and The relationship between events, events and events is integrated into a map of knowledge.

1 is an exemplary flow chart of a method for constructing an industry knowledge map database provided by the present invention, which may include steps S11-S15.

In step S11, industry data is obtained from an industry data source, and entities and corresponding entity attributes and entity relationships are extracted from the industry data to generate the industry knowledge map database.

An industry data source is a source of basic data for one or more specific areas or industries that are targeted for monitoring. In one embodiment, the industry data source can be a structured industry database to obtain high quality industry basic data as much as possible. The structured database can be accessed through an application programming interface (API) to obtain data in a query manner (for example, through a query command).

Through the "Extraction-Transform-Load (ETL)" process, the obtained industry data can be converted, and then the entity, entity attributes and entity relationships are extracted from the converted data and loaded into the present The industry knowledge map database proposed by the invention. The specific execution steps of the ETL operation can be implemented by existing data integration means. For example, in an ontology-based data integration method, mapping relationships between various fields in different databases and various entity information are defined in a predetermined manner, thereby extracting entities, entity attributes, and entities according to the fields and their contents. Relationship, complete the construction of the basic industry knowledge map database. In addition, due to the differences in the structure of the industry database, and there may be problems such as data noise, data loss or data errors, data cleaning operations may be required in the process of data processing of industry data. Data cleaning operations can be implemented in conjunction with ETL processing using techniques known in the art.

As an example, FIG. 2 illustrates exemplary structured industry data that, as described above, may be obtained from a structured industry database. In Figure 2, Table 1 is an example of listed company structured data, which includes two data items, company A and company B, each of which includes the company's Chinese and English name, registered address, stock code, chairman of the board, etc. Field. By performing ETL operations on the structured data, entities (ie, company A, company B, person X, person Y), entity attributes (ie, specific information of company A and company B), and entity relationships (ie, companies) can be extracted. A and the character X and the relationship between the company B and the character Y), thereby generating a knowledge map database for the industry.

In another embodiment, the industry data source can also be a semi-structured or non-institutional data source from the Internet, and can crawl industry data from the data source through web crawler technology and use information based on natural language processing technology. Extract operations to extract entities, entity attributes, and entity relationships.

In step S12, data related to the industry is obtained from an internet data source, and is extracted from the data Take the entities related to the industry and the corresponding entity attributes and entity relationships.

In this step, data related to the above specific fields or industries is first obtained from an Internet data source. Internet data sources can be structured, semi-structured, or unstructured data sources. Therefore, for different structural characteristics of Internet data sources, industry-related data can be obtained in different ways. The entity and corresponding entity attributes and entity relationships are then extracted from the industry-related data.

For structured Internet data sources, the corresponding data content can be queried through the API and entity, entity attributes and entity relationships can be obtained. For the semi-structured data source, after the data content is captured, the content is analyzed by the information extraction operation in the natural language processing technology, thereby extracting the entity, entity attribute and entity relationship related to the industry. A semi-structured data source is a data source that contains partially structured, partially unstructured data, so that corresponding portions of the semi-structured data can be processed in a manner that handles structured and unstructured data, respectively. For example, HTML and XML files are the most common semi-structured data. In the process of processing HTML and XML files, on the one hand, the tag-based structured information can be used, and on the other hand, information extraction technology and machine learning technology can be combined to extract the required information.

In one embodiment, the information extraction operation includes an entity identification operation and a relationship extraction operation.

Entity recognition operations may employ existing natural language processing tools (eg, part-of-speech tagging or named entity recognition tools), or machine learning methods to train entity recognition models for specific annotated data. It should be noted that some natural language processing tasks and processing tools are language-dependent (for example, Chinese data requires word segmentation and English data is not required). The machine learning method digitally represents data in different languages and formats, and then uses general, language-independent algorithms (eg, conditional random field algorithms and hidden Markov models) for model training.

Relationship extraction operations can be implemented through a variety of existing statistical learning or machine learning methods. For example, a template learning method may be adopted, taking an entity that conforms to a certain relationship in the knowledge map database as an instance, extracting and counting the sentence patterns and contexts existing in the text in a large amount of text to form a relationship extraction template, and then The resulting template is applied to the text data to extract new instances. If you extract an instance that does not yet exist in the Knowledge Graph database, you can add it to the Knowledge Graph database.

In step S13, the industry knowledge map database is supplemented or updated based on the industry-related entities and corresponding entity attributes and entity relationships.

After extracting industry-related entities and corresponding entity attributes and entity relationships, they can be correlated and compared with corresponding information in the knowledge map database, and new entities, entity attributes, and entities are closed as needed. The system is added to the knowledge map database and can update existing entity attributes and entity relationships.

As described above, the industry knowledge map database proposed by the present invention can adopt a traditional relational database, an RDF triple database, or a new non-relational database (for example, a graph database). Correspondingly, the specific operations of supplementing or updating the knowledge map database can be implemented in a customized manner by using a database query language, for example, the SQL language for the relational database, the RDF triple query language SPARQL, and the Neo4j map. Database Cypher language, etc.

The description will continue with the example in FIG. 2. Assuming that the structured data of the listed company's executives is obtained from the structured Internet data source through API query, the following can be supplemented and updated to the industry knowledge map database: 1) The entity attributes of the person Z and the character Z And the relationship between the character Z and the company B is added to the knowledge map database; 2) the entity attribute of the person X and the character Y is added; 3) the relationship between the person Y and the company B is updated (that is, the update from "current position" to "Zeng """.

In one embodiment, entity linking operations and semantic disambiguation operations are required in the process of replenishing or updating the industry knowledge map database.

The entity linking operation is intended to correspond to an entity reference (or entity mention) appearing in the data content to the related entity concept in the knowledge map database. For example, in the two sentences "Steve Jobs is one of the founders of Apple" and "Steve Jobs created NeXT in the United States in 1985", the two entities "Steve Jobs" and "Steve Jobs" The generation should correspond to the same person entity concept "Steve Jobs (ex-CEO of Apple)" in the knowledge map database, so the two entities need to be associated with the same entity through the entity link operation. . Semantic disambiguation is intended to disambiguate ambiguous entities. For example, the "Apple" entity refers to multiple ambiguous entities, such as "Apple (fruit)", "Apple Inc.", "Apple Daily", "Apple (movie)", etc. The "Apple" in the first sentence of the above example should correspond to the corporate entity concept "Apple Inc." in the Knowledge Mapping Database instead of "Apple (Fruit)", "Apple (Movie)" or "Apple". daily". Entity links and semantic disambiguation are usually done together. Because semantic disambiguation is the means of entity linking, and entity linking is the purpose of semantic disambiguation; so the two are often used interchangeably or mutually.

Any existing entity linking and semantic disambiguation techniques can be used in the present invention. For example, one of the methods based on entity knowledge performs disambiguation and linking on an entity-by-independent basis. Entity knowledge includes, but is not limited to, the probability of occurrence of an entity, the distribution of names of entities (full name, alias, abbreviation, etc.), the context of the entity (such as co-occurrence information of words, word distribution, etc.) and the entity in the knowledge base. Category information (such as company entities, individual entities, Location entity, etc.). You can use probability-based (such as linear regression or logistic regression, etc.) or machine learning (such as Support Vector Machines, Random Forest, etc.) to learn and train semantic disambiguation based on entity knowledge. Entity link model. Another type of approach is based on the assumption of subject consistency (ie, the entities in the article are usually related to the text topic, so these entities also have semantic relevance), using the candidate entities referred to by all entities in the text content in the knowledge base (eg The associations in Wikipedia or the knowledge map constructed by the present invention consistently disambiguate and link all entity references in an article. This kind of method usually uses collaborative reasoning based on graph data structure in the calculation process. The candidate entities refer to all entities in the article content, and use their relationship in the knowledge base to construct a candidate entity graph. The dense distribution of the graph reflects The degree of semantic association between different candidate entity nodes in the graph. The process of entity linking is to synergistically enhance the evidence by iteratively passing the evidence (the possible degree of association between different entities) according to the dependency structure of the candidate entity graph until convergence. The above two types of methods can also be combined flexibly or organically to improve the performance of disambiguation and linking.

In step S14, Internet media data related to the industry is obtained from an Internet data source, and specific media events related to the industry and corresponding directly related entities are extracted from the Internet media data.

Internet media data can be obtained from Internet data sources in a variety of ways. For example, some social media sites (eg, Sina Weibo, Facebook, Twitter, etc.) have open APIs for getting their data. Web crawler technology and content extraction technology can also be used to capture news site or industry media site data.

There are a number of technical implementations in the art for monitoring Internet media to obtain specific media events. For example, in one implementation, the Internet media data is first detected to discover the content of the media event in the particular domain or industry of interest and the entity involved in the event, and then to differently identify the newly discovered media event. Indicators (eg, negative, significant, sudden, speed and scope of the event, credibility, etc.) are evaluated to screen out media events that meet the requirements.

For different types of Internet media data, different processing technologies can be used to identify directly related entities corresponding to media events. For example, entity recognition techniques based on natural language processing can be used to identify entities from textual data, images or video recognition processing techniques can be used to identify entities from image or video data, and speech recognition processing techniques can be used to identify from audio or video data. entity. Those skilled in the art will appreciate that the present invention does not limit the media types and language types of Internet media data.

In step S15, the industry knowledge map database is supplemented based on the specific media event and the corresponding directly related entity, wherein the specific media event is supplemented as an abstract entity into the industry knowledge map database.

After obtaining specific industry-related media events and corresponding directly related entities (for example, a listed company chairman's corruption scandal and the companies, people, locations involved in the incident), the event is added as an abstract entity to industry knowledge. In the graph database, entity linkage and semantic disambiguation are performed on the directly related entities involved in the event, that is, the corresponding entity in the industry knowledge map database is found and associated with the abstract entity representing the event. . If it is found that the entity involved in the event does not exist in the industry knowledge map database, it may be supplemented in the manner described in the above step S13. After completing the supplement to the industry knowledge map database, the relationship between the abstract entity representing the media event and the other in the industry knowledge map database can be found based on the relationship between the directly related entities of the event and the other entities in the knowledge map database. Indirectly related entities.

After constructing the industry knowledge map database through the above methods, it is possible to perform automated and in-depth monitoring of Internet media events based on the constructed information. Preferably, after completing the first construction of the industry knowledge map database, in order to maintain the integrity and validity of the information, the industry knowledge map database may also be updated, for example, steps S12 and S13 may be periodically performed in a predetermined cycle, and Steps S14 and S15 are performed in a real-time uninterrupted manner.

In addition, those skilled in the art can understand that the contents of various data such as industry data, industry-related data, and Internet media data involved in the present invention may be multi-language or multiple types (for example, The text, image, video, voice, etc.), the present invention does not impose any limitation on this.

3 is an exemplary flow chart of a method for monitoring media events provided by the present invention, which can monitor industry-specific media events based on an industry knowledge map database constructed in the present invention. The method can include steps S31-S35.

In step S31, internet media data is acquired.

As mentioned above, Internet media data can be obtained from Internet data sources in a variety of ways. For example, some social media sites (eg, Sina Weibo, Facebook, Twitter, etc.) have open APIs for getting their data. Web crawler technology and content extraction technology can also be used to capture news site or industry media site data.

In step S32, event detection, event evaluation, and screening are performed based on the acquired internet media data to obtain the specific media event related to the industry.

As described above, there are various technical implementations in the art for monitoring Internet media to obtain specific media events. For example, in one implementation, the Internet media data is first detected to discover the content of the media event in the particular domain or industry of interest and the entity involved in the event, and then to the new Current media events are evaluated according to different indicators (eg, negative, significant, sudden, speed and scope of the event, credibility, etc.) to screen out media events that meet the requirements.

Specifically, in one embodiment, the technical implementation steps involved in event detection may include: topic classification, entity recognition, sentiment analysis, and event discovery.

In the step of topic classification, the content in the acquired Internet media data is classified into topics to obtain content for a specific topic. The purpose of topic classification is to filter out the content that belongs to a certain topic of interest or related to customer needs from the content obtained. Topic classification is a kind of text mining technology. The machine learning or deep learning method is generally used to train the classification model on the annotation data, and then applied to the text to judge the topic category. Any existing classification model (e.g., naive Bayesian model, decision tree, support vector machine, artificial neural network, etc.) can be used in the present invention.

In the step of entity identification, the entities involved are identified from the obtained content. The purpose of entity extraction is to find out which entities involved in the article for further analysis. For example, the entity identification may include extracting an entity from the text information by an information extraction technique in natural language processing, identifying an entity from the image (including video) information by an image recognition technology, and identifying the entity from the voice information by using a voice recognition technology. You can also combine entities identified from text, images, and speech.

In the step of sentiment analysis, sentiment analysis is performed on the obtained content and the identified entity, and the obtained content is filtered based on the result of the sentiment analysis. Sentiment analysis is used to determine the full text of the content and the emotional polarity expressed for different entities to find content that meets the monitoring criteria. The prior art generally implements sentiment analysis in a text classification method (eg, classifying emotions as positive, neutral, or negative) or regression analysis methods (eg, expressing emotions as scores between -5 and +5). Judging the emotions of an entity in the content can use the context information of the entity in the text, or use the dependency syntax analysis tool to find the text part of the text related to the entity for the sentiment analysis of the entity.

In the step of event discovery, event discovery is performed based on the filtered content to cluster media events and discover new media events. The purpose of event discovery is to extract event information from different texts (for example, the time, place, etc. of the event), then cluster and merge the relevant information into abstract "events", and compare them with existing events to judge new Events that occur and cluster events based on their similarity or relevance.

In an embodiment, optionally, during the event detection, the authenticity of the event may also be performed based on the attributes of the media event (for example, the time and place of the event, the media event publisher and its related attributes, etc.). Analyze and sort and/or filter media events based on the results of the analysis.

It will be understood by those skilled in the art that the implementations listed for the operations in the above steps are merely exemplary, and some other manners existing in the art may also implement the operations, and the present invention does not implement the specific manner of implementing the foregoing operations. Make any restrictions.

In step S33, a directly related entity corresponding to the specific media event is identified.

In one embodiment, each directly related entity in each media event can be obtained by entity identification and event discovery operations in event monitoring. At the same time, as described above, each directly related entity can be associated with the corresponding entity concept in the industry knowledge map database or supplemented to the industry knowledge map database through entity link and semantic disambiguation processing.

In step S34, based on the directly related entity, the industry knowledge map database is accessed to determine an indirectly related entity corresponding to the specific media event.

In one embodiment, other indirectly related entities associated with the event directly related entity may be directly queried on the industry knowledge map database by preset various conditions. For example, the preset condition may be: 1) an entity that has an association relationship with the event directly related entity in the N layer (N may be 1, 2, 3...); 2) the degree of association with the event directly related entity satisfies a certain condition (such as other entities greater than a specified threshold); 3) entities that have a specific relationship (eg, supply relationship, investment relationship, etc.) directly related to the event; 4) have certain attributes (eg, belong to a certain An entity that specifies a industry, is located at a location, has a position, etc.). These preset conditions can be used individually or in combination.

In another embodiment, a method of data mining may be employed to exploit a variety of conditions to mine an indirectly related entity of an event based on an industry knowledge map database. For example, the specific implementation method may adopt a link prediction method for graph data, that is, express an indirectly related entity problem detecting an event as “a node representing the event and a directly related entity in the forecast industry knowledge map database”. The technical problem of whether there is a side edge between other entity nodes other than the node. Conditions that can be used for link prediction include, but are not limited to, the characteristics of the event itself (eg, type of event, time and location attributes, negativeness, etc.), the relationship of the event to historical events (including relationship types and relationship strengths), events directly The relationship between related entities and other entities (including relationship types and relationship strengths) and entity types and attributes, all of which can be mined in the knowledge map database, to achieve a comprehensive judgment of indirectly related entities of specific media events.

In step S35, an alert message is sent to the directly related entity and/or the indirectly related entity.

After identifying the direct and indirect related entities corresponding to the specific media event, multiple ways (eg, email, SMS, live chat tool, social network platform, etc.) can be sent to the corresponding entity user. Alert message. The alert message may contain a textual description of the event itself, a picture, dissemination related statistics, an event evaluation indicator, and how the related entity may be affected by the event.

Those skilled in the art can understand that the specific media events described in the present invention may be various types of events that meet the conditions set by the user and can be obtained from the Internet media, for example, negative events, emergencies, crisis events, Group events or public opinion events. The invention does not impose any limitation on this.

As a preferred embodiment, FIG. 4 illustrates an exemplary flow chart of another method of constructing an industry knowledge map database provided by the present invention. The method may include steps S41, S421/S422, and S43-S45.

In step S41, industry data is obtained from an industry data source, and entities and corresponding entity attributes and entity relationships are extracted from the industry data to generate an industry knowledge map database.

In step S421, based on the structured data source, an entity, an entity attribute, and an entity relationship related to the industry are obtained in a query manner by using an application program interface. In one embodiment, the structured data source can be a structured open data platform such as Wikidata, DBPedia, and industry related data can be obtained from the API.

In step S422, based on the semi-structured or unstructured data source, the data is subjected to entity identification and relationship extraction using natural language processing techniques to extract entities, entity attributes, and entity relationships related to the industry. In one embodiment, the semi-structured or unstructured data source may be an open data platform such as Wikipedia or Baidu Encyclopedia, or any related third-party data source (eg, a professional website, in Internet media). Published content, etc.), and can obtain industry-related data through web crawling or content extraction technology.

Preferably, steps S421 and/or S422, S43 may be periodically performed in a predetermined cycle.

In step S43, the industry knowledge map database is supplemented or updated based on the industry-related entities and corresponding entity attributes and entity relationships.

In step S44, Internet media data is obtained from an Internet data source, and specific media events related to the industry and corresponding directly related entities are extracted from the Internet media data.

In step S45, an industry knowledge map database is supplemented based on the particular media event and the corresponding directly related entity, wherein the particular media event is supplemented as an abstract entity into the industry knowledge map database.

Preferably, steps S44 and S45 can be performed in an uninterrupted manner in real time.

FIG. 5 is an exemplary flowchart of another method for constructing an industry knowledge map database provided by the present invention. The method can include steps S51-S53:

In step S51, the industry data is obtained from the data source;

In step S52, data processing is performed on the industry data to extract entities related to the industry and corresponding entity attributes and/or entity relationships;

In step S53, the industry knowledge map database is constructed based on the extracted entities, entity attributes, and/or entity relationships.

As mentioned above, the data source for the industry knowledge map database can be varied, including but not limited to open encyclopedia data sources, structured databases, and any related third-party semi-structured or unstructured Internet data. source. At the same time, as mentioned above, the data source of the industry knowledge map database can also be an internet media data source.

In one embodiment, the data source may be a structured industry database, and the method may be implemented in the following specific manner: in step S51(1), obtaining a structuring including a plurality of fields from a third-party industry database Industry data; in step S52(1), data cleaning and extraction-conversion-loading (ETL) of the structured industry data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships Processing; in step S53(1), generating the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship.

In another embodiment, the data source may be an unstructured or semi-structured Internet data source, and the method may be implemented in the following specific manner: in step S51 (2), using web crawling technology, from the Internet The data source acquires industry-related data, the Internet data source includes an unstructured or semi-structured data source; and in step S52(2), the information related to the industry is utilized by using an information extraction technique in natural language processing Performing entity identification and relationship extraction to extract the entity, entity attribute, and/or entity relationship; in step S53(2), performing the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship Supplement or update.

Further, the steps S51(2)-S53(2) may be performed periodically at a predetermined cycle.

In another embodiment, the data source may be an open Internet data source, and the method may be implemented in the following specific manner: in step S51 (3), using an application program interface (API) to query from The internet data source acquires industry-related data; in step S52(3), data cleaning of the industry-related data is performed before extracting entities related to the industry and corresponding entity attributes and/or entity relationships And an extract-convert-load (ETL) process; in step S53(3), the industry knowledge map database is supplemented or updated based on the extracted entity, entity attribute, and/or entity relationship.

Further, the steps S51(3)-S53(3) may be performed periodically at a predetermined cycle.

In another embodiment, the data source may be an internet media data source, and the method may be implemented in the following specific manner: in step S51 (4), using an application program interface (API) or web crawler technology, from The internet data source obtains the internet media data; in step S52(4), performing event detection, event evaluation and screening on the internet media data to extract specific media events related to the industry, and extracting the media media data from the internet Identifying a corresponding directly related entity; in step S53 (4), supplementing the industry knowledge map database based on the specific media event and a corresponding directly related entity, wherein the specific media event is treated as an abstract entity Added to the industry knowledge map database.

For example, in step S52(4), a directly related entity corresponding to a specific media event may be identified by at least one of: identifying an entity from text data based on entity recognition in natural language processing; based on image or video The recognition process identifies an entity from the image or video data; or, the entity is identified from the audio or video data based on the speech recognition process.

For example, the particular media event may include a negative event, an emergency, a crisis event, a mass event, a public opinion event, or other event of industry significance.

Furthermore, the steps S51(4)-S53(4) may be performed in real time without interruption.

In another embodiment, the step of supplementing or updating the industry knowledge map database in steps S53(2), S53(3), and S53(4) may include performing semantic disambiguation on the extracted entity. Entity link. For example, the semantic disambiguation and entity linking may be performed by at least one of: semantically disambiguating and entity linking are performed independently for each extracted entity based on entity knowledge; Sexual hypothesis, using the association of candidate entities in the knowledge base, consistently semantically disambiguating and entity linking the extracted entities.

The method for constructing an industry knowledge map database provided by the present invention is described above by way of example. Those skilled in the art will appreciate that various combinations of these embodiments are also included within the concept of such a method of constructing an industry knowledge map database.

6 is an exemplary block diagram of a system for monitoring media events provided by the present invention. The system includes a data acquisition unit, a data acquisition unit, a database construction unit, a database storage unit, a media event monitoring unit, a database access unit, and a message sending unit.

A data acquisition unit for obtaining industry data from a data source.

a data processing unit, configured to perform data processing on the industry data to extract realities related to the industry Body and corresponding entity attributes and/or entity relationships;

a database construction unit, configured to build the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship;

Database storage unit: used to store the built industry knowledge map database;

a media event monitoring unit: configured to acquire internet media data, perform event detection, event evaluation, and screening based on the acquired internet media data to obtain the specific media event related to the industry, and identify a direct corresponding to the specific media event Related entity

a database access unit: configured to access the industry knowledge map database based on the directly related entity to determine an indirectly related entity corresponding to the specific media event;

a message sending unit, configured to send an alert message to the directly related entity and/or the indirectly related entity.

In one embodiment, the data obtaining unit includes: a structured data obtaining unit, configured to obtain structured data from a third-party industry database, the structured data includes a plurality of fields; the data processing unit includes: structured a data processing unit, configured to perform data cleaning and extract-convert-load (ETL) processing on the structured data; the database building unit includes: a database generating unit, configured to be based on the extracted entity, entity attribute, and/or The entity relationship generates the industry knowledge map database.

In another embodiment, the data acquisition unit includes: an industry-related data acquisition unit for obtaining industry-related data from an Internet data source, including an unstructured or semi-structured, using a web crawler technology The data processing unit includes: an industry-related data processing unit, configured to perform entity identification and relationship extraction on the industry-related data by using an information extraction technology in natural language processing to extract the entity and the entity. Attributes and/or entity relationships; the database building unit comprising: a database supplement/update unit for supplementing or updating the industry knowledge map database based on the extracted entities, entity attributes and/or entity relationships.

In another embodiment, the data acquisition unit includes: an industry-related data acquisition unit configured to acquire industry-related data from an Internet data source in an inquiry manner using an application program interface (API), the Internet data source including an open source The data processing unit includes: an industry-related data processing unit, configured to perform data on the industry-related data before extracting an entity related to the industry and a corresponding entity attribute and/or entity relationship Cleaning and extraction-conversion-loading (ETL) processing; the database building unit comprising: a database supplement/update unit for knowing the industry based on the extracted entities, entity attributes, and/or entity relationships The map database is supplemented or updated.

In another embodiment, the data obtaining unit includes: a media data acquiring unit, configured to acquire industry-related Internet media data from an Internet data source by using an application program interface (API) or a web crawler technology; The unit includes: a media data processing unit, configured to perform event detection, event evaluation, and screening on the internet media data to extract specific media events related to the industry, and identify corresponding direct correlations from the internet media data. Entity; the database building unit comprising: a database supplement/update unit for supplementing the industry knowledge map database based on the specific media event and a corresponding directly related entity, wherein the specific media event is an abstract entity It is added to the industry knowledge map database.

In one embodiment, the database supplement/update unit is further configured to perform semantic disambiguation and entity linking on the extracted entities.

In one embodiment, the media event monitoring unit is further configured to: perform topic classification on the content in the acquired internet media data to obtain content for a specific topic; identify the involved entity from the obtained content; The obtained content and the identified entity perform sentiment analysis, and filter the obtained content based on the result of the sentiment analysis; perform event discovery based on the filtered content to cluster media events and discover new media events. In another embodiment, the media event monitoring unit is further configured to: analyze the authenticity of the event based on the attribute of the media event, and sort and/or filter the media event according to the analysis result.

In one embodiment, the database access unit is further configured to query the industry knowledge map database to determine the indirectly related entity based on the directly related entity. In another embodiment, the database access unit is further configured to: use the data mining technique to determine the indirectly related entity in the industry knowledge map database based on the directly related entity.

The system for monitoring media events provided by the present invention is described above by way of example. Those skilled in the art can understand that the operational steps in the various methods described above in connection with FIGS. 1 and 3-5 can be applied to the constituent units of the system, and thus are not described herein again.

Those skilled in the art will also appreciate that the various exemplary method steps and units described in connection with the various embodiments disclosed herein can be implemented in electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative steps and units are described above generally in terms of their functionality. Whether such functionality is implemented as hardware or as software depends on the particular application and design constraints imposed on the overall system. Those skilled in the art can adapt to each specific application in a flexible manner. The presently described functions, however, should not be construed as causing a departure from the scope of the disclosure.

The "example/exemplary" used in the description of the present invention is used as an example, illustration or description. Any technical solution described as "exemplary" in the specification should not be construed as being more preferred or advantageous over other technical solutions.

The present invention is provided to enable a person skilled in the art to make or use the invention. Many modifications and variations of the present invention will be apparent to those skilled in the <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; Therefore, the present invention is not limited to the specific embodiments shown above, but should be consistent with the broadest scope of the inventive concepts disclosed herein.

Claims

A method for constructing an industry knowledge map database, comprising the steps of:

Step 101: Obtain industry data from a data source;

Step 102: Perform data processing on the industry data to extract entities related to the industry and corresponding entity attributes and/or entity relationships;

Step 103: Construct the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.
The method of claim 1 wherein

The step 101 is implemented by acquiring structured industry data from a third-party industry database, the structured industry data including a plurality of fields;

The step 102 is implemented by performing data cleaning and extraction-conversion-loading (ETL) processing on the structured industry data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships. ;

The step 103 is implemented by generating the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.
The method of claim 1 wherein

The step 101 is implemented by acquiring network-related data from an Internet data source, including an unstructured or semi-structured data source, by using a web crawler technology;

The step 102 is implemented by performing an entity identification and relationship extraction on the industry-related data by using an information extraction technique in natural language processing to extract the entity, the entity attribute, and/or the entity relationship;

The step 103 is implemented by supplementing or updating the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.
The method of claim 1 wherein

The step 101 is implemented by using an application program interface (API) to query from the interconnection. A network data source acquires industry-related data, the Internet data source including an open data source;

The step 102 is implemented by performing data cleaning and extraction-conversion-loading (ETL) on the industry-related data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships. deal with;

The step 103 is implemented by supplementing or updating the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.
The method of claim 1 wherein

The step 101 is implemented by acquiring an industry-related Internet media data from an Internet data source by using an application program interface (API) or a web crawler technology;

The step 102 is implemented by performing event detection, event evaluation and screening on the internet media data to extract specific media events related to the industry, and identifying corresponding directly related entities from the internet media data. ;

The step 103 is implemented by supplementing the industry knowledge map database based on the specific media event and a corresponding directly related entity, wherein the specific media event is supplemented to the industry knowledge map as an abstract entity In the database.
The method according to claim 5, wherein in step 102, a directly related entity corresponding to the specific media event is further identified by at least one of the following:

Identifying entities from text data based on entity recognition in natural language processing;

Identifying an entity from image or video data based on image or video recognition processing; or

Identifying entities from audio or video data based on speech recognition processing.
The method of any of claims 3-5, wherein the step 103 comprises semantic disambiguation and entity linking of the extracted entities.
The method of claim 7, wherein the step of semantically disambiguating and entity linking the extracted entities is further implemented by at least one of the following:

Semantic disambiguation and entity linking are performed independently for each extracted entity based on entity knowledge;

Based on the topic consistency assumption, using the association of the candidate entities in the knowledge base, the extracted entities are consistently semantically disambiguated and entity linked.
The method of claim 5 wherein the particular media event comprises a negative event, an emergency, a crisis event, a mass event, a public opinion event, or other event of industry significance.
The method according to claim 3 or 4, wherein said steps 101-103 are performed periodically at a predetermined cycle.
The method of claim 5 wherein said steps 101-103 are performed in real time without interruption.
A method for monitoring a specific media event related to an industry based on the industry knowledge map database of any one of claims 1-11, comprising the steps of:

Step 1201: Obtain internet media data.

Step 1202: Perform event detection, event evaluation, and screening based on the acquired Internet media data to obtain the specific media event related to the industry;

Step 1203: Identify a directly related entity corresponding to the specific media event;

Step 1204, based on the directly related entity, accessing the industry knowledge map database to determine an indirectly related entity corresponding to the specific media event;

Step 1205: Send an alert message to the directly related entity and/or the indirectly related entity.
The method of claim 12 wherein the detecting of the event in step 1202 comprises the steps of:

Sorting topics in the acquired Internet media data to obtain content for a specific topic;

Identifying the entities involved from the obtained content;

Performing sentiment analysis on the obtained content and the identified entity, and filtering the obtained content based on the result of the sentiment analysis;

Event discovery based on filtered content to cluster media events and discover new media events.
The method according to claim 13, wherein the detecting of the event in the step 1202 further comprises the following steps:

The authenticity of the event is analyzed based on the attributes of the media event, and the media events are sorted and/or filtered according to the analysis result.
The method of claim 12, wherein the directly related entity corresponding to the particular media event is identified in the step 1203 by at least one of:

Identifying entities from text data based on entity recognition in natural language processing;

Identifying an entity from image or video data based on image or video recognition processing; or

Identifying entities from audio or video data based on speech recognition processing.
The method of claim 12 wherein said step 1204 is accomplished in the following manner:

Querying in the industry knowledge map database to determine the indirectly related entity based on the directly related entity.
The method of claim 12 wherein said step 1204 is accomplished in the following manner:

Based on the directly related entity, data mining techniques are used in the industry knowledge map database to determine the indirectly related entities.
An apparatus for constructing an industry knowledge map database, comprising:

a data acquisition module for obtaining industry data from a data source;

a data processing module, configured to perform data processing on the industry data to extract entities related to the industry and corresponding entity attributes and/or entity relationships;

A database building module is configured to build the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.
The device of claim 18, wherein

The data acquisition module obtains industry data by obtaining structured industry data from a third-party industry database, the structured industry data including a plurality of fields;

The data processing module performs data processing by performing data cleaning and extraction-conversion-loading on the structured industry data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships ( ETL) processing;

The database building module constructs an industry knowledge map database by generating the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.
The device of claim 18, wherein

The data acquisition module obtains industry data by using network crawler technology to obtain industry-related data from an Internet data source, the Internet data source including an unstructured or semi-structured data source;

The data processing module performs data processing by using an information extraction technique in natural language processing to perform entity identification and relationship extraction on the industry-related data to extract the entity, entity attribute, and/or entity relationship;

The database building module constructs an industry knowledge map database by supplementing or updating the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.
The device of claim 18, wherein

The data acquisition module acquires industry data by using an application program interface (API) to obtain industry-related data from an Internet data source in an inquiry manner, the Internet data source including an open data source;

The data processing module performs data processing by performing data cleaning and extraction-conversion-loading on the industry-related data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships. (ETL) processing;

The database building module constructs an industry knowledge map database by supplementing or updating the industry knowledge map database based on the extracted entities, entity attributes, and/or entity relationships.
The device of claim 18, wherein

The data acquisition module obtains industry data by acquiring an industry-related Internet media data from an Internet data source by using an application program interface (API) or a web crawler technology;

The data processing module performs data processing by performing event detection, event evaluation, and screening on the internet media data to extract specific media events related to the industry, and identifying corresponding ones from the internet media data. Directly related entity;

The database building module constructs an industry knowledge map database by supplementing the industry knowledge map database based on the specific media event and a corresponding directly related entity, wherein the specific media event is supplemented as an abstract entity The industry knowledge map database.
The apparatus of claim 22, wherein the database building module further identifies a directly related entity corresponding to the particular media event by at least one of:

Identifying entities from text data based on entity recognition in natural language processing;

Identifying an entity from image or video data based on image or video recognition processing; or

Identifying entities from audio or video data based on speech recognition processing.
The apparatus of any of claims 20-22, wherein the database building module comprises means for semantic disambiguation and entity linking of the extracted entities.
The apparatus according to claim 24, wherein the means for semantic disambiguation and entity linking of the extracted entities further performs semantic disambiguation and entity linking by at least one of:

Semantic disambiguation and entity linking are performed independently for each extracted entity based on entity knowledge;

Based on the topic consistency assumption, using the association of the candidate entities in the knowledge base, the extracted entities are consistently semantically disambiguated and entity linked.
The method of claim 22 wherein the particular media event comprises a negative event, an emergency, a crisis event, a mass event, a public opinion event, or other event of industry significance.
A system for monitoring specific media events related to the industry, characterized by comprising:

a data acquisition unit for obtaining industry data from a data source;

a data processing unit, configured to perform data processing on the industry data to extract entities related to the industry and corresponding entity attributes and/or entity relationships;

a database construction unit, configured to build the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship;

Database storage unit: used to store the built industry knowledge map database;

a media event monitoring unit: configured to acquire internet media data, perform event detection, event evaluation, and screening based on the acquired internet media data to obtain the specific media event related to the industry, and identify a direct corresponding to the specific media event Related entity

a database access unit: configured to access the industry knowledge map database based on the directly related entity to determine an indirectly related entity corresponding to the specific media event;

a message sending unit, configured to send an alert message to the directly related entity and/or the indirectly related entity.
The system of claim 27 wherein:

The data obtaining unit includes: a structured data obtaining unit, configured to obtain structured industry data from a third-party industry database, where the structured industry data includes multiple fields;

The data processing unit includes: a structured data processing unit, configured to perform data cleaning and extraction-conversion on the structured industry data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships - loading (ETL) processing;

The database construction unit includes: a database generation unit configured to generate the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship.
The system of claim 27 wherein:

The data acquisition unit includes: an industry-related data acquisition unit, configured to obtain industry-related data from an Internet data source, including an unstructured or semi-structured data source, by using a web crawler technology;

The data processing unit includes: an industry-related data processing unit, configured to perform entity identification and relationship extraction on the industry-related data by using an information extraction technology in natural language processing to extract the entity and the real Body attributes and/or entity relationships;

The database construction unit includes a database supplement/update unit for supplementing or updating the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship.
The system of claim 27 wherein:

The data obtaining unit includes: an industry-related data acquiring unit, configured to acquire industry-related data from an Internet data source by using an application program interface (API), where the Internet data source includes an open data source;

The data processing unit includes: an industry-related data processing unit, configured to perform data cleaning and extraction on the industry-related data before extracting entities related to the industry and corresponding entity attributes and/or entity relationships - Conversion-loading (ETL) processing;

The database construction unit includes a database supplement/update unit for supplementing or updating the industry knowledge map database based on the extracted entity, entity attribute, and/or entity relationship.
The system of claim 27 wherein:

The data obtaining unit includes: a media data acquiring unit, configured to acquire industry-related Internet media data from an Internet data source by using an application program interface (API) or a web crawler technology;

The data processing unit includes: a media data processing unit, configured to perform event detection, event evaluation, and screening on the internet media data to extract a specific media event related to the industry, and identify from the internet media data Corresponding directly related entities;

The database construction unit includes a database supplement/update unit for supplementing the industry knowledge map database based on the specific media event and a corresponding directly related entity, wherein the specific media event is supplemented as an abstract entity Go to the industry knowledge map database.
The system according to any one of claims 29-31, wherein the database supplement/update unit is further configured to perform semantic disambiguation and entity linking on the extracted entities.
The system of claim 27, wherein the media event monitoring unit is further configured to:

Sorting topics in the acquired Internet media data to obtain content for a specific topic;

Identifying the entities involved from the obtained content;

Performing sentiment analysis on the obtained content and the identified entity, and filtering the obtained content based on the result of the sentiment analysis;

Event discovery based on filtered content to cluster media events and discover new media events.
The system of claim 33, wherein the media event monitoring unit is further configured to:

The authenticity of the event is analyzed based on the attributes of the media event, and the media events are sorted and/or filtered according to the analysis result.
The system of claim 27, wherein the database access unit is further configured to:

Querying in the industry knowledge map database to determine the indirectly related entity based on the directly related entity.
The system of claim 27, wherein the database access unit is further configured to:

Based on the directly related entity, data mining techniques are used in the industry knowledge map database to determine the indirectly related entities.
The system of claim 27, wherein the particular media event comprises a negative event, an emergency, a crisis event, a mass event, a public opinion event, or other event of industry significance.