CN114706948A - News processing method and device, storage medium and electronic equipment - Google Patents

News processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN114706948A
CN114706948A CN202210316535.4A CN202210316535A CN114706948A CN 114706948 A CN114706948 A CN 114706948A CN 202210316535 A CN202210316535 A CN 202210316535A CN 114706948 A CN114706948 A CN 114706948A
Authority
CN
China
Prior art keywords
entity
news
company
target
corporate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210316535.4A
Other languages
Chinese (zh)
Inventor
王展
张�杰
于皓
罗华刚
李犇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Minglue Zhaohui Technology Co Ltd
Original Assignee
Beijing Minglue Zhaohui Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Minglue Zhaohui Technology Co Ltd filed Critical Beijing Minglue Zhaohui Technology Co Ltd
Priority to CN202210316535.4A priority Critical patent/CN114706948A/en
Publication of CN114706948A publication Critical patent/CN114706948A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a news processing method, a news processing device, a storage medium and electronic equipment. The method comprises the following steps: under the condition of acquiring target news to be identified, segmenting the target news to obtain a plurality of news segments; identifying a first corporate entity in a news segment and a first news entity associated with the first corporate entity; aligning the first company entity to obtain an aligned target company entity; analyzing the emotional tendency of the target company entity according to the first news entity; and filling the target company entity and emotional tendency into the knowledge graph. The invention solves the technical problem of low efficiency of searching public opinion news related to companies.

Description

News processing method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of computers, and in particular, to a news processing method and apparatus, a storage medium, and an electronic device.
Background
In the prior art, when a user monitors public opinion information of an electric power company, an account number of each news platform is usually registered for the user, then popular information of each platform is browsed every moment, whether public opinion information related to the company exists or not is checked, and accordingly the company image is maintained timely.
However, the above method causes inefficiency in finding public opinion news related to a company.
Disclosure of Invention
The embodiment of the invention provides a news processing method, a news processing device, a storage medium and electronic equipment, which are used for at least solving the technical problem of low efficiency of searching public sentiment news related to a company.
According to an aspect of an embodiment of the present invention, there is provided a news processing method, including: under the condition of acquiring target news to be identified, segmenting the target news to obtain a plurality of news segments; identifying a first corporate entity in said news segment and a first news entity associated with said first corporate entity; aligning the first company entity to obtain an aligned target company entity; analyzing the emotional tendency of the target company entity according to the first news entity; and filling the target company entity and the emotional tendency into a knowledge graph.
According to another aspect of the embodiments of the present invention, there is provided a news processing apparatus including: the word segmentation module is used for segmenting the target news to be identified under the condition that the target news to be identified is obtained, so that a plurality of news segments are obtained; an identification module for identifying a first corporate entity in the news segment and a first news entity associated with the first corporate entity; the alignment module is used for aligning the first company entity to obtain an aligned target company entity; the analysis module is used for analyzing the emotional tendency of the target company entity according to the first news entity; and the filling module is used for filling the target company entity and the emotional tendency into the knowledge graph.
As an alternative example, the identification module includes: a first determining unit configured to, in a case where a first-level company entity is identified in the news segmentation, take the identified first-level company entity as the first company entity; a second determination unit configured to, in a case where the company entity of the first level is not identified, and in a case where the company entity of the second level is identified, take the identified company entity of the second level as the first company entity; and a third determining unit configured to, if the second-level company entity is not identified and if a third-level company entity is identified, determine the identified third-level company entity as the first company entity, wherein the first-level company entity, the second-level company entity, and the third-level company entity are company entities classified according to a geographical classification, the first level is greater than the second level, and the second level is greater than the third level.
As an alternative example, the identification module includes: a fourth determining unit, configured to use the news entity in the same topic news as the identified first company entity as the first news entity associated with the first company entity; or using the news entity in the same paragraph of the same target news as the first company entity as the first news entity related to the first company entity; and taking the news entity in the same sentence in the same text news as the first company entity as the first news entity associated with the first company entity.
As an alternative example, the alignment module includes: the system comprises a presetting unit, a processing unit and a processing unit, wherein the presetting unit is used for presetting a plurality of standard company entities, and each standard company entity corresponds to one or more first company entities; a processing unit, configured to take each of the first company entities as a current company entity, and execute the following operations: comparing the current company entity with each of the standard company entities; and taking the standard company entity with the highest similarity with the current company entity as an aligned target company entity.
As an optional example, the apparatus further includes: a first processing module, configured to identify the first news entity to obtain a target keyword related to the target company entity in the target news; and acquiring target time information of the target news where the target keyword is located, wherein the target time information is a release time point of the target news.
As an alternative example, the filling module includes: a fifth determining unit, configured to use the target company entity as the first entity in the knowledge-graph; a sixth determining unit, configured to use the target news related to the target company entity as a second entity related to the first entity in the knowledge graph; and a seventh determining unit configured to use the emotional tendency of the target company entity, the target keyword, and the target time information as the attribute information of the second entity.
As an optional example, the apparatus further includes: the second processing module is used for taking each news website in the plurality of news websites as a current news website and executing the following operations: and traversing the target news from the current website by taking one website of the current news website as the current website, and taking the website to which the current website can jump as a new current website after traversing the current website.
According to still another aspect of the embodiments of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program executes the above-mentioned news processing method when being executed by a processor.
According to still another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores therein a computer program, and the processor is configured to execute the news processing method by the computer program.
The method can be applied to the map construction process of the knowledge map technology. In the embodiment of the invention, under the condition of acquiring the target news to be identified, the target news is participled to obtain a plurality of news participles; identifying a first corporate entity in the news segment and a first news entity associated with the first corporate entity; aligning the first company entity to obtain an aligned target company entity; analyzing the emotional tendency of the target company entity according to the first news entity; the method for filling the target company entity and the emotional tendency into the knowledge graph comprises the steps of acquiring target news, identifying a first company entity in the news and aligning to obtain the target company entity, identifying a first news entity of the first company entity, analyzing to obtain the emotional tendency of the target company entity, filling the target company entity and the emotional tendency into the knowledge graph, and quickly and efficiently checking the emotional tendency result and the corresponding news result of the company in the knowledge graph through keywords, so that the aim of improving the efficiency of searching the public sentiment news related to the company is fulfilled, and the technical problem of low efficiency of searching the public sentiment news related to the company is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of an alternative news processing method according to an embodiment of the present invention;
FIG. 2 is a system diagram of an alternative news processing method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a constructed knowledge graph of an alternative news processing method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of crawled target news for an alternative news processing method according to an embodiment of the invention;
FIG. 5 is a directed acyclic graph of an alternative news processing method according to an embodiment of the present invention;
FIG. 6 is a flow diagram of word segmentation for an alternative news processing method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of entity extraction for an alternative news processing method according to an embodiment of the present invention;
FIG. 8 is a diagram of a news query for an alternative news processing method according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an alternative news processing apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic view of an alternative electronic device according to embodiments of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to a first aspect of the embodiments of the present invention, there is provided a news processing method, optionally, as shown in fig. 1, the method includes:
s102, under the condition that target news to be identified is obtained, segmenting the target news to obtain a plurality of news segments;
s104, identifying a first company entity in the news segmentation and a first news entity associated with the first company entity;
s106, aligning the first company entity to obtain an aligned target company entity;
s108, analyzing the emotional tendency of the target company entity according to the first news entity;
and S110, filling the target company entity and the emotional tendency into a knowledge graph.
Optionally, the method and the device can be applied to processes of news monitoring, news query and the like. The target news may be news on a news website or news of a website of interest to the user. And acquiring news information from the website as target news, and identifying the target news to obtain the target company entity and the emotional tendency of the target company entity. After the target company entity and the emotional tendency are filled in the knowledge graph, the user can query the knowledge graph to quickly query news information of one or a plurality of companies and quickly check the emotional tendency of the news of the companies. The user may also set a specific company and specific information and alert if a message for the specific company or the specific information occurs. For example, if a company's negative message is included in the news, the user is immediately alerted.
Fig. 2 is a system architecture diagram of the present embodiment. The method comprises the steps of collecting target news, processing the target news and constructing a knowledge graph.
In this embodiment, processing news to obtain a knowledge graph is divided into several stages. As shown in fig. 3.
1. And a public opinion acquisition module acquires the target news. This stage may take each of a plurality of news sites as a current news site, performing the following operations: and taking a website of the current news website as a current website, traversing the target news from the current website, and taking the website to which the current website can jump as a new current website after traversing the current website.
In this embodiment, the target news may be acquired from a plurality of websites. The website may be a preset website. And taking a website of each website, such as a home page, as a current website, traversing the current website to crawl news content, and then continuing taking the website to which the current website can jump as the current website to crawl news content after the current website is crawled.
For example, the public opinion is monitored, the public opinion collecting module is used as an acquisition module of a data source, the quality of the acquired data source, namely the quality of target news, is determined, news resources of a plurality of power news websites are automatically collected by means of a web crawler, and text data of the news are extracted to serve as an input module of the whole system. In this step, the data in the picture or other format may be filtered to extract the text data.
The method comprises the following steps: electric news website
In order to ensure the real-time performance and the effectiveness of a public opinion map system, a plurality of websites for releasing power news are selected.
Step two: web crawler
Fig. 4 is a schematic diagram of crawling target news of the present embodiment.
1) The web crawler starts from a specified power website initial Uniform Resource Locator (URL);
2) crawling a page according to an initial URL, obtaining a new URL address according to a screening condition, and storing the crawled URL address into a URL list for duplication removal, wherein the screening condition is that all power companies in a certain area are specified; it is understood that the top-level URL address is obtained first, then the hierarchical URL addresses in the URL address are obtained, and finally the lowest-level URL address is obtained.
3) In the step 2), after the next new URL address is obtained, the new URL address is stored in a URL queue;
4) reading a new URL from the URL queue, crawling a webpage according to the new URL, simultaneously acquiring the new URL from the new webpage, and repeating the crawling process;
because this system's real-time news data of acquireing that stops, consequently the reptile system does not set up the stop condition, and the reptile can circulate and crawl off.
2. And after the target news is acquired, entering a stage of processing the target news by a public opinion preprocessing module.
Public opinion preprocessing module includes: the method comprises the following steps of word segmentation, entity extraction and entity alignment, sentiment analysis, keyword extraction and other capabilities, wherein word segmentation processing is used for cutting target news, entity extraction and entity alignment are used for accurately obtaining electric power company entities in the target news, sentiment analysis is used for carrying out positive and negative sentiment classification on the target news and giving confidence, and keyword extraction is used for obtaining topics of the target news so that a user can clearly know core ideas of the news.
1) And in the word segmentation stage, performing word segmentation processing on target news acquired by the public opinion acquisition module by using a word segmentation technology to obtain news words, wherein in order to prevent the electric power company from being cut in the word segmentation process, an electric power company organization dictionary is added before word segmentation. The word segmentation process can be described as: firstly, loading a word library, establishing a dictionary (Trie) tree word segmentation model, and then establishing a word Graph of a segmented Directed Acyclic Graph (DAG) for an input sentence, wherein the Directed Acyclic Graph obtained by the sentence of 'intentionally having a divergence' is shown in FIG. 5. And calculating global probability Route to obtain a word frequency maximum segmentation combination based on the prefix dictionary, and finally, marking an identifier according to the dictionary and outputting a word segmentation result. FIG. 6 is a flow chart of word segmentation.
2) For the entity extraction and entity alignment stages, corporate entities are extracted first. In the case where a first level of corporate entity is identified in the news segmentation, the identified first level of corporate entity is taken as a first corporate entity; in the case where the company entity of the first level is not identified, in the case where the company entity of the second level is identified, the identified company entity of the second level is taken as the first company entity; and if the company entity of the second level is not identified, if the company entity of the third level is identified, the identified company entity of the third level is taken as the first company entity, wherein the company entity of the first level, the company entity of the second level and the company entity of the third level are the company entities which are divided according to the regional level, the first level is greater than the second level, and the second level is greater than the third level.
This entity extraction, which may also be referred to as entity identification (NER), may identify different levels of corporate Entities in the target news. The different levels of corporate entities may be different regional levels of corporate entities. Such as a provincial corporate entity, a city-level corporate entity, a county-level corporate entity, etc. A company may have one or more of an upgrade company entity, a city level company entity, a county level company entity. Such as a provincial corporation, there are multiple subsidiaries at the city level. In identifying corporate entities, hierarchical identification may be performed by identifying whether a first level corporate entity exists, identifying whether a second level corporate entity exists if the first level corporate entity does not exist, and identifying whether a third level corporate entity exists if the second level corporate entity does not exist.
And after the entity extraction, aligning the company entity. Presetting a plurality of standard company entities, wherein each standard company entity corresponds to one or more first company entities; taking each first company entity as a current company entity, and performing the following operations: comparing the current company entity with each standard company entity; and taking the standard company entity with the highest similarity with the current company entity as an aligned target company entity.
The standard corporate entity may be the full name of the company. If a first corporate entity matches 90% of the company's full title, the first corporate entity may be adjusted to the company's full title.
The main task of entity extraction is to identify the text range of a named entity and classify it into predefined categories, which are the basis of the knowledge graph. Due to the particularity of the power grid organization, the system adopts a method based on rules and dictionaries to extract entities, wherein the power grid organization is used as an entity. Entity alignment, colloquially speaking, is that entity writing is different, but it is directed to the same entity. The goal of entity alignment is to be able to create a large, unified knowledge base from the top level, thereby helping the machine understand the underlying data. However, entity alignment has many problems and challenges to be solved in terms of data quality, matching efficiency, and the like. In the aspect of data quality, the system constructs a mapping set of a power grid organization mechanism on line. The specific operation process is as follows: firstly, dictionaries and mapping sets of electric power companies at three levels of province, city and county are established, then entity extraction is performed from companies at three levels of province, city and county, the extraction logic belongs to a matching mode from high level to low level, namely, province company, city company and county company are matched preferentially, and fig. 7 is a schematic diagram of entity extraction in the embodiment. If a company of a higher level is matched, no matching is performed to the next level. In addition, in the process of extracting entities, names of all entities need to be unified by using the mapping set of entity alignment. If the text "123 electric power company can fully cope with the weather of low temperature, rain, snow and ice" is segmented to obtain "123 electric power company" and 123 is the abbreviation of the electric power company, the whole name is 12345 electric power company limited, and the entity alignment obtains the standard entity "12345 electric power company limited".
3) After the entities are extracted, sentiment analysis is performed.
The emotion analysis means that the natural language processing technology is utilized to automatically judge the emotion polarity type of the text for the Chinese text with subjective description and give corresponding confidence. The method comprises a semantic-based emotion dictionary method and a machine learning-based method, wherein the dictionary-based method performs paragraph borrowing and syntactic analysis on a text by formulating a series of emotion dictionaries and rules, calculates an emotion value, and finally uses the size of the emotion value as an emotion tendency basis of the text; most of the methods based on machine learning convert the problem into a classification problem to be seen, and for judging the emotion polarity, target emotions are classified into 2 types: and positive emotion and negative emotion are adopted, the training text is marked manually, and then a supervised machine learning process is carried out. The electric news emotion corpus can be sorted by utilizing the pre-training model and added into a training set to train the pre-training model, so that the model accords with the application scene of electric news. The training data format is a tab split value (tsv) format, where each row represents a piece of training data, and is divided into two fields with a tab as a separator. The first field is the emotional tendency, the value is 0 or 1, which respectively represents the negative emotional tendency and the positive emotional tendency, and the second field is the content of the text. The content of the text is subjected to word segmentation, and words are divided by spaces.
4) And after emotion analysis, extracting key words.
Keyword extraction refers to the process of determining some terms from text that can describe the meaning of a document. For the electric power news data, the extracted keywords are used as topic lists, a TextRank algorithm is adopted for extracting the keywords, the default extraction number is N, and N is a positive integer. The TextRank algorithm is a graph-based sorting algorithm for keyword extraction and document summarization, can extract keywords by using co-occurrence information (semantics) among words in a document, can extract the keywords and key word groups of the text from a given text, and can extract the key sentences of the text by using an extraction-type automatic summarization method. The basic idea of the TextRank algorithm is to treat a document as a network of words, links in the network represent semantic relationships between words, and formula (1) gives a calculation formula of the TextRank algorithm:
Figure BDA0003569150100000091
wherein ws (vi) represents the weight of sentence i, the summation on the right side represents the contribution degree of each adjacent sentence to the sentence, in a single document, all sentences can be roughly task-adjacent, multiple windows do not need to be generated and extracted like multiple documents, only a single document window is needed, Wji represents the similarity of two sentences, ws (vj) represents the weight of sentence j iterated last time, and d is a damping coefficient, and is 0.85. V is an operation symbol, and out (y) indicates a range not belonging to y.
3. After the target news is processed, a knowledge graph is constructed by a public opinion knowledge graph module.
In the application, the target company entity and the emotional tendency of the target company entity can be used for constructing the knowledge graph, and in addition, the target company entity, the emotional tendency of the target company entity and other information can be used for constructing the knowledge graph together. Other information may include keywords in the target news, the time of release of the target news, and the like. Taking a target company entity as a first entity in the knowledge graph; using the target news related to the target company entity as a second entity related to the first entity in the knowledge graph; and taking the emotional tendency, the target keyword and the target time information of the target company entity as the attribute information of the second entity.
When the public opinion map module constructs the knowledge map, three external services such as news import, news display and news retrieval are provided. The system adopts neo4j as a graph database, and after the system is started, all power organization entities and relations of a province, including a provincial company, a plurality of city-level companies and a plurality of county-level companies, are initialized in the graph database. And then, when the crawler module crawls new data, the new data are sent to a news import service after being processed by the preprocessing module, and in the import service, a news entity is added by taking the news name as the entity name and the emotional tendency, the keyword, the time information and the like as the entity attributes. In addition, the power company grade is obtained according to the entity extraction and entity alignment, and the relation link between the news entity and the corresponding power company is increased. And finally, the public opinion map module provides real-time display of news data in a map mode to the outside and can inquire the specific information of the corresponding entity according to the entity name.
1) News import
For news data which is crawled and processed in real time, news is used as an entity tag, a news name is used as an entity name, and news emotion, topics, time and the like are used as entity attributes to create a news entity. In addition, the public opinion preprocessing module can obtain the relation between news data and corresponding power grid companies, and the relation is utilized to establish the relation link between the news entity and the company entity, wherein the relation attribute is news.
2) News show
News display submodule defaults to display all map data in real time
3) News retrieval
According to the entity name of the power company, corresponding information containing news can be inquired, and when a certain power supply branch company is input, two news related to the power supply branch company can be inquired, as shown in fig. 8.
The method and the system have the advantages that data of a plurality of electric power news websites are collected in real time through a crawler technology, the crawled news texts are subjected to word segmentation, then entity extraction and entity alignment are carried out according to the existing organizational structure relation, company entities of different levels such as province, city and county power grid companies are obtained, emotion analysis and keyword extraction are carried out on each piece of news, and positive and negative emotions and news topics are obtained. The method comprises the steps of taking a news name as a news entity of a public sentiment map, taking emotion and topics as attributes of the news entity, adding the news entity in the public sentiment map, and establishing a relation link between the news entity and power grid companies of different levels according to different levels. Finally, the system provides functions of display, retrieval and the like to the outside in a map form, visually, objectively and truly displays public sentiment information of the power grid company, helps users to quickly master topics, emotions, time and overall conditions of public sentiment events, and lays a foundation for public sentiment work of the power grid company.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to another aspect of the embodiments of the present application, there is also provided a news processing apparatus, as shown in fig. 9, including:
the word segmentation module 902 is configured to segment the target news to obtain a plurality of news segments when the target news to be identified is obtained;
an identifying module 904 for identifying a first corporate entity in the news segment and a first news entity associated with the first corporate entity;
an alignment module 906, configured to align the first company entity to obtain an aligned target company entity;
an analysis module 908 for analyzing emotional tendencies of the target corporate entity based on the first news entity;
and a filling module 910, configured to fill the target company entity and emotional tendency into the knowledge graph.
Optionally, the method and the device can be applied to processes of news monitoring, news query and the like. The target news may be news on a news website or news of a website of interest to the user. And acquiring news information from the website as target news, and identifying the target news to obtain the target company entity and the emotional tendency of the target company entity. After the target company entity and the emotional tendency are filled in the knowledge graph, the user can query the knowledge graph to quickly query news information of one or a plurality of companies and quickly check the emotional tendency of the news of the companies. The user may also set a specific company and specific information and alert if a message for the specific company or the specific information occurs. For example, if a company's negative message is included in the news, the user is immediately alerted.
For other examples of this embodiment, please refer to the above examples, which are not described herein again.
Fig. 10 is a block diagram of an alternative electronic device according to an embodiment of the present application, as shown in fig. 10, including a processor 1002, a communication interface 1004, a memory 1006, and a communication bus 1008, where the processor 1002, the communication interface 1004, and the memory 1006 communicate with each other via the communication bus 1008, where,
a memory 1006 for storing a computer program;
the processor 1002, when executing the computer program stored in the memory 1006, implements the following steps:
under the condition of acquiring target news to be identified, segmenting the target news to obtain a plurality of news segments;
identifying a first corporate entity in a news segment and a first news entity associated with the first corporate entity;
aligning the first company entity to obtain an aligned target company entity;
analyzing the emotional tendency of the target company entity according to the first news entity;
and filling the target company entity and emotional tendency into the knowledge graph.
Alternatively, in this embodiment, the communication bus may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus. The communication interface is used for communication between the electronic equipment and other equipment.
The memory may include RAM, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
As an example, the memory 1006 may include, but is not limited to, the word segmentation module 902, the recognition module 904, the alignment module 906, the analysis module 908, and the population module 910 of the news processing apparatus. In addition, the module may further include, but is not limited to, other module units in the processing apparatus of the request, which is not described in this example again.
The processor may be a general-purpose processor, and may include but is not limited to: a CPU (Central Processing Unit), NP (Network Processor), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
It can be understood by those skilled in the art that the structure shown in fig. 10 is only an illustration, and the device implementing the above-mentioned news processing method may be a terminal device, and the terminal device may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 10 is a diagram illustrating a structure of the electronic apparatus. For example, the electronic device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disk, ROM, RAM, magnetic or optical disk, and the like.
According to still another aspect of embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program, when executed by a processor, performs the steps in the above-mentioned news processing method.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A news processing method, comprising:
under the condition of obtaining target news to be identified, performing word segmentation on the target news to obtain a plurality of news word segments;
identifying a first corporate entity in the news segment and a first news entity associated with the first corporate entity;
aligning the first company entity to obtain an aligned target company entity;
analyzing emotional tendency of the target company entity according to the first news entity;
and filling the target company entity and the emotional tendency into a knowledge graph.
2. The method of claim 1, wherein the identifying the first corporate entity in the news segment comprises:
in the event that a first level of corporate entity is identified in the news participle, treating the identified first level of corporate entity as the first corporate entity;
in the case where the corporate entity of the first level is not identified, in the case where a corporate entity of a second level is identified, treating the identified corporate entity of the second level as the first corporate entity;
and if the second-level company entity is not identified, if a third-level company entity is identified, taking the identified third-level company entity as the first company entity, wherein the first-level company entity, the second-level company entity and the third-level company entity are company entities which are classified according to regional levels, the first level is greater than the second level, and the second level is greater than the third level.
3. The method of claim 2, wherein identifying the first news entity associated with the first corporate entity comprises:
identifying a news entity within the same topic news as the first corporate entity associated with the first corporate entity; or
Identifying a news entity within the same paragraph of the same underlying news as the first corporate entity associated with the first corporate entity;
a news entity within the same sentence in the same topic news as the identified first corporate entity is identified as a first news entity associated with the first corporate entity.
4. The method of claim 1, wherein aligning the first corporate entity to obtain an aligned target corporate entity comprises:
presetting a plurality of standard company entities, wherein each standard company entity corresponds to one or more first company entities;
taking each first company entity as a current company entity, and executing the following operations:
comparing the current company entity with each standard company entity;
and taking the standard company entity with the highest similarity with the current company entity as an aligned target company entity.
5. The method of claim 1, further comprising:
identifying the first news entity to obtain a target keyword related to the target company entity in the target news;
and acquiring target time information of the target news where the target keyword is located, wherein the target time information is a release time point of the target news.
6. The method of claim 5, wherein populating a knowledge graph with the target corporate entity and the emotional tendencies comprises:
identifying the target corporate entity as a first entity in the knowledge-graph;
identifying a target news item associated with the target corporate entity as a second entity in the knowledge graph associated with the first entity;
and taking the emotional tendency, the target keyword and the target time information of the target company entity as the attribute information of the second entity.
7. The method of any one of claims 1 to 6, further comprising:
taking each news website in the plurality of news websites as a current news website, and executing the following operations:
and taking a website of the current news website as a current website, traversing the target news from the current website, and taking a website to which the current website can jump as a new current website after the current website is traversed.
8. A news processing apparatus, comprising:
the word segmentation module is used for segmenting the target news to obtain a plurality of news segmented words under the condition of acquiring the target news to be identified;
an identification module to identify a first corporate entity in the news segment and a first news entity associated with the first corporate entity;
the alignment module is used for aligning the first company entity to obtain an aligned target company entity;
the analysis module is used for analyzing the emotional tendency of the target company entity according to the first news entity;
and the filling module is used for filling the target company entity and the emotional tendency into a knowledge graph.
9. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 7 by means of the computer program.
CN202210316535.4A 2022-03-28 2022-03-28 News processing method and device, storage medium and electronic equipment Pending CN114706948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210316535.4A CN114706948A (en) 2022-03-28 2022-03-28 News processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210316535.4A CN114706948A (en) 2022-03-28 2022-03-28 News processing method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114706948A true CN114706948A (en) 2022-07-05

Family

ID=82171546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210316535.4A Pending CN114706948A (en) 2022-03-28 2022-03-28 News processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114706948A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112809A (en) * 2023-10-25 2023-11-24 卓世科技(海南)有限公司 Knowledge tracking method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112809A (en) * 2023-10-25 2023-11-24 卓世科技(海南)有限公司 Knowledge tracking method and system
CN117112809B (en) * 2023-10-25 2024-01-26 卓世科技(海南)有限公司 Knowledge tracking method and system

Similar Documents

Publication Publication Date Title
CN108874777B (en) Text anti-spam method and device
US8630972B2 (en) Providing context for web articles
US8161059B2 (en) Method and apparatus for collecting entity aliases
CN102053991B (en) Method and system for multi-language document retrieval
KR101605430B1 (en) SYSTEM AND METHOD FOR BUINDING QAs DATABASE AND SEARCH SYSTEM AND METHOD USING THE SAME
Srinath et al. Privacy at scale: Introducing the PrivaSeer corpus of web privacy policies
CN113282955B (en) Method, system, terminal and medium for extracting privacy information in privacy policy
US20180025012A1 (en) Web page classification based on noise removal
CN112507160A (en) Automatic judgment method and device for trademark infringement, electronic equipment and storage medium
CN111192176B (en) Online data acquisition method and device supporting informatization assessment of education
CN107679075B (en) Network monitoring method and equipment
CN110737821B (en) Similar event query method, device, storage medium and terminal equipment
CN112258254B (en) Internet advertisement risk monitoring method and system based on big data architecture
WO2014029318A1 (en) Method and apparatus for identifying webpage type
CN109165373B (en) Data processing method and device
CN111723256A (en) Government affair user portrait construction method and system based on information resource library
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN105512300B (en) information filtering method and system
CN112818200A (en) Data crawling and event analyzing method and system based on static website
CN112149422A (en) Enterprise news dynamic monitoring method based on natural language
Gopal et al. Machine learning based classification of online news data for disaster management
CN111222031A (en) Website distinguishing method and system
CN108595466B (en) Internet information filtering and internet user information and network card structure analysis method
CN114706948A (en) News processing method and device, storage medium and electronic equipment
CN105183843A (en) List page recognition system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination