WO2019136920A1

WO2019136920A1 - Presentation method for visualization of topic evolution, application server, and computer readable storage medium

Info

Publication number: WO2019136920A1
Application number: PCT/CN2018/090694
Authority: WO
Inventors: 王健宗; 吴天博; 黄章成; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-01-12
Filing date: 2018-06-11
Publication date: 2019-07-18
Also published as: CN108170838A; CN108170838B

Abstract

Disclosed is a presentation method for the visualization of topic evolution, comprising: extracting topics related to a plurality of text material of a same event, and determining an association between each of the topics so as to establish a topic stream; screening from the plurality of topics a plurality of first topics that contain important events; extracting keywords of each of the first topics, and determining an association between the keywords of each of the first topics; and adding the keywords of each of the first topics and the associations thereof to the topic stream, so as to generate topic evolution context graphs corresponding to the plurality of text material. Further provided in the present application is an application server and a computer readable storage medium. The presentation method for the visualization of topic evolution, the application server and the computer readable storage medium provided in the present application may visually display a topic evolution process of an event, such that users may quickly understand and analyze the evolution process of the whole event.

Description

Visualization method for topic evolution, application server and computer readable storage medium

This application claims priority to Chinese Patent Application No. 201810031859.7, entitled "Visualization of Topic Evolution, Application Server and Computer Readable Storage Media", filed on January 12, 2018, all of which are entitled The content is incorporated herein by reference.

Technical field

The present application relates to the field of image processing technologies, and in particular, to a visual presentation method for an item evolution, an application server, and a computer readable storage medium.

Background technique

In the era of information explosion, people can read and download all kinds of news reports about a news topic for free from the Internet. Due to the large number of related news articles on a news topic (especially hot news topics) on the Internet, it is difficult to efficiently and time-savingly understand the development trend and evolution of target news topics from many related news reports. Understanding the evolution of some of the topics on social media has important implications for investors/managers. When the investor/manager understands the meaning of the topic, he or she can make appropriate judgments and take further action accordingly. However, it is difficult to analyze the topic's evolution in time. It is impossible to quickly detect and distinguish the major events, evolutions, etc. contained in each topic and topic, and at the same time, the generation, termination, splitting and merging of topics. There is also no effective identification mechanism.

Summary of the invention

In view of this, the present application proposes a visual presentation method for an item evolution, an application server, and a computer readable storage medium, which can visually display an event evolution process of an event, so that the user can quickly understand and analyze the evolution of the entire event. process.

First, in order to achieve the above object, the present application provides an application server, where the application server includes a memory and a processor, where the memory stores a visual presentation system for topic evolution that can be run on the processor, the topic The evolved visual presentation system implements the following steps when executed by the processor:

Extracting topics related to multiple textual materials of the same event, and determining an association relationship between each of the topics to establish a theme stream;

Filtering a plurality of first topics including important events from a plurality of said topics;

Extracting keywords of each of the first topics, and determining associations of keywords of each of the first topics; and

Adding keywords of each of the first topics and their associated relationships to the topic stream to generate a topic evolution context map corresponding to the plurality of text materials.

In addition, in order to achieve the above object, the present application further provides a visual presentation method for topic evolution, which is applied to an application server, and the method includes:

Further, in order to achieve the above object, the present application further provides a computer readable storage medium storing a visual presentation system of topic evolution, the visual evolution system of the topic evolution being at least one processor Executing, to cause the at least one processor to perform the steps of the visual presentation method as evolved from the above topic.

Compared with the prior art, the visual presentation method, the application server, and the computer readable storage medium of the topic evolution proposed by the present application firstly extract the topics of multiple text materials related to the same event, and determine each of the topics. a relationship between the two to create a theme stream; secondly, selecting a plurality of first topics containing important events from the plurality of the topics; and further, extracting keywords of each of the first topics, and determining each An association relationship of keywords of the first topic; finally, adding keywords of each of the first topics and their associations to the topic stream to generate topic evolution corresponding to the plurality of text materials Context map. In this way, the topic can be mined for sequential social events, and the evolution trend of the event can be visualized through the theme flow over time, enabling users to have a better understanding of the evolution of the topic and the major events. Avoid topic drift caused by topic association, and help users to understand the deep meaning of the topic in depth and avoid misunderstanding or decision.

DRAWINGS

1 is a schematic diagram of an optional hardware architecture of an application server of the present application;

2 is a schematic diagram of a program module of a first embodiment of a visual presentation system in which the subject matter of the present application evolves;

3 is a schematic diagram of a program module of a second embodiment of a visual presentation system of the subject matter evolution of the present application;

4 is a schematic flowchart of an implementation process of a first embodiment of a visual display method for a topic evolution of the present application;

FIG. 5 is a schematic diagram of an implementation process of a second embodiment of a visual display method for the evolution of the topic of the present application.

Reference mark:

应用服务器application server	22
存储器Memory	1111
处理器processor	1212
网络接口Network Interface	1313
话题演变的可视化展现系统Visual representation system for topic evolution	100100

第一提取模块 First extraction module	101101
筛选模块 Screening module	102102
第二提取模块 Second extraction module	103103
生成模块 Build module	104104
标示模块 Marking module	105105

The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed ways

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.

Referring to FIG. 1, it is a schematic diagram of an optional hardware architecture of the application server 2 of the present application.

In this embodiment, the application server 2 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus. It is pointed out that Figure 1 only shows the application server 2 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.

The application server 2 may be a computing device such as a rack server, a blade server, a tower server, or a rack server. The application server 2 may be a stand-alone server or a server cluster composed of multiple servers.

The memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. In some embodiments, the memory 11 may be an internal storage unit of the application server 2, such as a hard disk or memory of the application server 2. In other embodiments, the memory 11 may also be an external storage device of the application server 2, such as a plug-in hard disk equipped on the application server 2, a smart memory card (SMC), and a secure digital number. (Secure Digital, SD) card, flash card, etc. Of course, the memory 11 can also include both the internal storage unit of the application server 2 and its external storage device. In this embodiment, the memory 11 is generally used to store an operating system installed in the application server 2 and various types of application software, such as program code of the visual presentation system 100 of the topic evolution. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.

The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the application server 2, such as performing control and processing related to data interaction or communication with the terminal device 1. In this embodiment, the processor 12 is configured to run program code or process data stored in the memory 11, such as a visual presentation system 100 that runs the topic evolution.

The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the application server 2 and other electronic devices.

So far, the hardware structure and functions of the devices related to this application have been described in detail. Hereinafter, various embodiments of the present application will be made based on the above description.

First, the present application proposes a visual presentation system 100 for topic evolution.

Referring to FIG. 2, it is a program module diagram of the first embodiment of the visual presentation system 100 in which the subject matter of the present application evolves.

In this embodiment, the visual representation system 100 of the topic evolution includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the topic evolution of the embodiments of the present application can be implemented. Visualization of the operation. In some embodiments, the visual evolution system 100 of topic evolution may be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 2, the visualization of the topic presentation system 100 can be divided into a first extraction module 101, a screening module 102, a second extraction module 103, and a generation module 104. among them:

The first extraction module 101 is configured to extract topics related to multiple text materials of the same event, and determine an association relationship between each of the topics to establish a theme stream.

In an embodiment, the text material may be online news text, and the first extraction module 101 may extract a plurality of news texts related to the same event by accessing the network. Specifically, a plurality of news texts related to the event may be searched for and extracted from the network by inputting a keyword of an event (for example, a place where the event occurs, a main character, an event, etc.), and then multiple news texts are extracted according to the event. To extract its theme. The first extraction module 101 may acquire elements such as a person, a place, an event, and the like of the current news text, and generate an event summary as a theme of the news text based on the elements.

In an embodiment, the first extraction module 101 is further configured to preprocess the extracted plurality of text materials. The pre-processing may include: segmenting, simplifying, replacing ambiguous words, removing stop words, low frequency words, numbers, punctuation marks, and the like.

In an embodiment, the first extraction module 101 may model each topic by a layered Dirichlet process, and record the ith text data that arrives at time t as

The cluster in which it is located is

If at two points in time,

Different cluster marks, ie

versus

Inconsistent, then you can think

The subject has changed so that two quantities can be calculated to derive the split and merge of the subject. The two quantities are the ratio from cluster s in cluster r from time t-1 to time t:

And the ratio of flow to cluster r from time t-1 to time t cluster s:

In an embodiment, the generation and termination of the subject matter can be detected by applying a hash table. In the hash table, each topic has a unique storage location corresponding to the hash table to detect the generation and end of the topic.

In an embodiment, the first extraction module 101 may sort the topics of each text material according to the posting time of each text material. The theme stream established by the first extraction module 101 represents the evolution of a plurality of topics over time, and the height of the topic stream may represent the number of documents belonging to the topic. The theme stream can also be divided into several branches, and several branches can also be combined into one topic.

The screening module 102 is configured to filter a plurality of first topics including important events from a plurality of the topics.

In an embodiment, the plurality of first topics are preferably subject matter that is split, merged. The splitting and merging of topics can be represented by scores. Specifically, an information entropy algorithm can be used to calculate the score. The scores for the merged topic can be calculated by the following formula:

Where R(r,t) is the ordering score of cluster r at time t, N _r is the number of elements flowing into cluster r, and the score of the subject with splitting can be calculated by the following formula:

Where R(s, t) is the ordering score of cluster s at time t, and N _s is the number of elements flowing into cluster r.

The screening module 102 may select, according to the calculated scores of each topic, a plurality of topics in the front row of the score sorting (the scores may be arranged from large to small) as the first topic including the important events. For example, the screening module 102 selects the top ten topics of the score ranking as the first topic.

The second extraction module 103 is configured to extract keywords of each of the first topics, and determine an association relationship of keywords of each of the first topics.

In an embodiment, the second extraction module 103 may extract a keyword of each of the first topics using a TF-IDF algorithm. The TF-IDF algorithm can be used to assess how important a word is in a subject text. The importance of a word increases proportionally with the number of times it appears in the text. When performing TF-IDF calculation, the TF-IDF value of a certain word is obtained by word frequency (TF) and inverse document frequency (IDF), and the TF-IDF value is higher if the word is more important to the subject text. The bigger. Therefore, the second extraction module 103 can rank the TF-IDF value in the first few words as the keyword of the subject text. For example, a word with the TF-IDF value ranked in the top five is used as the keyword of the first topic.

In an embodiment, the second extraction module 103 may determine an association relationship of keywords of each of the first topics by a layered Dirichlet process. The second extraction module 103 may further determine the association relationship of the keywords of each of the first topics in combination with each of the first topics at a node location of the topic stream.

The generating module 104 is configured to add a keyword of each of the first topics and an association relationship thereof to the topic stream to generate a topic evolution context map corresponding to the plurality of text materials.

In an embodiment, the generating module 104 may visualize keywords of each of the first topics and their associated relationships as word clouds overlapping on the topic stream. The topic evolution context map can be displayed by a display module (not shown).

Through the above-mentioned program modules 101-104, the visual presentation system 100 of the topic evolution proposed by the present application firstly extracts topics of a plurality of text materials related to the same event, and determines an association relationship between each of the topics to establish a topic stream; secondly, filtering a plurality of first topics including important events from a plurality of the topics; further, extracting keywords of each of the first topics, and determining each of the first topics Keyword associations; finally, keywords of each of the first topics and their associated relationships are added to the topic stream to generate a topic evolution context map corresponding to the plurality of text materials. In this way, the topic can be mined for sequential social events, and the evolution trend of the event can be visualized through the theme flow over time, enabling users to have a better understanding of the evolution of the topic and the major events. Avoid topic drift caused by topic association, and help users to understand the deep meaning of the topic in depth and avoid misunderstanding or decision.

Referring to FIG. 3, it is a program module diagram of a second embodiment of the visual presentation system 100 of the subject matter of the present application. In this embodiment, the visual representation system 100 of the topic evolution includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the topic evolution of the embodiments of the present application can be implemented. Visualization of the operation. In some embodiments, the visual evolution system 100 of topic evolution may be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 3, the visual evolution system 100 of the topic evolution may be divided into a first extraction module 101, a screening module 102, a second extraction module 103, a generation module 104, and a labeling module 105. The program modules 101-104 are the same as the first embodiment of the visual presentation system 100 in which the subject matter of the present application evolves, and the labeling module 105 is added thereto. among them:

In an embodiment, the text material may be online news text, and the first extraction module 101 may extract a plurality of news texts related to the same event by accessing the network. Specifically, a plurality of news texts related to the event may be searched for and extracted from the network by inputting a keyword of an event (for example, a place where the event occurs, a main character, an event, etc.), and then multiple news texts are extracted according to the event. To extract its theme. The first extraction module 101 may acquire elements such as characters, places, events, and the like of the current news text, and generate an event summary as the topic of the news text based on the elements.

The cluster in which it is located is

If at two points in time,

Different cluster marks, ie

versus

Inconsistent, then you can think

And the ratio of flow to cluster r from time t-1 to time t cluster s:

The labeling module 105 is configured to identify, generate, split, merge, and end node locations in the topic stream for each of the topics, and apply the node locations of each of the topics generated, split, merged, and ended. Different marker symbols are marked. For example, a solid circle is used to represent the generation of the theme, an open circle is used to represent the end of the theme, and a three-pronged mark using different angles represents the splitting and merging of the theme, respectively.

In an embodiment, the labeling module 105 can use a hash table and a layered Dirichlet process to identify, generate, split, merge, and end the position of each node in the topic stream. The position of the nodes that generate, split, merge, and end each of the topics is marked with different preset markers. For the subject of splitting and merging, the indicator module 105 can also be marked with a color similar to the original theme.

The screening module 102 may select, according to the calculated scores of each topic, a plurality of topics in the front row of the score sorting (the scores may be arranged from large to small) as the first topic including the important events. For example, the screening module 102 selects the top ten topics of the score ranking as the first topic. The first subject matter may also be labeled with a particular color or indicia on the subject stream.

In an embodiment, the second extraction module 103 may extract a keyword of each of the first topics using a TF-IDF algorithm. The TF-IDF algorithm can be used to assess how important a word is in a subject text. The importance of a word increases proportionally with the number of times it appears in the text. When performing TF-IDF calculation, the TF-IDF value of a certain word is obtained by word frequency (TF) and inverse document frequency (IDF), and the TF-IDF value is higher if the word is more important to the subject text. The bigger. Therefore, the second extraction module 103 can classify the first few words of the TF-IDF value as keywords of the topic text. For example, a word with the TF-IDF value ranked in the top five is used as the keyword of the first topic.

In an embodiment, the generating module 104 may visualize keywords of each of the first topics and their associated relationships as word clouds overlapping on the topic stream. The topic evolution context map can be displayed by a display module (eg, a projection screen, a display, etc.).

Through the above program modules 101-105, the visual presentation system 100 of the topic evolution proposed by the present application firstly extracts topics of a plurality of text materials related to the same event, and determines an association relationship between each of the topics to establish a topic stream; secondly, identifying, generating, splitting, merging, ending the node locations in the topic stream for each of the topics, and applying different node locations for each of the topics generated, split, merged, and ended Marking symbols are marked; further, selecting a plurality of first topics including important events from the plurality of the topics; further, extracting keywords of each of the first topics, and determining each of the The association relationship of the keywords of a topic; finally, the keywords of each of the first topics and their associated relationships are added to the topic stream to generate a topic evolution context map corresponding to the plurality of text materials. In this way, the topic can be mined for sequential social events, and the evolution trend of the event can be visualized through the theme flow over time, enabling users to have a better understanding of the evolution of the topic and the major events. Avoid topic drift caused by topic association, and help users to understand the deep meaning of the topic in depth and avoid misunderstanding or decision.

In addition, the present application also proposes a visual display method for topic evolution.

Referring to FIG. 4, it is a schematic flowchart of the implementation of the first embodiment of the visual display method for the evolution of the topic of the present application. In this embodiment, the order of execution of the steps in the flowchart shown in FIG. 4 may be changed according to different requirements, and some steps may be omitted.

Step S500, extracting topics of a plurality of text materials related to the same event, and determining an association relationship between each of the topics to establish a theme stream.

In an embodiment, the text material may be online news text, and multiple news texts related to the same event may be extracted through the access network. Specifically, a plurality of news texts related to the event may be searched for and extracted from the network by inputting a keyword of an event (for example, a place where the event occurs, a main character, an event, etc.), and then multiple news texts are extracted according to the event. To extract its theme.

In an embodiment, an event summary may be generated as a subject of the news text by acquiring elements such as a person, a place, an event, and the like of the current news text.

In an embodiment, the extracted plurality of text materials may be pre-processed prior to extracting the text material theme. The pre-processing may include: segmenting, simplifying, replacing ambiguous words, removing stop words, low frequency words, numbers, punctuation marks, and the like.

In one embodiment, each topic can be modeled by a layered Dirichlet process, and the ith text data arriving at time t is recorded as

The cluster in which it is located is

If at two points in time,

Different cluster marks, ie

versus

Inconsistent, then you can think

And the ratio of flow to cluster r from time t-1 to time t cluster s:

In an embodiment, the topics of each text material may be ordered according to the posting time of each text material. The created topic stream can represent the evolution of multiple topics over time, and the height of the topic stream can represent the number of documents belonging to that topic. The theme stream can also be divided into several branches, and several branches can also be combined into one topic.

Step S502, selecting a plurality of first topics including important events from the plurality of the topics.

In an embodiment, a plurality of topics in the front row of the score sorting (the scores may be arranged from large to small) may be selected as the first topic including the important event according to the calculated score of each topic. For example, the topic of the top ten is sorted by the score as the first topic.

Step S504: Extract keywords of each of the first topics, and determine an association relationship of keywords of each of the first topics.

In an embodiment, a TF-IDF algorithm may be used to extract keywords for each of the first topics. The TF-IDF algorithm can be used to assess how important a word is in a subject text. The importance of a word increases proportionally with the number of times it appears in the text. When performing TF-IDF calculation, the TF-IDF value of a certain word is obtained by word frequency (TF) and inverse document frequency (IDF), and the TF-IDF value is higher if the word is more important to the subject text. The bigger. The first few words of the TF-IDF value can be used as keywords for the subject text. For example, a word with the TF-IDF value ranked in the top five is used as the keyword of the first topic.

In an embodiment, the association relationship of keywords of each of the first topics may also be determined by a layered Dirichlet process.

In an embodiment, the association relationship of the keywords of each of the first topics may be further determined by combining the node locations of the topic streams in each of the first topics.

Step S506, adding keywords of each of the first topics and their associated relationships to the topic stream to generate a topic evolution context map corresponding to the plurality of text materials.

In an embodiment, the keywords of each of the first topics and their associated relationships may be visualized as word clouds overlapping on the topic stream. The topic evolution map can be displayed through projection screens, displays, and other devices.

Through the above steps S500-S506, the visual presentation method of the topic evolution proposed by the present application firstly extracts the topics of multiple text materials related to the same event, and determines the association relationship between each of the topics to establish a theme. Flowing; secondly, filtering a plurality of first topics including important events from a plurality of the topics; further, extracting keywords of each of the first topics, and determining keywords of each of the first topics Finally, the keyword of each of the first topics and its associated relationship are added to the topic stream to generate a topic evolution context map corresponding to the plurality of text materials. In this way, the topic can be mined for sequential social events, and the evolution trend of the event can be visualized through the theme flow over time, enabling users to have a better understanding of the evolution of the topic and the major events. Avoid topic drift caused by topic association, and help users to understand the deep meaning of the topic in depth and avoid misunderstanding or decision.

Referring to Figure 5, there is shown a flow chart of the implementation of the second embodiment of the visual presentation method of the subject matter evolution. In this embodiment, the order of execution of the steps in the flowchart shown in FIG. 5 may be changed according to different requirements, and some steps may be omitted.

The cluster in which it is located is

If at two points in time,

Different cluster marks, ie

versus

Inconsistent, then you can think

And the ratio of flow to cluster r from time t-1 to time t cluster s:

Step S508, identifying, generating, splitting, merging, and ending the node positions in the topic stream for each of the topics, and applying different mark symbols to the node positions of each of the topics generated, split, merged, and ended. Mark it. For example, a solid circle is used to represent the generation of the theme, an open circle is used to represent the end of the theme, and a three-pronged mark using different angles represents the splitting and merging of the theme, respectively.

In an embodiment, the hash table and the hierarchical Dirichlet process may be used to identify the generation, splitting, merging, and ending of each of the topics in the topic stream, and thus each of the The position of the nodes that generate, split, merge, and end the theme is marked with different preset markers. For split and merged themes, you can also choose a color that is similar to the original theme.

In an embodiment, a plurality of topics in the front row of the score sorting (the scores may be arranged from large to small) may be selected as the first topic including the important event according to the calculated score of each topic. For example, the topic of the top ten is sorted by the score as the first topic. The first subject matter may also be labeled with a particular color or indicia on the subject stream.

Through the above steps S500-S508, the visual presentation method of the topic evolution proposed by the present application firstly extracts the topics of multiple text materials related to the same event, and determines the association relationship between each of the topics to establish a theme. Streaming; secondly, identifying, generating, splitting, merging, ending the node locations in the topic stream for each of the topics, and applying different markers to the node locations of each of the topics generated, split, merged, and ended Symbols are marked; further, a plurality of first topics including important events are filtered from a plurality of the topics; and further, keywords of each of the first topics are extracted, and each of the first topics is determined The association of the keywords; finally, adding the keywords of each of the first topics and their associated relationships to the topic stream to generate a topic evolution context map corresponding to the plurality of text materials. In this way, the topic can be mined for sequential social events, and the evolution trend of the event can be visualized through the theme flow over time, enabling users to have a better understanding of the evolution of the topic and the major events. Avoid topic drift caused by topic association, and help users to understand the deep meaning of the topic in depth and avoid misunderstanding or decision.

The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims

A method for visualizing the evolution of a topic, applied to an application server, characterized in that the method comprises:

Extracting topics related to multiple textual materials of the same event, and determining an association relationship between each of the topics to establish a theme stream;

Filtering a plurality of first topics including important events from a plurality of said topics;

Extracting keywords of each of the first topics, and determining associations of keywords of each of the first topics; and

Adding keywords of each of the first topics and their associated relationships to the topic stream to generate a topic evolution context map corresponding to the plurality of text materials.
The visual presentation method of claim 1, wherein the visual presentation method further comprises:

Pre-processing the plurality of text materials, the pre-processing comprising: segmenting, simplifying, replacing ambiguous words, removing stop words, low frequency words, numbers, and punctuation marks.
The visual presentation method according to claim 1, wherein the step of establishing the theme stream further comprises:

Identifying, generating, splitting, merging, and ending the node locations in the topic stream for each of the topics; and

The position of the nodes that generate, split, merge, and end each of the topics is marked with different notation symbols.
The visual presentation method according to any one of claims 1-3, wherein the step of determining an association relationship between each of the topics to establish a topic stream comprises:

Determining an association relationship between each of the topics by a layered Dirichlet process to establish the topic stream;

The hierarchical Dirichlet process includes calculating a ratio from the cluster t to the cluster s from the time t-1 to the time t, and the ratio of the cluster s to the cluster r from the time t-1 to the time t, Determine the relationship between each of the topics, and record the ith data coming from time t as
The cluster in which it is located is
The ratio of the self-cluster s in the cluster r is calculated by the following formula:

The ratio of the clusters s flowing to the cluster r is calculated by the following formula:
The visual presentation method according to any one of claims 1-3, wherein the step of filtering out a plurality of first topics including important events from the plurality of the topics comprises:

Using an information entropy algorithm to calculate a score for each of the topics; and

Extracting a plurality of the first topics including important events from a plurality of the topics according to the calculated score size;

Wherein, the calculation formula of the information entropy algorithm is:

R(r,t) is the ordering score of cluster r at time t, and N r is the number of elements flowing into cluster r.
The visual presentation method according to claim 4, wherein the step of extracting keywords of each of the first topics and determining an association relationship of keywords of each of the first topics comprises:

Extracting keywords of each of the first topics by using a TF-IDF algorithm; and

The association relationship of the keywords of each of the first topics is determined by a layered Dirichlet process.
The visual presentation method according to claim 5, wherein the step of extracting keywords of each of the first topics and determining an association relationship of keywords of each of the first topics comprises:

Extracting keywords of each of the first topics by using a TF-IDF algorithm; and

The association relationship of the keywords of each of the first topics is determined by a layered Dirichlet process.
An application server, comprising: a memory, a processor, wherein the memory stores a visual presentation system that can evolve on a topic running on the processor, and the visual presentation system of the topic evolution is The processor implements the following steps when executed:

Extracting topics related to multiple textual materials of the same event, and determining an association relationship between each of the topics to establish a theme stream;

Filtering a plurality of first topics including important events from a plurality of said topics;

Extracting keywords of each of the first topics, and determining associations of keywords of each of the first topics; and

Adding keywords of each of the first topics and their associated relationships to the topic stream to generate a topic evolution context map corresponding to the plurality of text materials.
The application server according to claim 8, wherein when the visual presentation system of the topic evolution is executed by the processor, the following steps are further implemented:

Pre-processing the plurality of text materials, the pre-processing comprising: segmenting, simplifying, replacing ambiguous words, removing stop words, low frequency words, numbers, and punctuation marks.
The application server according to claim 8, wherein the step of establishing the theme stream further comprises:

Identifying, generating, splitting, merging, and ending the node locations in the topic stream for each of the topics; and

The position of the nodes that generate, split, merge, and end each of the topics is marked with different notation symbols.
The application server according to any one of claims 8 to 10, wherein the step of determining an association relationship between each of the topics to establish a topic stream comprises:

Determining an association relationship between each of the topics by a layered Dirichlet process to establish the topic stream;

The hierarchical Dirichlet process includes calculating a ratio from the cluster t to the cluster s from the time t-1 to the time t, and the ratio of the cluster s to the cluster r from the time t-1 to the time t, Determine the relationship between each of the topics, and record the ith data coming from time t as
The cluster in which it is located is
The ratio of the self-cluster s in the cluster r is calculated by the following formula:

The ratio of the clusters s flowing to the cluster r is calculated by the following formula:
The application server according to any one of claims 8 to 10, wherein the step of filtering out a plurality of first topics including important events from the plurality of the topics comprises:

Using an information entropy algorithm to calculate a score for each of the topics; and

Extracting a plurality of the first topics including important events from a plurality of the topics according to the calculated score size;

Wherein, the calculation formula of the information entropy algorithm is:

R(r,t) is the ordering score of cluster r at time t, and N r is the number of elements flowing into cluster r.
The application server according to claim 11, wherein the step of extracting keywords of each of the first topics and determining an association relationship of keywords of each of the first topics comprises:

Extracting keywords of each of the first topics by using a TF-IDF algorithm; and

The association relationship of the keywords of each of the first topics is determined by a layered Dirichlet process.
The application server according to claim 12, wherein the step of extracting keywords of each of the first topics and determining an association relationship of keywords of each of the first topics comprises:

Extracting keywords of each of the first topics by using a TF-IDF algorithm; and

The association relationship of the keywords of each of the first topics is determined by a layered Dirichlet process.
A computer readable storage medium storing a visual presentation system of topic evolution, the visual evolution system of the topic evolution being executable by at least one processor to cause the at least one processor to perform the following step:

Extracting topics related to multiple textual materials of the same event, and determining an association relationship between each of the topics to establish a theme stream;

Filtering a plurality of first topics including important events from a plurality of said topics;

Extracting keywords of each of the first topics, and determining associations of keywords of each of the first topics; and

Adding keywords of each of the first topics and their associated relationships to the topic stream to generate a topic evolution context map corresponding to the plurality of text materials.
The computer readable storage medium of claim 15, wherein when the visual presentation system of the topic evolution is executed by the processor, the following steps are further implemented:

Pre-processing the plurality of text materials, the pre-processing comprising: segmenting, simplifying, replacing ambiguous words, removing stop words, low frequency words, numbers, and punctuation marks.
The computer readable storage medium according to claim 15, wherein the step of establishing the theme stream further comprises:

Identifying, generating, splitting, merging, and ending the node locations in the topic stream for each of the topics; and

The position of the nodes that generate, split, merge, and end each of the topics is marked with different notation symbols.
The computer readable storage medium according to any one of claims 15 to 17, wherein the step of determining an association relationship between each of the topics to establish a topic stream comprises:

Determining an association relationship between each of the topics by a layered Dirichlet process to establish the topic stream;

The hierarchical Dirichlet process includes calculating a ratio from the cluster t to the cluster s from the time t-1 to the time t, and the ratio of the cluster s to the cluster r from the time t-1 to the time t, Determine the relationship between each of the topics, and record the ith data coming from time t as
The cluster in which it is located is
The ratio of the self-cluster s in the cluster r is calculated by the following formula:

The ratio of the clusters s flowing to the cluster r is calculated by the following formula:
The computer readable storage medium according to any one of claims 15-17, wherein the step of filtering a plurality of first topics including important events from the plurality of the topics comprises:

Using an information entropy algorithm to calculate a score for each of the topics; and

Extracting a plurality of the first topics including important events from a plurality of the topics according to the calculated score size;

Wherein, the calculation formula of the information entropy algorithm is:

R(r,t) is the ordering score of cluster r at time t, and N r is the number of elements flowing into cluster r.
The computer readable storage medium according to claim 19, wherein the step of extracting keywords of each of the first topics and determining an association relationship of keywords of each of the first topics comprises:

Extracting keywords of each of the first topics by using a TF-IDF algorithm; and

The association relationship of the keywords of each of the first topics is determined by a layered Dirichlet process.