CN117473078A - Visual reading system of long literature based on cross-domain named entity recognition - Google Patents

Visual reading system of long literature based on cross-domain named entity recognition Download PDF

Info

Publication number
CN117473078A
CN117473078A CN202311298279.1A CN202311298279A CN117473078A CN 117473078 A CN117473078 A CN 117473078A CN 202311298279 A CN202311298279 A CN 202311298279A CN 117473078 A CN117473078 A CN 117473078A
Authority
CN
China
Prior art keywords
entity
character
literature
visualization
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311298279.1A
Other languages
Chinese (zh)
Inventor
刘颖凡
曹云昀
耿岱琳
梅文娟
程涛
高明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202311298279.1A priority Critical patent/CN117473078A/en
Publication of CN117473078A publication Critical patent/CN117473078A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a visual reading system of long literature based on cross-domain named entity recognition, which comprises the following components: the data acquisition module acquires a source text of the literary work through a web crawler program; the literature entity recognition module is used for recognizing character entities, place entities and family entities in the source text by using an open-source named entity recognition model to generate a coarse-granularity entity data set; the literature entity optimization module is used for training a cross-domain named entity recognition model based on parameter migration based on a coarse-granularity entity data set, and further recognizing and generating fine-granularity character entity, place entity and family entity data sets through a rule weight network optimization model result; the visual analysis and display module for the literature entity utilizes an open source visual tool to carry out visual analysis and display on a fine-grained literature entity data set, and the visual analysis and display comprises four units of a figure relation network, a figure moving track, a figure emotion change and a figure attendance frequency.

Description

Visual reading system of long literature based on cross-domain named entity recognition
Technical Field
The invention relates to the technical field of natural language processing and data visualization, in particular to a long literature visual reading system based on cross-domain named entity recognition.
Background
Since the internet age of "information explosion," the contradiction between the vast and sophisticated information and limited time-driven incentives has led to an increasing pursuit of fragmented information with a decreasing focus on traditional literature, especially long literature. The long literature works usually have the characteristics of long length, complex character relationship, and zigzag plot, and are often obscured and understandable to read, so that people are difficult to adhere to. Under the rapid development of information science, the literature field starts to explore the application value and development potential of text visual analysis technology gradually. Text processing analysis can well combine natural language processing techniques, while visualization can clearly and effectively convey and communicate information in a graphical manner. The combination of text analysis and visualization in the literature field gives full play to the core advantages of the two technical fields, and the content and the characteristics of the literature works can be understood deeply and rapidly through interaction with readers.
Currently, there is a visualization method for literary works, with publication number CN116151255a. The patent discloses a text analysis and visualization method and system, but the method only considers the visualization of the frequency of person appearing, the form is relatively single, and the requirements of readers cannot be met.
Disclosure of Invention
The invention aims to provide a visual reading system for long literature based on cross-domain named entity recognition, which takes a novel text as a research object, performs character relation analysis, character track analysis, emotion change analysis, field frequency analysis and the like on an original text, generalizes law facts and emotion factors therein, and shows the law facts and emotion factors in a more efficient and visual mode, so that a user can conveniently read the novel text and understand characters, topics and text emotion more clearly. The invention constructs a visual platform which is focused on intelligent analysis and interactivity of long literary works, thereby optimizing the traditional reading mode, increasing the reading interest, improving the overall understanding ability of readers to the literary works, leading lengthy and complex stories to become clear and easy to understand and arousing attention of people to the literary works.
The specific technical scheme for realizing the aim of the invention is as follows:
a visual reading system for long literature based on cross-domain named entity recognition, comprising:
the data acquisition module acquires a source text of the literary work from the electronic resource by using a Python web crawler program;
the literature entity recognition module is used for recognizing character entities, place entities and family entities in the source text by using an open-source named entity recognition model to generate a coarse-granularity entity data set;
the literature entity optimizing module is used for training a cross-domain named entity identifying model based on parameter migration based on a coarse granularity entity data set, setting a rule weight network by introducing context constraint rules of specific entity types, optimizing model results, further identifying literature work source texts, and generating fine granularity character entity, place entity and family entity data sets;
the visual analysis and display module for the literature entity utilizes an open source visual tool to carry out visual analysis and display on a fine-grained literature entity data set, and comprises four units of figure relation network visualization, figure movement track visualization, figure emotion change visualization and figure emergence frequency visualization.
Preferably, in a chinese literature entity optimization module of a visual reading system based on cross-domain named entity recognition, the context constraint rule of a specific entity type includes: rules for identifying persona entities, rules for identifying place entities, and rules for identifying family entities.
Preferably, the visual analysis and display module facing the literature entity in the visual reading system of the literature based on the cross-domain naming entity identification comprises:
the figure relation network visualization unit is used for displaying a complex figure relation network in the literary works and evaluating the relation strength degree between two roles by counting the simultaneous occurrence times of different roles in the same sentence; and using the open source visualization tool D3.Js, taking the names of the roles as nodes and the relationship strength values among the roles as the weights of the edges, and presenting a complex person relationship network in a visual mode.
Preferably, the visual analysis and display module facing the literature entity in the visual reading system of the literature based on the cross-domain naming entity identification further comprises:
the character movement track visualization unit is used for displaying movement tracks and important events of characters in literary works, extracting place entities in character sections in the articles according to the character names and the appearance sequence, and constructing a character track data set; extracting important events by a rule-based matching method; drawing a line drawing of the moving track of the character by using an open source visualization tool ECharts; the module also integrates interaction functions of storylines and places where different chapters occur, and provides reading navigation tools to assist readers in understanding and tracking complex character relationships and story development;
preferably, the visual analysis and display module facing the literature entity in the visual reading system of the literature based on the cross-domain naming entity identification further comprises:
the character emotion change visualization unit is used for showing the change trend of character emotion along with the development of the scenario in the literary works, the module utilizes an open source tool NLTK to extract sentences describing characters, carries out emotion analysis on the sentences, and calculates the score of each character in different emotion dimensions; then, using open source visualization tool ECharts to present the variation condition of the emotion of the character in different chapters in a line graph;
preferably, the visual analysis and display module facing the literature entity in the visual reading system of the literature based on the cross-domain naming entity identification further comprises:
the character appearance frequency visualization unit is used for showing the change trend of the character appearance frequency along with the development of chapters in literary works, and a data set of the character appearance frequency is constructed by counting the occurrence times of different roles in different chapters; and drawing a figure out field frequency change line graph by using an open source visualization tool ECharts, wherein a user can check the number of times that a certain character appears in the whole novel by dragging a time axis so as to help understand the importance degree change trend of the figure in the plot development.
Compared with the prior art, the invention has at least the following advantages or beneficial effects:
(1) Visual presentation of novel text: the invention constructs a perfect visual system for the literacy by processing and analyzing the text. With the visualization method, including charts, graphs and other visual elements, readers can intuitively understand and perceive the episodes, character relationships and other important elements of novels in a completely new form. The visual mode provides a brand new reading experience, so that literary works are more vivid, easy to understand and appreciate.
(2) Automated entity extraction: in the aspect of the extraction of fictitious characters and place entities of the novel, the invention adopts an automatic method. Through a cross-domain named entity recognition technology, entities and relations in novels can be efficiently and accurately extracted. Compared with the traditional manual extraction mode, the automatic extraction method greatly reduces the manual workload and improves the efficiency.
(3) Lowering the reading threshold: the invention enables readers to learn about the content of novels in a faster way, thereby lowering the threshold of reading. By processing and analyzing the novel text, key information can be extracted, and a brief summary or abstract is constructed. The reader does not need to fully read the entire book to understand its main content. The method not only saves reading time, but also increases the interest and interactivity of reading.
Drawings
FIG. 1 is a schematic diagram of the structure of the present invention;
FIG. 2 is a flow chart of the literature entity optimization module of the present invention;
FIG. 3 is a diagram of a relationship network of people provided by an embodiment of the present invention;
FIG. 4 is a diagram showing a change in the movement track of a person according to an embodiment of the present invention;
FIG. 5 is a graph of a person's emotion change provided by an embodiment of the present invention;
fig. 6 is a graph showing the variation of the frequency of the person's appearance according to the embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and drawings. The procedures, conditions, experimental methods, etc. for carrying out the present invention are common knowledge and common knowledge in the art, except for the following specific references, and the present invention is not particularly limited.
Examples
FIG. 1 is a schematic diagram of a visual reading system for long literature based on cross-domain named entity recognition; as shown in fig. 1, the visual reading system for long literature based on cross-domain named entity recognition comprises:
the data acquisition module acquires a source text of the literary work from the electronic resource by using a Python web crawler program;
the literature entity recognition module is used for recognizing character entities, place entities and family entities in the source text by using an open-source named entity recognition model to generate a coarse-granularity entity data set;
the literature entity optimizing module is used for training a cross-domain named entity identifying model based on parameter migration based on a coarse granularity entity data set, setting a rule weight network by introducing context constraint rules of specific entity types, optimizing model results, further identifying literature work source texts, and generating fine granularity character entity, place entity and family entity data sets;
the visual analysis and display module for the literature entity utilizes an open source visual tool to carry out visual analysis and display on a fine-grained literature entity data set, and comprises four units of figure relation network visualization, figure movement track visualization, figure emotion change visualization and figure emergence frequency visualization.
In this embodiment, the literary works are selected as English novel from Song of ice and fire, and the literary entities include character names, place names and family names in the literary works.
In this embodiment, the literature entity recognition module uses an open source crawler tool to capture the novel text of ice and fire, and uses an open source natural language processing tool library SpaCy to perform entity recognition processing, so as to construct a novel text entity data set named GOT, wherein the novel text entity data set includes entities such as person names, place names, family names and the like;
in this embodiment, the implementation steps of the literature entity optimization module (as shown in fig. 2) are as follows:
step 1: using the CONLL03 dataset as the source dataset, multiple language models are pre-trained on the dataset, including BERT, biLSTM, biLSTM +crf based models, etc., that will get a shared semantic representation by learning named entity features in the source dataset.
Step 2: and taking the GOT as a target data set, and performing fine tuning on the pre-trained model. In the fine tuning process, the GOT data set is used for performing supervised training on the model, so that the model is better suitable for entity identification tasks in the literature field.
Step 3: the method comprises the steps of introducing context constraint rules of specific entity types, wherein the specific rules comprise: rules for identifying persona entities, including matching rules expressed as if an entity appears before a verb as a subject or object, and mention rules, it is likely to be a persona entity; the mention rule is expressed as if an entity is mentioned together with a known persona entity in a sentence, it is likely to be a persona entity. Rules for identifying a place entity, including matching rules and description rules, the matching rules being expressed as if an entity occurs after a place preposition (e.g., on, at), then it is likely to be a place entity; the descriptive rule is expressed as if an entity is mentioned together with a known locality entity in a sentence, it is likely to be a locality entity. Rules for identifying a family entity, including descriptive rules, are expressed as if an entity is mentioned together with a known family entity in a sentence, then it is likely to be a family entity.
Step 4: designing a rule weight network, taking the fully-connected neural network as the weight network, taking the characteristic representation of the rule as input, outputting the weight of the rule, adding a weight coefficient in the fine tuning process, and optimizing a model result.
Step 5: by testing a series of model migration methods, the best performance of the BiLSTM+CRF based pre-training model was found, and therefore the model was chosen as the final pre-training model.
Step 6: and (3) re-identifying the entities in the text of 'Bing and Huo Geng' by using the cross-domain named entity identification model based on BiLSTM+CRF optimized in the steps 1 to 4 to obtain a fine-grained entity data set for the subsequent visual analysis and display module.
In this embodiment, the visual analysis and display module for literature entities includes:
the figure relation network visualization unit is used for displaying a complex figure relation network in the literary works and evaluating the relation strength degree between two roles by counting the simultaneous occurrence times of different roles in the same sentence; and using the open source visualization tool D3.Js, taking the names of the roles as nodes and the relationship strength values among the roles as the weights of the edges, and presenting a complex person relationship network in a visual mode.
In a specific example, the person relationship network visualization unit is implemented by:
step 1: and constructing the character relationship. According to the fine-grained literature entity data set, the interaction times between people are calculated, for example, in a novel, the people and the entities are simultaneously present in the same sentence, and then the existence of one interaction between the two people is judged.
Step 2: character relationship data sets are formed. And (3) arranging the person interaction relation data obtained through statistics into a database table structure suitable for visual display so as to be read and presented during the visual display.
Step 3: and (5) visual display. An interactive character relationship network diagram is constructed by using an open source visualization tool library d3.Js, as shown in fig. 3, in which character nodes represent character entities and basic information of characters including names, nicknames, family information, character profiles, etc. are integrated. The relationship among the people is represented by drawing the connection lines among the people nodes, and the connection lines are added with labels so as to enrich the content of the people relationship network.
Step 4: interaction function. In the visual display process, a user can check personal information of a person, including names, nicknames, family information, person profiles and the like, by clicking on nodes in the network diagram; meanwhile, the user can click on the edges between the nodes to know the relation strength between the person and other people, so as to understand the complexity of the person relation.
In this embodiment, the visual analysis module for literature entities further includes:
the character movement track visualization unit is used for displaying movement tracks and important events of characters in literary works, extracting place entities in character sections in the articles according to the character names and the appearance sequence, and constructing a character track data set; extracting important events by a rule-based matching method; drawing a line drawing of the moving track of the character by using an open source visualization tool ECharts; the module also integrates interaction functions of storylines and places where different chapters occur, and provides reading navigation tools to assist readers in understanding and tracking complex character relationships and story development;
in a specific example, the implementation steps of the character movement track visualization unit are as follows:
step 1: character track data is extracted. Aiming at literary works of 'ice and fire songs', a name entity recognition technology is used for extracting a person name from chapter names, and comparing the person name with place names on ice and fire Wikipedia, and extracting place names where people appear in relevant chapter texts;
step 2: searching a story map of ice and fire songs, and manually marking each place on the map, including coordinate information of each place.
Step 3: and (3) arranging character, place and family information obtained by using a named entity recognition technology into a JSON data format, and inputting the JSON data format as a data source for visualizing the character track.
Step 4: the interactive line graph and scatter plot (as shown in fig. 4) are implemented using the open source visualization tool library echartis, showing the change in character trajectories. Each character will have its own trajectory, the intersection of the character trajectories representing that there is some event link between the current characters.
Step 5: the track visualization page also integrates the interactive function of chapters and places, so that readers can quickly find places and events, and chapter positioning is realized. The function is used as a navigation tool and has guidance on the reading process of readers.
In this embodiment, the visual analysis and display module for literature entities further includes:
the character emotion change visualization unit is used for displaying the change trend of character emotion along with the development of the drama in the literary works, extracting sentences describing the characters by using an open source tool NLTK, carrying out emotion analysis on the sentences, and calculating the score of each character in different emotion dimensions; then, using open source visualization tool ECharts to present the variation condition of the emotion of the character in different chapters in a line graph;
in a specific example, the implementation steps of the figure emotion change visualization unit are as follows:
step 1: and extracting sentences describing the characters from the text by using an open source tool NLTK, and analyzing emotion. An emotion dictionary NRC issued by the national research council of canada is used, which contains emotion categories of happiness, fear, sadness, anger, surprise, aversion, trust, and desire. And calculating the score of each character in different emotion dimensions by judging emotion categories in sentences.
Step 2: and (3) sorting the character emotion analysis results, and creating an emotion data set comprising character names and corresponding emotion scores.
Step 3: the ECharts is used for realizing an interactive line graph (shown in fig. 5), and the emotion score and the change condition of the character are intuitively displayed. Each character has an independent emotion change broken line chart, and the size of a specific emotion score can be reflected through the height of the broken line chart.
Step 4: according to the user's needs, the system provides a sliding time axis, and the user can select different time ranges to observe the influence of plot development on the emotion change of the character.
In this embodiment, the visual analysis and display module for literature entities further includes:
the character appearance frequency visualization unit is used for showing the change trend of the character appearance frequency along with the development of chapters in literary works, and a data set of the character appearance frequency is constructed by counting the occurrence times of different roles in different chapters; and drawing a figure out field frequency change line graph by using an open source visualization tool ECharts, wherein a user can check the number of times that a certain character appears in the whole novel by dragging a time axis so as to help understand the importance degree change trend of the figure in the plot development.
In a specific example, the implementation steps of the person attendance frequency analysis unit are as follows:
step 1: and counting the occurrence times of all the people in each section according to the section division of the novel, and constructing a data set of the people's attendance frequency.
Step 2: drawing a line graph of the character appearance frequency by adopting ECharts, wherein the horizontal axis in the graph represents chapters, and the vertical axis represents the character appearance frequency as shown in fig. 6; the user can drag the time axis to check the frequency of the occurrence of the specific role along with the development of the plot in the whole novel so as to help readers understand the change trend of the importance degree of the character in the development of the plot.

Claims (3)

1. A visual reading system for long literature based on cross-domain named entity recognition, comprising:
the data acquisition module acquires a source text of the literary work from the electronic resource by using a Python web crawler program;
the literature entity recognition module is used for recognizing character entities, place entities and family entities in the source text by using an open-source named entity recognition model to generate a coarse-granularity entity data set;
the literature entity optimizing module is used for training a cross-domain named entity identifying model based on parameter migration based on a coarse-granularity entity data set, optimizing a model result by introducing a context constraint rule design rule weight network of a specific entity type, further identifying literature work source texts and generating a fine-granularity character entity, place entity and family entity data set;
the visual analysis and display module for the literature entity utilizes an open source visual tool to carry out visual analysis and display on a fine-grained literature entity data set, and comprises four units of figure relation network visualization, figure movement track visualization, figure emotion change visualization and figure emergence frequency visualization.
2. The visual reading system of claim 1, wherein the context constraint rules for a particular entity type in the literature entity optimization module include rules for identifying persona entities, rules for identifying place entities, and rules for identifying family entities.
3. The visual reading system of literature based on cross-domain named entity recognition of claim 1, wherein the visual analysis and display module for literature entities comprises:
the figure relation network visualization is used for showing relation networks among different figures in literary works, and evaluating the relation strength degree between two roles by counting the simultaneous occurrence times of different roles in the same sentence; using open source visualization tool D3.Js, using character names as nodes and relationship strength values between characters as weights of edges, and displaying a character relationship network in a visual mode;
the character movement track visualization is used for displaying movement tracks and important events of characters in literary works, and extracting place entities in character sections in the articles according to the character names and the appearance sequence to construct a character track data set; drawing a line drawing of the moving track of the character by using an open source visualization tool ECharts; extracting important events in literary works by a rule-based matching method, and acquiring abstracts, places and chapter information of the important events; integrating the abstract, the place and the chapter of the important event into a figure moving track graph, and providing a reading navigation tool to assist readers in understanding and tracking the figure track and the plot development;
the character emotion change visualization is used for displaying the change trend of character emotion along with the development of the drama in literary works, extracting sentences describing the characters by using an open source tool NLTK, and carrying out emotion analysis on the extracted sentences, namely calculating the score of each character on different emotion dimensions; then, using open source visualization tool ECharts to present the variation condition of the emotion of the character in different chapters in a line graph;
the character appearance frequency visualization is used for showing the change trend of the character appearance frequency along with the development of chapters in literary works, and a data set of the character appearance frequency is constructed by counting the occurrence times of different roles in different chapters; and drawing a character field frequency change line graph by using an open source visualization tool ECharts, and checking the number of times of a character in the whole novel by dragging a time axis by a user to help readers understand the change trend of the number of field times of the character in the plot development.
CN202311298279.1A 2023-10-09 2023-10-09 Visual reading system of long literature based on cross-domain named entity recognition Pending CN117473078A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311298279.1A CN117473078A (en) 2023-10-09 2023-10-09 Visual reading system of long literature based on cross-domain named entity recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311298279.1A CN117473078A (en) 2023-10-09 2023-10-09 Visual reading system of long literature based on cross-domain named entity recognition

Publications (1)

Publication Number Publication Date
CN117473078A true CN117473078A (en) 2024-01-30

Family

ID=89636954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311298279.1A Pending CN117473078A (en) 2023-10-09 2023-10-09 Visual reading system of long literature based on cross-domain named entity recognition

Country Status (1)

Country Link
CN (1) CN117473078A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118170919A (en) * 2024-05-13 2024-06-11 南昌理工学院 Method and system for classifying literary works

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118170919A (en) * 2024-05-13 2024-06-11 南昌理工学院 Method and system for classifying literary works

Similar Documents

Publication Publication Date Title
US11551567B2 (en) System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
CN109189942B (en) Construction method and device of patent data knowledge graph
Gu et al. " what parts of your apps are loved by users?"(T)
US20180366013A1 (en) System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
CN110674271B (en) Question and answer processing method and device
CN106951438A (en) A kind of event extraction system and method towards open field
CN105512687A (en) Emotion classification model training and textual emotion polarity analysis method and system
CN110612524B (en) Information processing apparatus, information processing method, and recording medium
CN111209384A (en) Question and answer data processing method and device based on artificial intelligence and electronic equipment
CN110825867B (en) Similar text recommendation method and device, electronic equipment and storage medium
US20160117954A1 (en) System and method for automated teaching of languages based on frequency of syntactic models
CN112765974B (en) Service assistance method, electronic equipment and readable storage medium
CN117473078A (en) Visual reading system of long literature based on cross-domain named entity recognition
CN115599899A (en) Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph
CN114661872A (en) Beginner-oriented API self-adaptive recommendation method and system
Da et al. Deep learning based dual encoder retrieval model for citation recommendation
KR20200064490A (en) Server and method for automatically generating profile
CN112667819A (en) Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device
Morie et al. Information extraction model to improve learning game metadata indexing
CN117216617A (en) Text classification model training method, device, computer equipment and storage medium
CN116257618A (en) Multi-source intelligent travel recommendation method based on fine granularity emotion analysis
CN114780755A (en) Playing data positioning method and device based on knowledge graph and electronic equipment
CN109325096A (en) A kind of knowledge resource search system of knowledge based resource classification
CN111949781B (en) Intelligent interaction method and device based on natural sentence syntactic analysis
CN106446198A (en) Recommending method and device of news based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination