CN106503180B - Non-semantic formula social networks abstracting method towards drama - Google Patents

Non-semantic formula social networks abstracting method towards drama Download PDF

Info

Publication number
CN106503180B
CN106503180B CN201610938958.4A CN201610938958A CN106503180B CN 106503180 B CN106503180 B CN 106503180B CN 201610938958 A CN201610938958 A CN 201610938958A CN 106503180 B CN106503180 B CN 106503180B
Authority
CN
China
Prior art keywords
role
stage
indicative
drama
preposition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610938958.4A
Other languages
Chinese (zh)
Other versions
CN106503180A (en
Inventor
李建敦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dianji University
Original Assignee
Shanghai Dianji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dianji University filed Critical Shanghai Dianji University
Priority to CN201610938958.4A priority Critical patent/CN106503180B/en
Publication of CN106503180A publication Critical patent/CN106503180A/en
Application granted granted Critical
Publication of CN106503180B publication Critical patent/CN106503180B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides a kind of non-semantic formula social networks abstracting method towards drama.The present invention abandons in small data set that effect is more preferable, the higher semantic category technology path of complexity, and under light weight, easy extension framework, having chosen has the stage being closely connected instruction with role's dialogue, and the direct correlation between role is found from side.In general, compared with more accurately natural language processing formula method, the actor first that this measure is extracted can be more sparse;But he is not necessarily to training set and test set without guidance, without arameter optimization and calibration, without high space-time cost, while being also equipped with excellent scalability, immediately available characteristic.

Description

Non-semantic formula social networks abstracting method towards drama
Technical field
The present invention relates to a kind of actor first for quickly extracting drama, (role is node, and relationship is connection, the i.e. void of drama Quasi- social networks) method, so that literary critic be facilitated online relatively and in real time understand to magnanimity electronics drama.
Background technique
Discovery, identification and the degree of relationship between role are concentrated mainly on from the challenge for extracting network in the formatted text of drama In amount.For at present, method that there are two main classes, i.e. (co-occurrence) based on co-occurrence and (line) based on dialogue.
The thought of co-occurrence class method is, if it find that two roles are same in some scene (or stage, paragraph, chapters and sections etc.) When occur, then will be just connected each other with nonoriented edge, weight (often indicating cohesion) can be them and cooperate with appearance Number is also possible to be separated by the inverse of text (or line number) between them.
Dialogue class method is built upon on the basis of co-occurrence class, and with the lines between role for direct investigation object.For There is the judgement of connection between role, specific method is more.There is research to think, two roles are with the appearance in certain scene and have speech i.e. It can;It is exactly relationship between them if another studies have pointed out that the sounding of two roles successively reproduces;More there is research to pass through platform Distance between word determines between two roles close and distant (inverse ratio);Still there is one index of research and utilization, i.e., refers to angle in role X lines Number whether color Y, the weight (zero represent unrelated) as X Y.It moreover is exactly to utilize natural language processing (Natural Language Processing, NLP) technology, carry out the specific aim of discrimination role's lines, and make the oriented company of accurate role It connects.
For co-occurrence class method, for the virtual social network accuracy extracted compared with Customers ' Legal Right, reason is such side Method not can guarantee following three points: 1) whether two roles have aspectant chance really, divide especially by line-spacing with threshold value Scene;2) between them exist whether necessary being certain directly contact, even if facing other side directly;3) role relation extracted It is not directed toward.
Since the main body of most dramas is part dialog, dialogue class method is more often used relatively, but still remains one The place needed to be improved a bit.For non-semantic formula, the direct connection that can not accurately excavate relatively between role is hard defects;For certainly Right Language Processing formula may be more accurate to part drama by training, but for different times, different-style, different subject matters Magnanimity script library for, mining effect will have a greatly reduced quality, while its complexity is also higher than other algorithms, be unfavorable for expanding Exhibition.
Summary of the invention
The actor first of drama is quickly extracted the purpose of the present invention is the parsing indicated by stage.
In order to achieve the above object, the non-semantic formula social network towards drama that the technical solution of the present invention is to provide a kind of Network abstracting method, which comprises the following steps:
Step 1, the role in drama, dialogue and stage instruction carry out typesetting again, and it is unified to form drama element format Normalization document, wherein about the indefinite pronoun that occurs in role and stage instruction or conjunction using dynamic stage come into Row parsing, to be substituted with specific role name:
For the no row moment, dynamic stage SnoIt is made of all roles before the lights, i.e. Sno=p | for For the no row moment, p is before the lights }, p indicates role, the dynamic change with the propulsion of plot of the role on stage or stage, By stage indicate in role upper end prompt information track;
Step 2 carries out complete edition scanning to the normalization document that step 1 obtains, and collects the different role occurred in drama;
Step 3 establishes directed connection between role;
Step 4, the weight for calculating each directed connection.
Preferably, if establishing directed connection X → Y of role X to role Y, the step 3 includes:
Step 3.1, certain section of dialogue for setting role XTwo conditions of directed connection X → Y can be facilitated, b, e indicate acute This b row and e row, for the start-stop row of this section of dialogue:
Condition one:And
Condition two: at least there is one of following two situation: situation one, dialogueBefore there are the instructions of forerunner's stage And instruction includes " the indicative preposition Y of X ";In the ranks there is stage instruction in situation two, drama b row and eAnd instruction includes " instruction Property preposition Y ";
Step 3.2 after setting higher priority for condition one, has been calculated using two conditions that step 3.1 is set To connection X → Y.
Preferably, in the step 4, for certain section of dialogue of role XIf role X having to role Y can be facilitated To connection X → Y, then the maximum value Max of its weight is e-b, actual weight is calculated separately according to following two situation:
Situation one: for " the indicative preposition Y of X ": actual weight is started counting from b row, until discovery first non-" instruction Property preposition Y " stage instruction until, if the stage instruction first trip be s row, at this time the actual weight of directed connection X → Y be s-b, If not finding non-" indicative preposition Y " stage instruction, the actual weight of directed connection X → Y is Max;
Situation two, for " indicative preposition Y ": actual weight since containing " indicative preposition Y " stage instruction count Number, until finding first non-" indicative preposition Y " stage instruction, the actual weight of directed connection X → Y is s '-at this time T, s ' are the first trip of first non-" indicative preposition Y " stage instruction, and t is the end of the stage instruction containing " indicative preposition Y " Row;If " indicative preposition Y " occurs repeatedly, calculating actual weight of the accumulated value as directed connection X → Y;If do not sent out Existing non-" indicative preposition Y " stage instruction, the actual weight of directed connection X → Y is e-t.
The present invention can be reinstated immediately in face of magnanimity dramatic verse, ignore between them heterogeneous characteristic (such as the epoch, subject matter, Author, school etc.) and accurate virtual social network expression is provided, it is bright spot.It can assist in literature appreciation man to comment with drama By family, preliminary understanding is carried out to magnanimity electronics drama in a short time and is compared with thick-thin-strip, especially point towards social networks Analysis, and be conducive to subsequent classification, judge, the even adaptive building of drama.
Detailed description of the invention
Fig. 1 is the actor first in the Hamlet generated according to the method provided by the invention.
Specific embodiment
In order to make the present invention more obvious and understandable, it is hereby described in detail below with preferred embodiment.
The present invention provides a kind of non-semantic formula social networks abstracting method towards drama, comprising the following steps:
Step 1, pretreatment: since the format differences of electronics drama are larger, it is necessary to carry out before network struction Pretreatment.Pretreated primary goal is to form the unified normalization document (such as XML file) of drama element format.
In general, drama is made of multiple element, and present invention is primarily concerned with roles (protagonist), dialogue (line) (directive) is indicated with stage, all role construction complete or collected works U.Specific to drama element format, can easily distinguish Under the premise of, carry out free setting.It is exemplified below: can be " individually embarking on journey, colon ending " with typesetting for role;For right White paragraph can be " individually at section, quotation marks are surrounded " with typesetting;And stage is indicated, it can be with typesetting for " square brackets encirclement ".
It is worth noting that, drama again after typesetting, indicates, in fact it could happen that indefinite pronoun is (such as about role's row and stage They, both, all) or conjunction (such as and, but), it is therefore desirable to they are resolved into specific role name.This demand is logical " dynamic stage " this concept is crossed to solve.
For the no row moment, dynamic stage SnoIt is made of all roles before the lights, i.e. Sno=p | for For the no row moment, p is before the lights }, p indicates role, the dynamic change with the propulsion of plot of the role on stage or stage, By stage indicate in role upper end prompt information track.
After having dynamic stage, the indirect vocabulary in role and stage instruction can be obtained by accurate parsing.For convenience Subsequent network abstraction needs all to be replaced these indirect vocabulary with role name.Later, if role's number in role's row More than a people, then needing to be arranged separation mark (such as comma), to facilitate efficiently generating for subsequent connection.
Step 2, role's discovery: after pretreatment, the character list in drama can scan role's row by complete edition To have collected.It is more easy to identify since role's row have passed through special normalization.
Particularly, sometimes before plot expansion (such as in drama setting), author can initiatively be listed role, and this One list can be used as the alternative of scheme and supplement.Certainly, directly it is not using this list as network node to full text scanning Dangerous, reason has two: 1) this list may not list all roles;2) the part role that list is enumerated may be not There is dialogue.Because the present invention is based role to network is originally constructed, the role of dialogue is not research pair of the invention As they will not appear in last virtual social network.
Step 3 establishes directed connection between role
If establishing directed connection X → Y of role X to role Y, step 3 includes:
Step 3.1, certain section of dialogue for setting role XTwo conditions of directed connection X → Y can be facilitated, b, e indicate acute This b row and e row, for the start-stop row of this section of dialogue:
Condition one:And
Condition two: at least there is one of following two situation: situation one, dialogueBefore there are the instructions of forerunner's stage And instruction includes " X to Y ";In the ranks there is stage instruction in situation two, drama b row and eAnd instruction includes " to Y ";
Step 3.2 after setting higher priority for condition one, has been calculated using two conditions that step 3.1 is set To connection X → Y.
Certain indicative preposition to therein can be by toward (s), address, tell, the near synonym such as order substitution.
Step 4, the weight for calculating each directed connection
In step 4, for certain section of dialogue of role XIf directed connection X → Y of role X to role Y can be facilitated, So maximum value Max of its weight is e-b, and actual weight is calculated separately according to following two situation:
Situation one: for " X to Y ": actual weight is started counting from b row, until finding first non-" to Y " stage Until instruction, if the first trip of stage instruction is s row, the actual weight of directed connection X → Y is s-b at this time, if do not found non- The instruction of " to Y " stage, then the actual weight of directed connection X → Y is Max;
Situation two, for " to Y ": actual weight from containing " to Y " stage instruction start counting, until discovery first Until a non-" to Y " stage instruction, the actual weight of directed connection X → Y is s '-t at this time, and s ' is first non-" to Y " dance The first trip of platform instruction, t are the footline of the stage instruction containing " to Y ";If " to Y " occurs repeatedly, accumulated value work is calculated For the actual weight of directed connection X → Y;If not finding non-" to Y " stage instruction, the actual weight of directed connection X → Y is e-t。
The final weight of directed connection X → Y is to complete the accumulated value after scanning to all dialogues of role X.
The schematic diagram of the actor first in Hamlet generated using the above method is as shown in Figure 1.
The present invention abandons in small data set that effect is more preferable, the higher semantic category technology path of complexity, in light weight, easily expands It opens up under frame, having chosen has the stage being closely connected instruction with role's dialogue, and the direct correlation between role is found from side.One As for, compared with more accurately natural language processing formula method, actor first that this measure is extracted can it is more sparse (it is possible that Omit some important informations);But he is not necessarily to training set and test set without guidance, without arameter optimization and calibration, without high Space-time cost, while be also equipped with excellent scalability, immediately available characteristic.

Claims (1)

1. a kind of non-semantic formula social networks abstracting method towards drama, which comprises the following steps:
Step 1, the role in drama, dialogue and stage instruction carry out typesetting again, form the unified rule of drama element format It formats document, wherein solved about the indefinite pronoun or conjunction occurred in role and stage instruction using dynamic stage Analysis, to be substituted with specific role name:
For the no row moment, dynamic stage SnoIt is made of all roles before the lights, i.e. Sno=p | when for no row For quarter, p is before the lights }, p indicates role, and the dynamic change with the propulsion of plot of the role on stage or stage passes through dance Role goes up end prompt information to track in platform instruction;
Step 2 carries out complete edition scanning to the normalization document that step 1 obtains, and collects the different role occurred in drama;
Step 3 establishes directed connection between role, if establishing directed connection X → Y of role X to role Y, step 3 packet It includes:
Step 3.1, certain section of dialogue for setting role XTwo conditions of directed connection X → Y can be facilitated, b, e indicate the b of drama Capable and e row, for the start-stop row of this section of dialogue:
Condition one:And
Condition two: at least there is one of following two situation: situation one, dialogueBefore there are the instructions of forerunner's stageAnd refer to Show and includes " the indicative preposition Y of X ";In the ranks there is stage instruction in situation two, drama b row and eAnd instruction includes " indicative Jie Word Y ";
Step 3.2 after setting higher priority for condition one, calculates oriented company using two conditions that step 3.1 is set Meet X → Y;
Step 4, the weight for calculating each directed connection, for certain section of dialogue of role XIf role X having to role Y can be facilitated To connection X → Y, then the maximum value Max of its weight is e-b, actual weight is calculated separately according to following two situation:
Situation one: for " the indicative preposition Y of X ": actual weight is started counting from b row, until finding first non-" indicative Jie Until word Y " stage indicates, if the first trip of stage instruction is s row, the actual weight of directed connection X → Y is s-b at this time, if Non- " indicative preposition Y " stage instruction is not found, then the actual weight of directed connection X → Y is Max;
Situation two, for " indicative preposition Y ": actual weight from containing " indicative preposition Y " stage instruction start counting, directly Until discovery first non-" indicative preposition Y " stage instruction, the actual weight of directed connection X → Y is s'-t at this time, and s' is The first trip of first non-" indicative preposition Y " stage instruction, t are the footline of the stage instruction containing " indicative preposition Y ";If " indicative preposition Y " occurs repeatedly, then calculating actual weight of the accumulated value as directed connection X → Y;If not finding non-" refer to The instruction of the property shown preposition Y " stage, the actual weight of directed connection X → Y are e-t.
CN201610938958.4A 2016-10-25 2016-10-25 Non-semantic formula social networks abstracting method towards drama Expired - Fee Related CN106503180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610938958.4A CN106503180B (en) 2016-10-25 2016-10-25 Non-semantic formula social networks abstracting method towards drama

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610938958.4A CN106503180B (en) 2016-10-25 2016-10-25 Non-semantic formula social networks abstracting method towards drama

Publications (2)

Publication Number Publication Date
CN106503180A CN106503180A (en) 2017-03-15
CN106503180B true CN106503180B (en) 2019-10-22

Family

ID=58319164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610938958.4A Expired - Fee Related CN106503180B (en) 2016-10-25 2016-10-25 Non-semantic formula social networks abstracting method towards drama

Country Status (1)

Country Link
CN (1) CN106503180B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577672B (en) * 2017-09-19 2021-07-06 网智天元科技集团股份有限公司 Public opinion-based script setting method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063481A (en) * 2010-12-24 2011-05-18 中国电子科技集团公司第五十四研究所 Method for establishing movie and TV drama analysis dedicated knowledge base and method for analyzing drama

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063481A (en) * 2010-12-24 2011-05-18 中国电子科技集团公司第五十四研究所 Method for establishing movie and TV drama analysis dedicated knowledge base and method for analyzing drama

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Operationalizing or the function of measurement in literary theory;Moretti F;《New Left Review》;20131231;第1卷(第84期);103-119 *
Structural analysis on social network constructed from characters in literature texts;Park GM,Kim SH,Cho HG;《journal of Computers》;20131231;第8卷(第9期);2442-2447 *

Also Published As

Publication number Publication date
CN106503180A (en) 2017-03-15

Similar Documents

Publication Publication Date Title
Chakrabarty et al. AMPERSAND: Argument mining for PERSuAsive oNline discussions
US9336199B2 (en) Automatic sentence evaluation device using shallow parser to automatically evaluate sentence, and error detection apparatus and method of the same
CN104881402B (en) The method and device of Chinese network topics comment text semantic tendency analysis
US20150154173A1 (en) Method of detecting grammatical error, error detecting apparatus for the method, and computer-readable recording medium storing the method
CN106021229B (en) A kind of Chinese event synchronous anomalies method
WO2017010652A1 (en) Automatic question and answer method and device therefor
KR101012504B1 (en) Method of extracting Triplets by searching dependency grammar setence tree
WO2017177809A1 (en) Word segmentation method and system for language text
WO2015141700A1 (en) Dialogue system construction support apparatus and method
US20150161096A1 (en) Method for detecting grammatical errors, error detection device for same and computer-readable recording medium having method recorded thereon
US20120183935A1 (en) Learning device, determination device, learning method, determination method, and computer program product
Zhang et al. Video-aided unsupervised grammar induction
Lison et al. Automatic turn segmentation for movie & tv subtitles
Eryigit The Impact of Automatic Morphological Analysis & Disambiguation on Dependency Parsing of Turkish.
US20150286628A1 (en) Information extraction system, information extraction method, and information extraction program
CN106503180B (en) Non-semantic formula social networks abstracting method towards drama
US9020803B2 (en) Confidence-rated transcription and translation
CN110232121A (en) A kind of control order classification method based on semantic net
KR102398683B1 (en) System and Method for Constructing Emotion Lexicon by Paraphrasing and Recognizing Emotion Frames
Ding et al. Dependency graph based chinese semantic parsing
Sun et al. Syntactic parsing of web queries
van Halteren et al. Identification of differences between Dutch language varieties with the VarDial 2018 Dutch-Flemish subtitle data
Fenogenova et al. A general method applicable to the search for anglicisms in russian social network texts
Kuncham et al. Statistical sandhi splitter for agglutinative languages
Monahan et al. Populating a Knowledge Base with Information about Events.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20191022