CN106503180B - Non-semantic formula social networks abstracting method towards drama - Google Patents
Non-semantic formula social networks abstracting method towards drama Download PDFInfo
- Publication number
- CN106503180B CN106503180B CN201610938958.4A CN201610938958A CN106503180B CN 106503180 B CN106503180 B CN 106503180B CN 201610938958 A CN201610938958 A CN 201610938958A CN 106503180 B CN106503180 B CN 106503180B
- Authority
- CN
- China
- Prior art keywords
- role
- stage
- indicative
- drama
- preposition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000010606 normalization Methods 0.000 claims description 5
- 238000003058 natural language processing Methods 0.000 abstract description 5
- 230000000694 effects Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000012549 training Methods 0.000 abstract description 3
- 238000005457 optimization Methods 0.000 abstract description 2
- 238000012360 testing method Methods 0.000 abstract description 2
- 238000011160 research Methods 0.000 description 4
- 230000001154 acute effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Computing Systems (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides a kind of non-semantic formula social networks abstracting method towards drama.The present invention abandons in small data set that effect is more preferable, the higher semantic category technology path of complexity, and under light weight, easy extension framework, having chosen has the stage being closely connected instruction with role's dialogue, and the direct correlation between role is found from side.In general, compared with more accurately natural language processing formula method, the actor first that this measure is extracted can be more sparse;But he is not necessarily to training set and test set without guidance, without arameter optimization and calibration, without high space-time cost, while being also equipped with excellent scalability, immediately available characteristic.
Description
Technical field
The present invention relates to a kind of actor first for quickly extracting drama, (role is node, and relationship is connection, the i.e. void of drama
Quasi- social networks) method, so that literary critic be facilitated online relatively and in real time understand to magnanimity electronics drama.
Background technique
Discovery, identification and the degree of relationship between role are concentrated mainly on from the challenge for extracting network in the formatted text of drama
In amount.For at present, method that there are two main classes, i.e. (co-occurrence) based on co-occurrence and (line) based on dialogue.
The thought of co-occurrence class method is, if it find that two roles are same in some scene (or stage, paragraph, chapters and sections etc.)
When occur, then will be just connected each other with nonoriented edge, weight (often indicating cohesion) can be them and cooperate with appearance
Number is also possible to be separated by the inverse of text (or line number) between them.
Dialogue class method is built upon on the basis of co-occurrence class, and with the lines between role for direct investigation object.For
There is the judgement of connection between role, specific method is more.There is research to think, two roles are with the appearance in certain scene and have speech i.e.
It can;It is exactly relationship between them if another studies have pointed out that the sounding of two roles successively reproduces;More there is research to pass through platform
Distance between word determines between two roles close and distant (inverse ratio);Still there is one index of research and utilization, i.e., refers to angle in role X lines
Number whether color Y, the weight (zero represent unrelated) as X Y.It moreover is exactly to utilize natural language processing (Natural
Language Processing, NLP) technology, carry out the specific aim of discrimination role's lines, and make the oriented company of accurate role
It connects.
For co-occurrence class method, for the virtual social network accuracy extracted compared with Customers ' Legal Right, reason is such side
Method not can guarantee following three points: 1) whether two roles have aspectant chance really, divide especially by line-spacing with threshold value
Scene;2) between them exist whether necessary being certain directly contact, even if facing other side directly;3) role relation extracted
It is not directed toward.
Since the main body of most dramas is part dialog, dialogue class method is more often used relatively, but still remains one
The place needed to be improved a bit.For non-semantic formula, the direct connection that can not accurately excavate relatively between role is hard defects;For certainly
Right Language Processing formula may be more accurate to part drama by training, but for different times, different-style, different subject matters
Magnanimity script library for, mining effect will have a greatly reduced quality, while its complexity is also higher than other algorithms, be unfavorable for expanding
Exhibition.
Summary of the invention
The actor first of drama is quickly extracted the purpose of the present invention is the parsing indicated by stage.
In order to achieve the above object, the non-semantic formula social network towards drama that the technical solution of the present invention is to provide a kind of
Network abstracting method, which comprises the following steps:
Step 1, the role in drama, dialogue and stage instruction carry out typesetting again, and it is unified to form drama element format
Normalization document, wherein about the indefinite pronoun that occurs in role and stage instruction or conjunction using dynamic stage come into
Row parsing, to be substituted with specific role name:
For the no row moment, dynamic stage SnoIt is made of all roles before the lights, i.e. Sno=p | for
For the no row moment, p is before the lights }, p indicates role, the dynamic change with the propulsion of plot of the role on stage or stage,
By stage indicate in role upper end prompt information track;
Step 2 carries out complete edition scanning to the normalization document that step 1 obtains, and collects the different role occurred in drama;
Step 3 establishes directed connection between role;
Step 4, the weight for calculating each directed connection.
Preferably, if establishing directed connection X → Y of role X to role Y, the step 3 includes:
Step 3.1, certain section of dialogue for setting role XTwo conditions of directed connection X → Y can be facilitated, b, e indicate acute
This b row and e row, for the start-stop row of this section of dialogue:
Condition one:And
Condition two: at least there is one of following two situation: situation one, dialogueBefore there are the instructions of forerunner's stage
And instruction includes " the indicative preposition Y of X ";In the ranks there is stage instruction in situation two, drama b row and eAnd instruction includes " instruction
Property preposition Y ";
Step 3.2 after setting higher priority for condition one, has been calculated using two conditions that step 3.1 is set
To connection X → Y.
Preferably, in the step 4, for certain section of dialogue of role XIf role X having to role Y can be facilitated
To connection X → Y, then the maximum value Max of its weight is e-b, actual weight is calculated separately according to following two situation:
Situation one: for " the indicative preposition Y of X ": actual weight is started counting from b row, until discovery first non-" instruction
Property preposition Y " stage instruction until, if the stage instruction first trip be s row, at this time the actual weight of directed connection X → Y be s-b,
If not finding non-" indicative preposition Y " stage instruction, the actual weight of directed connection X → Y is Max;
Situation two, for " indicative preposition Y ": actual weight since containing " indicative preposition Y " stage instruction count
Number, until finding first non-" indicative preposition Y " stage instruction, the actual weight of directed connection X → Y is s '-at this time
T, s ' are the first trip of first non-" indicative preposition Y " stage instruction, and t is the end of the stage instruction containing " indicative preposition Y "
Row;If " indicative preposition Y " occurs repeatedly, calculating actual weight of the accumulated value as directed connection X → Y;If do not sent out
Existing non-" indicative preposition Y " stage instruction, the actual weight of directed connection X → Y is e-t.
The present invention can be reinstated immediately in face of magnanimity dramatic verse, ignore between them heterogeneous characteristic (such as the epoch, subject matter,
Author, school etc.) and accurate virtual social network expression is provided, it is bright spot.It can assist in literature appreciation man to comment with drama
By family, preliminary understanding is carried out to magnanimity electronics drama in a short time and is compared with thick-thin-strip, especially point towards social networks
Analysis, and be conducive to subsequent classification, judge, the even adaptive building of drama.
Detailed description of the invention
Fig. 1 is the actor first in the Hamlet generated according to the method provided by the invention.
Specific embodiment
In order to make the present invention more obvious and understandable, it is hereby described in detail below with preferred embodiment.
The present invention provides a kind of non-semantic formula social networks abstracting method towards drama, comprising the following steps:
Step 1, pretreatment: since the format differences of electronics drama are larger, it is necessary to carry out before network struction
Pretreatment.Pretreated primary goal is to form the unified normalization document (such as XML file) of drama element format.
In general, drama is made of multiple element, and present invention is primarily concerned with roles (protagonist), dialogue
(line) (directive) is indicated with stage, all role construction complete or collected works U.Specific to drama element format, can easily distinguish
Under the premise of, carry out free setting.It is exemplified below: can be " individually embarking on journey, colon ending " with typesetting for role;For right
White paragraph can be " individually at section, quotation marks are surrounded " with typesetting;And stage is indicated, it can be with typesetting for " square brackets encirclement ".
It is worth noting that, drama again after typesetting, indicates, in fact it could happen that indefinite pronoun is (such as about role's row and stage
They, both, all) or conjunction (such as and, but), it is therefore desirable to they are resolved into specific role name.This demand is logical
" dynamic stage " this concept is crossed to solve.
For the no row moment, dynamic stage SnoIt is made of all roles before the lights, i.e. Sno=p | for
For the no row moment, p is before the lights }, p indicates role, the dynamic change with the propulsion of plot of the role on stage or stage,
By stage indicate in role upper end prompt information track.
After having dynamic stage, the indirect vocabulary in role and stage instruction can be obtained by accurate parsing.For convenience
Subsequent network abstraction needs all to be replaced these indirect vocabulary with role name.Later, if role's number in role's row
More than a people, then needing to be arranged separation mark (such as comma), to facilitate efficiently generating for subsequent connection.
Step 2, role's discovery: after pretreatment, the character list in drama can scan role's row by complete edition
To have collected.It is more easy to identify since role's row have passed through special normalization.
Particularly, sometimes before plot expansion (such as in drama setting), author can initiatively be listed role, and this
One list can be used as the alternative of scheme and supplement.Certainly, directly it is not using this list as network node to full text scanning
Dangerous, reason has two: 1) this list may not list all roles;2) the part role that list is enumerated may be not
There is dialogue.Because the present invention is based role to network is originally constructed, the role of dialogue is not research pair of the invention
As they will not appear in last virtual social network.
Step 3 establishes directed connection between role
If establishing directed connection X → Y of role X to role Y, step 3 includes:
Step 3.1, certain section of dialogue for setting role XTwo conditions of directed connection X → Y can be facilitated, b, e indicate acute
This b row and e row, for the start-stop row of this section of dialogue:
Condition one:And
Condition two: at least there is one of following two situation: situation one, dialogueBefore there are the instructions of forerunner's stage
And instruction includes " X to Y ";In the ranks there is stage instruction in situation two, drama b row and eAnd instruction includes " to Y ";
Step 3.2 after setting higher priority for condition one, has been calculated using two conditions that step 3.1 is set
To connection X → Y.
Certain indicative preposition to therein can be by toward (s), address, tell, the near synonym such as order substitution.
Step 4, the weight for calculating each directed connection
In step 4, for certain section of dialogue of role XIf directed connection X → Y of role X to role Y can be facilitated,
So maximum value Max of its weight is e-b, and actual weight is calculated separately according to following two situation:
Situation one: for " X to Y ": actual weight is started counting from b row, until finding first non-" to Y " stage
Until instruction, if the first trip of stage instruction is s row, the actual weight of directed connection X → Y is s-b at this time, if do not found non-
The instruction of " to Y " stage, then the actual weight of directed connection X → Y is Max;
Situation two, for " to Y ": actual weight from containing " to Y " stage instruction start counting, until discovery first
Until a non-" to Y " stage instruction, the actual weight of directed connection X → Y is s '-t at this time, and s ' is first non-" to Y " dance
The first trip of platform instruction, t are the footline of the stage instruction containing " to Y ";If " to Y " occurs repeatedly, accumulated value work is calculated
For the actual weight of directed connection X → Y;If not finding non-" to Y " stage instruction, the actual weight of directed connection X → Y is
e-t。
The final weight of directed connection X → Y is to complete the accumulated value after scanning to all dialogues of role X.
The schematic diagram of the actor first in Hamlet generated using the above method is as shown in Figure 1.
The present invention abandons in small data set that effect is more preferable, the higher semantic category technology path of complexity, in light weight, easily expands
It opens up under frame, having chosen has the stage being closely connected instruction with role's dialogue, and the direct correlation between role is found from side.One
As for, compared with more accurately natural language processing formula method, actor first that this measure is extracted can it is more sparse (it is possible that
Omit some important informations);But he is not necessarily to training set and test set without guidance, without arameter optimization and calibration, without high
Space-time cost, while be also equipped with excellent scalability, immediately available characteristic.
Claims (1)
1. a kind of non-semantic formula social networks abstracting method towards drama, which comprises the following steps:
Step 1, the role in drama, dialogue and stage instruction carry out typesetting again, form the unified rule of drama element format
It formats document, wherein solved about the indefinite pronoun or conjunction occurred in role and stage instruction using dynamic stage
Analysis, to be substituted with specific role name:
For the no row moment, dynamic stage SnoIt is made of all roles before the lights, i.e. Sno=p | when for no row
For quarter, p is before the lights }, p indicates role, and the dynamic change with the propulsion of plot of the role on stage or stage passes through dance
Role goes up end prompt information to track in platform instruction;
Step 2 carries out complete edition scanning to the normalization document that step 1 obtains, and collects the different role occurred in drama;
Step 3 establishes directed connection between role, if establishing directed connection X → Y of role X to role Y, step 3 packet
It includes:
Step 3.1, certain section of dialogue for setting role XTwo conditions of directed connection X → Y can be facilitated, b, e indicate the b of drama
Capable and e row, for the start-stop row of this section of dialogue:
Condition one:And
Condition two: at least there is one of following two situation: situation one, dialogueBefore there are the instructions of forerunner's stageAnd refer to
Show and includes " the indicative preposition Y of X ";In the ranks there is stage instruction in situation two, drama b row and eAnd instruction includes " indicative Jie
Word Y ";
Step 3.2 after setting higher priority for condition one, calculates oriented company using two conditions that step 3.1 is set
Meet X → Y;
Step 4, the weight for calculating each directed connection, for certain section of dialogue of role XIf role X having to role Y can be facilitated
To connection X → Y, then the maximum value Max of its weight is e-b, actual weight is calculated separately according to following two situation:
Situation one: for " the indicative preposition Y of X ": actual weight is started counting from b row, until finding first non-" indicative Jie
Until word Y " stage indicates, if the first trip of stage instruction is s row, the actual weight of directed connection X → Y is s-b at this time, if
Non- " indicative preposition Y " stage instruction is not found, then the actual weight of directed connection X → Y is Max;
Situation two, for " indicative preposition Y ": actual weight from containing " indicative preposition Y " stage instruction start counting, directly
Until discovery first non-" indicative preposition Y " stage instruction, the actual weight of directed connection X → Y is s'-t at this time, and s' is
The first trip of first non-" indicative preposition Y " stage instruction, t are the footline of the stage instruction containing " indicative preposition Y ";If
" indicative preposition Y " occurs repeatedly, then calculating actual weight of the accumulated value as directed connection X → Y;If not finding non-" refer to
The instruction of the property shown preposition Y " stage, the actual weight of directed connection X → Y are e-t.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610938958.4A CN106503180B (en) | 2016-10-25 | 2016-10-25 | Non-semantic formula social networks abstracting method towards drama |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610938958.4A CN106503180B (en) | 2016-10-25 | 2016-10-25 | Non-semantic formula social networks abstracting method towards drama |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106503180A CN106503180A (en) | 2017-03-15 |
CN106503180B true CN106503180B (en) | 2019-10-22 |
Family
ID=58319164
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610938958.4A Expired - Fee Related CN106503180B (en) | 2016-10-25 | 2016-10-25 | Non-semantic formula social networks abstracting method towards drama |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106503180B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107577672B (en) * | 2017-09-19 | 2021-07-06 | 网智天元科技集团股份有限公司 | Public opinion-based script setting method and device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063481A (en) * | 2010-12-24 | 2011-05-18 | 中国电子科技集团公司第五十四研究所 | Method for establishing movie and TV drama analysis dedicated knowledge base and method for analyzing drama |
-
2016
- 2016-10-25 CN CN201610938958.4A patent/CN106503180B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063481A (en) * | 2010-12-24 | 2011-05-18 | 中国电子科技集团公司第五十四研究所 | Method for establishing movie and TV drama analysis dedicated knowledge base and method for analyzing drama |
Non-Patent Citations (2)
Title |
---|
Operationalizing or the function of measurement in literary theory;Moretti F;《New Left Review》;20131231;第1卷(第84期);103-119 * |
Structural analysis on social network constructed from characters in literature texts;Park GM,Kim SH,Cho HG;《journal of Computers》;20131231;第8卷(第9期);2442-2447 * |
Also Published As
Publication number | Publication date |
---|---|
CN106503180A (en) | 2017-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chakrabarty et al. | AMPERSAND: Argument mining for PERSuAsive oNline discussions | |
US9336199B2 (en) | Automatic sentence evaluation device using shallow parser to automatically evaluate sentence, and error detection apparatus and method of the same | |
CN104881402B (en) | The method and device of Chinese network topics comment text semantic tendency analysis | |
US20150154173A1 (en) | Method of detecting grammatical error, error detecting apparatus for the method, and computer-readable recording medium storing the method | |
CN106021229B (en) | A kind of Chinese event synchronous anomalies method | |
WO2017010652A1 (en) | Automatic question and answer method and device therefor | |
KR101012504B1 (en) | Method of extracting Triplets by searching dependency grammar setence tree | |
WO2017177809A1 (en) | Word segmentation method and system for language text | |
WO2015141700A1 (en) | Dialogue system construction support apparatus and method | |
US20150161096A1 (en) | Method for detecting grammatical errors, error detection device for same and computer-readable recording medium having method recorded thereon | |
US20120183935A1 (en) | Learning device, determination device, learning method, determination method, and computer program product | |
Zhang et al. | Video-aided unsupervised grammar induction | |
Lison et al. | Automatic turn segmentation for movie & tv subtitles | |
Eryigit | The Impact of Automatic Morphological Analysis & Disambiguation on Dependency Parsing of Turkish. | |
US20150286628A1 (en) | Information extraction system, information extraction method, and information extraction program | |
CN106503180B (en) | Non-semantic formula social networks abstracting method towards drama | |
US9020803B2 (en) | Confidence-rated transcription and translation | |
CN110232121A (en) | A kind of control order classification method based on semantic net | |
KR102398683B1 (en) | System and Method for Constructing Emotion Lexicon by Paraphrasing and Recognizing Emotion Frames | |
Ding et al. | Dependency graph based chinese semantic parsing | |
Sun et al. | Syntactic parsing of web queries | |
van Halteren et al. | Identification of differences between Dutch language varieties with the VarDial 2018 Dutch-Flemish subtitle data | |
Fenogenova et al. | A general method applicable to the search for anglicisms in russian social network texts | |
Kuncham et al. | Statistical sandhi splitter for agglutinative languages | |
Monahan et al. | Populating a Knowledge Base with Information about Events. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20191022 |