CN115618003A

CN115618003A - Literature figure relation identification method and system

Info

Publication number: CN115618003A
Application number: CN202211392235.0A
Authority: CN
Inventors: 周凤莉
Original assignee: Harbin University
Current assignee: Harbin University
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2023-01-17

Abstract

The invention relates to a method and a system for recognizing the relation of literary characters, and provides a method and a system for recognizing the relation of literary characters, which can firstly determine sentences containing special identity relations through a text classification model and then recognize the relation of characters to be recognized from the sentences, thereby realizing the accurate recognition of the relation of characters of a preset type. Carrying out sequence annotation on the literary works to be analyzed, and extracting names of people in the literary works and sentences containing special identity relations; pairing and splicing each identified person name and each sentence group containing a special identity relationship, and inputting a relationship classification model to determine whether the relationship in the sentences containing the special identity relationship is a character relationship to be identified; and counting the number of the tone words used for embodying emotion in the literary work to be analyzed, substituting the number into the category analysis model, and acquiring the literary category of the literary work to be analyzed.

Description

Literature figure relation identification method and system

Technical Field

The invention relates to the technical field of natural language and text processing, in particular to a literature character relationship identification method and system.

Background

In order to excavate effective knowledge of literary works, the character relationship between characters in the literary works needs to be analyzed, and the character relationship is an important knowledge acquisition means, which is to extract the semantic relationship existing between two character entities from a natural language text.

The existing person relationship identification method utilizes a conventional PCNN (pulse coupled neural network, PCNN) model to improve a pooling layer of a conventional Convolutional Neural Network (CNN), utilizes the improved conventional convolutional neural network to mine the person relationship, and mainly comprises the steps of dividing a feature map into three sections through two entity positions for pooling, and decomposing the feature map into (before an entity, between entities and after the entity) so as to better capture the structural information between the two entities. Using the attention mechanism, the false tag problem is mitigated by establishing a sentence-level attention mechanism. However, the semantic meaning of the sentence is not fully considered in the models, the models are not suitable for literary works, meanwhile, the literary works often have more characters, are distributed in each chapter of the book, and have complicated and intricate relationships, and the current character relationship identification method cannot fully show the complicated character relationships of the literary works.

Disclosure of Invention

The invention relates to a method and a system for recognizing the relation of literary characters, and provides a method and a system for recognizing the relation of literary characters, which can firstly determine sentences containing special identity relations through a text classification model and then recognize the relation of characters to be recognized from the sentences, thereby realizing the accurate recognition of the relation of characters of a preset type.

A literature figure relation identification method comprises the following steps:

s1: carrying out sequence annotation on the literary works to be analyzed, and extracting names of people in the literary works and sentences containing special identity relations;

s2: and combining and splicing each identified person name and each sentence group containing the special identity relationship, and inputting a relationship classification model to determine whether the relationship in the sentences containing the special identity relationship is the character relationship to be identified.

S3: and counting the number of the tone words used for embodying emotion in the literary work to be analyzed, substituting the number into the category analysis model, and acquiring the literary category of the literary work to be analyzed.

Further, the specific method for performing sequence annotation on the literary work to be analyzed and extracting the names of people in the literary work and the sentences containing the special identity relationship comprises the following steps:

s101: carrying out sequence annotation on the literary work to be analyzed to obtain the name of a figure contained in the literary work to be analyzed;

s102: segmenting the literary works to be analyzed according to sentences, and inputting each sentence into a text classification model to determine whether each sentence contains a special identity relationship;

s103: and extracting the names of the characters in the literary works of sentences containing special identity relations through a name identification interface suitable for the literary works.

Furthermore, the specific method for segmenting the literary work to be analyzed according to sentences and inputting each sentence into the text classification model to determine whether each sentence contains a special identity relationship is as follows;

s10201, converting each word in A words of the sentence into B-dimensional vector, and forming an A-B matrix by the B-dimensional vector;

s10202, inputting the matrix A and B into a convolution neural network of the text classification model to obtain a characteristic diagram, and performing maximum pooling operation on the characteristic diagram to obtain a characteristic vector;

s10203, the feature vectors are processed by a classifier to obtain a classification result, and the classification result indicates whether the sentence contains a special identity relationship.

Further, the specific method for pairing and splicing each recognized person name and each sentence group containing a special identity relationship and inputting a relationship classification model to determine whether the relationship in the sentences containing the special identity relationship is the relationship of the person to be recognized is as follows:

s201, obtaining a name list from each identified name, and forming a name-sentence pair by traversing the name list and each sentence containing a special identity relationship;

s202, segmenting the sequence text of the spliced name-sentence pair according to characters and inputting the segmented sequence text into an input layer of the language pre-training Bert model;

s203, splicing the hidden vector output by the language pre-training Bert model with the name-to-position vector in the sentence;

and S204, passing the spliced vector through a full connection layer and a softmax layer to obtain a category distribution probability vector, wherein the relationship category corresponding to the maximum value in the category distribution probability vector is the category of the spliced name-sentence pair.

Further, the step of counting the number of the tone words used for representing emotion in the literary work to be analyzed, substituting the number into the category analysis model, and obtaining the literary category of the literary work to be analyzed comprises the following steps:

s301: extracting the tone words used for representing the emotion in the literary works to be analyzed to obtain the number of the tone words used for representing the emotion;

s302: substituting the number of the tone words for representing the emotion into a literature category analysis model to obtain an importance degree parameter of the tone words for representing the emotion in the call text information;

s303: and obtaining the literature category to which the literature to be analyzed belongs according to the importance degree parameter.

A literature figure relationship recognition system, the literature figure relationship recognition system comprising:

the extraction module is used for carrying out sequence annotation on the literary works to be analyzed and extracting names of people and sentences containing special identity relations in the literary works;

and the recognition module is used for pairing and splicing each recognized person name and each sentence group containing the special identity relationship, and inputting a relationship classification model to determine whether the relationship in the sentences containing the special identity relationship is the character relationship to be recognized.

And the classification module is used for counting the number of the tone words used for embodying emotion in the literary works to be analyzed, substituting the number into the category analysis model, and acquiring the literary category of the literary works to be analyzed.

Further, the extraction module comprises:

the annotation module is used for carrying out sequence annotation on the literary works to be analyzed to obtain the names of the figures contained in the literary works to be analyzed;

the segmentation module is used for segmenting the literary works to be analyzed according to sentences and inputting each sentence into the text classification model to determine whether each sentence contains a special identity relationship;

and the interface module is used for extracting the names of the people in the literary works of the sentences containing the special identity relations through the name identification interface applicable to the literary works.

Further, the segmentation module comprises;

the matrix module is used for converting each character in A characters of the sentence into a B-dimensional vector and forming an A-B matrix by the B-dimensional vector;

the vector module is used for inputting the A & ltx & gt B matrix into a convolutional neural network of the text classification model to obtain a characteristic diagram, and performing maximum value pooling operation on the characteristic diagram to obtain a characteristic vector;

and the characteristic module is used for enabling the characteristic vectors to pass through the classifier to obtain a classification result, and the classification result indicates whether the sentence contains a special identity relation.

Further, the identification module comprises:

the group-to-group module is used for obtaining a name list from each identified name and forming a name-sentence pair by traversing the name list and each sentence containing a special identity relationship;

the embedded module is used for segmenting the sequence text of the spliced name-sentence pair according to characters and inputting the segmented sequence text into an input layer of the language pre-training Bert model;

the splicing module is used for splicing the hidden vector output by the language pre-training Bert model with the name-to-position vector in the sentence; splicing;

and the corresponding module is used for enabling the spliced vector to pass through a full connection layer and a softmax layer so as to obtain a category distribution probability vector, wherein the relationship category corresponding to the maximum value in the category distribution probability vector is the category of the spliced human name-sentence pair.

Further, the classification module comprises:

the data acquisition module is used for extracting the tone words used for embodying the emotion in the literary works to be analyzed to obtain the number of the tone words used for embodying the emotion;

the calculation module is used for substituting the number of the tone words for representing the emotion into the literature category analysis model to obtain the importance degree parameters of the tone words for representing the emotion in the call text information;

and the definition module is used for obtaining the literature categories to which the literature to be analyzed belongs according to the importance degree parameters.

The beneficial effects of the invention are as follows:

the method is suitable for analyzing the literary works, fully considers the semantics of sentences in the literary works, effectively processes the problems that the literary works are frequently provided with more characters and distributed in each section of the book, and the relation is complicated, and can fully show the complicated character relation of the literary works.

Meanwhile, by counting the number of the tone words used for embodying emotion in the literary work, the literary category of the literary work to be analyzed can be obtained, and the literary work is analyzed from multiple angles, and the character relation of the literary work is printed on the side face.

Furthermore, dropout, softmax and the like are widely applied in analysis processing, so that the calculation of the whole analysis process is simpler, the effect is obvious, the use is very good, and the reliability and the application range of the method are further improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The following goes through the drawings and examples. The technical scheme of the invention is further described in detail.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention without limiting the invention in which:

FIG. 1 is a diagram of the steps of the method of the present invention;

FIG. 2 is a schematic diagram of a system in accordance with the present invention;

fig. 3 is a detailed view of the system of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

The embodiment of the invention provides a literary figure relation identification method, which comprises the following steps of:

s2: pairing and splicing each identified person name and each sentence group containing a special identity relationship, and inputting a relationship classification model to determine whether the relationship in the sentences containing the special identity relationship is a character relationship to be identified;

s3: and counting the number of the language and gas words used for embodying emotion in the literary work to be analyzed, substituting the number into the category analysis model, and acquiring the literary category of the literary work to be analyzed.

The working principle of the embodiment is as follows:

firstly, carrying out sequence annotation on a literary work to be analyzed to obtain a character name contained in the literary work to be analyzed; segmenting the literary works to be analyzed according to sentences, and inputting each sentence into a text classification model to determine whether each sentence contains a special identity relationship; and extracting the names of the people in the literary works of the sentences containing the special identity relations through a name identification interface suitable for the literary works.

Then, after the name of a person in the referee document is extracted through a name recognition interface applicable to the referee document, a name list is obtained; traversing a name list, pairing each name with each sentence containing a special identity relationship, and splicing; thus, the obtained person name-sentence pairs can be input into the relation classification model to obtain the relation type predicted by the model.

Then, in the relation classification model, the sequence text of the name-sentence pairs is segmented according to characters and input into the model at an input layer, a hidden vector output by the model is taken, after the hidden vector is spliced with the position vector of the name pair in the sentence, the hidden vector is output through a full connection layer and a softmax layer, a category distribution probability vector is obtained, and the relation category with the maximum output value is taken as the prediction result of the model.

Finally, extracting the tone words used for representing the emotion in the literary works to be analyzed to obtain the number of the tone words used for representing the emotion; and substituting the number of the tone words for representing the emotion into a literature category analysis model to obtain the importance degree parameters of the tone words for representing the emotion in the conversation text information.

The softmax layer is a logistic regression model, and is a prior art.

The special identity relations are relations such as relatives, couples, superior and subordinate.

The text classification model comprises: fastext, textCNN, textRNN, textRCNN, and the like.

The beneficial effect of this embodiment does:

In one embodiment, the specific method for performing sequence annotation on the literary work to be analyzed and extracting the names of people and the sentences containing the special identity relationship in the literary work comprises the following steps:

The working principle of the embodiment is as follows:

and preprocessing the literary work to be analyzed, namely acquiring the literary work to be identified with a preset character relationship, such as the special character identity relationship, and finishing the data cleaning work. Converting each of the A words of the sentence into a B-dimensional vector; forming A B-dimensional vectors corresponding to the A words of the sentence into an A x B matrix; inputting the A-B matrix into a convolutional neural network of a text classification model to obtain a feature map; performing maximum pooling operation on the feature map to obtain a feature vector; and passing the feature vector through a classifier to obtain a classification result, wherein the classification result represents whether the sentence contains a special identity relationship; and for the sentences containing special identity relations judged by the text classification model, the names of the characters in the literary works can be extracted through a name recognition interface suitable for the literary works.

Example (c): a may take the value 4 and B may take the value 3, then the a x B matrix is:

the beneficial effect of this embodiment does:

the character relationship in the literature is often embodied by sentences containing special identity relationship. Therefore, the names of the people in the literary works and the sentences containing the special identity relations are obtained, and the character relations in the literary works can be found out most quickly. Therefore, the cultural works need to be processed in a segmented manner, the processing process of the cultural works is precise, high logicality is embodied, and the analysis efficiency of the human relationship in the cultural works is improved;

further, the literary works are converted into vectors to be analyzed and processed, so that the speed and the accuracy of the analysis process are improved; furthermore, dropout, softmax and the like are widely applied in analysis processing, so that the calculation of the whole analysis process is simpler, the effect is obvious, the use is very good, and the reliability and the application range of the method are further improved.

In one embodiment, the specific method for segmenting the literary work to be analyzed according to sentences and inputting each sentence into the text classification model to determine whether each sentence contains a special identity relationship is as follows;

s10201, converting each word in A words of the sentence into a B-dimensional vector, and forming an A-B matrix by the B-dimensional vector;

s10202, inputting the A-B matrix into a convolutional neural network of the text classification model to obtain a characteristic diagram, and performing maximum value pooling operation on the characteristic diagram to obtain a characteristic vector;

The working principle of the embodiment is as follows:

converting each of the A words of the sentence into a B-dimensional vector; forming A B-dimensional vectors corresponding to the A words of the sentence into an A x B matrix; inputting the A-B matrix into a convolutional neural network of a text classification model to obtain a feature map; performing maximum pooling operation on the feature map to obtain feature vectors; and passing the feature vector through a classifier to obtain a classification result, wherein the classification result represents whether the sentence contains a special identity relationship.

Specifically, in the classification, the feature vectors are first output through the fully connected layer, and a Dropout layer is added to prevent overfitting. In multi-classification, usually a Softmax layer is used for multi-classification, and a Softmax function can map the output of the neural network into a (0, 1) interval, and can regard the value as a class distribution probability vector, and take the class with the maximum probability value as a final prediction result. And the training data of the classification model is derived from the manually marked data whether to contain the relation category in the referee document, namely the labels of the sentences are of two types, one type is the sentences containing the special identity relation, and the other type is the sentences not containing the special identity relation.

The Dropout layer is a structure that can be used to reduce neural network overfitting.

The beneficial effect of this embodiment does:

by means of subsection processing of the literary works, the processing process of the literary works is precise, high logicality is embodied, and the analysis efficiency of the relationship between the literary works and the human beings is improved; further, the literary works are converted into vectors to be analyzed and processed, so that the speed and the accuracy of the analysis process are improved; furthermore, dropout, softmax and the like are widely applied in analysis processing, so that the calculation of the whole analysis process is simpler, the effect is obvious, the use is very good, and the reliability degree and the application range of the method are further improved.

In one embodiment, the specific method for pairing and splicing each recognized person name and each sentence group containing a special identity relationship and inputting the sentence group into the relationship classification model to determine whether the relationship in the sentences containing the special identity relationship is the relationship of the person to be recognized is as follows:

The working principle of the embodiment is as follows:

firstly, after the name of a person in a referee document is extracted through a name recognition interface applicable to the referee document, a name list is obtained; traversing a name list, pairing each name with each sentence containing a special identity relationship, and splicing; thus, the obtained person name-sentence pairs can be input into the relation classification model, and the relation type predicted by the model is obtained.

Then, in the relation classification model, segmenting the sequence text of the name-sentence pairs according to characters and inputting the segmented sequence text into the model in an input layer; and (3) taking the hidden vector output by the model, splicing the hidden vector with the name pair position vector in the sentence, outputting the hidden vector through full connection and a softmax layer to obtain a class distribution probability vector, and taking the relation class with the maximum output value as the prediction result of the model.

In the embodiment of the present application, the relational classification model may be a language pre-training Bert model, and the Bert model is a language pre-training model proposed by google in 2018, and belongs to the prior art.

When the model is used for classifying the relation types of the literary works, the model is pre-trained based on large-scale literary field linguistic data, so that the model is more suitable for processing the natural language processing problem in the literary field. Then, a new step of training of the model is performed using the labeled literary work.

The training process of the language pre-training Bert model comprises the following steps:

1) Pre-training the language pre-training Bert model based on the linguistic data in the large-scale literature field;

2) And training the language pre-training Bert model by using the marked literature.

The beneficial effect of this embodiment does:

through the classification of the relation category of the relation classification model, whether the relation in the sentence containing the special identity relation is the character relation to be identified or not can be determined, such as the special character identity relation, the method is accurate and efficient, and the working time is saved;

the Bert model has strong language representation capability and feature extraction capability. The state of the art is reached in 11 NLP benchmark test tasks, and meanwhile, the capability of the bidirectional language model is proved to be more powerful, and the working efficiency and the reliability degree of the invention are greatly improved.

In one embodiment, the step of counting the number of the mood words used for representing emotion in the literary work to be analyzed, substituting the number into the category analysis model, and obtaining the literary category of the literary work to be analyzed includes:

s301: extracting the tone words used for embodying emotion in the literary works to be analyzed to obtain the number of the tone words used for embodying emotion;

s303: and obtaining the literature category to which the literature work to be analyzed belongs according to the importance degree parameter.

The working principle of the embodiment is as follows:

the literature category analysis model comprises the following steps:

the number of the tone words for representing emotion in the formula is Si, and the importance degree of the ith tone word for representing emotion in the literature is Zi, i =1,2, 3.

Sorting according to the Zi size, and determining a first ranking tone word for representing emotion; and determining the literature category to which the literature belongs according to the first ranked Chinese word for representing emotion.

The beneficial effect of this embodiment does:

because the literary works may contain various mood words for representing emotions, such as words representing joy, words representing anger, words representing loss, words representing love and the like, for the condition that various mood words for representing emotions coexist, the main emotions of both parties of a call can be analyzed by carefully screening, and through the embodiment, the mood words for representing emotions with the first rank can be determined by sequencing according to the size of Zi, and then the main category tendency of the analyzed literary works can be accurately determined according to the words of the mood words for representing emotions with the first rank.

Compared with the prior art, the literature type analysis model is more precise, has more accurate and visual expression effect, and is beneficial to the propagation and popularization of the invention.

Example (c): when the first ranking mood word embodying emotion is of emotion class: love, hate, complain, recite, etc., the literary work can be classified as a sentiment-type literary work;

when the first-ranked emotional tone word represents a startle, the following words are used: the literary works can be classified into horror works by frightening, scaring, flustering and the like;

when the first-ranked emotion-expressing linguistic word is of a reasoning class: thinking, worrying, waiting, etc., the literary work can be classified as a reasoning class work.

The embodiment provides a literary character relationship identification system, as shown in figure 2,

the extraction module is used for carrying out sequence labeling on the literary works to be analyzed and extracting names of people and sentences containing special identity relations in the literary works;

the recognition module is used for pairing and splicing each recognized person name and each sentence group containing a special identity relationship, and inputting a relationship classification model to determine whether the relationship in the sentences containing the special identity relationship is a character relationship to be recognized or not;

and the classification module is used for counting the number of the tone words used for embodying emotion in the literary work to be analyzed, substituting the number into the category analysis model, and acquiring the literary category of the literary work to be analyzed.

The working principle of the embodiment is as follows:

Further, after the name of the referee document is extracted through a name recognition interface applicable to the referee document, a name list is obtained; traversing a name list, pairing each name with each sentence containing a special identity relationship, and splicing; thus, the obtained person name-sentence pairs can be input into the relation classification model, and the relation type predicted by the model is obtained.

Further, in the relation classification model, the sequence text of the name-sentence pair is segmented according to characters and input into the model at an input layer, a hidden vector output by the model is taken, after the hidden vector is spliced with the position vector of the name pair in the sentence, the hidden vector is output through a full connection layer and a softmax layer, a class distribution probability vector is obtained, and the relation class with the maximum output value is taken as the prediction result of the model.

Finally, extracting the tone words used for embodying emotion in the literary works to be analyzed to obtain the number of the tone words used for embodying emotion; and substituting the number of the tone words for representing the emotion into a literature category analysis model to obtain the importance degree parameter of the tone words for representing the emotion in the call text information.

The softmax layer is a logistic regression model, and is a prior art.

The special identity relations are relations of relatives, couples, superior and subordinate.

The beneficial effect of this embodiment does:

In one embodiment, as shown in fig. 3, the extraction module comprises:

the system comprises a labeling module, a judging module and a judging module, wherein the labeling module is used for carrying out sequence labeling on a literary work to be analyzed to obtain the name of a figure contained in the literary work to be analyzed;

the segmentation module is used for segmenting the literary works to be analyzed according to sentences and inputting each sentence into the text classification model so as to determine whether each sentence contains a special identity relationship;

and the interface module is used for extracting the names of the characters in the sentences containing the sentences with special identity relations through the name identification interface suitable for the characters.

The working principle of the embodiment is as follows:

and preprocessing the literary work to be analyzed, namely acquiring the literary work to be identified with a preset character relationship, such as the special character identity relationship, and finishing the data cleaning work. Converting each of the A words of the sentence into a B-dimensional vector; forming A B-dimensional vectors corresponding to the A words of the sentence into an A x B matrix; inputting the A-B matrix into a convolutional neural network of a text classification model to obtain a feature map; performing maximum pooling operation on the feature map to obtain feature vectors; and passing the feature vector through a classifier to obtain a classification result, wherein the classification result represents whether the sentence contains a special identity relationship; for the sentence containing the special identity relation judged by the text classification model, the name of the person in the literary work can be extracted through a name recognition interface suitable for the literary work.

the beneficial effect of this embodiment does:

the relationship of the characters in the literary works is often embodied by sentences containing special identity relationships. Therefore, the names of the people in the literary works and the sentences containing the special identity relations are obtained, and the character relations in the literary works can be found out most quickly. Therefore, the cultural works need to be processed in a segmented manner, the processing process of the cultural works is precise, high logicality is embodied, and the analysis efficiency of the human relationship in the cultural works is improved;

In one embodiment, as shown in FIG. 3, the segmentation module comprises;

the matrix module is used for converting each word in A words of a sentence into a B-dimensional vector and forming an A & ltx & gt B matrix by the B-dimensional vector;

the vector module is used for inputting the A-B matrix into a convolutional neural network of the text classification model to obtain a characteristic diagram, and performing maximum value pooling operation on the characteristic diagram to obtain a characteristic vector;

The working principle of the embodiment is as follows:

converting each of the A words of the sentence into a B-dimensional vector; forming A B-dimensional vectors corresponding to the A words of the sentence into an A x B matrix; inputting the A-B matrix into a convolutional neural network of a text classification model to obtain a feature map; performing maximum pooling operation on the feature map to obtain feature vectors; and enabling the feature vectors to pass through a classifier to obtain a classification result, wherein the classification result represents whether the sentence contains a special identity relation.

Specifically, in the classification, the feature vectors are first output through the fully-connected layer, and a Dropout layer is added to prevent overfitting. In multi-classification, usually a Softmax layer is used for multi-classification, and a Softmax function can map the output of the neural network into a (0, 1) interval, and can regard the value as a class distribution probability vector, and take the class with the maximum probability value as a final prediction result. And the training data of the classification model is derived from the manually marked data whether to contain the relation category in the referee document, namely the labels of the sentences are of two types, one type is the sentences containing the special identity relation, and the other type is the sentences not containing the special identity relation.

The beneficial effect of this embodiment does:

by means of subsection processing of the literary works, the processing process of the literary works is precise, high logicality is embodied, and the analysis efficiency of the relationship between the literary works and the human beings is improved; further, the literary works are converted into vectors to be analyzed and processed, so that the speed and the accuracy of the analysis process are improved; furthermore, dropout, softmax and the like are widely applied in analysis processing, so that the calculation of the whole analysis process is simpler, the effect is obvious, the use is very good, and the reliability and the application range of the method are further improved.

In one embodiment, as shown in fig. 3, the identification module comprises:

the embedded module is used for segmenting the sequence text of the spliced name-sentence pair according to characters and inputting the sequence text into an input layer of the language pre-training Bert model;

and the corresponding module is used for enabling the spliced vectors to pass through a full connection layer and a softmax layer so as to obtain a category distribution probability vector, wherein the relationship category corresponding to the maximum value in the category distribution probability vector is the category of the spliced name-sentence pair.

The working principle of the embodiment is as follows:

firstly, extracting the name of a person in a referee document through a name recognition interface suitable for the referee document to obtain a name list; traversing a name list, pairing each name with each sentence containing a special identity relationship, and splicing; thus, the obtained person name-sentence pairs can be input into the relation classification model, and the relation type predicted by the model is obtained.

When the model is used for classifying the relation categories of the literary works, the model is pre-trained on the basis of large-scale literary field linguistic data, so that the model is more suitable for processing the natural language processing problem of the literary field. Then, a new step of training of the model is performed using the labeled literary work.

2) And training the language pre-training Bert model by using the marked literary works.

The beneficial effect of this embodiment does:

the Bert model has strong language representation capability and feature extraction capability. The state of the art is reached in 11 NLP benchmark test tasks, and meanwhile, the capability of the bidirectional language model is proved to be more powerful, and the working efficiency and the reliability of the invention are greatly improved.

In one embodiment, as shown in fig. 3, the classification module comprises:

the calculation module is used for substituting the number of the tone words for embodying the emotion into the literature category analysis model to obtain the importance degree parameters of the tone words for embodying the emotion in the call text information;

The working principle of the embodiment is as follows:

the literature category analysis model comprises the following steps:

the number of the language word used for representing emotion is Si, and the importance degree of the ith language word used for representing emotion in the literature is Zi, i =1,2, 3.

Sorting according to the Zi, and determining a first-ranked Chinese word for representing emotion; and determining the literature category to which the literature belongs according to the first ranked Chinese word for representing emotion.

The beneficial effect of this embodiment does:

since the literary works may include a plurality of tone words for representing emotions, such as words representing joy, words representing anger, words representing loss, words representing love and the like, for the case that a plurality of tone words for representing emotions coexist, the main emotions of both parties of a call can be analyzed by carefully screening, and through the embodiment, the tone words for representing emotions with the first ranking can be determined by sorting according to the size of Zi, and then the main category tendency of the analyzed literary works can be accurately determined according to the words of the tone words for representing emotions with the first ranking.

Example (c): when the first ranking mood word embodying emotion is of emotion class: love, abhate, complain, recite, etc., then the literary work can be classified as a sentiment-like literary work;

when the first-ranked emotional tone word represents a startle, the following words are used: the literary works can be classified into horror works by frightening, scaring, flustery and the like;

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, if such modifications and variations of the present invention fall within the technical scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A literature figure relation identification method is characterized by comprising the following steps:

2. The literary figure relationship identification method of claim 1, wherein the sequence labeling is performed on the literary work to be analyzed, and the specific method for extracting the names of people in the literary work and the sentences containing special identity relationships comprises the following steps:

3. The method for recognizing the relation of literary characters as claimed in claim 2, wherein the literary works to be analyzed are segmented into sentences, and each sentence is inputted into the text classification model to determine whether each sentence contains a special identity relation;

4. The method of claim 1, wherein the specific method of pairing and concatenating each recognized person name and each sentence group containing a special identity relationship and inputting the relationship classification model to determine whether the relationship in the sentence containing a special identity relationship is the person relationship to be recognized is as follows:

and S204, passing the spliced vector through a full connection layer and a softmax layer to obtain a category distribution probability vector, wherein the relationship category corresponding to the maximum value in the category distribution probability vector is the category of the spliced human name-sentence pair.

5. The literary character relationship identification method of claim 1, wherein the step of counting the number of the mood words for representing emotion in the literary work to be analyzed, substituting the number into the category analysis model, and obtaining the literary category of the literary work to be analyzed comprises:

6. A literary personal relationship identification system, comprising:

7. The literary human relationship identification system of claim 6, wherein the extraction module comprises:

8. The literary character relationship identification method of claim 7, wherein the segmentation module comprises;

9. The literary character relationship identification method of claim 6, wherein the identification module comprises:

10. The literary character relationship identification method of claim 6, wherein the classification module comprises: