WO2022134779A1

WO2022134779A1 - Method, apparatus and device for extracting character action related data, and storage medium

Info

Publication number: WO2022134779A1
Application number: PCT/CN2021/124629
Authority: WO
Inventors: 蔡壮壮
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2020-12-23
Filing date: 2021-10-19
Publication date: 2022-06-30
Also published as: CN112597307A

Abstract

A method, apparatus and device for extracting character action related data, and a storage medium, which relate to the field of artificial intelligence and are used for performing syntactic analysis and part-of-speech tagging on text data by means of a Han language processing (HanLP) algorithm, and screening out data related to an ongoing behavior action, and thereby improving the accuracy of data extraction and reducing the noise of an extracted data set. The method for extracting character action related data comprises: acquiring pre-set text data; performing classification processing on the pre-set text data, so as to screen out text data containing character information and obtain initial text data (102); performing segmentation processing and part-of-speech tagging on the initial text data, so as to generate intermediate text data; performing dependency syntactic analysis and semantic dependency analysis on the intermediate text data, so as to generate analysis text data; and performing filtering processing on the analysis text data, so as to generate target text data. In addition, the method further relates to blockchain technology, and target text data can be stored in a blockchain.

Description

Method, device, device and storage medium for extracting data related to character action

This application claims the priority of the Chinese patent application filed on December 23, 2020 with the application number 202011545182.2 and the invention titled "Method, Apparatus, Equipment and Storage Medium for Extracting Data Related to Character Actions", the entire contents of which are Incorporated in the application by reference.

technical field

The present application relates to the field of natural language processing, and in particular, to a method, apparatus, device and storage medium for extracting data related to a character's action.

Background technique

Natural language processing includes two parts: natural language understanding and natural language generation. Realizing natural language communication between humans and machines means that computers can not only understand the meaning of natural language texts, but also express given intentions and texts in natural language texts. Thoughts, etc., the former is called natural language understanding, the latter is called natural language generation, and natural language processing is an important direction in the field of computer science and artificial intelligence. Among them, the Chinese natural language processing HanLP algorithm is a text data extraction algorithm , including word segmentation, part-of-speech tagging, and entity recognition.

In recent years, driven by big data and deep learning, natural language processing technology has developed rapidly. At present, the subject-verb-object extraction algorithms for text data are roughly divided into two types, one is based on deep learning, the other is based on language The method based on rules, the inventor realized that the method based on deep learning requires a large amount of labeling data, and the extraction effect of language description related to the action of the character is not ideal, while the extraction method based on language rules has a large error and does not conform to the behavior of the character. The need for relevant data extraction, and the extracted data is noisy.

SUMMARY OF THE INVENTION

The present application provides a method, device, device and storage medium for extracting data related to character actions, which are used for syntactic analysis and part-of-speech tagging of text data through Chinese natural language processing HanLP algorithm, and based on the grammatical relationship and modality of subject, predicate and object The verb filters out the relevant data of the action that is taking place, which improves the accuracy of data extraction and reduces the noise of the extracted data set.

A first aspect of the present application provides a method for extracting data related to character actions, including: acquiring preset text data, where the preset text data is novel text data including character actions and actions; The data is classified and processed, and the text data containing the character information is screened out to obtain the initial text data; based on the preset Chinese natural language processing HanLP algorithm, the initial text data is subjected to word segmentation and part-of-speech tagging to generate intermediate text data; The preset Chinese natural language processing HanLP algorithm performs dependency syntactic analysis and semantic dependency analysis on the intermediate text data to generate analysis text data; filter the analysis text data to obtain target text data containing multiple character behaviors and actions .

A second aspect of the present application provides a device for extracting data related to character actions, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and the processor executes the When the computer-readable instructions are described, the following steps are implemented: obtaining preset text data, which is text data containing character behaviors; classifying the preset text data, and filtering out the text data containing character information based on the preset Chinese natural language processing HanLP algorithm to perform word segmentation and part-of-speech tagging on the initial text data to generate intermediate text data; based on the preset Chinese natural language processing HanLP algorithm The intermediate text data is subjected to dependency syntactic analysis and semantic dependency analysis to generate analysis text data; the analysis text data is filtered to obtain target text data including behaviors and actions of a plurality of characters.

A third aspect of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer is caused to perform the following steps: acquiring preset text The preset text data is the text data containing the behaviors and actions of the characters; the preset text data is classified and processed, and the text data containing the character information is screened out to obtain the initial text data; based on the preset Chinese natural The language processing HanLP algorithm performs word segmentation and part-of-speech tagging on the initial text data to generate intermediate text data; based on the preset Chinese natural language processing HanLP algorithm, the intermediate text data is subjected to dependency syntax analysis and semantic dependency analysis to generate Analyzing the text data; filtering the analyzed text data to obtain target text data including behaviors and actions of a plurality of characters.

A fourth aspect of the present application provides a device for extracting data related to character actions, comprising: an obtaining module for obtaining preset text data, where the preset text data is novel text data including character actions and actions; a classification module , used to classify and process the preset text data, screen out the text data containing the character information, and obtain the initial text data; the word segmentation module is used to analyze the initial text data based on the preset Chinese natural language processing HanLP algorithm Perform word segmentation processing and part-of-speech tagging to generate intermediate text data; an analysis module is used to perform dependency syntactic analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm, and generate analysis text data; filter; The module is used for filtering and processing the analysis text data to obtain target text data including a plurality of characters' behaviors and actions.

In the technical solution provided by the present application, the preset text data is obtained, and the preset text data is novel text data containing character behaviors; the preset text data is classified and processed, and the text containing the character information is filtered out. text data to obtain initial text data; based on the preset Chinese natural language processing HanLP algorithm, word segmentation and part-of-speech tagging are performed on the initial text data to generate intermediate text data; based on the preset Chinese natural language processing HanLP algorithm The intermediate text data is subjected to dependency syntactic analysis and semantic dependency analysis to generate analysis text data; the analysis text data is filtered to obtain target text data containing behaviors and actions of multiple characters. In the embodiment of this application, the Chinese natural language processing HanLP algorithm is used to perform syntax analysis and part-of-speech tagging on the text data, and based on the grammatical relationship between the subject, predicate and object and modal verbs, the relevant data of the ongoing behavior and actions are screened out, which improves the efficiency of data extraction. accuracy, reducing the noise of the extracted dataset.

Description of drawings

FIG. 1 is a schematic diagram of an embodiment of a method for extracting data related to character actions in an embodiment of the present application;

FIG. 2 is a schematic diagram of another embodiment of a method for extracting data related to character actions in an embodiment of the present application;

FIG. 3 is a schematic diagram of an embodiment of an apparatus for extracting data related to character actions in an embodiment of the present application;

FIG. 4 is a schematic diagram of another embodiment of an apparatus for extracting data related to character actions in an embodiment of the present application;

FIG. 5 is a schematic diagram of an embodiment of a device for extracting data related to a character action in an embodiment of the present application.

Detailed ways

The embodiments of the present application provide a method, device, device, and storage medium for extracting data related to a character's action. The Chinese natural language processing HanLP algorithm is used to perform syntax analysis and part-of-speech tagging on the text data, and based on the grammatical relationship and modality of subject, predicate and object The verb filters out the relevant data of the action that is taking place, which improves the accuracy of data extraction and reduces the noise of the extracted data set.

The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" or "having" and any variations thereof are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed steps or units, but may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

For ease of understanding, the specific process of the embodiment of the present application will be described below. Please refer to FIG. 1 . An embodiment of the method for extracting data related to a character action in the embodiment of the present application includes:

101. Acquire preset text data, where the preset text data is novel text data including behaviors and actions of characters.

The server obtains preset text data, and the preset text data is novel text data including behaviors and actions of characters. The server obtains multiple novel texts in the specified tags from the network through the crawler, and creates a preset data set based on the multiple novel texts.

It can be understood that the execution subject of the present application may be a device for extracting data related to a character action, and may also be a terminal or a server, which is not specifically limited here. The embodiments of the present application take the server as an execution subject as an example for description.

102. Classify the preset text data, filter out the text data containing the personal information, and obtain initial text data.

The server classifies the preset text data, filters out the text data containing the character information, and obtains the initial text data. Specifically, the server classifies the preset text data according to the preset classification rules, filters out the text data containing pronouns or personal names, and generates classified text data; the server filters the classified text data, identifies the target punctuation marks and generates Delete the text data containing the dialogue of the characters, generate the initial text data, and the target punctuation marks are used to indicate the dialogue of the characters. The server divides the preset text data into two categories according to whether it contains character information, and removes the text data that does not contain character information, for example, "the dog is running in the yard", "the bird is chirping outside the window", " The squirrel uses its big fluffy tail as a quilt cover, etc., and filters out text data including character pronouns or character names. Character pronouns include me (we), you (we), him (them) and she (them). The preset punctuation mark is a combination of "colon + double quotation marks", which is used to indicate the dialogue between characters. Although the text data with dialogue between characters contains character information, it does not conform to the analysis and extraction of data related to characters' actions and actions in this scheme. So it needs to be removed.

103. Perform word segmentation and part-of-speech tagging on the initial text data based on the preset Chinese natural language processing HanLP algorithm to generate intermediate text data.

The server performs word segmentation and part-of-speech tagging on the initial text data based on the preset Chinese natural language processing HanLP algorithm to generate intermediate text data. Specifically, the server performs sentence segmentation processing on the initial text data through punctuation to obtain a sentence segmentation result; the server performs word segmentation processing on the sentence segmentation result based on the preset Chinese natural language processing HanLP algorithm, and obtains a word segmentation result; The natural language processing HanLP algorithm and the preset HanLP part-of-speech tagging set perform part-of-speech tagging on the word segmentation results to generate intermediate text data. Word is the most basic unit of text, word segmentation is the most basic step in natural language processing, word segmentation algorithm is divided into dictionary method and statistical method, among which, the method based on dictionary and artificial rules is to analyze the word to be analyzed and dictionary according to a certain strategy. The terms in the corpus are matched, and the statistical method is the statistical frequency of the basic strings appearing in the corpus. Each punctuation mark has a corresponding regular expression, and the initial text data is segmented by the punctuation mark, and the long sentence is divided into multiple short sentences to obtain the first text data. Chinese natural language processing (han language processing, HanLP) is a toolkit composed of a series of models and algorithms, the goal is to promote the application of natural language processing in the production environment, HanLP has complete functions, high performance, clear structure, and up-to-date corpus In this solution, HanLP firstly performs word segmentation on the text data, for example, input "Xiao Ming is eating", and the result after word segmentation is "Xiao Ming", "Making", "Eating". Part-of-speech tagging refers to the process of marking a correct part-of-speech for each word in the segmentation result, that is, the process of determining whether each word in the segmentation result is a noun, verb, adjective or other part-of-speech. The tagging set performs part-of-speech tagging on the result of word segmentation. The part of speech corresponding to "Xiao Ming" is "noun", the part of speech corresponding to "Zheng" is "adverb", and the part of speech corresponding to "dining" is "verb".

104. Perform dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm, and generate analysis text data.

The server performs dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm, and generates analysis text data. Dependency parsing (DP) reveals the syntactic structure by analyzing the dependencies between the components in the language unit, that is, analyzing the grammatical components such as "subject-predicate-object" and "definite-state complement" in the sentence, and analyzes each component. Semantic dependency parsing (SDP) analyzes the semantic associations between the language units of a sentence, and presents the semantic associations in a dependency structure. Units directly connect dependency arcs and mark corresponding semantic relations, which is also an important difference between semantic dependency analysis and syntactic dependency analysis. For example, "Xiao Ming ate an apple", "Xiao Ming ate an apple", "Xiao Ming ate an apple", although the three sentences have different syntactic structures, resulting in different syntactic analysis results, but the language units in the three sentences The semantic relationship between them has not changed, and the same semantic information is expressed, that is, Xiao Ming implements an eating action, and the eating action is implemented on the apple.

105. Perform filtering processing on the analyzed text data to obtain target text data including behaviors and actions of multiple characters.

The server performs filtering processing on the analysis text data, and obtains target text data including the behaviors and actions of a plurality of characters. Specifically, the server obtains and analyzes the text data, filters and analyzes the text data containing modal verbs in the analysis text data, and generates the filtered text data; the server normalizes the filtered text data to generate target text data, and the target text data includes the extracted personal behavior. After the subject-predicate-object character actions are screened out, when there is a modal verb that modifies the predicate verb in the sentence, it does not meet the conditions, because due to the appearance of the modal verb, the sentence presents the general future tense, indicating the action at a certain moment in the future Or the state, the character action has not yet occurred, for example, "Xiao Ming is going to set off to swing on the swing", the swing action has not yet occurred, so the relevant text data needs to be filtered and deleted.

In the embodiment of this application, the Chinese natural language processing HanLP algorithm is used to perform syntax analysis and part-of-speech tagging on the text data, and based on the grammatical relationship between the subject, predicate and object and modal verbs, the relevant data of the ongoing behavior and actions are screened out, which improves the efficiency of data extraction. accuracy, reducing the noise of the extracted dataset.

Referring to FIG. 2 , another embodiment of the method for extracting data related to character actions in the embodiment of the present application includes:

201. Acquire preset text data, where the preset text data is novel text data including behaviors and actions of characters.

The server obtains preset text data, and the preset text data is novel text data including behaviors and actions of characters. The server obtains multiple novel texts within a specified tag from the network through a crawler, and creates a preset data set based on the multiple novel texts.

202. Classify the preset text data, filter out the text data containing the character information, and obtain initial text data.

203. Perform word segmentation processing and part-of-speech tagging on the initial text data based on the preset Chinese natural language processing HanLP algorithm to generate intermediate text data.

204. Call the preset Chinese natural language processing HanLP algorithm to identify and analyze the relationship between the grammatical components in the intermediate text data, and when the core relationship of the object points to the predicate verb, extract the core subject-predicate-object relationship to generate the first analysis text data.

The server invokes the preset Chinese natural language processing HanLP algorithm to identify and analyze the relationship between the grammatical components in the intermediate text data. When the core relationship of the object points to the predicate verb, the core subject-verb-object relationship is extracted to generate the first analysis text data. For example, in "Xiao Ming is playing a game in the room", "Xiao Ming" belongs to the noun subject, "zheng" belongs to the noun adverbial, "zai" belongs to the prepositional modifier, "room" belongs to the prepositional location modifier, and "li" belongs to Time preposition, "play" belongs to the predicate verb, "game" belongs to the direct object, and the predicate verb "play" is the core word, so this sentence can be extracted as "Xiao Ming playing the game" which contains the subject-verb-object relationship.

205. Invoke the preset Chinese natural language processing HanLP algorithm to analyze the semantic relationship in the intermediate text data, determine the relationship type, filter out the text data including the agency relationship, and generate the second analysis text data.

The server invokes the preset Chinese natural language processing HanLP algorithm to analyze the semantic relationship in the intermediate text data, determines the relationship type, filters out the text data containing the agency relationship, and generates the second analysis text data. The relationship types include agency relationship, party relationship, feeling relationship, consular relationship, client relationship, guest relationship, success relationship, source relationship, involved relationship, and comparative roles, for example, "Xiao Ming gave her a bouquet of flowers" , the semantic relationship type in this sentence is the agency relationship, and the action of "sending flowers" is the specific action made by the character, which meets the screening conditions in this scheme, "Xiao Ming is eating in the room, watching TV, and talking at the same time. ", this sentence contains multiple predicate verbs "eat", "see" and "speak", and the multiple predicate verbs have an inheritance relationship, which also meets the filtering conditions in this scheme.

206. Combine the first analysis text data and the second analysis text data to generate analysis text data.

The server combines the first analysis text data and the second analysis text data to generate analysis text data. In this solution, word segmentation, part-of-speech tagging, syntactic analysis and semantic analysis are all based on the HanLP algorithm. Each layer will form a separate data result. The data result of each layer can be used alone or transmitted to the next layer for further analysis.

207. Perform filtering processing on the analyzed text data to obtain target text data including behaviors and actions of multiple characters.

The method for extracting data related to character motion in the embodiment of the present application has been described above, and the apparatus for extracting data related to character motion in the embodiment of the present application is described below. Please refer to FIG. 3 , the apparatus for extracting data related to character motion in the embodiment of the present application An example of includes:

The obtaining module 301 is used for obtaining preset text data, where the preset text data is novel text data including the behavior and actions of characters;

The classification module 302 is used for classifying and processing preset text data, screening out text data containing personal information, and obtaining initial text data;

The word segmentation module 303 is configured to perform word segmentation processing and part-of-speech tagging on the initial text data based on the preset Chinese natural language processing HanLP algorithm to generate intermediate text data;

The analysis module 304 is configured to perform dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm, and generate analysis text data;

The filtering module 305 is configured to perform filtering processing on the analyzed text data to obtain target text data including behaviors and actions of a plurality of characters.

Referring to FIG. 4 , another embodiment of the apparatus for extracting data related to character actions in the embodiment of the present application includes:

Optionally, the classification module 302 includes:

The classification unit 3021 is used to classify the preset text data according to the preset classification rules, filter out the text data containing the pronouns or the names of the characters, and generate the classified text data;

The deletion unit 3022 is configured to identify the target punctuation in the classified text data, and delete the text data containing the dialogue of the characters according to the target punctuation to generate initial text data, and the target punctuation is used to indicate the dialogue of the characters.

Optionally, the word segmentation module 303 includes:

The sentence segmentation unit 3031 is used to perform sentence segmentation processing on the initial text data through punctuation to obtain a sentence segmentation result;

The word segmentation unit 3032 is used to perform word segmentation processing on the sentence segmentation result based on the preset Chinese natural language processing HanLP algorithm to obtain the word segmentation result;

The part-of-speech tagging unit 3033 is configured to perform part-of-speech tagging on the word segmentation result based on the preset Chinese natural language processing HanLP algorithm and the preset HanLP part-of-speech tagging set, and generate intermediate text data.

Optionally, the analysis module 304 includes:

The first analysis unit 3041 is used to call the preset Chinese natural language processing HanLP algorithm to identify and analyze the relationship between grammatical components in the intermediate text data. When the core relationship of the object points to the predicate verb, extract the core subject-predicate-object relationship to generate The first analyzes the text data;

The second analysis unit 3042 is used to call the preset Chinese natural language processing HanLP algorithm to analyze the semantic association in the intermediate text data, determine the relationship type and filter out the text data containing the agency relationship, and generate the second analysis text data;

The combining unit 3043 is configured to combine the first analysis text data and the second analysis text data to generate analysis text data.

Optionally, the filtering module 305 includes:

Filtering unit 3051, for filtering and analyzing the text data containing modal verbs in the text data, and generating filtering text data;

The normalization unit 3052 is configured to perform normalization processing on the filtered text data, so as to generate target text data including the behaviors and actions of a plurality of characters.

Optionally, after the analysis module 304 and before the filtering module 305, the apparatus for extracting the data related to the action of the character further includes:

The identification module 306 is used to identify and analyze whether the character behaviors and actions that occurred in the past are included in the analysis text data. When the analysis text data does not include the character behaviors and actions that occurred in the past, the analysis text data is retained, and the characters that occur in the past are included in the analysis text data. When performing actions, the data related to the actions and actions of the characters that have occurred in the past will be deleted.

Specifically, for example, in "Xiao Ming has already eaten", "eat" is a predicate verb, but the sentence is in the simple past tense, and the semantic relationship expresses Xiao Ming's past state, not the current action. Therefore, the relevant text data needs to be deleted.

Figures 3 and 4 above describe in detail the device for extracting data related to human action in the embodiment of the present application from the perspective of modular functional entities, and the following describes the device for extracting data related to human action in the embodiment of the present application in detail from the perspective of hardware processing. describe.

5 is a schematic structural diagram of a device for extracting data related to character actions provided by an embodiment of the present application. The device 500 for extracting data related to human actions may vary greatly due to different configurations or performances, and may include one or more than one Central processing units (CPU) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store application programs 533 or data 532. Among them, the memory 520 and the storage medium 530 may be short-term storage or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the apparatus 500 for extracting data related to the action of a character. Furthermore, the processor 510 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the device 500 for extracting data related to the character action.

The apparatus 500 for extracting data related to character actions may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531, For example Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the extraction device for character action-related data shown in FIG. 5 does not constitute a limitation on the extraction device for character action-related data, and may include more or less components than those shown in the figure, or a combination of certain some components, or a different arrangement of components.

The present application also provides a device for extracting data related to a character's action. The computer device includes a memory and a processor. Computer-readable instructions are stored in the memory. When the computer-readable instructions are executed by the processor, the processor executes the above implementations. The steps of the method for extracting the data related to the character action in the example.

The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium may also be a volatile computer-readable storage medium. Instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, make the computer execute the steps of the method for extracting data related to a character movement.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, and the like; The data created by the use of the node, etc.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.

The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, removable hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims

A method for extracting data related to character actions, wherein the method for extracting data related to character actions includes:

Obtaining preset text data, the preset text data is text data containing the behavior and actions of characters;

classifying the preset text data, screening out text data containing personal information, and obtaining initial text data;

Perform word segmentation and part-of-speech tagging on the initial text data based on the preset Chinese natural language processing HanLP algorithm to generate intermediate text data;

Performing dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm to generate analysis text data;

The analysis text data is filtered to obtain target text data including the behaviors and actions of a plurality of characters.
The method for extracting data related to character movements according to claim 1, wherein the classifying and processing the preset text data, filtering out text data containing character information, and obtaining the initial text data comprises:

Classifying the preset text data according to preset classification rules, filtering out text data including character pronouns or character names, and generating classified text data;

Identifying target punctuation marks in the classified text data, and deleting text data containing dialogues between characters according to the target punctuation marks, and generating initial text data, the target punctuation marks are used to indicate dialogues between characters.
The method for extracting character action-related data according to claim 1, wherein the preset Chinese natural language processing HanLP algorithm performs word segmentation and part-of-speech tagging on the initial text data, and generating the intermediate text data comprises:

Perform sentence segmentation processing on the initial text data through punctuation to obtain a sentence segmentation result;

Perform word segmentation processing on the sentence segmentation result based on the preset Chinese natural language processing HanLP algorithm to obtain the word segmentation result;

Based on the preset HanLP algorithm for Chinese natural language processing and the preset HanLP part-of-speech tagging set, part-of-speech tagging is performed on the word segmentation result to generate intermediate text data.
The method for extracting data related to character actions according to claim 1, wherein the HanLP algorithm based on the preset Chinese natural language processing performs dependency syntax analysis and semantic dependency analysis on the intermediate text data to generate analysis text data include:

Call the preset Chinese natural language processing HanLP algorithm to identify and analyze the relationship between the grammatical components in the intermediate text data, when the core relationship of the object points to the predicate verb, extract the core subject-predicate-object relationship, and generate the first analysis text data;

Invoke the preset Chinese natural language processing HanLP algorithm to analyze the semantic association in the intermediate text data, determine the relationship type and filter out the text data containing the agency relationship, and generate the second analysis text data;

The first analysis text data and the second analysis text data are combined to generate analysis text data.
The method for extracting character action-related data according to claim 1, wherein the filtering of the analysis text data to generate target text data, the target text data comprising the extracted multiple character actions and actions comprising:

Filtering the text data containing modal verbs in the analysis text data to generate filtered text data;

The filtered text data is normalized to generate target text data including multiple characters' actions.
The method for extracting character action-related data according to claim 5, wherein the filtering the text data containing modal verbs in the analysis text data, and generating the filtered text data comprises:

Identifying text data that contains modal verbs in the analyzed text data, the modal verbs are used to indicate character actions that have not yet occurred;

The text data containing the modal verb is deleted to generate filtered text data.
The method for extracting character action-related data according to any one of claims 1 to 6, wherein a dependency syntax analysis and a semantic dependency analysis are performed on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm. , after the analysis text data is generated, and before the analysis text data is filtered and the target text data is generated, the method further includes:

Identifying whether the analysis text data contains the behaviors and actions of characters that occurred in the past, when the analysis text data does not contain the behaviors and actions of characters that occurred in the past, keep the analysis text data, and when the analysis text data contains the behaviors that occurred in the past When the character behavior and action are mentioned, the relevant data including the character behavior and action that happened in the past will be deleted.
A device for extracting data related to character action, comprising a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and the processor executes the computer-readable instructions to achieve Follow the steps below:

Obtaining preset text data, the preset text data is text data containing the behavior and actions of characters;

classifying the preset text data, screening out text data containing personal information, and obtaining initial text data;

Perform word segmentation and part-of-speech tagging on the initial text data based on the preset Chinese natural language processing HanLP algorithm to generate intermediate text data;

Performing dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm to generate analysis text data;

The analysis text data is filtered to obtain target text data including the behaviors and actions of a plurality of characters.
The device for extracting data related to character actions according to claim 8, wherein the processor further implements the following steps when executing the computer program:

Classifying the preset text data according to preset classification rules, filtering out text data including character pronouns or character names, and generating classified text data;

Identifying target punctuation marks in the classified text data, and deleting text data containing dialogues between characters according to the target punctuation marks, and generating initial text data, the target punctuation marks are used to indicate dialogues between characters.
The device for extracting data related to character actions according to claim 8, wherein the processor further implements the following steps when executing the computer program:

Perform sentence segmentation processing on the initial text data through punctuation to obtain a sentence segmentation result;

Perform word segmentation processing on the sentence segmentation result based on the preset Chinese natural language processing HanLP algorithm to obtain the word segmentation result;

Based on the preset Chinese natural language processing HanLP algorithm and the preset HanLP part-of-speech tagging set, part-of-speech tagging is performed on the word segmentation result to generate intermediate text data.
The device for extracting data related to character actions according to claim 8, wherein the processor further implements the following steps when executing the computer program:

Call the preset Chinese natural language processing HanLP algorithm to identify and analyze the relationship between the grammatical components in the intermediate text data, when the core relationship of the object points to the predicate verb, extract the core subject-predicate-object relationship to generate the first analysis text data;

Invoke the preset Chinese natural language processing HanLP algorithm to analyze the semantic association in the intermediate text data, determine the relationship type and filter out the text data containing the agency relationship, and generate the second analysis text data;

The first analysis text data and the second analysis text data are combined to generate analysis text data.
The device for extracting data related to character actions according to claim 8, wherein the processor further implements the following steps when executing the computer program:

Filtering text data containing modal verbs in the analysis text data to generate filtered text data;

The filtered text data is normalized to generate target text data including multiple characters' actions.
The device for extracting data related to character actions according to claim 12, wherein the processor further implements the following steps when executing the computer program:

Identifying text data that contains modal verbs in the analyzed text data, the modal verbs are used to indicate character actions that have not yet occurred;

The text data containing the modal verb is deleted to generate filtered text data.
According to the device for extracting data related to character action according to any one of claims 8-13, the processor further implements the following steps when executing the computer program:

Identifying whether the analysis text data contains the behaviors and actions of characters that occurred in the past, when the analysis text data does not contain the behaviors and actions of characters that occurred in the past, keep the analysis text data, and when the analysis text data contains the behaviors that occurred in the past When the character behavior and action are mentioned, the relevant data including the character behavior and action that happened in the past will be deleted.
A computer-readable storage medium, storing computer instructions in the computer-readable storage medium, when the computer instructions are executed on a computer, the computer is made to perform the following steps:

Obtaining preset text data, the preset text data is text data containing the behavior and actions of characters;

classifying the preset text data, screening out text data containing personal information, and obtaining initial text data;

Perform word segmentation and part-of-speech tagging on the initial text data based on the preset Chinese natural language processing HanLP algorithm to generate intermediate text data;

Performing dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm to generate analysis text data;

The analysis text data is filtered to obtain target text data including the behaviors and actions of a plurality of characters.
The computer-readable storage medium according to claim 15, wherein computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on the computer, the computer is caused to perform the following steps:

Classifying the preset text data according to preset classification rules, filtering out text data containing pronouns or personal names, and generating classified text data;

Identifying target punctuation marks in the classified text data, and deleting text data containing dialogue between characters according to the target punctuation marks, and generating initial text data, the target punctuation marks are used to indicate dialogue between characters.
The computer-readable storage medium according to claim 15, wherein computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on the computer, the computer is caused to perform the following steps:

Perform sentence segmentation processing on the initial text data through punctuation to obtain a sentence segmentation result;

Perform word segmentation processing on the sentence segmentation result based on the preset Chinese natural language processing HanLP algorithm to obtain the word segmentation result;

Based on the preset Chinese natural language processing HanLP algorithm and the preset HanLP part-of-speech tagging set, part-of-speech tagging is performed on the word segmentation result to generate intermediate text data.
The computer-readable storage medium according to claim 15, wherein computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on the computer, the computer is caused to perform the following steps:

Call the preset Chinese natural language processing HanLP algorithm to identify and analyze the relationship between the grammatical components in the intermediate text data, when the core relationship of the object points to the predicate verb, extract the core subject-predicate-object relationship to generate the first analysis text data;

Invoke the preset Chinese natural language processing HanLP algorithm to analyze the semantic association in the intermediate text data, determine the relationship type and filter out the text data containing the agency relationship, and generate the second analysis text data;

The first analysis text data and the second analysis text data are combined to generate analysis text data.
The computer-readable storage medium according to claim 15, wherein computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on the computer, the computer is caused to perform the following steps:

Filtering the text data containing modal verbs in the analysis text data to generate filtered text data;

The filtered text data is normalized to generate target text data including multiple characters' actions.
A device for extracting data related to character actions, wherein the device for extracting data related to character actions includes:

an acquisition module, used for acquiring preset text data, the preset text data being novel text data containing the behavior and actions of characters;

a classification module, configured to classify and process the preset text data, screen out the text data containing the character information, and obtain the initial text data;

The word segmentation module is used to perform word segmentation and part-of-speech tagging on the initial text data based on the preset Chinese natural language processing HanLP algorithm to generate intermediate text data;

an analysis module, configured to perform dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm, and generate analysis text data;

The filtering module is used for filtering the analysis text data to obtain target text data including the behaviors and actions of a plurality of characters.