CN113191140A

CN113191140A - Text processing method and device, electronic equipment and storage medium

Info

Publication number: CN113191140A
Application number: CN202110742177.9A
Authority: CN
Inventors: 陈帅婷; 陈昌滨; 郭少彤
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2021-07-01
Filing date: 2021-07-01
Publication date: 2021-07-30
Anticipated expiration: 2041-07-01
Also published as: CN113191140B

Abstract

The disclosure provides a text processing method, a text processing device, an electronic device and a storage medium. Wherein the method comprises the following steps: generating a grammar tree of a text to be processed through a grammar tool; converting the grammar tree to obtain a grammar graph of the text to be processed; based on the grammar graph, coding grammar relations of characters in the text to be processed to obtain grammar relation feature data of the characters in the text to be processed; carrying out grammar enhancement processing on the grammar relation of the characters based on the grammar relation feature data of the characters and the obtained semantic feature data of the characters to obtain the grammar enhancement feature data of the characters; and performing prosodic boundary prediction and polyphonic disambiguation on the characters based on the grammar enhancement feature data of the characters to obtain prosodic boundary prediction results and polyphonic disambiguation results of the characters. By the method and the device, the text processing of the front-end module can be effectively simplified while the accuracy of prosodic boundary prediction and polyphonic disambiguation is considered.

Description

Text processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of speech synthesis technologies, and in particular, to a text processing method, an apparatus, an electronic device, and a computer-readable storage medium.

Background

The speech synthesis system mainly comprises a front-end module and a rear-end module, wherein the front-end module is used for text analysis, and the rear-end module is used for speech generation. The front-end module bases the speech synthesis of the rear-end module to ensure smooth speech synthesis. In a Chinese speech synthesis system, the front-end module contains at least two main parts, prosodic boundary prediction and polyphonic disambiguation. The phenomenon of one character being ambiguous, i.e. polyphone, exists in Chinese. The pronunciations of polyphones in different contexts can be greatly different and even influence the meaning expressed in the whole sentence, so that the polyphone disambiguation is required to ensure the accuracy of the pinyin annotation. Meanwhile, the Chinese text is not separated from adjacent words of the English text by using spaces, the words of the Chinese text can be one or more words, and no definite separation symbol exists between the adjacent words, so that the rhythm problem in the sentence needs to be processed by accurately predicting the rhythm boundary, and the natural smoothness and rhythm feeling of the synthesized voice are ensured.

Currently, polyphonic disambiguation is mainly achieved by using rule-based models, while prosodic boundary prediction is mainly achieved by statistically learning conditional random fields, or bidirectional long-and-short-term memory network/attention-based models. Thus, polyphone disambiguation and prosodic boundary prediction are treated as separate tasks in the front-end module, the front-end module becomes a long pipeline, the construction and maintenance of the front-end module become complex and laborious work, and the storage and calculation of various models in the front-end module also hinder the deployment of a speech synthesis system on mobile devices.

Therefore, it is obvious that how to effectively simplify the text processing of the front-end module becomes a technical problem to be solved at present while considering the accuracy of prosodic boundary prediction and polyphonic disambiguation.

Disclosure of Invention

In view of the above, one of the technical problems to be solved by the present disclosure is to provide a text processing method, an apparatus, an electronic device, and a computer-readable storage medium, so as to solve at least one of the above technical problems.

According to a first aspect of the present disclosure, a text processing method is provided. The method comprises the following steps: generating a grammar tree of a text to be processed through a grammar tool; converting the grammar tree of the text to be processed to obtain a grammar graph of the text to be processed; based on the grammatical graph of the text to be processed, coding grammatical relations of characters in the text to be processed so as to obtain grammatical relation characteristic data of the characters in the text to be processed; based on the syntactic relation feature data of the characters and the obtained semantic feature data of the characters, carrying out syntactic enhancement processing on the syntactic relation of the characters to obtain syntactic enhancement feature data of the characters; and performing prosodic boundary prediction and polyphonic disambiguation on the characters based on the grammar enhancement feature data of the characters to obtain prosodic boundary prediction results and polyphonic disambiguation results of the characters.

According to a second aspect of the present disclosure, a text processing apparatus is provided. The device comprises: the generating module is used for generating a grammar tree of the text to be processed through a grammar tool; the conversion module is used for converting the grammar tree of the text to be processed to obtain a grammar graph of the text to be processed; the grammar relation coding module is used for coding grammar relations of characters in the text to be processed based on a grammar graph of the text to be processed so as to obtain grammar relation feature data of the characters in the text to be processed; the grammar enhancement processing module is used for carrying out grammar enhancement processing on the grammar relation of the characters based on the grammar relation characteristic data of the characters and the obtained semantic characteristic data of the characters so as to obtain the grammar enhancement characteristic data of the characters; and the text processing module is used for carrying out prosodic boundary prediction and polyphonic disambiguation on the characters based on the grammar enhancement feature data of the characters so as to obtain prosodic boundary prediction results and polyphonic disambiguation results of the characters.

According to a third aspect of the present disclosure, an electronic device is provided. The electronic device includes: a processor; and a memory storing a program, wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to the first aspect of the disclosure.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the first aspect of the present disclosure.

According to the text processing scheme provided by the disclosure, a grammar tree of a text to be processed is generated through a grammar tool, the grammar tree of the text to be processed is converted to obtain a grammar drawing of the text to be processed, grammar relations of characters in the text to be processed are encoded based on the grammar drawing of the text to be processed, grammar relation feature data of the characters in the text to be processed can be accurately obtained, grammar enhancement processing is performed on the grammar relations of the characters based on the grammar relation feature data of the characters and the obtained semantic feature data of the characters, grammar enhancement feature data of the characters can be accurately obtained, and namely grammar information of the characters is successfully introduced into the feature data of the characters. And finally, performing prosodic boundary prediction and polyphonic disambiguation on the characters based on the grammar enhancement feature data of the characters, and accurately obtaining prosodic boundary prediction results and polyphonic disambiguation results of the characters. In addition, prosodic boundary prediction and polyphonic disambiguation of characters are simultaneously achieved in the same text processing scheme in the front-end module of the speech synthesis system, which can effectively simplify text processing in the front-end module. That is, the text processing of the front-end module can be effectively simplified while giving consideration to the accuracy of prosodic boundary prediction and polyphonic disambiguation.

Drawings

Further details, features and advantages of the disclosure are disclosed in the following description of exemplary embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1A is a flowchart illustrating steps of a text processing method according to a first exemplary embodiment of the present disclosure;

FIG. 1B is a diagram illustrating a syntax tree in accordance with an illustrative embodiment of the present disclosure;

fig. 1C shows a schematic diagram of a syntax diagram of a first exemplary embodiment of the present disclosure;

FIG. 1D is a diagram illustrating a graph representation of a forward syntax diagram in accordance with an exemplary embodiment of the present disclosure;

FIG. 1E is a diagram illustrating a graph representation of a backward syntax diagram in a first exemplary embodiment of the present disclosure;

FIG. 1F is a flow chart illustrating syntax relationship coding in an exemplary embodiment of the disclosure;

FIG. 1G is a diagram illustrating a structure of a model for text processing in accordance with an exemplary embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a text processing apparatus according to a second exemplary embodiment of the present disclosure;

fig. 3 shows a schematic structural diagram of an electronic device according to a third exemplary embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Before describing the embodiments of the present disclosure in detail, a brief description will be given of the design idea of the technical solution of the present disclosure. In particular, prosodic boundary prediction is studied by combining the taxonomic and lexical branches of the major linguistics. In grammar, the definition of the selection is based on a grammar selection test, the goal of which is to manipulate the grammar components defined by the grammar structure. These tests provide evidence that the sentence is hierarchically structured. Even if words are pronounced in a linear fashion, their membership in grammatical phrases is interrelated. In the grammar-prosody interface, prosodic components are diagnosed based on prosodic boundaries and prosodic domains, the division of which includes distribution of pitch accents and prosodic boundary tones, application fields of the segmentation (transposition) process, and scaling such as pauses, durations, and tonal accents. Therefore, the prosodic component and the grammatical component have certain isomorphism. Furthermore, in speech synthesis in chinese, unlike english where adjacent words are separated by spaces, a word in chinese may be one or more words without explicit separation symbols between adjacent words. Words as the basic unit of Chinese with semantics are also crucial to polyphonic disambiguation. Therefore, the grammar information plays an important role in the prosodic boundary prediction task and the polyphonic disambiguation task, and adding the grammar information to the prosodic boundary prediction task and the polyphonic disambiguation task is helpful. Based on this, the inventors of the present application have considered that the accuracy of prosodic boundary prediction and polyphonic disambiguation of text can be effectively improved by introducing grammatical information into the prosodic boundary prediction task and the polyphonic disambiguation task. The text processing method provided by the disclosure has the following specific implementation modes:

example one

Referring to fig. 1A, a flowchart illustrating steps of a text processing method according to a first exemplary embodiment of the present disclosure is shown.

Specifically, the text processing method provided by the present disclosure includes the following steps:

in step S101, a syntax tree of the text to be processed is generated by the syntax tool.

In this embodiment, the grammar tool may be a SyntaxNet toolkit, the text to be processed may be a chinese text to be speech-synthesized, and the grammar tree may be understood as a tree representation describing language dependencies among words of the text, which is advantageous for understanding a hierarchy of a grammar structure of the text. In short, the syntax tree is a tree representation formed when derivation is performed according to a preset rule. In a tree structure, directly related words in the text are connected, while others are not directly related.

In some optional embodiments, when generating, by the grammar tool, a grammar tree of the text to be processed, the method further includes: storing the part-of-speech type of the node in the syntax tree by using a part-of-speech type and a part-of-speech type identification mode; and storing the characters in the text to be processed by using a character and character identification mode. Therefore, the part-of-speech type of the node in the grammar tree can be effectively stored by using the part-of-speech type and the part-of-speech type identification. In addition, characters in the text to be processed can be effectively stored by using the characters and the character identification mode.

In one specific example, the text to be processed is "duckling playing in water" and its syntax tree representation is shown in FIG. 1B. Specifically, the content in each box represents a part-of-speech type, IP represents a sentence, NN represents a noun, P represents a preposition, PP represents a preposition phrase, VP represents a predicate phrase, and NP represents a noun phrase. The storage of the node part-of-speech type is performed by using (part-of-speech, id) [ (IP,0), (NN,1), (PP,2), (VP,3), (NP,4), (P,5) ], etc., so as to facilitate the subsequent addition of grammar information. The storage mode of the characters is as follows: [ ('little', 6), ('duck', 7), ('son', 8), ('in', 9), ('water', 10), ('li', 11), ('play', 12), ('play', 13) ].

In step S102, the syntax tree of the text to be processed is converted to obtain a syntax map of the text to be processed.

In this embodiment, the grammar graph is a graphic description of the grammar of the text to be processed, and is a commonly used illustration method for describing a programming language.

In some optional embodiments, the syntax map comprises a forward syntax map. When the grammar tree of the text to be processed is converted, taking characters in the text to be processed as nodes of the forward grammar graph, and bidirectionally connecting the nodes where adjacent characters in the text to be processed are located; if the non-adjacent character of the current character in the text to be processed is judged to be the first character in the leaf node in the grammar tree, the node where the current character is located is connected with the node where the non-adjacent character of the current character is located in the forward direction, so that a forward grammar graph of the text to be processed is obtained. Therefore, the forward grammar graph of the text to be processed can be accurately obtained through the bidirectional connection of the nodes where the adjacent characters are located in the text to be processed and the forward connection of the node where the current character is located and the node where the non-adjacent character of the current character is located.

In a specific example, as shown in FIG. 1C, the forward grammar map for the text to be processed is located below the grammar tree for the text to be processed. The process of converting the syntax tree of the text to be processed into the forward syntax map of the text to be processed is as follows: (1) and (3) node: characters in the text to be processed are used as nodes in a forward grammar graph; (2) and (3) edge formulation rules: a. the original sequence relation between the adjacent nodes in the text is considered, so that the adjacent nodes are connected in a bidirectional mode; b. other nodes, whether the current character (node) i can be concatenated with character (node) j, depend on whether j is the first character of a leaf node of the syntax tree. For example, the "small" and "water" can be connected forward, but the "small" and "lining" cannot be connected forward, and the basic word boundary information is introduced in this way; (3) and side information, wherein the information on each side is the part-of-speech id of the character j connected with the character i, so as to introduce the grammar information.

In some optional embodiments, the syntax map further comprises a backward syntax map. When the grammar tree of the text to be processed is converted, taking characters in the text to be processed as nodes of the backward grammar graph, and bidirectionally connecting the nodes where adjacent characters in the text to be processed are located; and if the non-adjacent character of the current character in the text to be processed is judged to be the first character in the leaf node in the grammar tree, the node where the current character is located is connected with the node where the non-adjacent character of the current character is located in the backward direction, so that a backward grammar graph of the text to be processed is obtained. Therefore, the backward grammar graph of the text to be processed can be accurately obtained through the bidirectional connection of the nodes where the adjacent characters are located in the text to be processed and the backward connection of the nodes where the current character is located and the nodes where the non-adjacent characters of the current character are located.

In a specific example, as shown in fig. 1C, the backward grammar of the text to be processed is located below the forward grammar of the text to be processed. The process of converting the syntax tree of the text to be processed into the backward syntax map of the text to be processed is as follows: (1) and (3) node: characters in the text to be processed are used as nodes in the backward grammar graph; (2) and (3) edge formulation rules: a. the original sequence relation between the adjacent nodes in the text is considered, so that the adjacent nodes are connected in a bidirectional mode; b. other nodes, whether the current character (node) i can be concatenated with character (node) j, depend on whether j is the first character of a leaf node of the syntax tree. For example, "water" and "small" can be connected backwards, but "water" and "inner" cannot be connected backwards, in this way, the basic word boundary information is introduced; (3) and side information, wherein the information on each side is the part-of-speech id of the character j connected with the character i, so as to introduce the grammar information.

In step S103, based on the grammar graph of the text to be processed, the grammar relationship of the characters in the text to be processed is encoded to obtain grammar relationship feature data of the characters in the text to be processed.

In the present embodiment, the grammatical relationship of the character can be understood as an interrelation between the component unit to which the character belongs and other component units in the text grammar structure, such as a dominance relationship, a union relationship, a bias relationship, a moving guest relationship, a moving complement relationship, and the like. The grammatical relationship feature data for the character can be a grammatical relationship feature vector for the character.

In some optional embodiments, the syntax map comprises a forward syntax map and a backward syntax map. When the grammatical relation of characters in the text to be processed is coded based on the grammatical diagram of the text to be processed, respectively performing diagram representation on a forward grammatical diagram and a backward grammatical diagram of the text to be processed so as to obtain a forward grammatical representation sequence of the forward grammatical diagram of the text to be processed and a backward grammatical representation sequence of the backward grammatical diagram of the text to be processed; and coding the grammatical relation of the characters in the text to be processed based on the forward grammatical representation data of the characters in the text to be processed, which is included in the forward grammatical representation sequence, and the backward grammatical representation data of the characters in the text to be processed, which is included in the backward grammatical representation sequence, through a grammatical relation coding model so as to obtain grammatical relation characteristic data of the characters in the text to be processed. Wherein the forward syntax characterization data can be forward syntax characterization vectors, the backward syntax characterization data can be backward syntax characterization vectors, and the syntax relationship feature data can be syntax relationship feature vectors. Therefore, by the grammar relation coding model, the grammar relation of the characters in the text to be processed is coded based on the forward grammar characterization data of the characters in the text to be processed, which are included in the forward grammar characterization sequence, and the backward grammar characterization data of the characters in the text to be processed, which are included in the backward grammar characterization sequence, and the grammar relation feature data of the characters in the text to be processed can be accurately obtained.

In a specific example, as shown in fig. 1D, the top is a forward grammar graph of the text to be processed, the bottom is a table formed by a forward grammar characterization sequence of the forward grammar graph of the text to be processed, the number on the diagonal of the table is the identifier of the current character, 0 is indicated in a leaf node of the same grammar tree, inf is no connection, and the rest is the identifier of the part of speech of the word in which the character j connected to the current character i is located, for example, in a left-to-right representation, "small" is connected to "water", L-f [ "small" ] [ "water" ] = 1, "small" is connected to "at", and L-f [ "small" ] [ "at" ] = 5. As another example, the forward grammar for the character "Small" characterizes the data as (6, 0,0,5,1, inf, 1, inf).

In a specific example, as shown in fig. 1E, the upper part is a backward grammar graph of the text to be processed, the lower part is a table formed by a backward grammar characterization sequence of the backward grammar graph of the text to be processed, the number on the diagonal of the table is an identifier of the current character, 0 is indicated in a leaf node of the same grammar tree, inf is an unconnected character, and the rest is an identifier of the part of speech of a word in which a character j connected to the current character i is located, for example, in a right-to-left representation, "water" is connected to "small", L-b [ "water" ] [ "small" ] = 1, "in" connection to "small", L-b [ "in" ] [ "small" ] = 5. For another example, the backward grammar characterization data of the character "play" is (1, inf, inf, 5,1, inf, 0, 13).

In some alternative embodiments, the syntactic relational coding model includes a bi-directional recursive unit with gating. When the grammatical relations of the characters in the text to be processed are coded based on the forward grammatical representation data of the characters in the text to be processed and the backward grammatical representation data of the characters in the text to be processed, which are included by the forward grammatical representation sequence, through a grammatical relation coding model, the forward grammatical relations of the characters in the text to be processed are coded based on the forward grammatical representation data of the characters in the text to be processed, which are included by the forward grammatical representation sequence, through a forward recursion unit in the bidirectional recursion unit with gating, so as to obtain forward grammatical relation characteristic data of the characters in the text to be processed; coding, by a backward recursion unit in the bidirectional recursion unit with gating, a backward grammatical relation of characters in the text to be processed based on backward grammatical representation data of the characters in the text to be processed, which is included in the backward grammatical representation sequence, to obtain backward grammatical relation feature data of the characters in the text to be processed; and determining grammatical relation characteristic data of the characters in the text to be processed based on the forward grammatical relation characteristic data and the backward grammatical relation characteristic data of the characters in the text to be processed. The Bidirectional recursion Unit with gating can be a BiGRU (Bidirectional Gated recursion Unit), the forward recursion Unit can be a GRU (Gated recursion Unit), the backward recursion Unit can be a GRU (Gated recursion Unit), the forward recursion Unit can be a forward grammatical relation feature vector, and the backward grammatical relation feature data can be a backward grammatical relation feature vector. Therefore, by determining the forward grammar relation characteristic data and the backward grammar relation characteristic data of the characters in the text to be processed, the grammar relation characteristic data of the characters in the text to be processed can be accurately determined.

In a specific example, when determining grammatical relation feature data of characters in the text to be processed, forward grammatical relation feature vectors and backward grammatical relation feature vectors of the characters in the text to be processed are spliced front and back to obtain grammatical relation feature vectors of the characters in the text to be processed.

In a specific example, as shown in fig. 1F, the syntactic relation encoding process of the characters in the text to be processed is as follows: firstly inputting a text, then generating a grammar tree of the input text, then converting the grammar tree of the input text to obtain a grammar graph of the input text, then carrying out graph representation on the grammar graph of the input text to obtain a grammar representation sequence of the grammar graph of the input text, and finally inputting the grammar representation sequence of the grammar graph of the input text into a bidirectional recursion unit with gating to obtain grammar relation coding of characters in the input text, namely grammar relation characteristic data of the characters in the input text.

In step S104, based on the syntactic relationship feature data of the character and the obtained semantic feature data of the character, syntactic enhancement processing is performed on the syntactic relationship of the character to obtain syntactic enhancement feature data of the character.

In the present embodiment, the grammatical relationship feature data of the character can be understood as a vector for characterizing the grammatical relationship features of the character. Semantic feature data of the character may be understood as a vector for characterizing semantic features of the character. Specifically, semantic feature data of characters in the text to be processed may be extracted through a semantic feature extraction model. The semantic feature extraction model can be an encoder, a BERT model, a GPT model and the like in a neural machine translation model. The BERT model is called Bidirectional Encoder responses from transforms, means Bidirectional Encoder characterization quantity from a transformer, and is a new language characterization model. The BERT model aims to pre-train deep bi-directional tokens based on left and right contexts of all layers, so that the pre-trained BERT tokens can be fine-tuned with only one additional output layer, thereby creating a current optimal model for many tasks (such as question answering and language reasoning) without requiring a large amount of modification to a task-specific architecture. The GPT model is named as Generation Pre-Training, means a Generative Pre-Training model, and is a new Language characterization model, GPT was originally proposed in the paper "Improving Language Understanding by Generation Pre-Training", which aims to solve the Language Understanding task by using unsupervised Pre-Training and supervised fine tuning. The GPT is a universal characterization model, is suitable for various tasks and can obtain a good effect without modifying a task architecture. The grammar enhancement feature data for the character may be understood as a vector of grammar enhancement features used to characterize the character.

In some optional embodiments, when performing syntax enhancement processing on the syntax relationship of the character based on the syntax relationship feature data of the character and the obtained semantic feature data of the character, the syntax relationship feature data of the character and the semantic feature data of the character are spliced to obtain spliced feature data of the character; constructing graph structure data corresponding to the text to be processed based on the character splicing feature data; and carrying out grammar enhancement processing on the grammar relationship of the characters based on graph structure data corresponding to the text to be processed through a graph attention network so as to obtain grammar enhancement feature data of the characters. Wherein the splicing feature data of the character can be a splicing feature vector of the character. Therefore, through the graph attention network, the grammar enhancement processing is carried out on the grammar relation of the characters based on the graph structure data corresponding to the text to be processed, and the grammar enhancement feature data of the characters can be accurately obtained.

In a specific example, when graph structure data corresponding to the text to be processed is constructed based on the splicing feature data of the characters, the characters and the position relations of the characters in the text to be processed are respectively used as nodes and edges of the graph structure data, and the splicing feature data of the characters are used as feature data of the nodes of the graph structure data. Therefore, the graph structure data corresponding to the text to be processed can be accurately constructed.

In one particular example, the self-attention mechanism is one that relates different positions of a single sequence to calculate a representation of the same sequence. The self-attention mechanism is applied to many tasks of natural language processing, and the parallelism of the self-attention mechanism and the property that the self-attention mechanism can specify different weights for adjacent nodes make the self-attention mechanism increasingly widely applied to a graph neural network. The application of the graph attention mechanism on the graph structure data shows excellent effect, and the realization principle mainly uses the self-attention mechanism to carry out the characterization of the nodes in the graph. In the learning of the graph structure data to the representation sequence, the graph attention mechanism not only focuses on the information of the nodes of the graph structure data, but also focuses more on the relationship between the nodes of the graph structure data, so in the embodiment, in order to add the grammar information to the prosody boundary prediction task and the polyphonic disambiguation task, the graph attention mechanism is proposed to be used for grammar perception. The input sequence H (i.e. the content of the syntactic relation feature data and the semantic feature data extracted by the BERT model) is as follows:

H = {h₁,h₂,h₃, ...,h_N},h_i∈ F

wherein N represents the number of nodes of the graph structure data, F represents the characteristic dimension of the nodes of the graph structure data, and h_iAnd representing the characteristic data of the input node i.

The output of the graph attention network is represented as set H', H_i’In order to extract the obtained feature data of the nodes, F 'represents the feature dimension of the nodes of the graph structure data, and N' represents the number of the nodes of the graph structure data:

H’= {h_1’,h_2’,h_3’, ...,h _N’},h _I’∈F’

in some optional embodiments, when performing syntax enhancement processing on the syntactic relation of the character based on graph structure data corresponding to the text to be processed through a graph attention network, determining, through the graph attention network, an attention autocorrelation coefficient of a current node and an attention cross-correlation coefficient of the current node and a neighboring node of the current node based on feature data of the current node and feature data of the neighboring node in the graph structure data; determining, by the graph attention network, new feature data of the current node based on the feature data of the current node, feature data of nodes neighboring the current node, the attention autocorrelation coefficient, and the attention cross-correlation coefficient; and determining the new feature data of the current node as the grammar enhancement feature data of the character corresponding to the current node. Therefore, the attention autocorrelation coefficient of the current node and the attention cross-correlation coefficient of the current node and the adjacent nodes can be accurately determined through the characteristic data of the current node and the characteristic data of the adjacent nodes of the current node in the graph structure data. Furthermore, through the graph attention network, based on the feature data of the current node, the feature data of the nodes adjacent to the current node, the attention autocorrelation coefficient, and the attention cross-correlation coefficient, the grammar enhancement feature data of the character corresponding to the current node can be accurately determined.

In some optional embodiments, after obtaining the grammar enhancement feature data for the character, the method further includes: and performing secondary grammar enhancement processing on the grammar relationship of the characters based on the grammar enhancement feature data of the characters through a feedforward neural network to obtain the secondary grammar enhancement feature data of the characters. Therefore, the grammar relationship of the characters is subjected to grammar enhancement processing again on the basis of the grammar enhancement feature data of the characters through the feedforward neural network, and the grammar enhancement feature data of the characters can be obtained more accurately.

In one particular example, the attention parameter is calculated by performing a shared self-attention mechanism on the nodes. One part of the graph structure data carries out self attention of the current node in the graph structure data, namely the attention autocorrelation coefficient of the current node, and the other part of the graph structure data is responsible for attention of the adjacent node in the graph structure data, namely the attention cross correlation coefficient of the current node and the adjacent node.

e_ii= α × (Wh_i)

e_{i j}= α × (Wh_j)

Note that the mechanism kernel α is obtained from a single-layer feed-forward neural network for weight vector parameterization, and the weight matrix W in the shared linear transformation is defined as follows: w is formed as R^F×R^F’. Wherein e is_iiRepresenting the attention autocorrelation coefficient, e, of the current node i in the graph structure data_{i j}The attention cross-correlation coefficient of the current node i and the adjacent node j in the graph structure data is represented. The self-attention of the current node and the attention of the adjacent nodes can be realized through the above operations. Wherein, the attention of the adjacent node can reflect the influence importance of the adjacent node j on the current node i. Thereafter, the feedforward neural network can be further used to acquire grammar enhancement feature data of characters in the text to be processed. In this embodiment, a grammar relationship of characters in a text to be processed may be subjected to grammar enhancement processing using six pairs of the attention network and the feedforward neural network.

In step S105, prosodic boundary prediction and polyphonic disambiguation are performed on the character based on the grammar enhancement feature data of the character to obtain a prosodic boundary prediction result and a polyphonic disambiguation result of the character.

In the present embodiment, the prosodic boundary prediction task is configured to predict distribution of three prosodic boundaries of prosodic words, prosodic phrases, and intonation phrases for an input text. For example, "Nanjing city Chang Jiang river Tong Zhi. - > Nanjing #1 city, Chang #2 river #1, Zhi #3 ". The polyphonic disambiguation task is used for converting each Chinese character in the text to be processed into the pronunciation of the Chinese character. For example, "gudu sika- > gu3 du1 xi1 an 1".

In some optional embodiments, when performing prosodic boundary prediction and polyphonic disambiguation on the character based on the grammar enhancement feature data of the character, performing prosodic boundary prediction on the character based on the grammar enhancement feature data of the character through a first conditional random field to obtain a prosodic boundary prediction result of the character; and carrying out polyphonic disambiguation on the character based on the grammar enhancement feature data of the character through a second conditional random field to obtain a polyphonic disambiguation result of the character. Therefore, the prosodic boundary prediction result of the character can be accurately obtained by performing prosodic boundary prediction on the character based on the grammar enhancement feature data of the character by the first conditional random field. In addition, the second conditional random field is used for carrying out polyphonic disambiguation on the character based on the grammar enhancement feature data of the character, and a polyphonic disambiguation result of the character can be accurately obtained.

In a specific example, in the concept of probability statistics, the current maximum probability value is selected as the output result of the model, but when the length of the content to be processed is greater than 1, only the predicted result of a single maximum value is selected as the final result, which may not be the best result for the global situation, and therefore, the method of the random condition field is to select the group with the maximum probability value as the final result of the model from the perspective of the complete sequence. Therefore, in this embodiment, a conditional random field is used to perform prosody boundary prediction and polyphonic disambiguation on the character based on the grammar enhancement feature data of the character to obtain a prosody boundary prediction result and a polyphonic disambiguation result of the character.

In one specific example, as shown in FIG. 1G, models for text processing include grammar coders, BERT models, graph attention networks, feed forward neural networks, and conditional random fields. The text processing procedure is as follows: firstly inputting a text, then generating a syntax tree of the input text, then converting the syntax tree of the input text to obtain a syntax graph of the input text, then carrying out graph representation on the syntax graph of the input text to obtain a syntax representation sequence of the syntax graph of the input text, and finally inputting the syntax representation sequence of the syntax graph of the input text into a syntax encoder to obtain a syntax relation code of characters in the input text. Meanwhile, semantic feature extraction is carried out on the input text through a BERT model so as to obtain semantic feature data of characters in the input text. Then, through a graph attention network and a feed-forward neural network, grammar enhancement processing is carried out on grammar relations of the characters in the input text based on grammar relation codes and semantic feature data of the characters in the input text, so as to obtain grammar enhancement feature data of the characters in the input text. And finally, performing prosodic boundary prediction and polyphonic disambiguation on the characters through a conditional random field based on the grammar enhancement feature data of the characters to obtain prosodic boundary prediction results and polyphonic disambiguation results of the characters. The model for text processing comprises N pairs of graph attention networks and a feed-forward neural network, wherein the N pairs of graph attention networks and the feed-forward neural network are used for carrying out grammar enhancement processing on grammatical relations of characters in input texts.

The text processing method provided by the present embodiment may be executed by any suitable device having data processing capability, including but not limited to: cameras, terminals, mobile terminals, PCs, servers, in-vehicle devices, entertainment devices, advertising devices, Personal Digital Assistants (PDAs), tablet computers, notebook computers, handheld game consoles, smart glasses, smart watches, wearable devices, virtual display devices or display enhancement devices (such as Google Glass, Oculus rise, Hololens, Gear VR), and the like.

Example two

Fig. 2 shows a schematic structural diagram of a text processing apparatus according to a second exemplary embodiment of the present disclosure, and referring to fig. 2, the apparatus includes:

a generating module 201, configured to generate a syntax tree of a text to be processed through a syntax tool;

a conversion module 202, configured to convert the syntax tree of the text to be processed to obtain a syntax diagram of the text to be processed;

a grammatical relation encoding module 203, configured to encode a grammatical relation of characters in the text to be processed based on a grammatical graph of the text to be processed, so as to obtain grammatical relation feature data of the characters in the text to be processed;

the grammar enhancement processing module 204 is configured to perform grammar enhancement processing on the grammar relationship of the characters based on the grammar relationship feature data of the characters and the obtained semantic feature data of the characters to obtain grammar enhancement feature data of the characters;

and the text processing module 205 is configured to perform prosody boundary prediction and polyphonic disambiguation on the characters based on the grammar enhancement feature data of the characters to obtain prosody boundary prediction results and polyphonic disambiguation results of the characters.

In the exemplary embodiment of the present disclosure, a syntax tree of a text to be processed is generated by a syntax tool, the syntax tree of the text to be processed is converted to obtain a syntax diagram of the text to be processed, a syntax relation of characters in the text to be processed is encoded based on the syntax diagram of the text to be processed, syntax relation feature data of the characters in the text to be processed can be accurately obtained, syntax enhancement processing is performed on the syntax relation of the characters based on the syntax relation feature data of the characters and the obtained semantic feature data of the characters, and syntax enhancement feature data of the characters can be accurately obtained, that is, syntax information of the characters is successfully introduced into the feature data of the characters. And finally, performing prosodic boundary prediction and polyphonic disambiguation on the characters based on the grammar enhancement feature data of the characters, and accurately obtaining prosodic boundary prediction results and polyphonic disambiguation results of the characters. In addition, prosodic boundary prediction and polyphonic disambiguation of characters are simultaneously achieved in the same text processing scheme in the front-end module of the speech synthesis system, which can effectively simplify text processing in the front-end module. That is, the text processing of the front-end module can be effectively simplified while giving consideration to the accuracy of prosodic boundary prediction and polyphonic disambiguation.

In one possible implementation, the apparatus further includes: the storage module is used for storing the part-of-speech types of the nodes in the grammar tree by using the way of part-of-speech type and part-of-speech type identification when the grammar tree of the text to be processed is generated by a grammar tool; and storing the characters in the text to be processed by using a character and character identification mode.

In a possible implementation manner, the syntax map includes a forward syntax map, and the conversion module 202 is specifically configured to: taking characters in the text to be processed as nodes of the forward grammar graph, and bidirectionally connecting the nodes where adjacent characters in the text to be processed are located; if the non-adjacent character of the current character in the text to be processed is judged to be the first character in the leaf node in the grammar tree, the node where the current character is located is connected with the node where the non-adjacent character of the current character is located in the forward direction, so that a forward grammar graph of the text to be processed is obtained.

In a possible implementation manner, the syntax map further includes a backward syntax map, and the conversion module 202 is further configured to: taking characters in the text to be processed as nodes of the backward grammar graph, and bidirectionally connecting the nodes where adjacent characters in the text to be processed are located; and if the non-adjacent character of the current character in the text to be processed is judged to be the first character in the leaf node in the grammar tree, the node where the current character is located is connected with the node where the non-adjacent character of the current character is located in the backward direction, so that a backward grammar graph of the text to be processed is obtained.

In a possible implementation manner, the syntax map includes a forward syntax map and a backward syntax map, and the syntax relation encoding module 203 is specifically configured to: respectively carrying out graph representation on the forward grammar graph and the backward grammar graph of the text to be processed so as to obtain a forward grammar representation sequence of the forward grammar graph of the text to be processed and a backward grammar representation sequence of the backward grammar graph of the text to be processed; and coding the grammatical relation of the characters in the text to be processed based on the forward grammatical representation data of the characters in the text to be processed, which is included in the forward grammatical representation sequence, and the backward grammatical representation data of the characters in the text to be processed, which is included in the backward grammatical representation sequence, through a grammatical relation coding model so as to obtain grammatical relation characteristic data of the characters in the text to be processed.

In a possible implementation manner, the syntax relation coding model includes a bidirectional recursive unit with gating, and the syntax relation coding module 203 is further configured to: coding, by a forward recursion unit in the bidirectional recursion unit with a gate, forward grammatical relations of characters in the text to be processed based on forward grammatical representation data of the characters in the text to be processed, which is included in the forward grammatical representation sequence, to obtain forward grammatical relation feature data of the characters in the text to be processed; coding, by a backward recursion unit in the bidirectional recursion unit with gating, a backward grammatical relation of characters in the text to be processed based on backward grammatical representation data of the characters in the text to be processed, which is included in the backward grammatical representation sequence, to obtain backward grammatical relation feature data of the characters in the text to be processed; and determining grammatical relation characteristic data of the characters in the text to be processed based on the forward grammatical relation characteristic data and the backward grammatical relation characteristic data of the characters in the text to be processed.

In a possible implementation manner, the syntax enhancement processing module 204 is specifically configured to: splicing the syntactic relation characteristic data of the characters and the semantic characteristic data of the characters to obtain spliced characteristic data of the characters; constructing graph structure data corresponding to the text to be processed based on the character splicing feature data; and carrying out grammar enhancement processing on the grammar relationship of the characters based on graph structure data corresponding to the text to be processed through a graph attention network so as to obtain grammar enhancement feature data of the characters.

In a possible implementation manner, the syntax enhancement processing module 204 is further configured to: and respectively taking the position relations of the characters and the characters in the text to be processed as nodes and edges of the graph structure data, and taking the splicing characteristic data of the characters as the characteristic data of the nodes of the graph structure data.

In a possible implementation manner, the syntax enhancement processing module 204 is further configured to: determining, by the graph attention network, an attention autocorrelation coefficient of a current node and an attention cross-correlation coefficient of the current node and neighboring nodes of the current node based on feature data of the current node and feature data of the neighboring nodes of the current node in the graph structure data; determining, by the graph attention network, new feature data of the current node based on the feature data of the current node, feature data of nodes neighboring the current node, the attention autocorrelation coefficient, and the attention cross-correlation coefficient; and determining the new feature data of the current node as the grammar enhancement feature data of the character corresponding to the current node.

In a possible implementation manner, the syntax enhancement processing module 204 is further configured to: after obtaining the grammar enhancement feature data of the characters, carrying out grammar enhancement processing again on the grammar relations of the characters through a feedforward neural network based on the grammar enhancement feature data of the characters so as to obtain grammar enhancement feature data again of the characters.

In a possible implementation manner, the text processing module 205 is specifically configured to: performing prosodic boundary prediction on the character based on grammar enhancement feature data of the character through a first conditional random field to obtain a prosodic boundary prediction result of the character; and carrying out polyphonic disambiguation on the character based on the grammar enhancement feature data of the character through a second conditional random field to obtain a polyphonic disambiguation result of the character.

EXAMPLE III

An exemplary embodiment of the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, is for causing the electronic device to perform a text processing method according to an embodiment of the present disclosure.

The disclosed exemplary embodiments also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a text processing method according to an embodiment of the present disclosure.

The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is adapted to cause the computer to carry out a text processing method according to an embodiment of the present disclosure.

Referring to fig. 3, a block diagram of a structure of an electronic device 800, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 3, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, an output unit 807, a storage unit 808, and a communication unit 809. The input unit 806 may be any type of device capable of inputting information to the electronic device 800, and the input unit 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 807 can be any type of device capable of presenting information and can include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 804 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above. For example, in some embodiments, the text processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. In some embodiments, the computing unit 801 may be configured to perform the text processing method in any other suitable manner (e.g., by means of firmware).

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

1. A method of text processing, the method comprising:

generating a grammar tree of a text to be processed through a grammar tool;

converting the grammar tree of the text to be processed to obtain a grammar graph of the text to be processed;

based on the grammatical graph of the text to be processed, coding grammatical relations of characters in the text to be processed so as to obtain grammatical relation characteristic data of the characters in the text to be processed;

based on the syntactic relation feature data of the characters and the obtained semantic feature data of the characters, carrying out syntactic enhancement processing on the syntactic relation of the characters to obtain syntactic enhancement feature data of the characters;

and performing prosodic boundary prediction and polyphonic disambiguation on the characters based on the grammar enhancement feature data of the characters to obtain prosodic boundary prediction results and polyphonic disambiguation results of the characters.

2. The text processing method of claim 1, wherein, when generating a grammar tree of the text to be processed by a grammar tool, the method further comprises:

storing the part-of-speech type of the node in the syntax tree by using a part-of-speech type and a part-of-speech type identification mode;

and storing the characters in the text to be processed by using a character and character identification mode.

3. The text processing method of claim 1, wherein the grammar graph comprises a forward grammar graph,

the converting the syntax tree of the text to be processed to obtain the syntax map of the text to be processed includes:

taking characters in the text to be processed as nodes of the forward grammar graph, and bidirectionally connecting the nodes where adjacent characters in the text to be processed are located;

if the non-adjacent character of the current character in the text to be processed is judged to be the first character in the leaf node in the grammar tree, the node where the current character is located is connected with the node where the non-adjacent character of the current character is located in the forward direction, so that a forward grammar graph of the text to be processed is obtained.

4. The text processing method of claim 3, wherein the grammar graph further includes a backward grammar graph,

taking characters in the text to be processed as nodes of the backward grammar graph, and bidirectionally connecting the nodes where adjacent characters in the text to be processed are located;

and if the non-adjacent character of the current character in the text to be processed is judged to be the first character in the leaf node in the grammar tree, the node where the current character is located is connected with the node where the non-adjacent character of the current character is located in the backward direction, so that a backward grammar graph of the text to be processed is obtained.

5. The text processing method of claim 1, wherein the grammar map includes a forward grammar map and a backward grammar map,

the encoding the grammatical relation of the characters in the text to be processed based on the grammatical graph of the text to be processed to obtain the grammatical relation characteristic data of the characters in the text to be processed includes:

respectively carrying out graph representation on the forward grammar graph and the backward grammar graph of the text to be processed so as to obtain a forward grammar representation sequence of the forward grammar graph of the text to be processed and a backward grammar representation sequence of the backward grammar graph of the text to be processed;

and coding the grammatical relation of the characters in the text to be processed based on the forward grammatical representation data of the characters in the text to be processed, which is included in the forward grammatical representation sequence, and the backward grammatical representation data of the characters in the text to be processed, which is included in the backward grammatical representation sequence, through a grammatical relation coding model so as to obtain grammatical relation characteristic data of the characters in the text to be processed.

6. The text processing method of claim 5, wherein the syntactic relational encoding model comprises a bi-directional recursive unit with gating,

the encoding, by a grammatical relationship coding model, grammatical relationships of characters in the text to be processed based on forward grammatical representation data of the characters in the text to be processed included in the forward grammatical representation sequence and backward grammatical representation data of the characters in the text to be processed included in the backward grammatical representation sequence includes:

coding, by a forward recursion unit in the bidirectional recursion unit with a gate, forward grammatical relations of characters in the text to be processed based on forward grammatical representation data of the characters in the text to be processed, which is included in the forward grammatical representation sequence, to obtain forward grammatical relation feature data of the characters in the text to be processed;

coding, by a backward recursion unit in the bidirectional recursion unit with gating, a backward grammatical relation of characters in the text to be processed based on backward grammatical representation data of the characters in the text to be processed, which is included in the backward grammatical representation sequence, to obtain backward grammatical relation feature data of the characters in the text to be processed;

and determining grammatical relation characteristic data of the characters in the text to be processed based on the forward grammatical relation characteristic data and the backward grammatical relation characteristic data of the characters in the text to be processed.

7. The text processing method according to claim 1, wherein the performing syntax enhancement processing on the syntactic relation of the character based on the syntactic relation characteristic data of the character and the obtained semantic characteristic data of the character to obtain the syntactic enhancement characteristic data of the character comprises:

splicing the syntactic relation characteristic data of the characters and the semantic characteristic data of the characters to obtain spliced characteristic data of the characters;

constructing graph structure data corresponding to the text to be processed based on the character splicing feature data;

and carrying out grammar enhancement processing on the grammar relationship of the characters based on graph structure data corresponding to the text to be processed through a graph attention network so as to obtain grammar enhancement feature data of the characters.

8. The text processing method according to claim 7, wherein the constructing graph structure data corresponding to the text to be processed based on the character splicing feature data comprises:

and respectively taking the position relations of the characters and the characters in the text to be processed as nodes and edges of the graph structure data, and taking the splicing characteristic data of the characters as the characteristic data of the nodes of the graph structure data.

9. The text processing method according to claim 7, wherein the performing, through a graph attention network, grammar enhancement processing on the grammatical relation of the characters based on graph structure data corresponding to the text to be processed to obtain grammar enhancement feature data of the characters comprises:

determining, by the graph attention network, an attention autocorrelation coefficient of a current node and an attention cross-correlation coefficient of the current node and neighboring nodes of the current node based on feature data of the current node and feature data of the neighboring nodes of the current node in the graph structure data;

determining, by the graph attention network, new feature data of the current node based on the feature data of the current node, feature data of nodes neighboring the current node, the attention autocorrelation coefficient, and the attention cross-correlation coefficient;

and determining the new feature data of the current node as the grammar enhancement feature data of the character corresponding to the current node.

10. The text processing method of claim 7, wherein after obtaining the grammatical enhancement feature data for the character, the method further comprises:

and performing secondary grammar enhancement processing on the grammar relationship of the characters based on the grammar enhancement feature data of the characters through a feedforward neural network to obtain the secondary grammar enhancement feature data of the characters.

11. The text processing method of claim 1, wherein the performing prosodic boundary prediction and polyphonic disambiguation on the character based on the grammar enhancement feature data for the character to obtain prosodic boundary prediction results and polyphonic disambiguation results for the character comprises:

performing prosodic boundary prediction on the character based on grammar enhancement feature data of the character through a first conditional random field to obtain a prosodic boundary prediction result of the character;

and carrying out polyphonic disambiguation on the character based on the grammar enhancement feature data of the character through a second conditional random field to obtain a polyphonic disambiguation result of the character.

12. A text processing apparatus, the apparatus comprising:

the generating module is used for generating a grammar tree of the text to be processed through a grammar tool;

the conversion module is used for converting the grammar tree of the text to be processed to obtain a grammar graph of the text to be processed;

the grammar relation coding module is used for coding grammar relations of characters in the text to be processed based on a grammar graph of the text to be processed so as to obtain grammar relation feature data of the characters in the text to be processed;

the grammar enhancement processing module is used for carrying out grammar enhancement processing on the grammar relation of the characters based on the grammar relation characteristic data of the characters and the obtained semantic characteristic data of the characters so as to obtain the grammar enhancement characteristic data of the characters;

and the text processing module is used for carrying out prosodic boundary prediction and polyphonic disambiguation on the characters based on the grammar enhancement feature data of the characters so as to obtain prosodic boundary prediction results and polyphonic disambiguation results of the characters.

13. An electronic device, comprising:

a processor; and

a memory for storing a program, wherein the program is stored in the memory,

wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the method according to any one of claims 1-11.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-11.