CN117131876A

CN117131876A - Text processing method, device, computer equipment and storage medium

Info

Publication number: CN117131876A
Application number: CN202311141812.3A
Authority: CN
Inventors: 王笑; 刘智; 翟毅腾
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-09-05
Filing date: 2023-09-05
Publication date: 2023-11-28

Abstract

The application relates to a text processing method, a text processing device, computer equipment and a storage medium, wherein a first text is input into a trained semantic learning model to obtain semantic information; training a semantic learning model comprises the steps of obtaining a target entity and a first knowledge graph in a second text, determining nodes in the first knowledge graph according to the target entity, and generating a second knowledge graph; obtaining example features including text features and map features according to the second text and the second knowledge map; according to the partial vector of the first self-supervision task prediction example characteristic, a first loss function is obtained; obtaining a second loss function according to the connection relation of a second knowledge graph in the second self-supervision task prediction example characteristics; and adjusting parameters of the semantic learning model according to the first loss function and the second loss function until the semantic learning model converges. The semantic learning model obtained through training according to the example features and the two self-supervision tasks has good generalization, and semantic information obtained through processing texts based on the semantic learning model has high accuracy.

Description

Text processing method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a text processing method, a text processing device, a computer device, and a storage medium.

Background

If the text data corresponds to the scene to cover a plurality of fields, and higher relevance and complexity exist among the fields, the traditional technology is difficult to comprehensively and accurately understand and analyze each element in the scene, so that the efficiency and the precision of a text processing task are affected. For example, with the sea wars of green industry in recent years, green industry involves numerous financial products and services, including bonds, stocks, funds, insurance, etc., which results in the green industry scenario involving multiple fields, and there are various types of text data with numerous entries and semantically complex, how to make decisions and risk assessment based on the text data, which is a challenging task.

The traditional technology is based on deep learning technology and natural language processing technology, and by training a large number of texts, key words are extracted from the texts, and a semantic learning model is built so as to achieve the purposes of intelligent analysis and judgment of corresponding scenes. Because the entities with different dimensions, different visual angles and different granularities exist in the scene, entity relations exceeding the sequence type text semantics exist among the entities, and the entity relation information is the problem that the existing mainstream semantic learning method cannot be modeled, and the accuracy of the text semantic analysis result is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a text processing method, apparatus, computer device, and storage medium capable of improving accuracy of text semantic analysis results.

In a first aspect, the present application provides a text processing method, the method comprising:

inputting the first text into a trained semantic learning model to obtain semantic information; wherein training the semantic learning model comprises:

acquiring a target entity and a first knowledge graph in a second text, determining a node corresponding to the target entity in the first knowledge graph, and generating a second knowledge graph according to the corresponding node;

obtaining example features according to the second text and the second knowledge graph, wherein the example features comprise text features and graph features;

predicting partial vectors in the example features according to a first self-supervision task to obtain a first loss function;

predicting the connection relation of the second knowledge graph in the example characteristics according to a second self-supervision task to obtain a second loss function;

and adjusting parameters of the semantic learning model according to the first loss function and the second loss function until the semantic learning model reaches a convergence condition.

In one embodiment, obtaining the example feature according to the second text and the second knowledge-graph includes:

vectorizing the second knowledge graph and the second text to obtain an instance;

adding noise elements into the text vector of the instance, and adding interaction nodes into the second knowledge-graph vector of the instance;

and extracting text features and map features of the examples, and fusing the text features and the map features to obtain example features.

In one embodiment, extracting text features and map features of the instance, and fusing the text features and the map features to obtain instance features, including:

generating a word symbol according to the text vector of the instance;

generating node characterization of each node in the second knowledge-graph according to the second knowledge-graph vector of the example;

and carrying out fusion coding on the word symbol and the node representation to obtain the example feature.

In one embodiment, generating the second knowledge-graph includes:

acquiring a corresponding first node and a second node associated with the first node from the first knowledge graph according to the target entity in the second text;

and respectively connecting the interaction node with the first node, and generating the second knowledge graph according to the interaction node, the first node and the second node.

In one embodiment, predicting, according to a first self-supervision task, a partial vector in the example feature, to obtain a first loss function, including:

masking a portion of the vectors in the example feature, predicting a value of the portion of the vectors;

the first loss function is derived from the prediction of the partial vector.

In one embodiment, predicting, according to a second self-supervision task, a connection relationship of a second knowledge graph in the example feature, to obtain a second loss function, including:

mapping the nodes of the second knowledge graph in the example features and the node relation to obtain a mapping vector;

constructing a scoring function according to the mapping vector;

constructing a positive triplet and a negative triplet based on the mapping vector and the scoring function;

and predicting the connection relation of a second knowledge graph in the example characteristic according to the positive triplet and the negative triplet, and obtaining the second loss function.

In one embodiment, obtaining the second text includes:

acquiring a file to be processed, and extracting a text in the file to be processed;

modifying the text according to a cleaning rule;

classifying the modified text according to the category and the theme of the file to be processed to obtain a plurality of corpora;

and obtaining the second text according to the corpus.

In one embodiment, inputting the first text into the trained semantic learning model to obtain semantic information includes:

generating a third knowledge graph according to the first text and the first knowledge graph;

the first text and the first knowledge graph are coded in a combined mode, and a word symbol of the first text and vector characterization of each node in the third knowledge graph are obtained;

according to the word symbol of the first text, carrying out pooling operation on the vector characterization of each node;

and fusing the word symbol of the noise element in the first text, the vector representation of the interaction node in the knowledge graph and the vector representation after pooling.

In a second aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the text processing method according to the first aspect.

In a third aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the text processing method described in the first aspect.

The text processing method, the text processing device, the computer equipment and the computer readable storage medium are used for inputting the first text into the trained semantic learning model to obtain semantic information; wherein training the semantic learning model comprises: acquiring a target entity and a first knowledge graph in a second text, determining a node corresponding to the target entity in the first knowledge graph, and generating a second knowledge graph according to the corresponding node; obtaining example features according to the second text and the second knowledge graph, wherein the example features comprise text features and graph features; predicting partial vectors in the example features according to the first self-supervision task to obtain a first loss function; predicting the connection relation of the second knowledge graph in the example characteristics according to the second self-supervision task to obtain a second loss function; and adjusting parameters of the semantic learning model according to the first loss function and the second loss function until the semantic learning model reaches a convergence condition. According to example characteristics obtained by the two modal data and semantic learning models obtained by training two self-supervision tasks for realizing two-way interaction of the two modal data, generalization is good, semantic information obtained by processing texts based on the semantic learning models is high in accuracy, and the problem of low accuracy of semantic analysis results of the texts can be solved.

Drawings

FIG. 1 is a flow diagram of a text processing method in one embodiment;

FIG. 2 is a flow diagram of training a semantic learning model in one embodiment;

FIG. 3 is a schematic diagram of a training process of a green fused scene semantic learning model in one embodiment;

FIG. 4 is a schematic diagram of a knowledge graph of green melt domain in one embodiment;

FIG. 5 is a schematic diagram of a green melt scene text processing method in one embodiment;

fig. 6 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In one embodiment, as shown in fig. 1, a text processing method is provided, where the method is applied to a terminal to illustrate, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes: and inputting the first text into the trained semantic learning model to obtain semantic information. The first text is a natural language text, the semantic learning model is used for analyzing and learning the first text, and the semantic information comprises prediction results, such as text classification results, multiple-choice question-answering results and the like, which are output based on the first text. Optionally, when the application scene of the semantic learning model is a green fusion field, the semantic information may be a green fusion determination result of a company service. Fig. 2 is a schematic flow chart of training a semantic learning model in the present embodiment, including the following steps.

Step S201, a target entity and a first knowledge graph in a second text are obtained, a node corresponding to the target entity is determined in the first knowledge graph, and a second knowledge graph is generated according to the corresponding node.

The first knowledge graph is generated according to standard data of the field where the second text is located. The second text is unstructured text data used for training a semantic learning model, and the second text is generated according to massive texts except standard data in the corresponding field of the first text.

Illustratively, the semantic learning model is used to process green melt-related text. And extracting unstructured text data from books, articles, bulletins, news and files related to the green melt field to obtain a second text. Industry, finance and national standard data in the green melt field, such as enterprise industry and commerce data, bulletin data, industry classification standards of green industry guidance catalogue (2019) and the like, are obtained, and a knowledge graph about green finance is constructed according to the data in the green melt field, so that a first knowledge graph is obtained.

The target entity is obtained according to the vocabulary forming the second text, and optionally, the vocabulary forming the second text is extracted to obtain the target entity corresponding to the vocabulary. And selecting a node associated with the target entity from the first knowledge graph, and connecting the target entity and the node in the first knowledge graph to obtain a second knowledge graph.

Step S202, obtaining example features according to the second text and the second knowledge graph, wherein the example features comprise text features and graph features.

The example is used for training a semantic generation model, text features are text sequence semantics of the example, and atlas features are knowledge atlas structures of the example.

And combining the second text and the second knowledge graph to obtain an example. By combining the text features and the map features, semantic information of the second knowledge map is supplemented by the text features, and relational structures of the second text target entity are supplemented by the map features, so that cross-modal interaction of the text and the map structures in two different modes is realized, and example features are obtained.

Step S203, according to the first self-supervision task, predicting partial vectors in the example features to obtain a first loss function. The first self-supervision task improves the understanding capability of the trained semantic learning model to the first text by predicting partial vectors in the instance features.

And step S204, predicting the connection relation of the second knowledge graph in the example features according to the second self-supervision task to obtain a second loss function. The second self-supervision task complements the missing nodes in the second knowledge graph and the node connection relationship by predicting the connection relationship of the second knowledge graph in the example characteristics.

Step S205, according to the first loss function and the second loss function, parameters of the semantic learning model are adjusted until the semantic learning model reaches a convergence condition. Training the semantic learning model through two self-supervision tasks, and adjusting the model by combining the first loss function and the second loss function. The self-supervision task is used as a target, so that two-way interaction of two pieces of cross-modal information of a knowledge graph structure and text sequence semantics in an example is realized, and generalization of a semantic learning model is improved.

In the text processing method, the input first text is processed through the trained semantic learning model. The training data input into the semantic learning model is example characteristics obtained by fusing two types of modal data of text and graph structures. Because the instance features are obtained by interacting two types of modal data, the relationship among the entities contained in the instance features is more accurate, and better data support is provided for the semantic learning model. The training targets of the semantic learning model are two self-supervision tasks for improving the text understanding capability and complementing the structural connection relation of the graph respectively. The semantic learning model is trained by combining the loss functions generated in the two self-supervision task processes, so that bidirectional information intercommunication between the text and the knowledge graph can be promoted, and the trained semantic learning model can jointly and accurately infer the text and the knowledge graph. Therefore, the accuracy of semantic information output according to the trained semantic learning model is high, and the problem of low accuracy of a semantic analysis result of the text is solved.

In one embodiment, generating the second knowledge-graph includes: according to the target entity in the second text, a corresponding first node and a second node associated with the first node are obtained from the first knowledge graph; and generating a second knowledge graph according to the first node, the second node and the edges among the nodes.

The first node is a node directly corresponding to the target entity in the first knowledge graph, and the second node associated with the first node may be a second-order neighborhood node of the first node.

The second text is a text segment W, and the target entity in the second text is connected to the first knowledge-graphThe entity node in (1) obtains a first node V _el From knowledge graph->And searching a second-order neighborhood node bridge point of the initial node to obtain a second node. Obtaining a total node set for forming a second knowledge-graph structure from the first node and the second node>Extracting an edge +.A. of a node for constructing a second knowledge-graph>Finally, a second knowledge graph G= (V, E) is obtained.

In one embodiment, obtaining the example feature according to the second text and the second knowledge-graph includes: vectorizing the second knowledge graph and the second text to obtain an instance; adding noise elements into the text vector of the instance, and adding interaction nodes into the second knowledge-graph vector of the instance; and extracting text features and map features of the examples, and fusing the text features and the map features to obtain example features.

Vectorization of the second text W and the second knowledge graph G results in instance x= (W, G). Noise element w _int As the information pool point of the text vector of the instance, the text vector w= { W of the instance ₁ ,…,w _i Adding noise element w to } _int . According to the added noise element w _int In the latter example, the extracted text features contain noise elements, so that the semantic learning model obtained through training has the capabilities of resisting data noise and interference, thereby improving the accuracy of text processing.

The interaction node Vint is an information pool point of a second knowledge-graph vector of the instance, the interaction node Vint is added in the second knowledge-graph vector G= (V, E) of the instance, and the interaction node Vint passes through a new relation type r _el Connecting the interaction node Vint and each node in the second knowledge-graphAccording to the example after the interaction node Vint is added, extracting the graph characteristics of the interaction node, including the node characteristics and the connection relation between the nodes in the second knowledge graph.

Adding noise elements into the text vector of the example, and adding interaction nodes into the second knowledge-graph vector of the example, wherein the obtained example comprises the following steps: text sequence information { w _int ,w ₁ ,...,w _i Map structure information { v } _int ,v ₁ ,...,v _j And w is a target entity vector of the second text in the example, and v is a second knowledge-graph vector in the example.

Optionally, obtaining the instance feature includes: by a cross-mode encoder f _encoder Exchanging text sequence information and graph structure information in an example, and encoding the text sequence information and the graph structure information:

(H _int ,H ₁ ,...,H _i ),(V _int ,V ₁ ,...,V _j )＝f _encoder ({w _int ,w ₁ ,...,w _i },{v _int ,v ₁ ,...,v _j })

wherein the cross-mode encoder f _encoder Two types of data with different modes can be exchanged, the semantics of the graph structure is supplemented by the sequence information, and the entity relationship structure of the sequence information is supplemented by the graph structure, so as to obtain example characteristics (H) _int ,H ₁ ,...,H _i ),(V _int ,V ₁ ,...,V _j ). The example features are obtained through interaction and modeling of two types of different modal information, and comprise entity relations exceeding sequence type text semantics, so that the example features can be applied to different scene tasks, the generalization degree of a semantic generation model obtained according to example feature training is high, and quick learning and prediction of text information in a new scene can be realized.

Extracting text features and map features of the examples, fusing the text features and the map features to obtain example features, including: generating a word symbol according to the text vector of the instance; generating node characterization of each node in the second knowledge-graph according to the second knowledge-graph vector of the example; and carrying out fusion coding on the word symbol and the node representation to obtain example characteristics.

Modeling text features through a transducer, obtaining map features through map neural network modeling, and combining the text features and the map features. The text feature acquisition comprises the following steps: converting text into an initial token representation using N-layer convertors

Wherein, LM is a language model,as the intermediate element w _int Initial token representation of (2)，/>As the target entity w in the instance _i I is the number of the target entity.

The acquisition of the map features comprises the following steps: converting knowledge graph nodes in an input instance into initial node representations using node embedding (node_embedding)

Wherein,for interaction node v _int Is characterized by the initial node of->The initial node representation of the second knowledge graph node in the example is represented, and i is the number of the node.

Fusing text features and map features to obtain example features including: through the fusion layer of M layers, the characters and the node characterization are jointly encoded to obtain example characteristics (H) _int ,H ₁ ,...,H _i ),(V _int ,V ₁ ,...,V _j )。

Optionally, the noise element and the interaction node are respectively used as two interfaces of modal interaction. And acquiring a word symbol generated according to the noise element, acquiring a node representation generated according to the interactive node, and fusing the word symbol generated according to the noise element and the node representation generated according to the interactive node to obtain an instance feature. Generating a word symbol through a transducer, generating a node representation through a GNN (Graph Neural Networks, graph neural network), and fusing the word symbol corresponding to a noise element and the node representation corresponding to an interaction node through an L-layer fusion layer of an MLP (Multilayer Perceptron, multi-layer perceptron):

wherein,is the noise element w _int Is a representation of the first layer fusion layer, +.>For interaction node v _int A representation of a first layer fusion layer; />For the node representation of the noise element converted by the converter layer I,/for the conversion of the noise element>For the representation of interaction nodes in the knowledge graph through the GNN layer I,/the interaction nodes are represented by the knowledge graph>The method is a representation of the interaction node of the noise element and the graph after the multi-layer perceptron fusion.

In one embodiment, predicting a portion of vectors in the instance feature according to a first self-supervision task to obtain a first loss function includes: masking a portion of the vectors in the instance feature, predicting a value of the portion of the vectors; a first loss function is derived from the prediction of the partial vector.

The first self-supervision task is a natural language processing task, masks part of the content of the instance in the middle, marks the part of the vector to be masked, predicts the value of the part of vector according to the text adjacent to the part of vector to be masked, and enables the semantic learning model to predict the true value of the part of vector to be masked through the training of the first self-supervision task. Thereby helping the language learning model understand and generate text.

Illustratively, the first self-supervising task is masking language modeling (Masked Language Modeling) which is a natural language processing task, with portions of the instance masked and labeled [ MASK ], and with real value predictions based on vectors at corresponding masking locations so that the model can learn relationships of the entities.

According to a second self-supervision task, predicting the connection relation of a second knowledge graph in the example features to obtain a second loss function, wherein the method comprises the following steps: mapping the nodes of the second knowledge graph in the example features and the node relation to obtain a mapping vector; constructing a scoring function according to the mapping vector; constructing a positive triplet and a negative triplet based on the mapping vector and the scoring function; and predicting the connection relation of the second knowledge graph in the example characteristic according to the positive triplet and the negative triplet, and obtaining a second loss function.

Mapping knowledge graphThe relation (r) between each entity node (h or t) and the node is mapped to obtain vectors h, t and r, a scoring function phi r (h, t) is defined, and the scoring function is used for converting a predicted result (for example, the relation existing between the entity node and the entity node) into a specific score to measure the quality of the predicted result. And modeling through a scoring function to obtain positive/negative triples, and predicting the missing links in the knowledge graph through complementing the positive/negative triples. Optionally, the second self-supervising task is a knowledge-graph link prediction task. Alternatively, a scoring function is constructed using the TransE method: Φr (h, t) = - |h+r-t||.

Respectively calculating a first self-supervision task and a second self-supervision task trainingAnd (3) training the loss in the semantic learning model process, adding the two loss functions to obtain a total loss value trained by the semantic learning model, and carrying out back propagation and model parameter optimization according to the total loss value to obtain a trained semantic learning model. Alternatively, loss during masking language tasks is calculated separately using cross entropy Loss function CE _MLM Loss in knowledge-graph link prediction process _LinkPred ；Loss _MLM And Loss of _LinkPred Adding to obtain model training total loss and carrying out back propagation and model parameter optimization: loss=loss _LinkPred +Loss _MLM 。

In one embodiment, obtaining the second text includes: acquiring a file to be processed, and extracting a text in the file to be processed; modifying the text according to the cleaning rule; classifying the modified text according to the category and the theme of the file to be processed to obtain a plurality of corpora; and obtaining a second text according to the corpus. Wherein, the cleaning rule includes: unifying Chinese and English punctuation marks, deleting messy codes, and deleting stop words. And constructing a plurality of corpora according to the cleaned texts, wherein the second text is the corpus in one of the corpora. And training the semantic generation model according to the corpus in the corpus database is beneficial to improving generalization of the model.

In one embodiment, inputting the first text into the trained semantic learning model to obtain semantic information includes: generating a third knowledge graph according to the first text and the first knowledge graph; the first text and the first knowledge graph are jointly coded, and a word symbol of the first text and vector characterization of each node in the third knowledge graph are obtained; according to the word symbol of the first text, carrying out pooling operation on the vector representation of each node; and fusing the word symbol of the noise element in the first text, the vector representation of the interaction node in the knowledge graph and the vector representation after pooling.

And processing the first text through the trained semantic learning model to obtain a vector representation X for executing the downstream task. Wherein the vector characterizes x=mlp (H _int ,V _int G). Wherein a noise element, H, is added to the first text _int Is a word symbol corresponding to the noise element.Adding interactive nodes into the third knowledge graph, V _int And representing the nodes corresponding to the interactive nodes. G is H _int Query the node { V ] of the third knowledge-graph _j |v _j ∈{v ₁ ,...,v _j And (3) merging H after realizing the pooling operation based on the attention _int ,V _int G, obtaining vector characterization X.

Optionally, the semantic learning model includes an application module, where the application module is configured to fine-tune the semantic learning model according to the downstream task, so that the semantic learning model can be applied to a different downstream task.

In one embodiment, a training flowchart of a green fused scene semantic learning model is provided, as shown in fig. 3, and the steps include:

step S301: and obtaining a green melt-related text, constructing a corpus therefrom, and cleaning and preprocessing the text. The corpus is key words and semantic features in the green melt-related text;

step S302: based on the green financial related data, a knowledge graph about green finance is constructed. The green melt-related data comprises enterprise business data, bulletin data, industry classification standards of green industry guidance catalog (2019) and the like; FIG. 4 is a schematic diagram of a knowledge graph of the green melt domain;

step S303: extracting a corpus from a corpus database, extracting a sub-graph related to the corpus from a knowledge graph based on the given corpus, and carrying out vectorization operation on the corpus and the sub-graph to create an input instance for the model;

step S304: based on the input example, a cross-mode encoder combining the sequence and the graph is used for obtaining the characteristics of the encoded example, and two self-supervised reasoning tasks are used for pre-training the model to construct a semantic learning model. Wherein, the two self-supervised reasoning tasks are respectively masking language modeling and knowledge graph link prediction.

Step S305: generalizing and fine-tuning the semantic model. The trained semantic model is applied to a new green fusion scene to realize rapid learning and prediction of the new scene, such as company business green fusion judgment.

Fig. 5 is a schematic diagram of a green fused scene text processing method in this embodiment, and as shown in fig. 5, the text "[ beginning ] xxxxxxxx, [ mask ]" is input into a trained semantic learning model to obtain semantic information. Wherein, "xxxxxxxx, xxxxx" is a natural sentence, [ start ] and [ mask ] are marking information of the natural sentence, [ start ] is used for marking a starting position of the natural sentence, and [ mask ] is used for covering and replacing part of words in the natural sentence. Specifically, pre-training the semantic learning model includes: inputting a text ([ Start ] xxxxxxx, xxxxxxx, mask ]) into an LM Layer (language model Layer), and obtaining text characteristics and a green financial knowledge graph output by the LM Layer. Inputting the text features and the knowledge graph into a fusion layer, wherein the fusion layer is used for exchanging information between two different mode data of the text and the graph structure, and outputting example features. In order to ensure the information flow intercommunication between the instance features, the instance features are trained according to the language modeling and the knowledge graph link prediction of two self-supervision reasoning tasks, and a semantic learning model is constructed. And fine tuning the semantic learning model according to the downstream task to obtain a trained semantic learning model.

In the embodiment, by training massive financial texts, key words and semantic features are automatically extracted; and constructing a knowledge graph related to green metal according to industry, finance and national standards. And (3) carrying out interaction and modeling on the information of the text and the graph structure in two different modes, so as to obtain an example feature which can be applied to a downstream specific financial scene task. The model is pre-trained according to example characteristics and two self-supervision reasoning tasks, information flow intercommunication between input examples is guaranteed, and generalization processing is further carried out according to scene requirements, so that intelligent learning and modeling of semantic information in a green fused scene are achieved.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store a semantic learning model. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a text processing method.

It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of text processing, the method comprising:

2. The method of claim 1, wherein deriving instance features from the second text and the second knowledge-graph comprises:

3. The method of claim 2, wherein extracting text features and atlas features of the instance, fusing the text features and the atlas features, obtaining instance features, comprises:

generating a word symbol according to the text vector of the instance;

4. The method of claim 1, wherein generating a second knowledge-graph comprises:

and generating the second knowledge graph according to the first node, the second node and the edges among the nodes.

5. The method of claim 1, wherein predicting the partial vectors in the instance feature according to a first self-supervising task results in a first loss function, comprising:

the first loss function is derived from the prediction of the partial vector.

6. The method of claim 1, wherein predicting the connection relationship of the second knowledge-graph in the example feature according to the second self-supervision task, to obtain the second loss function, comprises:

constructing a scoring function according to the mapping vector;

7. The method of claim 1, wherein obtaining the second text comprises:

modifying the text according to a cleaning rule;

and obtaining the second text according to the corpus.

8. The method of claim 1, wherein inputting the first text into the trained semantic learning model to obtain the semantic information comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 8.