CN111858898A

CN111858898A - Text processing method and device based on artificial intelligence and electronic equipment

Info

Publication number: CN111858898A
Application number: CN202010753509.9A
Authority: CN
Inventors: 陈玉博; 刘康; 赵军; 曹鹏飞; 闭玮; 刘晓江; 邸欣晨
Original assignee: Tencent Technology Shenzhen Co Ltd; Institute of Automation of Chinese Academy of Science
Current assignee: Tencent Technology Shenzhen Co Ltd; Institute of Automation of Chinese Academy of Science
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2020-10-30

Abstract

The application provides a text processing method, a text processing device, electronic equipment and a computer-readable storage medium based on artificial intelligence; the method comprises the following steps: carrying out feature extraction processing on a plurality of words belonging to the same sentence in the text to obtain feature representation of the plurality of words as sentence-level information of the sentence; carrying out feature extraction processing on a plurality of sentences in the text to obtain feature representations of the plurality of sentences as text-level information; acquiring setting characteristic representation of a plurality of words belonging to the same sentence in the text from a knowledge base to serve as setting information of the sentence; and for each word in the text, updating the feature representation of the word according to the sentence level information of the sentence in which the word is positioned, the text level information and the setting information of the sentence in which the word is positioned, and performing type prediction processing according to the updated feature representation of the word to obtain the predicted type of the word. Through the method and the device, the accuracy of the obtained prediction type can be improved, and the intelligent degree of the question and answer service can be further improved.

Description

Text processing method and device based on artificial intelligence and electronic equipment

Technical Field

The present application relates to artificial intelligence and natural language processing technologies, and in particular, to a text processing method and apparatus based on artificial intelligence, an electronic device, and a computer-readable storage medium.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. Natural Language Processing (NLP) is an important direction in the field of artificial intelligence, and various theories and methods for realizing efficient communication between a person and a computer using natural Language are mainly studied.

In natural language processing, type prediction of words in text is often involved, and the predicted type can be applied to various scenes, for example, a knowledge graph can be constructed to provide question and answer services based on the knowledge graph, such as intelligent customer service, doctor robot and the like. In the related art, generally, for each sentence in the text, feature extraction is performed on the words in the sentence, and the types of the words are predicted according to the obtained features. However, since the language itself has complexity and ambiguity, when the type prediction is performed according to the scheme provided by the related art, the accuracy of the obtained type is low, thereby affecting the intelligent degree of the question-answering service.

Disclosure of Invention

The embodiment of the application provides a text processing method and device based on artificial intelligence, an electronic device and a computer-readable storage medium, which can improve the precision of word type prediction, so that the intelligent degree of question-answering service is improved.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a text processing method based on artificial intelligence, which comprises the following steps:

carrying out feature extraction processing on a plurality of words belonging to the same sentence in the text to obtain feature representations of the words to be used as sentence-level information of the sentence;

carrying out feature extraction processing on a plurality of sentences in the text to obtain feature representations of the sentences as text-level information;

acquiring setting characteristic representations of a plurality of words belonging to the same sentence in the text from a knowledge base to serve as setting information of the sentence;

for each word in the text, updating the feature representation of the word according to the sentence level information of the sentence in which the word is positioned, the text level information and the setting information of the sentence in which the word is positioned, and

and performing type prediction processing according to the updated feature representation of the words to obtain the prediction types of the words.

The embodiment of the application provides a text processing device based on artificial intelligence, includes:

the first extraction module is used for carrying out feature extraction processing on a plurality of words belonging to the same sentence in the text to obtain feature representations of the words, and the feature representations are used as sentence-level information of the sentence;

the second extraction module is used for carrying out feature extraction processing on a plurality of sentences in the text to obtain feature representations of the sentences as text-level information;

the third extraction module is used for acquiring the setting characteristic representation of a plurality of words belonging to the same sentence in the text from a knowledge base to serve as the setting information of the sentence;

a prediction module for updating the feature representation of the word according to the sentence-level information of the sentence in which the word is located, the text-level information and the setting information of the sentence in which the word is located for each word in the text, and updating the feature representation of the word and the setting information of the sentence in which the word is located

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the text processing method based on artificial intelligence provided by the embodiment of the application when the executable instructions stored in the memory are executed.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions and is used for realizing the text processing method based on artificial intelligence provided by the embodiment of the application when being executed by a processor.

The embodiment of the application has the following beneficial effects:

the types of words in the text are predicted by jointly using the information of multiple levels of the text, and the information quantity of the feature representation of the text is enriched, so that the accuracy of the obtained prediction types can be improved when the type prediction processing is carried out based on the updated feature representation, accurate and effective question answering service can be provided, repeated trial and error is not needed when the question answering service is operated by equipment, the consumption of computing resources and communication resources is reduced, and the intelligent degree of the question answering service is also improved.

Drawings

FIG. 1 is a schematic diagram of an architecture of an event recognition model provided in the related art;

FIG. 2 is an alternative architecture diagram of an artificial intelligence based text processing system provided by an embodiment of the present application;

fig. 3 is an alternative architecture diagram of a terminal device provided in the embodiment of the present application;

FIG. 4 is an alternative architecture diagram of an artificial intelligence based text processing apparatus according to an embodiment of the present application;

FIG. 5A is a schematic flow chart of an alternative artificial intelligence based text processing method according to an embodiment of the present application;

FIG. 5B is a schematic flow chart of an alternative artificial intelligence based text processing method according to an embodiment of the present application;

FIG. 5C is an alternative flow diagram for performing association operations as provided by embodiments of the present application;

FIG. 5D is a schematic flow chart diagram illustrating an alternative artificial intelligence based text processing method according to an embodiment of the present application;

FIG. 5E is a schematic flow chart of an alternative artificial intelligence based text processing method according to an embodiment of the present application;

FIG. 6 is an alternative schematic diagram of text processing provided by embodiments of the present application;

fig. 7 is an alternative architecture diagram of a text processing model provided in the embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein. In the following description, the term "plurality" referred to means at least two.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) A knowledge base: the term refers to a library storing representations of setting characteristics associated with each of a plurality of words, and the knowledge base may be a WordNet library or a HowNet library, for example.

2) Attention (Attention) coding: it is essentially an automatic weighting, i.e. the association of two forms of data in a weighted form.

3) Word Embedding (Word Embedding) encoding: and mapping the words to a vector space to obtain word vectors, and taking the word vectors as a storage form of the words in an algorithm, so that the electronic equipment can conveniently perform subsequent processing.

4) Memorizing and coding: for each vector of the successive vectors, the vector is converted into a new vector, combining itself with the vectors preceding or following it.

5) Type prediction processing: among the plurality of setting types, a prediction type corresponding to a word in the text is determined. In different application scenarios, the setting types are different in form. For example, in event identification (event detection), setting types such as birth event, movement event, attack event, and non-event (i.e. not any event) may be included; the entity identification may include setting types such as a person name, a place name, an organization name, and a non-entity.

6) Sequence Tagging (Sequence Tagging): and according to a specific labeling mode, labeling the words in the text, and being applicable to type prediction processing. For example, in the BIO notation mode, B denotes the beginning of a word of a certain type, I denotes the middle of a word of a certain type, and O denotes nothing in any type.

7) Knowledge Graph (Knowledge Graph): a semantic network for revealing relationships between entities, a knowledge graph is composed of a plurality of pieces of knowledge, and each piece of knowledge can be represented in the form of a Subject-Predicate-Object (SPO) triple.

For predicting the types of words in the text, in the solutions provided in the related art, usually, for each sentence in the text, feature extraction is performed on the words in the sentence, and the types of the words are predicted according to the obtained features. As an example, a schematic diagram of a Dynamic Multi-pool convolutional neural network (DMCNN) model shown in fig. 1 is provided, and in fig. 1, taking event recognition of a sentence including words 1 to 9 as an example, word embedding coding is first performed on each word to obtain a word vector. Then, on one hand, splicing a plurality of word vectors; on the other hand, by using a network structure of the DMCNN model, the association among a plurality of word vectors is mined to obtain the feature representation of the sentence, and the feature representation of the sentence is convoluted and dynamically multicell. And finally, splicing the splicing results of the multiple word vectors and the results obtained after dynamic multi-pooling, and carrying out classification processing based on the obtained results to obtain the prediction types of the words in the sentence. However, the accuracy of word type prediction using sentence-level information is low due to the complexity and ambiguity of the language itself. If a knowledge graph is constructed based on the obtained prediction types and question and answer services are provided, the accuracy of the question and answer services is poor, effective response cannot be performed on the user's questions, and the user may still not obtain accurate answers after calling the question and answer services for many times.

The embodiment of the application provides a text processing method and device based on artificial intelligence, an electronic device and a computer-readable storage medium, which can improve the precision of word type prediction and further provide accurate and effective question and answer service. An exemplary application of the electronic device provided in the embodiments of the present application is described below, and the electronic device provided in the embodiments of the present application may be implemented as various types of terminal devices such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a personal digital assistant, a dedicated messaging device), and the like, and may also be implemented as a server.

By operating the scheme of text processing provided by the embodiment of the application, the electronic equipment can extract effective information (prediction type of words) from the text, namely, the text processing performance of the electronic equipment is improved, and the method is suitable for various application scenes of natural language processing. For example, words can be selected from the text according to the prediction types of the words to form an SPO triple, and the SPO triple is added to the knowledge graph, so that accurate and effective question-answering service is provided, namely, a question input by a user is queried in the knowledge graph, and an obtained accurate answer is fed back to the user; for another example, in event recognition, after the prediction type of the word is obtained, the text where the word is located is recommended to the user interested in the prediction type, that is, an intelligent recommendation service is provided, wherein whether the user is interested in the prediction type or not can be determined by means of portrait analysis and the like.

Referring to fig. 2, fig. 2 is an architecture diagram of an alternative artificial intelligence based text processing system 100 according to an embodiment of the present application, in which a terminal device 400 is connected to a server 200 through a network 300, and the server 200 is connected to a database 500, where the network 300 may be a wide area network or a local area network, or a combination of both.

In some embodiments, taking the electronic device provided by the present application as a terminal device as an example, the text processing method based on artificial intelligence provided by the embodiments of the present application may be implemented by the terminal device. For example, after acquiring the text, the terminal device 400 extracts, for each word in the text, sentence-level information of a sentence in which the word is located, text-level information of the text, and setting information of the sentence in which the word is located, updates feature representation of the word according to the three-dimensional information, and performs type prediction processing according to the updated feature representation to obtain a prediction type of the word. The setting information may be pre-stored locally in the terminal device 400, or may be obtained by the terminal device 400 from the outside (such as the database 500) in real time. After obtaining the prediction types of the words in the text, the terminal device 400 may apply the prediction types to various scenarios of natural language processing, for example, providing services such as question and answer services and intelligent recommendation services.

In some embodiments, taking the electronic device provided by the present application as an example of a server, the text processing method based on artificial intelligence provided by the embodiments of the present application may also be implemented by the server. For example, the server 200 is used by the terminal device 400 to call and execute by running various forms of computer programs, such as a cloud computing program, so as to implement the scheme of text processing provided by the embodiment of the present application in cooperation with the terminal device 400. In the process of text processing, the server 200 acquires a text sent by the terminal device 400 in response to the call of the terminal device 400. Then, the server 200 updates the feature representation of the word according to the three-dimensional information for each word in the text, and performs type prediction processing according to the updated feature representation to obtain a prediction type of the word. The server 200 may send the obtained prediction type to the terminal device 400, so that the terminal device 400 provides services such as question answering service and intelligent recommendation service according to the prediction type; alternatively, the server 200 may locally provide services such as a question answering service and an intelligent recommendation service based on the obtained prediction type, so that the terminal device 400 can call the services. It is worth mentioning that the server 200 may also retrieve the text from other locations than the terminal device 400, for example from the database 500.

The terminal device 400 is used to display various results and final results in the text processing in the graphic interface 410. In fig. 2, taking the server 200 providing the question-answering service as an example, a question sent by the terminal device 400 to the server 200 in the process of invoking the question-answering service and an answer obtained by the server 200 inquiring in the knowledge graph according to the question are shown.

In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform, where the cloud service may be a text processing service that is called by the terminal device 400 to process a text sent by the terminal device 400, and send a prediction type of an obtained word to the terminal device 400. The terminal device 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

Taking the electronic device provided in the embodiment of the present application as an example for illustration, it can be understood that, for the case where the electronic device is a server, parts (such as the user interface, the presentation module, and the input processing module) in the structure shown in fig. 3 may be default. Referring to fig. 3, fig. 3 is a schematic structural diagram of a terminal device 400 provided in an embodiment of the present application, where the terminal device 400 shown in fig. 3 includes: at least one processor 410, memory 450, at least one network interface 420, and a user interface 430. The various components in the terminal 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in FIG. 3.

The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.

The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), and the volatile memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 453 for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices 431 (e.g., display screens, speakers, etc.) associated with user interface 430;

an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.

In some embodiments, the artificial intelligence based text processing apparatus provided by the embodiments of the present application can be implemented in software, and fig. 3 shows an artificial intelligence based text processing apparatus 455 stored in a memory 450, which can be software in the form of programs and plug-ins, and the like, and includes the following software modules: a first extraction module 4551, a second extraction module 4552, a third extraction module 4553 and a prediction module 4554, which are logical and thus can be arbitrarily combined or further split depending on the functions implemented. The functions of the respective modules will be explained below.

In other embodiments, the artificial intelligence based text processing apparatus provided in the embodiments of the present Application may be implemented in hardware, for example, the artificial intelligence based text processing apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the artificial intelligence based text processing method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The text processing method based on artificial intelligence provided by the embodiment of the application will be described in conjunction with exemplary application and implementation of the electronic device provided by the embodiment of the application.

Referring to fig. 4 and fig. 5A, fig. 4 is a schematic structural diagram of an artificial intelligence based text processing apparatus 455 provided in an embodiment of the present application, and illustrates a flow of text processing through a series of modules, and fig. 5A is a schematic flow diagram of an artificial intelligence based text processing method provided in an embodiment of the present application, and the steps illustrated in fig. 5A will be described with reference to fig. 4.

In step 101, a plurality of words belonging to the same sentence in the text are subjected to feature extraction processing, and feature representations of the plurality of words are obtained as sentence-level information of the sentence.

As an example, referring to fig. 4, in the first extraction module 4551, a text including a plurality of consecutive sentences is obtained, and feature extraction processing is performed on all words belonging to the same sentence in the text, so as to obtain respective corresponding feature representations of each word, the feature representations of the words collectively constituting sentence-level information of the sentence, where the feature representations are in a vector form. Thus, for each sentence in the text, corresponding sentence-level information is available. Here, the Word feature extraction process may be performed using a language model related to natural language processing, for example, a Word2vec model, a transform-based Bidirectional Encoder Representation (BERT) model, a Long Short-Term Me memory (LSTM) model, or the like, and the present embodiment is not limited thereto.

It should be noted that the sentences in the text may be divided by a set rule, for example, by punctuation marks such as commas and periods. In addition, for an English text, a word may be a word in the English text; for the chinese text, the word may be a character in the chinese text, or the word may be obtained by performing word segmentation processing on each sentence in the chinese text, where the word segmentation processing mode is not limited.

In step 102, a plurality of sentences in the text are subjected to feature extraction processing, and feature representations of the plurality of sentences are obtained as text-level information.

For example, feature extraction processing is performed on all sentences in the text to obtain feature representations corresponding to each sentence, and the feature representations of the sentences are text-level information constituting the text. Here, the manner of performing the feature extraction processing on the sentence is also not limited, and the feature extraction processing may be implemented by a BERT model, for example, or the feature representations of a plurality of words in the sentence may be spliced into the feature representation of the sentence, for example.

In step 103, setting characteristic representations of a plurality of words belonging to the same sentence in the text are acquired from the knowledge base as setting information of the sentence.

For example, referring to fig. 3, in the third extraction module 4553, a set feature representation of a plurality of words is stored in a knowledge base, for example, the knowledge base may be a WordNet base or a HowNet base. And for each sentence in the text, inquiring in a knowledge base according to the words in the sentence to obtain corresponding set characteristic representation. The setting characteristics of all words in the sentence represent the setting information of the sentence.

In step 104, for each word in the text, the feature representation of the word is updated according to the sentence level information of the sentence in which the word is located, the text level information, and the setting information of the sentence in which the word is located, and the type prediction processing is performed according to the updated feature representation of the word to obtain the predicted type of the word.

As an example, referring to fig. 3, in the prediction module 4554, for each word in the text, the feature representation of the word is updated in combination with the three-dimensional information of the sentence level information of the sentence in which the word is located, the text level information of the text, and the setting information of the sentence in which the word is located, so that the updated feature representation can more accurately express the semantic meaning of the word. Then, a type prediction process is performed based on the updated feature representation of the word, and a prediction type of the word is determined from the plurality of setting types. The type prediction processing method is not limited in the embodiment of the application, for example, by using a Softmax function, the updated feature representation of the word is mapped to the probabilities corresponding to the plurality of setting types, and the setting type with the maximum probability is determined as the prediction type; the updated feature representation is further processed, for example, by an LSTM model or a Conditional Random Field (CRF) model, and then probability mapped.

It should be noted that the setting type may be customized according to the actual application scenario of natural language processing, for example, in the event identification scenario, the setting types may include setting types such as birth events, movement events, attack events, and the like; in the entity identification scene, the setting types such as a person name, a place name, an organization name and the like can be included; in the context of relationship extraction, the setting types may include spouse, birth place, and nationality. In addition, setting types of different scenes may be used in combination, for example, setting types such as a person name, a place name, an organization name, a spouse, a place of birth, and a nationality are applied at the same time, and thus, when performing the type prediction processing, it is equivalent to performing entity recognition and relationship extraction at the same time.

As shown in fig. 5A, in the embodiment of the present application, the sentence-level information, the text-level information, and the setting information are combined to update the feature representation of the word, so that the updated feature representation can accurately represent the semantics of the word, the accuracy of the finally obtained prediction type is improved, and accurate and effective services such as question-answering service and intelligent recommendation service are provided conveniently.

In some embodiments, referring to fig. 5B, fig. 5B is an optional flowchart of the text processing method based on artificial intelligence provided in the embodiment of the present application, and step 104 shown in fig. 5A may be implemented by steps 201 to 206, which will be described in conjunction with the steps.

In step 201, a first level of association operations is performed on the feature representations of the words and sentence level information of sentences in which the words are located, resulting in a first intermediate vector.

As an example, referring to fig. 4, in the prediction module 4554, the feature representation of the words may be updated in a hierarchical progression manner while combining sentence level information, text level information, and setting information. For example, a first level of association operations is performed on the feature representations of the words and the sentence-level information of the sentences in which the words are located to learn associations between the feature representations of the words and the sentence-level information of the sentences in which the words are located, resulting in a first intermediate vector. Wherein the correlation operation may be performed using an attention mechanism in a neural network.

In step 202, a second level of correlation is performed on the first intermediate vector and the text-level information to obtain a second intermediate vector.

And after the first intermediate vector is obtained, executing the association operation of the second level on the first intermediate vector and the text-level information of the text to obtain a second intermediate vector. Wherein the association operation of the first hierarchy is similar in principle to the association operation of the second hierarchy, and the nomenclature is different only to distinguish different stages.

In step 203, the first intermediate vector and the second intermediate vector are spliced to obtain a spliced vector.

And before executing the association operation of the third level, splicing the first intermediate vector and the second intermediate vector to obtain a spliced vector. The purpose of this step is to merge the association learned in step 201 and step 202 into the association operation of the third level, so that the result obtained by the association operation of the third level can be close to the correct semantic meaning of the word.

In step 204, a third level of association operation is performed on the splicing vector and the setting information of the sentence where the word is located, so as to obtain a third intermediate vector.

Here, a third level of association operation is performed on the concatenation vector and the setting information of the sentence where the word is located, that is, the association between the concatenation vector and the setting information of the sentence where the word is located is learned, so as to obtain a third intermediate vector.

Of course, in the embodiment of the present application, a non-hierarchical progressive manner may also be applied, for example, a first-level association operation is performed on the feature representation of the word and the sentence-level information of the sentence in which the word is located, so as to obtain a first intermediate vector; performing second-level association operation on the feature representation of the words and the text-level information to obtain a second intermediate vector; and performing third-level association operation on the feature representation of the words and the setting information of the sentences in which the words are positioned to obtain a third intermediate vector.

In step 205, the feature representation of the word is updated based on the first intermediate vector, the second intermediate vector, and the third intermediate vector.

And updating the feature representation of the word according to the results obtained by the association operation of the three levels, namely the first intermediate vector, the second intermediate vector and the third intermediate vector. Here, the result obtained by stitching the first intermediate vector, the second intermediate vector, and the third intermediate vector may be directly used as the updated feature representation of the word.

In some embodiments, updating the feature representation of the word based on the first intermediate vector, the second intermediate vector, and the third intermediate vector as described above may be accomplished by: splicing the first intermediate vector, the second intermediate vector and the third intermediate vector, and performing weighting processing and activation processing on a result obtained by splicing; and splicing the result obtained by the activation processing and the characteristic representation of the word to update the characteristic representation of the word.

When the feature representation of the word is updated according to the first intermediate vector, the second intermediate vector, and the third intermediate vector, the first intermediate vector, the second intermediate vector, and the third intermediate vector may be spliced, then, the result obtained by the splicing is weighted, and the result obtained by the weighting is activated through an activation function. It should be noted that, in the embodiment of the present application, the network parameters used for the weighting process include a weight and an offset, that is, the process of weighting a certain object is to essentially perform an integration process on the object and the weight, and add the result of the integration process to the offset. In addition, the activation processing has the effect of improving the nonlinear relation between the data before the activation processing and the data after the activation processing, so that the network parameters can be conveniently trained, and the probability of the occurrence of the overfitting phenomenon is reduced. The embodiment of the present application does not limit the activation function used for the activation processing, and for example, a modified Linear Unit (ReLU) function may be used to perform the activation processing.

And after activation processing, splicing the obtained result and the feature representation of the word to obtain the updated feature representation of the word. Through the method, the feature representation of the words can be effectively updated by combining the feature representation of the words, the first intermediate vector, the second intermediate vector and the third intermediate vector.

In step 206, a type prediction process is performed based on the updated characterization of the term to obtain a predicted type of the term.

As shown in fig. 5B, in the embodiment of the present application, the feature representation of the word, the sentence level information, the text level information, and the setting information are associated in a hierarchical progressive manner to obtain an updated feature representation of the word, so that the updated feature representation can accurately represent the actual semantics of the word.

In some embodiments, referring to fig. 5C, fig. 5C is an optional flowchart illustrating the performing of the association operation provided in the embodiment of the present application, and step 201 shown in fig. 5B may be implemented by steps 301 to 305, which will be described in conjunction with the steps.

In step 301, a weighting process is performed on the feature representation of the sentence where the word is located according to the network parameter corresponding to the first level.

The embodiment of the application comprises three levels of association operations, the principle of the association operations of different levels is similar, and for the convenience of understanding, the association operation of the first level is taken as an example for explanation. Firstly, according to the network parameters corresponding to the first level, the characteristic representation of the sentence where the word is located is weighted, wherein the network parameters comprise weight and bias. It is worth to be noted that, in the association operation of the second level, the characteristic representation of the sentence where the word is located is weighted according to the network parameter corresponding to the second level; in the association operation of the third level, the characteristic representation of the sentence in which the word is located is weighted according to the network parameter corresponding to the third level.

In step 302, mapping the weighted feature representation to obtain probability distribution; each numerical value in the probability distribution corresponds to the selection probability of a set frequency threshold.

For example, the weighted feature representation is mapped to a probability distribution through a Softmax function, and each value in the probability distribution is a selection probability of a set time threshold. For example, the set times thresholds include 1, 2 and 3, and the mapped probability distribution is (20%, 60%, 20%), which means that there is a 20% probability picking times threshold 1, a 60% probability picking times threshold 2, and a 20% probability picking times threshold 3.

In step 303, a threshold of times is selected according to the probability distribution, so that the selected threshold of times is applied to the association operation of the first level.

Here, the number-of-times threshold is selected based on the obtained probability distribution, and the finally selected number-of-times threshold is used as the number-of-times threshold corresponding to the first level. Therefore, the corresponding time threshold of each level can be adaptively adjusted based on the complexity of the sentence where the word is located (embodied by the feature representation of the sentence), and the effect of executing the subsequent operation is improved.

In some embodiments, before step 301, further comprising: acquiring a text comprising a plurality of sample sentences and a set type of words in each sample sentence; determining an incentive value according to the set type of words in the sample sentences and the predicted prediction type of the words in each sample sentence, and fusing the incentive value and the times threshold selected from the three levels to obtain a target value; summing the target values corresponding to the multiple sample sentences, and updating the network parameters corresponding to the three levels according to the result obtained by the summation; wherein the three levels include a first level, a second level, and a third level.

In the embodiment of the present application, the network parameters corresponding to each hierarchy may be trained in advance. For example, a text including a plurality of consecutive sample sentences and a set type of a word in each sample sentence, which can be found by artificial labeling, are acquired. For this text, steps 101 to 104 (involving steps 301 to 305) are performed, resulting in a predicted type for each word in the text. And aiming at each sample sentence in the text, determining an incentive value according to the set type and the prediction type of the words in the sample sentence, and fusing the incentive value and the times threshold selected from the three levels to obtain a target value, so that the network parameters corresponding to the three levels can be optimized simultaneously. Thus, for each sample sentence, a corresponding target value is obtained. And then, summing the target values corresponding to the plurality of sample sentences, determining a gradient according to a result obtained by the summing, and updating the network parameters corresponding to the three levels along the descending or ascending direction of the gradient, wherein the result obtained by the summing is equivalent to a loss value.

It should be noted that, the network parameters are updated in the gradient descending or ascending direction, depending on the determination manner of the reward value and the target value, if the target value is positively correlated with the effect of the type prediction processing (the larger the target value is, the better the effect of the type prediction processing is), the network parameters are updated in the gradient ascending direction, so as to promote the subsequent target value as much as possible; if the target value is negatively correlated with the effect of the type prediction processing, the network parameters are updated along the gradient descending direction so as to reduce the subsequently obtained target value as much as possible. By the method, the network parameters of three levels can be effectively updated.

In some embodiments, determining the reward value according to the set type of words in the sample sentence and the predicted prediction type may be implemented in such a way that: determining the accuracy rate and the recall rate of type prediction processing on the words in the sample sentence according to the set types of the words in the sample sentence and the predicted prediction types; and carrying out harmonic average processing on the accuracy rate and the recall rate to obtain the reward value corresponding to the sample sentence.

After the set type and the prediction type of each word in the sample sentence are obtained, the Precision (Precision) and the Recall (Recall) corresponding to the sample sentence can be calculated, and the calculation mode of the Precision and the Recall is not described herein. Then, the precision rate and the recall rate are subjected to harmonic average processing to obtain F1 scores of the sample sentences, and the F1 scores are used as reward values, for example, one mode of harmonic average processing is as follows: the award value is (precision × recall) × 2/(precision + recall). Therefore, the accuracy and the recall rate of the obtained reward value can be considered, and the training effect can be improved when the network parameter training is carried out according to the reward value. It should be noted that, when training the network parameters based on such a reward value, the network parameters are updated in a direction in which the gradient increases.

In some embodiments, the above-mentioned fusion process of the bonus value and the number threshold selected in the three levels can be implemented in such a way that the target value is obtained: traversing the three levels, performing product processing on the reward value and a selected time threshold value in the traversed levels, and performing logarithm processing on a result obtained by the product processing to obtain sub-target values corresponding to the traversed levels; and summing the sub-target values corresponding to the three levels to obtain the target value.

The embodiment of the application provides an implementation manner of fusion processing, which is to traverse three levels, perform product processing on the reward value and a selected frequency threshold value in the traversed levels, where the product processing may be product processing, and then perform logarithm processing on a result obtained by the product processing to obtain sub-target values corresponding to the traversed levels. When the logarithm processing is performed, the base of the logarithm may be 2, a natural constant e or 10, and the like, which is not limited. And after the sub-target values corresponding to the three levels are obtained, summing the three sub-target values to obtain the target value corresponding to the sample sentence. By the mode, the target value can be obtained by combining the three levels, and the network parameters of the three levels can be optimized together conveniently in the follow-up process.

In step 304, in each association operation of the first level, performing attention coding on the query vector and sentence-level information of a sentence in which the word is located to obtain an attention result, performing splicing processing on the attention result and the query vector, and performing weighting processing and activation processing on the result obtained by the splicing processing to obtain a new query vector so as to execute the next association operation; where the initial query vector is consistent with the feature representation of the term.

Taking the first level as an example, in each association operation of the first level, the attention mechanism is utilized to perform attention coding on the query vector and sentence-level information of a sentence where the words are located so as to learn the association between the query vector and the sentence-level information, and an attention result is obtained, wherein the initial query vector is consistent with the feature representation of the words. And then, splicing the attention result and the query vector, and performing weighting processing and activation processing on the result obtained by splicing to obtain a new query vector so as to execute the next association operation.

It is worth mentioning that in the association operation of the second level, the query vector and the text level information are attention-coded, wherein the initial query vector in the second level is consistent with the first intermediate vector; in the association operation of the third level, the set information of the sentence where the query vector and the word are located is attention-coded, wherein the initial query vector in the third level is consistent with the splicing vector.

In some embodiments, the above-mentioned attention encoding of the query vector and the sentence-level information of the sentence in which the word is located can be implemented in such a way that an attention result is obtained: determining similarity between the query vector and each feature representation in the sentence-level information; carrying out normalization processing on each obtained similarity; and taking the similarity after the normalization processing as the weight of the corresponding feature representation in the sentence-level information, and carrying out weighted summation on a plurality of feature representations in the sentence-level information to obtain an attention result.

Here, an implementation of attention coding is provided. Firstly, the similarity between the query vector and each feature representation in the sentence-level information of the sentence in which the words are located is determined, and normalization processing is carried out on each similarity. The embodiment of the present application does not limit the normalization processing manner, for example, a certain similarity may be divided by the sum of all similarities, so as to complete the normalization processing on the similarity. And taking the similarity after the normalization processing as the weight of the corresponding feature representation in the sentence-level information, and carrying out weighted summation on all feature representations in the sentence-level information to obtain an attention result.

For example, in sentence level information, a feature representation R is included₁、R₂And R₃The similarity obtained by correspondence is S respectively₁、S₂And S₃Then the attention result may be R1 × S1/(S)₁+S₂+S₃)+R2×S2/(S₁+S₂+S₃)+R3×S3/(S₁+S₂+S₃). Through the method, the association between the query vector and the sentence-level information of the sentence where the word is located can be effectively learned.

In step 305, when the number of times of execution of the association operation reaches a threshold number of times, a query vector obtained by the last association operation is determined as a first intermediate vector.

And when the execution times of the association operation in the first level reach the time threshold corresponding to the first level, determining the query vector obtained by the last association operation in the first level as a first intermediate vector. The second and third levels are the same.

As shown in fig. 5C, in the embodiment of the present application, the time threshold of each level is adaptively adjusted according to the complexity of the sentence where the word is located, and the association operation is performed at each level, so that the text processing effect caused by too few or too many time thresholds is effectively avoided.

In some embodiments, referring to fig. 5D, fig. 5D is an optional flowchart of the text processing method based on artificial intelligence provided in the embodiment of the present application, and step 101 shown in fig. 5A may be implemented by steps 401 to 404, which will be described in conjunction with the steps.

In step 401, word embedding encoding is performed on a plurality of words belonging to the same sentence in the text, so as to obtain a word vector of each word.

As an example, referring to fig. 4, in the first extraction module 4551, word embedding encoding is performed on a plurality of words belonging to the same sentence in the text, and a word vector of each word is obtained, for example, word embedding encoding may be performed through a word2vec model or other models.

In step 402, a word vector of a plurality of words is memory-encoded according to the sequence from beginning to end of the sentence in the sentence, so as to obtain a first feature representation of each word.

In the embodiment of the present application, the word vectors of the words may be further encoded through an LSTM model, a Gated recursive Unit (Gated current Unit) model, or a transform model, so as to obtain the feature representations of the words.

Here, the description will be given taking an example of encoding a word vector by a Bi-directional Long Short-Term Memory (Bi-LSTM) model. The coding process comprises two aspects, wherein on one hand, the word vectors of a plurality of words are subjected to memory coding according to the sequence from beginning to end of the words in the sentence, and the first characteristic representation of each word is obtained.

In some embodiments, the above memory encoding of the word vectors of the plurality of words according to the sequence from beginning to end of the sentence in the sentence can be implemented in such a way that the first feature representation of each word is obtained: traversing a plurality of words in the sentence according to the sequence from the beginning to the end of the sentence in the sentence; performing full-connection processing on the word vector of the traversed word and the first feature representation of the previous traversed word for multiple times to respectively obtain a forgetting gate result, an input gate result, an output gate result and a candidate memory unit; performing the integration processing on the memory unit of the previous traversed word and the forgetting gate result, performing the integration processing on the candidate memory unit and the input gate result, and summing the results obtained by the two integration processing to obtain the memory unit of the traversed word; and activating the output gate result, and performing integration processing on the activated output gate result and the memory unit of the traversed word to obtain a first feature representation of the traversed word.

In the Bi-LSTM model, three gate functions are included, namely a forgetting gate, an input gate, and an output gate, and each gate function is equivalent to a full link layer. When memory coding is carried out according to the sequence from the beginning to the end of a sentence in a sentence, traversing a plurality of words in the sentence according to the sequence, and carrying out full connection processing on word vectors of the traversed words and first feature representations of the previous traversed words to obtain candidate memory units of the traversed words; meanwhile, carrying out full connection processing on the word vector of the traversed word and the first characteristic representation of the previous traversed word according to a forgetting gate to obtain a forgetting gate result, wherein the forgetting gate result is used for determining which contents in the memory unit of the previous traversed word are stored in the memory unit of the traversed word; performing full-connection processing on the word vector of the traversed word and the first characteristic representation of the previous traversed word according to an input gate to obtain an input gate result, wherein the input gate result is used for determining which contents in the candidate memory unit of the traversed word are stored in the memory unit of the traversed word; and performing full connection processing on the word vector of the traversed word and the first characteristic representation of the previous traversed word according to an output gate to obtain an output gate result, wherein the output gate result is used for determining which contents in the memory unit of the traversed word are stored in the first characteristic representation of the traversed word. Wherein, the memory unit is also called as cell state, which is a way for Bi-LSTM model to preserve memory.

After a forgetting gate result, an input gate result, an output gate result and a candidate memory unit are obtained, the memory unit of the previous traversed word is subjected to the integration processing with the forgetting gate result, the candidate memory unit of the traversed word is subjected to the integration processing with the input gate result, and the results obtained by the two integration processing are summed to obtain the memory unit of the traversed word. And then, activating the output gate result, performing integration processing on the activated output gate result and the memory unit of the traversed word to obtain a hidden layer representation of the traversed word, and taking the hidden layer representation as a first characteristic representation. The integration process may be a hadamard product, and the activation function used for the activation process may be a hyperbolic tangent function. By the mode, the word vectors of the traversed words and the first feature representation of the previous traversed words are combined to carry out memory coding, and the accuracy of the obtained first feature representation is improved.

In step 403, the word vectors of the plurality of words are memory-coded according to the sequence from end to beginning of the sentence in the sentence, so as to obtain a second feature representation of each word.

In another aspect of the encoding process, the word vectors of the plurality of words are memory encoded according to the sequence from end to beginning of the sentence in the sentence, resulting in a second feature representation of each word. Similarly, the memory coding can be performed in the sequence from end to end in the sentence through the Bi-LSTM model.

In step 404, the first feature representation and the second feature representation of the word are spliced into a feature representation of the word, and feature representations of a plurality of words belonging to the same sentence in the text are determined as sentence-level information of the sentence.

And for each word in the text, splicing the first characteristic representation and the second characteristic representation of the word to obtain the characteristic representation of the word. The feature representations of all words in the text belonging to the same sentence together constitute sentence-level information of the sentence.

In fig. 5D, step 102 shown in fig. 5A may be implemented by steps 405 to 406.

In step 405, for each sentence in the text, the first feature representation of the last word in the sentence and the second feature representation of the first word in the sentence are concatenated into a feature representation of the sentence.

In the embodiment of the present application, the feature representation of the word at the beginning of the sentence (i.e., the first word in the sentence) and the feature representation of the word at the end of the sentence (i.e., the last word in the sentence) may be concatenated to obtain the feature representation of the sentence. For example, a first feature representation of the last word in the sentence is concatenated with a second feature representation of the first word in the sentence into a feature representation of the sentence.

In step 406, the feature representations of the plurality of sentences in the text are determined as text level information.

For example, the feature representations of all sentences in the text are determined as text-level information of the text.

As shown in fig. 5D, in the embodiment of the present application, further encoding processing is performed on the word vectors of the words according to two directions, so that the obtained feature representation of the words can reflect the real semantics of the words, and the accuracy of the sentence-level information and the text-level information is improved.

In some embodiments, referring to fig. 5E, fig. 5E is an optional flowchart of the text processing method based on artificial intelligence provided in the embodiment of the present application, and step 104 shown in fig. 5A may be implemented through step 501 to step 504, which will be described in conjunction with each step.

In step 501, for each word in the text, the feature representation of the word is updated according to the sentence-level information of the sentence in which the word is located, the text-level information, and the setting information of the sentence in which the word is located.

In step 502, the updated feature representations of the words are memory coded according to the sequence from beginning to end of the sentence in the sentence, so as to obtain hidden layer representations of the words.

In embodiments of the present application, the updated feature representations of the words may be further processed to obtain prediction vectors, and the prediction types of the words may be determined based on the prediction vectors. For example, according to the sequence from beginning to end of the sentence in the sentence, the updated feature representation of the word is subjected to memory coding, and the hidden layer representation of the word is obtained.

In some embodiments, the above-mentioned memory encoding of the updated feature representation of the word according to the sequence from beginning to end of the sentence in the sentence may be implemented in such a way that a hidden representation of the word is obtained: traversing a plurality of words in the sentence according to the sequence from the beginning to the end of the sentence in the sentence; performing full-connection processing on the updated feature representation of the traversed words, the hidden representation of the previous traversed words and the prediction vectors of the previous traversed words for multiple times to respectively obtain a forgetting gate result, an input gate result, an output gate result and a candidate memory unit; performing the integration processing on the memory unit of the previous traversed word and the forgetting gate result, performing the integration processing on the candidate memory unit and the input gate result, and summing the results obtained by the two integration processing to obtain the memory unit of the traversed word; and activating the output gate result, and performing integration processing on the activated output gate result and the memory unit of the traversed words to obtain the hidden layer representation of the traversed words.

In the embodiment of the present application, the updated feature representation may be memory-coded by the LSTM model, and unlike the above memory-coding of the word vector, a prediction vector of a word traversed before may be introduced, where the determination of the prediction vector is described later. In the process of memorizing and coding, traversing a plurality of words in the sentence according to the sequence from the beginning to the end of the sentence in the sentence, and carrying out full connection processing on the updated feature representation of the traversed word, the hidden representation of the previous traversed word and the prediction vector of the previous traversed word for a plurality of times to respectively obtain a forgetting gate result, an input gate result, an output gate result and a candidate memory unit of the traversed word. And then, carrying out the integration processing on the memory unit of the previous traversed word and the forgetting gate result, carrying out the integration processing on the candidate memory unit of the traversed word and the input gate result, and summing the results obtained by the two integration processing to obtain the memory unit of the traversed word. And then, activating the output gate result, and performing product processing on the activated output gate result and the memory unit of the traversed word to obtain the hidden layer representation of the traversed word. By the memory coding mode, the accuracy of the obtained hidden layer representation can be improved.

In step 503, the hidden representation of the word is weighted to obtain a prediction vector of the word.

For each word in the text, the hidden representation of the word obtained in step 502 is weighted to obtain a prediction vector of the word.

In step 504, weighting the word prediction vectors, and normalizing each value in the weighted prediction vectors to obtain a normalized value; wherein, each value after normalization processing corresponds to a setting type.

Here, the prediction vector of the word is further weighted and each value in the weighted prediction vector is normalized, for example, a value in the weighted prediction vector may be divided by the sum of all values to complete the normalization of the value. Each numerical value in the weighted prediction vector corresponds to a set type, and after normalization processing, the numerical value after normalization processing represents the probability that the word belongs to the corresponding set type.

In step 505, the setting type corresponding to the maximum normalized value is determined as the prediction type of the word.

If the probability is higher, the predicted result is more accurate, so that the maximum normalized value is selected from the plurality of normalized values corresponding to the word, and the set type corresponding to the selected value is determined as the predicted type of the word.

As shown in fig. 5E, in the embodiment of the present application, by performing further memory coding and weighting processing on the updated feature representation of the word, the accuracy of the finally obtained prediction type is improved, and based on the obtained prediction type, services such as an accurate and effective question and answer service and an intelligent recommendation service can be provided.

The embodiment of the application can be applied to various application scenes of natural language processing. For example, if a sentence in the text is "zhang san sheng in china", the type prediction processing is performed to obtain the prediction type of the word "zhang san" as a name of a person, the prediction type of the word "sheng" is a place of birth, and the prediction type of the word "china" is a place name, an SPO triple "zhang san sheng di-china" may be constructed, added to the knowledge map, and provide question and answer service based on the knowledge map, for example, in response to a question of "where zhang san sheng is" or "where zhang san sheng is" to feed back an accurate answer "china".

For another example, in a practical application scenario, the source of the text is a service record of real customer service (e.g., artificial customer service of an online shopping platform), and by predicting the types of words in the text, a knowledge graph can be constructed, and question and answer service (intelligent customer service) can be provided based on the knowledge graph. Therefore, the experience and knowledge of real customer service can be fully learned, and the intelligent degree of intelligent customer service is improved. When the user calls the intelligent customer service, the user can quickly and accurately know the information of the actual demand through question and answer.

For another example, the text may be a piece of news, a sentence in the news is that "xx country aircraft bombs xx region, causing xx casualties", after the type prediction processing, the prediction type of the word "bombing" is obtained as an attack event, and then the news can be recommended to the user whose preference degree for the attack event is greater than a set threshold, so that intelligent recommendation is realized, and the reading interest of the user is met. The preference degree of the user to different setting types can be realized by means of portrait analysis and the like, for example, the preference degree of the user to a certain setting type is set to be positively correlated with the reading frequency of the user to news conforming to the setting type.

In the above scenario, the prediction of word types may be achieved by constructing a text processing model, which is described in detail below. For convenience of understanding, the scenario of event identification is taken as an example, but it should be understood that the embodiment of the present application can be applied to other scenarios such as entity identification and relationship extraction.

The embodiment of the present application provides a schematic diagram of text processing as shown in fig. 6, and logic of text processing in the embodiment of the present application may be integrated into one text processing model, and an input unstructured natural language text may be processed through the text processing model, and an event type (corresponding to the above prediction type) included in the natural language text and a trigger word of the event type (i.e., a word belonging to the event type) are identified. Based on the obtained event type and the trigger word, multiple services of natural language processing can be realized, for example, a knowledge graph is constructed to provide question-answering services, and for example, targeted recommendation of natural language texts is performed according to the event type, which is not limited to this.

The embodiment of the present application provides an architecture diagram of a text processing model as shown in fig. 7, where the text processing model may include the following four modules:

1) embedding Layer (Embedding Layer): vectorizing and expressing the input natural language text, namely embedding and coding the words corresponding to the text;

2) bidirectional long-short memory Layer (Bi-LSTM Layer): using a Bi-LSTM model to encode the vector output by the embedding layer and extracting the characteristics of the vector;

3) hierarchical multi-channel memory network and self-adaptive inference layer: the hierarchical multi-channel memory network is used for modeling different levels of information, including sentence-level information, chapter-level information (corresponding to the text-level information above) and vocabulary knowledge (corresponding to the setting information above), wherein the vocabulary knowledge can be obtained from an external knowledge base, such as a WordNet knowledge base. Meanwhile, in the layer, self-adaptive reasoning is realized according to the complexity of the natural language text.

4) Labeling Layer (labeling Layer): the type prediction process is performed using a one-way LSTM model.

Next, four modules included in the text processing model are explained:

1) in the embedding layer, since the input natural language text is discrete word symbols, the word symbols are converted into vector representations by the embedding layer. For example, the input natural language text is d ═

Wherein N is_sIs an integer greater than 1 and represents the number of sentences in the natural language text d. The ith sentence in d is represented as

Wherein i is an integer greater than 0, N_wIs an integer greater than 1 and represents the number of words in the ith sentence. In fig. 7, taking the type prediction process of the word in the ith sentence as an example, each word in the ith sentence is first converted into a vector (corresponding to the word vector above) by the embedding layer, for example, the word w_jThe corresponding vector is e_j。

2) In the bidirectional long-and-short-term memory layer, a Bi-LSTM model is used to capture semantic information of words, wherein the Bi-LSTM model may be replaced by another model capable of capturing semantic information, such as a GRU model or a transform model, and the Bi-LSTM model is only used as an example for description here. The Bi-LSTM model comprises a forward LSTM model and a backward LSTM model, wherein the forward LSTM model is the sequence from the beginning to the end of a sentence in the sentence, and the backward LSTM model is the sentenceThe sequence of the tail to the head of the sentence in the son. For the word w_jE is given by the forward LST M model_jIs coded into

E is transformed by the inverse LSTM model_jIs coded into

Then, will

And

spliced together as word w_jCoded representations, i.e.

Herein, the

And h_jA first characteristic representation, a second characteristic representation and a characteristic representation, respectively, corresponding to the words above. For sentence s_iThe expression obtained by using B i-LSTM model coding is shown as

Herein, the

I.e. the feature representation corresponding to the sentence above.

3) In the hierarchical multi-channel memory network and the adaptive inference layer, the hierarchical multi-channel memory network includes a sentence-level channel, a chapter-level channel and a vocabulary knowledge channel, which correspond to the first, second and third levels of the above, respectively, and the three channels are similar in structure, so that the sentence-level channel is taken as an example for detailed explanation.

The sentence level channel mainly comprises three sub-modules: sentence-level memory, multi-round reasoning and an adaptive round number determiner. The sentence-level memory is used for storing initial sentence-level information; multi-round reasoning is used to learn the most effective clues from sentence-level information; the self-adaptive round number decider is used for automatically deciding the number of the inferred rounds according to the complexity of the input content. The multi-round reasoning and self-adaptive round number decider jointly construct a self-adaptive multi-round reasoning mechanism. Next, three sub-modules included in the sentence-level channel are set forth:

sentence-level memory: for the current sentence s_iWord w to be type predicted_jThe sentence s_iCharacteristic representation of all words

As sentence level information, expressed as

d_hRepresenting the dimensions of the vector output by the Bi-LSTM model. It should be noted that, in the chapter-level memory, the features of all sentences in the natural language text d are expressed, i.e. the characters of all sentences in the natural language text d are expressed

As chapter-level information; and storing the setting characteristic expressions which are acquired from the knowledge base and correspond to all the words in the ith sentence in a vocabulary knowledge memory as vocabulary knowledge.

Secondly, multi-round reasoning: in the sentence level channel, contain K_sRound of reasoning, K_sIs an integer greater than 0, and is determined by the adaptive round number determiner, and one round of reasoning corresponds to one correlation operation above. For words w to be type predicted_jIts characteristic is represented by h_jAs an initial query vector

For the k-th round of reasoning, query vectors are used

Deducing sentence level lines with sentence level information SM

Wherein K is greater than 0 and not more than K_sThe integer of [:]it is shown that the splicing operation is performed,

W_sand b_sIs a trainable network parameter, in particular W_sIs a weight, b_sIs an offset.

The sentence-level information and the query vector are weighted to obtain a result, and the result can be obtained by the following formula:

wherein, W_saAnd b_saIs a network parameter that is trainable,

query vector for k-th order for weighing jth term and feature representation h of tth word_tThe degree of closeness (degree of similarity),

the calculation process of (a) corresponds to the attention layer in fig. 7. Through K_sAfter the round of reasoning, the output of the last round of reasoning is output

As the clue information of sentence level, the clue information of sentence level corresponds to the above first intermediate vector.

The adaptive wheel number determiner: for automatically determining the number of inference rounds according to the complexity of the input content. In the embodiment of the present application, the adaptive round number determiner can be constructed by using the principle of reinforcement learning, and mainly relates to the state s_sAction a_sStrategy n_sAnd a prize value r_s。

The state is as follows: for words from sentence s_iWord w_jHere, the sentence s is used_iIs characterized by

As state s_s。

The actions are as follows: using the set number of inference rounds as action, i.e. a_s∈{1,2,…,N_pIn which N is_pAn integer greater than 1 refers to the number of the set maximum number of inference rounds.

Strategy: depending on the complexity of the input content, a policy network is used to select the appropriate action, i.e.

π_s(a_s|s_s)＝Softmax(W_πss_s+b_πs)

Wherein, W_πsAnd b_πsI.e. the network parameters corresponding to the above first level, can be optimized for training. After being processed by the Softmax function, a probability distribution is output, each numerical value in the probability distribution is the probability that the corresponding inference round number is selected, for example, in the probability distribution, the inference round number N_pThe corresponding probability is 0.6, which means that there is a probability of 0.6 to select the number of inference rounds N_p。

The reward value is as follows: in embodiments of the present application, the reward values are used to train network parameters in the policy network. The reward value may be calculated based on the set type of words in the sample sentence and the type of predicted event, and may be a precision rate, a recall rate, an F1 score, or the like.

Given T sample sentences, the objective function corresponding to the policy network in the sentence-level channel can be represented as:

here, a_s(l)、s_s(l)And r_s(l)Respectively corresponding actions (namely the number of the selected inference rounds), states and reward values of the ith sample sentence in the sentence-level channel. In the embodiment of the application, the network parameter theta of the policy network in the sentence-level channel can be updated in a policy gradient manner_s，θ_sI.e. represents W_πsAnd b_πs。

Correspondingly, the objective functions corresponding to the policy networks in the discourse-level channel and the vocabulary knowledge channel are respectively as follows:

here, a_d(l)、s_d(l)And r_d(l)The corresponding action, state and reward value of the ith sample sentence in the chapter level channel are respectively. a is_lk(l)、s_lk(l)And r_lk(l)Respectively corresponding action, state and reward value of the ith sample sentence in the vocabulary knowledge channel. It should be noted that, when the text processing model is trained, the loss functions of all network layers in the text processing model may be added together, so as to perform joint optimization.

Similar to the sentence level channel, clue information of chapter level channel can be obtained

(corresponding to the second intermediate vector above) and clue information of lexical knowledge channel

(corresponding to the third intermediate vector above). Then, bus index information for three channels is calculated:

here, W_acAnd b_acAre trainable network parameters.

For words w to be type predicted_jWill mean the word w_jCorresponding word vector e_jAnd bus index information cl_jSpliced together to obtain word w_jIs updated to represent xr_jI.e. by

xr_j＝[e_j∶cl_j]

4) In the annotation layer, the event type of each word in the ith sentence is automatically annotated, and here, the automatic annotation can be implemented by using a model such as an LSTM model or a CRF model, and for the convenience of understanding, a unidirectional L STM model is taken as an example for explanation. For word w_jThe labeling process is as follows:

wherein i_j、f_jAnd o_jRespectively an input gate result, a forgotten gate result and an output gate result,

is the word w_jOr temporary memory cell, i_jIn the calculation formula

And

is a trainable network parameter, and so on, σ represents a Sigmoid function,

representing a hyperbolic tangent function (tanh function),. indicates a hadamard product. Obtained c_jIs the word w_jThe memory cell of (a) is provided,

is the word w_jHidden layer representation of (1), T_jIs the word w_jThe prediction vector of (2).

Based on the obtained prediction vector, the final prediction result can be obtained:

O_t＝W_yT_t+b_y

wherein the content of the first and second substances,

representing a sentence s_iWord w in_jProbability of belonging to the t-th label (the probability corresponds to the above normalized value), N_tLabel indicating setting (correspondence to above)Set type of text) of the text. For the word w_jIn particular, in N_tAnd selecting the label with the highest probability as the event label of the word from the labels. As shown in fig. 7, the word w_jThe event tag of (1) is "B-Attack", meaning the word w_jIs a trigger of an attack event and is the initial part of the attack event; word w₁The event label of (a) is "O", meaning the word w₁Not belonging to any event.

For ease of understanding, the source of the natural language text is a news Application (APP), i.e., the natural language text is news. After the news is edited, relevant personnel (e.g., editors) of the news APP can publish the news to a backend server of the news APP, where the text processing model shown in fig. 7 is deployed in the backend server, and of course, the text processing model may also be deployed in the terminal device, which is described here by taking the server as an example. The background server performs a series of processing on the news through the text processing model to obtain event tags of each word in the news, and based on the event tags, the background server can further determine what types of events the news comprises. For example, if the event tag for a word in the news is "B-Attack," then the news is determined to include an Attack event.

According to the event included in the news, the background server can purposefully send the news to the news APP client of the user interested in the event. For example, the background server prestores preference information of a plurality of users, wherein the preference degree of the user A to the wedding event is high, and the preference degree of the user A to the attacking event is low; the preference degree of the user B for the wedding event is low, and the preference degree of the user B for the attacking event is high, wherein the preference degree of the user for the specific event can be determined according to the reading frequency of the user on the news APP client for the news comprising the specific event. Meanwhile, the recommendation rule set in the background server is as follows: and sending news including a certain event to a news APP client of a user with high preference degree to the event. For news comprising attack events, the background server sends the news to a news APP client of the user B so as to present the news APP client; for news including wedding events, the backend server sends it to the news APP client of user a. Therefore, when news recommendation is carried out, the preference of different users can be met, and the intelligent degree of recommendation service is improved.

Compared with the scheme provided by the related technology, the method and the device for predicting the type of the input content fuse the multi-source information, achieve the self-adaptive reasoning function according to the complexity of the input content, and can improve the accuracy of type prediction. The inventor verifies on an English international published Event recognition data set according to the text processing model shown in FIG. 7, the models for comparison include a Joint Event recognition (Joint Event extreme) model for executing an Event recognition task, a Cross entity model, a DMCNN model, a Joint Event recognition model based on a Recurrent Neural Network (JR NN) model, a Recurrent Neural Network based on dependence analysis (DBRNN) model, a Crosstent model, a JointDEE model, a Trigger Detection (TD-DMN) model based on a Dynamic Memory Network, a Bias Network (HBT) model based on a retrieval and Network Knowledge retrieval and Knowledge retrieval system (HBT-learning and Knowledge retrieval-learning Multi-level database) model based on a Dynamic Memory Network, KBLSTM) model and the PSL-ANN model. The experimental results are shown below:

the units of the values in the table are percentages (%). As can be seen, in the international published data set, compared with the event recognition model provided in the related art, all indexes (accuracy, recall rate, and F1 score) of the text processing model provided in the embodiment of the present application have advantages, that is, the embodiment of the present application can improve the accuracy of event recognition.

In addition, for the text processing model shown in fig. 7, the inventors performed comparative experiments (based on the international published event recognition data set) on the channels applied in the hierarchical multi-channel memory network and the adaptive inference layer, and the experimental results are as follows:

the sentence-level channel and the chapter-level channel are used independently, and results obtained by the two channels are spliced or otherwise processed; "sentence-level channel + chapter-level channel + hierarchical manner" means that the sentence-level channel and chapter-level channel are utilized in a hierarchical progressive manner, for example, the output of the sentence-level channel is used as the input of the chapter-level channel, and so on. Therefore, the optimal effect can be achieved by a hierarchical progressive mode and simultaneously utilizing a sentence level channel, a chapter level channel and a vocabulary knowledge channel.

In addition, for the text processing model shown in fig. 7, the inventor performed comparative experiments (based on the international published event recognition data set) on the number of inference rounds selected in the hierarchical multi-channel memory network and the adaptive inference layer, and the experimental results are as follows:

therefore, in a certain range, the performance of the text processing model can be improved by increasing the number of inference rounds; if the number of the inference rounds is increased continuously, the performance of the model is reduced, because the phenomenon of overfitting is caused by too many inference rounds. Compared with a fixed inference round number, the self-adaptive multi-round inference mechanism constructed in the embodiment of the application can automatically determine a proper inference round number according to the complexity of input contents, and can improve the performance of a model to a certain extent.

Continuing with the exemplary structure of the artificial intelligence based text processing apparatus 455 provided by the embodiments of the present application as implemented as software modules, in some embodiments, as shown in fig. 3, the software modules stored in the artificial intelligence based text processing apparatus 455 of the memory 450 may include: a first extraction module 4551, configured to perform feature extraction processing on multiple words belonging to the same sentence in a text to obtain feature representations of the multiple words, where the feature representations are used as sentence-level information of the sentence; a second extraction module 4552, configured to perform feature extraction processing on multiple sentences in the text to obtain feature representations of the multiple sentences, where the feature representations are used as text-level information; a third extraction module 4553, configured to obtain setting feature representations of multiple words belonging to the same sentence in the text from the knowledge base, so as to serve as setting information of the sentence; and the prediction module 4554 is configured to update, for each word in the text, the feature representation of the word according to the sentence-level information of the sentence in which the word is located, the text-level information, and the setting information of the sentence in which the word is located, and perform type prediction processing according to the updated feature representation of the word to obtain a predicted type of the word.

In some embodiments, the prediction module 4554 is further configured to: performing first-level association operation on the feature representation of the words and sentence-level information of sentences in which the words are located to obtain a first intermediate vector; performing second-level association operation on the first intermediate vector and the text-level information to obtain a second intermediate vector; splicing the first intermediate vector and the second intermediate vector to obtain a spliced vector; executing third-level correlation operation on the splicing vector and the set information of the sentence where the word is located to obtain a third intermediate vector; and updating the feature representation of the words according to the first intermediate vector, the second intermediate vector and the third intermediate vector.

In some embodiments, the first hierarchy includes at least one association operation; the prediction module 4554 is further configured to: in each association operation of the first level, performing attention coding on the query vector and sentence-level information of a sentence where the word is located to obtain an attention result, performing splicing processing on the attention result and the query vector, and performing weighting processing and activation processing on the result obtained by the splicing processing to obtain a new query vector so as to execute the next association operation; when the execution times of the association operation reach a time threshold value, determining a query vector obtained by the last association operation as a first intermediate vector; where the initial query vector is consistent with the feature representation of the term.

In some embodiments, the prediction module 4554 is further configured to: determining similarity between the query vector and each feature representation in the sentence-level information; carrying out normalization processing on each obtained similarity; and taking the similarity after the normalization processing as the weight of the corresponding feature representation in the sentence-level information, and carrying out weighted summation on a plurality of feature representations in the sentence-level information to obtain an attention result.

In some embodiments, the prediction module 4554 is further configured to: according to the network parameters corresponding to the first level, carrying out weighting processing on the characteristic representation of the sentence where the word is located; mapping the weighted feature representation to obtain probability distribution; each numerical value in the probability distribution corresponds to the selection probability of a set frequency threshold; and selecting a time threshold according to the probability distribution so as to apply the selected time threshold to the correlation operation of the first level.

In some embodiments, artificial intelligence based text processing device 455 further comprises: the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring texts comprising a plurality of sample sentences and setting types of words in each sample sentence; the reward determining module is used for determining a reward value according to the set type and the predicted prediction type of the words in the sample sentences aiming at each sample sentence, and fusing the reward value and the times threshold value selected from the three levels to obtain a target value; the parameter updating module is used for summing the target values corresponding to the sample sentences and updating the network parameters corresponding to the three levels according to the result obtained by summing; wherein the three levels include a first level, a second level, and a third level.

In some embodiments, the reward determination module is further to: determining the accuracy rate and the recall rate of type prediction processing on the words in the sample sentence according to the set types of the words in the sample sentence and the predicted prediction types; and carrying out harmonic average processing on the accuracy rate and the recall rate to obtain the reward value corresponding to the sample sentence.

In some embodiments, the reward determination module is further to: traversing the three levels, performing product processing on the reward value and a selected time threshold value in the traversed levels, and performing logarithm processing on a result obtained by the product processing to obtain sub-target values corresponding to the traversed levels; and summing the sub-target values corresponding to the three levels to obtain the target value.

In some embodiments, the prediction module 4554 is further configured to: splicing the first intermediate vector, the second intermediate vector and the third intermediate vector, and performing weighting processing and activation processing on a result obtained by splicing; and splicing the result obtained by the activation processing and the characteristic representation of the word to update the characteristic representation of the word.

In some embodiments, the first extraction module 4551 is further configured to: respectively carrying out word embedding coding on a plurality of words belonging to the same sentence in the text to obtain a word vector of each word; according to the sequence from the beginning to the end of the sentence in the sentence, carrying out memory coding on the word vectors of a plurality of words to obtain a first characteristic representation of each word; according to the sequence from the tail of a sentence to the head of the sentence in the sentence, carrying out memory coding on the word vectors of a plurality of words to obtain a second characteristic representation of each word; and splicing the first characteristic representation and the second characteristic representation of the word into the characteristic representation of the word.

In some embodiments, the first extraction module 4551 is further configured to: traversing a plurality of words in the sentence according to the sequence from the beginning to the end of the sentence in the sentence; performing full-connection processing on the word vector of the traversed word and the first feature representation of the previous traversed word for multiple times to respectively obtain a forgetting gate result, an input gate result, an output gate result and a candidate memory unit; performing the integration processing on the memory unit of the previous traversed word and the forgetting gate result, performing the integration processing on the candidate memory unit and the input gate result, and summing the results obtained by the two integration processing to obtain the memory unit of the traversed word; and activating the output gate result, and performing integration processing on the activated output gate result and the memory unit of the traversed word to obtain a first feature representation of the traversed word.

In some embodiments, the second extraction module 4552 is further configured to: and for each sentence in the text, splicing the first characteristic representation of the last word in the sentence and the second characteristic representation of the first word in the sentence into the characteristic representation of the sentence.

In some embodiments, the prediction module 4554 is further configured to: according to the sequence from beginning to end of the sentence in the sentence, carrying out memory coding on the updated feature representation of the words to obtain the hidden layer representation of the words; carrying out weighting processing on the hidden layer representation of the words to obtain a prediction vector of the words; weighting the prediction vectors of the words, and normalizing each numerical value in the weighted prediction vectors to obtain a normalized numerical value; each normalized value corresponds to a set type; and determining the set type corresponding to the maximum normalized numerical value as the prediction type of the word.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the artificial intelligence based text processing method according to the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, an artificial intelligence based text processing method as shown in fig. 5A, 5B, 5D, and 5E. Note that the computer includes various computing devices including a terminal device and a server.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, may be stored in a portion of a file that holds other programs or data, e.g., in one or more scripts in a HyperText markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, the following technical effects can be achieved through the embodiments of the present application:

1) sentence level information, text level information and setting information are combined in a hierarchical progressive mode, namely the feature representation of the words is updated according to the multi-source information, so that the updated feature representation can accurately represent the semantics of the words, the accuracy of the finally obtained prediction type is improved, accurate and effective question and answer services, intelligent recommendation services and other services are conveniently provided, and the intelligent degree of the services is enhanced.

2) According to the complexity of the sentence where the word is located, the threshold of the number of times of executing the association operation in each level is self-adaptively adjusted, and the problem that the text processing effect is poor due to too few or too many times of executing the association operation is effectively avoided.

3) During feature extraction, the word vectors are further encoded according to two directions, so that the obtained feature representation of the words can reflect the real semantics of the words, and the accuracy of the obtained sentence-level information and text-level information is improved.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A text processing method based on artificial intelligence is characterized by comprising the following steps:

2. The method of claim 1, wherein the updating the feature representation of the word according to the sentence-level information of the sentence in which the word is located, the text-level information, and the setting information of the sentence in which the word is located comprises:

performing first-level association operation on the feature representation of the words and sentence-level information of sentences in which the words are located to obtain first intermediate vectors;

performing second-level association operation on the first intermediate vector and the text-level information to obtain a second intermediate vector;

splicing the first intermediate vector and the second intermediate vector to obtain a spliced vector;

performing third-level association operation on the splicing vector and the setting information of the sentence where the word is located to obtain a third intermediate vector;

updating a feature representation of the word according to the first intermediate vector, the second intermediate vector, and the third intermediate vector.

3. The text processing method according to claim 2,

the first hierarchy comprises at least one association operation;

the performing a first-level association operation on the feature representation of the word and sentence-level information of a sentence in which the word is located to obtain a first intermediate vector includes:

in each association operation of the first level, attention coding is carried out on the query vector and sentence-level information of a sentence where the words are located to obtain an attention result, and the attention result is obtained

Splicing the attention result and the query vector, and performing weighting processing and activation processing on the result obtained by splicing to obtain a new query vector so as to execute the next correlation operation;

when the execution times of the association operation reach a time threshold value, determining a query vector obtained by the last association operation as a first intermediate vector;

wherein the initial query vector is consistent with the feature representation of the term.

4. The method of claim 3, wherein the attention coding the query vector and the sentence-level information of the sentence in which the word is located to obtain the attention result comprises:

determining a similarity between the query vector and each feature representation in the sentence-level information;

normalizing each obtained similarity;

the similarity after normalization processing is taken as the weight of the corresponding feature representation in the sentence-level information, and

and carrying out weighted summation on the plurality of feature representations in the sentence-level information to obtain an attention result.

5. The text processing method according to claim 3, further comprising:

according to the network parameters corresponding to the first level, carrying out weighting processing on the characteristic representation of the sentence where the word is located;

mapping the weighted feature representation to obtain probability distribution; each numerical value in the probability distribution corresponds to the selection probability of a set frequency threshold;

and selecting a time threshold according to the probability distribution so as to apply the selected time threshold to the correlation operation of the first level.

6. The text processing method according to claim 5, further comprising:

acquiring a text comprising a plurality of sample sentences and a set type of words in each sample sentence;

for each sample sentence, determining an award value according to the set type of words in the sample sentence and the predicted prediction type, and

fusing the reward value and a frequency threshold value selected from three levels to obtain a target value;

summing the target values corresponding to the sample sentences, and updating the network parameters corresponding to the three levels according to the result obtained by the summation;

wherein the three levels include the first level, the second level, and the third level.

7. The text processing method according to claim 6,

the determining the reward value according to the set type and the predicted prediction type of the words in the sample sentence comprises:

determining the accuracy and recall rate of type prediction processing on the words in the sample sentence according to the set types and the predicted prediction types of the words in the sample sentence;

carrying out harmonic and average processing on the accuracy rate and the recall rate to obtain an award value corresponding to the sample sentence;

the step of fusing the reward value and the times threshold selected from the three levels to obtain the target value comprises the following steps:

traversing the three levels, performing product processing on the reward value and a selected time threshold value in the traversed levels, and performing product processing on the reward value and the selected time threshold value

Carrying out logarithm processing on the result obtained by the product processing to obtain the sub-target value corresponding to the traversed level;

and summing the sub-target values corresponding to the three levels to obtain a target value.

8. The method of claim 2, wherein updating the feature representation of the word based on the first intermediate vector, the second intermediate vector, and the third intermediate vector comprises:

splicing the first intermediate vector, the second intermediate vector and the third intermediate vector, and performing weighting processing and activation processing on a result obtained by splicing;

and splicing the result obtained by the activation processing and the characteristic representation of the word to update the characteristic representation of the word.

9. The method according to any one of claims 1 to 8, wherein said performing feature extraction on a plurality of words belonging to the same sentence in the text to obtain feature representations of the plurality of words comprises:

respectively carrying out word embedding coding on a plurality of words belonging to the same sentence in the text to obtain a word vector of each word;

according to the sequence from beginning to end of the sentence in the sentence, carrying out memory coding on the word vectors of the words to obtain a first characteristic representation of each word;

according to the sequence from the tail of the sentence to the head of the sentence in the sentence, carrying out memory coding on the word vectors of the words to obtain a second characteristic representation of each word;

and splicing the first characteristic representation and the second characteristic representation of the word into the characteristic representation of the word.

10. The method of claim 9, wherein said memory encoding a word vector of the plurality of words according to an order from beginning to end of the sentence in the sentence to obtain a first feature representation of each word comprises:

traversing a plurality of words in the sentence according to the sequence from the beginning to the end of the sentence in the sentence;

performing full-connection processing on the word vector of the traversed word and the first feature representation of the previous traversed word for multiple times to respectively obtain a forgetting gate result, an input gate result, an output gate result and a candidate memory unit;

performing an integration process on the memory unit of the previous traversed word and the forgetting gate result, performing an integration process on the candidate memory unit and the input gate result, and performing an integration process on the candidate memory unit and the input gate result

Summing the results obtained by the two product processes to obtain a memory unit of the traversed words;

and activating the output gate result, and performing product processing on the activated output gate result and the memory unit of the traversed word to obtain a first feature representation of the traversed word.

11. The method according to claim 9, wherein said performing feature extraction processing on a plurality of sentences in the text to obtain feature representations of the plurality of sentences comprises:

for each sentence in the text, the first feature representation of the last word in the sentence and the second feature representation of the first word in the sentence are spliced into the feature representation of the sentence.

12. The method according to any one of claims 1 to 8, wherein said performing type prediction processing according to the updated feature representation of the word to obtain a predicted type of the word comprises:

according to the sequence from beginning to end of the sentence in the sentence, carrying out memory coding on the updated feature representation of the word to obtain the hidden layer representation of the word;

carrying out weighting processing on the hidden layer representation of the words to obtain a prediction vector of the words;

weighting the prediction vectors of the words, and

normalizing each numerical value in the weighted prediction vector to obtain a normalized numerical value; each normalized numerical value corresponds to a set type;

and determining the set type corresponding to the maximum numerical value after the normalization processing as the prediction type of the words.

13. An artificial intelligence based text processing apparatus, comprising:

14. An electronic device, comprising:

a memory for storing executable instructions;

a processor for implementing the artificial intelligence based text processing method of any one of claims 1 to 12 when executing executable instructions stored in the memory.

15. A computer-readable storage medium storing executable instructions for implementing the artificial intelligence based text processing method of any one of claims 1 to 12 when executed by a processor.