CN113761829A

CN113761829A - Natural language processing method, device, equipment and computer readable storage medium

Info

Publication number: CN113761829A
Application number: CN202110457927.8A
Authority: CN
Inventors: 刘知远; 陈泽; 韩旭; 林衍凯; 李鹏; 孙茂松; 周杰
Original assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd
Current assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2021-12-07

Abstract

The application provides a natural language processing method, a device, equipment and a computer readable storage medium; relates to the technical field of artificial intelligence, and the method comprises the following steps: obtaining a hyperbolic word vector sequence according to a preset word vector table; performing attention coding on the current hyperbolic word vector to obtain attention coding features corresponding to the current hyperbolic word vector; performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining an output feature vector obtained after the linear transformation matrix model processes the input feature vector in the hyperbolic space to be located in the hyperbolic space; and obtaining a target processing result corresponding to the statement to be processed based on the linear coding characteristics. By the method and the device, the stability and the efficiency of natural language processing by utilizing the hyperbolic neural network can be improved.

Description

Natural language processing method, device, equipment and computer readable storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a natural language processing method, apparatus, device, and computer-readable storage medium.

Background

In recent years, more and more work has explored how to learn complex data structure representations with non-euclidean geometric features in hyperbolic space. Current artificial intelligence efforts have shown that hyperbolic geometries can provide more flexibility than euclidean geometries when modeling complex data structures.

At present, when a hyperbolic neural network performs data processing, such as data processing in a natural language processing scene, it performs operations by mapping and converting points between a hyperbolic space and an euclidean space by using exponential mapping and logarithmic mapping: the method comprises the steps of mapping points in a hyperbolic space to a point tangent space by using logarithmic mapping, performing necessary Euclidean neural operation (such as matrix vector multiplication) in the tangent space, and performing exponential mapping on a result to obtain corresponding points in the hyperbolic space, so that the operation of the hyperbolic neural network is defined in a mixed mode. However, logarithmic and exponential mappings require a series of hyperbolic and hyperbolic functions. The components of the functions are quite complex, and the value range is usually infinite, so that the stability and convergence of the hyperbolic neural network are seriously weakened, and the stability and efficiency of natural language processing by using the hyperbolic neural network are influenced.

Disclosure of Invention

Embodiments of the present application provide a natural language processing method, an apparatus, and a computer-readable storage medium, which can improve stability and efficiency of natural language processing using a hyperbolic neural network.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a natural language processing method, which comprises the following steps:

obtaining a hyperbolic word vector sequence corresponding to a sentence to be processed according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence conforms to a Lorentz model;

performing attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector;

performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model;

and decoding and predicting the linear coding features to obtain a current processing result of the current hyperbolic word vector, and obtaining a target processing result corresponding to the to-be-processed statement by processing each hyperbolic word vector.

An embodiment of the present application provides a natural language processing apparatus, including: .

A hyperbolic vector extraction module, an attention module, a hyperbolic transformation module and a decoding prediction module, wherein,

the hyperbolic vector extraction module is used for obtaining a hyperbolic word vector sequence corresponding to a statement to be processed according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence conforms to a Lorentz model;

the attention module is used for carrying out attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector;

the hyperbolic linear transformation module is used for performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model;

and the decoding prediction module is used for decoding and predicting the linear coding features to obtain a current processing result of the current hyperbolic word vector, and obtaining a target processing result corresponding to the statement to be processed by processing each hyperbolic word vector.

In the above apparatus, the attention module is further configured to perform key-value pair conversion on the hyperbolic word vector sequence to obtain a key set, a value set, and an inquiry set corresponding to the hyperbolic word vector sequence; the key set comprises key vectors corresponding to each hyperbolic word vector; the query set comprises query word vectors corresponding to each hyperbolic word vector; acquiring a current query word vector corresponding to the current hyperbolic word vector from the query set; obtaining an attention weight set corresponding to the current hyperbolic word vector by calculating the Lorentz square distance between the current query word vector and each key vector in the key set; obtaining the attention coding feature through a preset hyperbolic space centroid calculation method based on the attention weight set and the value set; the preset hyperbolic space centroid calculation method is used for calculating the centroid position of the hyperbolic space through weighted summation.

In the above apparatus, the attention module is further configured to calculate a lorentz squared distance between the current query word vector and each key vector; and negating the Lorentz square distance, and performing normalization exponential processing on the ratio of a negation result to a preset constant to obtain the attention weight of each key vector relative to the current query word vector as the attention weight set.

In the above apparatus, the attention module is further configured to perform weighted summation on the value set according to the attention weight set to obtain an initial attention feature; calculating an attention weight normalization factor based on the preset curvature of the hyperbolic space, the attention weight set and the value set; the attention weight normalization factor is used to constrain the initial attention feature to be in hyperbolic space; and taking the ratio of the initial attention feature to the attention weight normalization factor as the attention coding feature corresponding to the current hyperbolic word vector.

In the above apparatus, the hyperbolic vector extraction module is further configured to, after obtaining a hyperbolic word vector sequence corresponding to a to-be-processed sentence according to a preset word vector table in a hyperbolic space, transmit a position embedding vector corresponding to each hyperbolic word vector in the hyperbolic word vector sequence to a plurality of tangent spaces corresponding to a plurality of points in the hyperbolic space through lorentz parallel transmission, so as to obtain a plurality of position vector information corresponding to each hyperbolic word vector; updating each hyperbolic word vector using the plurality of location vector information.

In the above apparatus, the natural language processing apparatus further includes a preset residual network, where the preset residual network is configured to perform linear transformation on the attention coding feature through a preset lorentz transformation function to obtain a linear coding feature corresponding to the attention coding feature, and then perform linear correction on the linear coding feature to obtain a coding residual correction result corresponding to the linear coding feature; combining the coding residual error correction result with the attention coding feature to obtain an intermediate linear coding feature; updating the linear coding feature according to a first constraint factor and the intermediate linear coding feature; the first constraint factor is used for constraining the linear coding features to be in the hyperbolic space.

In the above apparatus, the preset residual error network is configured to perform attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector, and then perform linear correction on the attention coding feature to obtain an attention residual error correction result corresponding to the attention coding feature; combining the attention residual error correction result with the current hyperbolic word vector to obtain intermediate attention coding features; updating the attention coding feature according to a second constraint factor and the intermediate attention coding feature; the second constraint factor is used to constrain the attention-coding feature to be in the hyperbolic space.

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the natural language processing method provided by the embodiment of the application when the processor executes the executable instructions stored in the memory.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute, so as to implement the natural language processing method provided by the embodiment of the application.

The embodiment of the application has the following beneficial effects: representing a hyperbolic space through a Lorentz model, training a linear transformation matrix model meeting the condition that input and output eigenvectors all conform to the Lorentz model expression, and obtaining a preset Lorentz transformation function; therefore, when the preset Lorentz transformation function is used for carrying out linear transformation processing on the input attention coding features, the output linear coding features can be ensured to be still located in the hyperbolic space, so that model calculation of the hyperbolic neural network can be completed in the hyperbolic space, the stability and the operation efficiency of the hyperbolic neural network model are improved, and the stability and the efficiency of natural language processing by using the hyperbolic neural network are improved.

Drawings

FIG. 1 is an alternative architectural diagram of a natural language processing system architecture provided by embodiments of the present application;

FIG. 2 is an alternative structural diagram of a natural language processing apparatus according to an embodiment of the present application;

FIG. 3 is an alternative flow chart of a natural language processing method provided by an embodiment of the present application;

FIG. 4 is an alternative flow chart of a natural language processing method provided by an embodiment of the present application;

FIG. 5 is an alternative flow chart of a natural language processing method provided by an embodiment of the present application;

FIG. 6 is an alternative flow chart of a natural language processing method provided by an embodiment of the present application;

fig. 7(a) is a schematic diagram of experimental results of convergence comparison experiments performed on the hyperbolic space model provided in the embodiment of the present application and other network models based on the WN18RR data set;

FIG. 7(b) is a schematic diagram of experimental results of convergence comparison experiments of hyperbolic space models and other network models provided by embodiments of the present application based on FB15k-237 data sets;

FIG. 7(c) is a schematic diagram of experimental results of convergence comparison experiments performed on the hyperbolic space model provided by the embodiment of the present application and other network models based on IWSLT14 data set;

fig. 7(d) is a schematic diagram of an experimental result of a convergence comparison experiment performed on the hyperbolic space model and other network models provided in the embodiment of the present application based on an Open entity dataset.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

2) Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

3) And (3) completing a knowledge graph: a knowledge graph is a set of fact triples, each triplet (h, r, t) illustrating the existence of a relationship r between a head entity h and a tail entity t. Because knowledge maps are generally incomplete, predicting missing triplets is an important research problem. Specifically, the purpose of the knowledge graph completion task is to solve the problem of (h, r, are) and (. Wherein? Missing portions of corresponding positions in the triplet.

4) Classifying fine-grained entities: given a sentence containing entity e, the purpose of entity classification is to predict the type of e from a list of type candidates based on the information provided by the sentence, which is a multi-label classification problem because multiple types can be assigned to e. For fine-grained entity classification, the type labels are further partitioned using fine-grained, such that the type candidate list contains thousands of types.

5) Machine translation, also known as automatic translation, is the process of converting one natural language (source language) to another (target language) using a computer. It is a branch of computational linguistics, is one of the ultimate targets of artificial intelligence, and has important scientific research value. With the progress of deep learning, Machine Translation (Neural Machine Translation) based on artificial Neural network is gradually emerging. The technical core is a deep neural network with massive nodes (neurons), and translation knowledge can be automatically learned from a corpus. After the sentences in one language are vectorized, the sentences in the other language are transmitted in layers in the network and converted into a representation form which can be understood by a computer, and then translations in the other language are generated through multiple layers of complex conducting operations. The translation mode of understanding the language and generating the translation is realized. Neural network machine translation typically employs an encoder-decoder architecture to model variable-length input sentences. The encoder realizes the 'understanding' of source language sentences, forms a floating point number vector with a specific dimension, and then the decoder generates translation results of a target language word by word according to the vector. Currently, the mainstream framework of machine translation in the industry adopts a self-attention network (transducer), which is not only applied to machine translation, but also has outstanding performances in the fields of self-supervision learning and the like.

6) Lorentz model

All points in a hyperbolic space are defined as a geometric model on the anterior leaf of a two-leaf hyperboloid with a curvature K.

7) Poincare ball model: a geometric model in which all points in the hyperbolic space are defined in a unit sphere.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to artificial intelligence natural language processing technology, including knowledge graph related representation learning technology, natural language processing field machine translation technology and entity classification technology, and the like, and is specifically explained by the following embodiments:

most of the existing hyperbolic neural networks are obtained by redefining basic algebraic operations such as vector addition in the neural network by using a rotation vector space (Gyrovector space) in a Poincare (Poincare) sphere model, constructing modules such as a feedforward neural network and polynomial logistic regression by using a sphere model frame, performing a series of subsequent processing, and adapting various Euclidean neural networks to the hyperbolic space. These hyperbolic neural networks can cover a wide range of scenarios, such as shallow neural networks or simple neural components, as well as word-embedded word vector representations, graph-embedded vector representations, knowledge graph-embedded vector representations, deep neural networks such as attention modules and variational autocoders, and so on. The hyperbolic neural network can achieve performance equivalent to or even better than that of a high-dimensional Euclidean neural network in a low-dimensional hyperbolic characteristic space.

However, in the current hyperbolic neural network, not all the operation operations are completed in the hyperbolic space. In practical applications, some arithmetic operations in euclidean neural networks, such as matrix vector multiplication, are difficult to correspond directly to operations in hyperbolic geometries. Since for each point in the hyperbolic space, the cut space at that point is a euclidean subspace, all euclidean neural elements can operate in these cut spaces. Therefore, the operation of the hyperbolic neural network is mainly realized by mapping and converting points between a hyperbolic space and an euclidean space by using exponential mapping and logarithmic mapping. Specifically, a point in the hyperbolic space is mapped to a tangent space at a certain point by using logarithmic mapping, necessary euclidean neural operation (such as matrix vector multiplication) is performed in the tangent space, and then the result is subjected to exponential mapping to obtain a corresponding point in the hyperbolic space, so that the operation of the hyperbolic neural network is realized in a mixed manner. However, the logarithmic mapping and the exponential mapping need to be implemented by a series of hyperbolic and hyperbolic function calculations, the composition of these functions is rather complex, and the range of values is usually infinite, thereby seriously impairing the stability and convergence of the hyperbolic neural network.

In order to solve the problem of complex transformations of exponential and logarithmic mappings, the applicant has discovered, based on the principle of the narrow theory of relativity, a method of directly defining the operation of a neural network in a hyperbolic space: the narrow relativity theory uses minkowski space (a model of lorentz) to measure space-time and defines the linear transformation in time-space as a lorentz transformation. The applicant utilizes a Lorentz model as a feature space of the hyperbolic neural network, establishes the hyperbolic neural network through Lorentz transformation, and constructs a neural network component which is completely operated in the hyperbolic space so as to perform natural language processing on feature vectors in the hyperbolic space.

Embodiments of the present application provide a natural language processing method, apparatus, device, and computer-readable storage medium, which can improve stability and convergence of a hyperbolic neural network, and an exemplary application of an electronic device provided in an embodiment of the present application is described below. In the following, an exemplary application will be explained when the device is implemented as a server.

Referring to fig. 1, fig. 1 is an alternative architecture diagram of a natural language processing system 100 provided in this embodiment of the present application, in order to support a natural language processing application, such as a machine translation application, a terminal 400 is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of both.

The terminal 400 is configured to run a client 410 of the machine translation application, receive a sentence to be translated, which is input by a user through voice or manually, as a sentence to be processed through the client 410, and send the sentence to be processed to the server 200 through the network 300.

The server 200 is configured to obtain a hyperbolic word vector sequence corresponding to a to-be-processed sentence according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence accords with a Lorentz model; performing attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain attention coding features corresponding to the current hyperbolic word vector; performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model; and decoding and predicting the linear coding characteristics to obtain a current processing result of the current hyperbolic word vector, and processing each hyperbolic word vector to obtain a target translation statement corresponding to the statement to be processed as a target processing result. The server 200 may further send the target translation sentence to the terminal 400 through the network 300, and the terminal 400 may display the target translation sentence to the user through the client 410 in a manner of voice or interface display.

In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a server 200 according to an embodiment of the present application, where the server 200 shown in fig. 2 includes: at least one processor 210, memory 250, at least one network interface 220, and a user interface 230. The various components in server 200 are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 2.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 230 includes one or more output devices 231, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.

The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 250 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 253 to enable presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 231 (e.g., a display screen, speakers, etc.) associated with the user interface 230;

an input processing module 254 for detecting one or more user inputs or interactions from one of the one or more input devices 232 and translating the detected inputs or interactions.

In some embodiments, the natural language processing device provided in the embodiments of the present application may be implemented in software, and fig. 2 shows a natural language processing device 255 stored in a memory 250, which may be software in the form of programs and plug-ins, and includes the following software modules: hyperbolic vector extraction module 2551, attention module 2552, hyperbolic transformation module 2553 and decoding prediction module 2554, which are logical and therefore can be arbitrarily combined or further split depending on the functions implemented. In some embodiments, the natural language processing device may be implemented as a neural network model, such as a natural language processing model; the hyperbolic vector extraction module, the attention module, the hyperbolic transformation module and the decoding prediction module can be implemented as a hyperbolic vector extraction layer, an attention layer, a hyperbolic transformation layer and a decoding prediction layer in a natural language processing model.

The functions of the respective modules will be explained below.

In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the natural language processing method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The natural language processing method provided by the embodiment of the present application will be described by taking an example of implementing the natural language processing device 255 on the server as a natural language processing model in conjunction with an exemplary application and implementation of the server provided by the embodiment of the present application.

Referring to fig. 3, fig. 3 is an alternative flowchart of a natural language processing method provided in an embodiment of the present application, and will be described with reference to the steps shown in fig. 3.

S101, obtaining a hyperbolic word vector sequence corresponding to a sentence to be processed according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence conforms to a lorentz model.

In the embodiment of the present application, the sentence to be processed may be a natural language sentence with semantic information, which includes at least one word. The preset word vector table contains the corresponding relation between a plurality of embedded word vectors and hyperbolic space word vectors, and is used for mapping the embedded word vectors in the to-be-processed sentences to hyperbolic space. The natural language processing model can extract an embedded word vector sequence (word embedding) from the sentence to be processed through the hyperbolic vector layer, and look up in a preset word vector table according to each embedded word vector in the embedded word vector sequence to obtain a hyperbolic word vector corresponding to each embedded word vector, and further obtain the hyperbolic word vector sequence according to the hyperbolic word vector corresponding to each embedded word vector.

Here, the hyperbolic space has a strong expression capability for structured data (e.g., data that can be represented as a tree, a network, etc.). According to the method and the device, the structural characteristic energy in the data in the hyperbolic space can be expressed through the Lorentz model. For example, for a set of social networking data, the data itself has a hierarchical structure of networks and can therefore be well expressed by the Lorentzian model. Or, in the text field, each sentence contains structured feature data such as a grammar tree and the like, so that the model can better learn the grammar knowledge by representing word vectors in the text by using a Lorentz model.

In some embodiments, for a training scenario or an application scenario of the natural language processing model, the preset word vector table may be subjected to gradient update based on a training result and an error between an application result and an expected result each time, so as to maintain the content accuracy of the preset word vector table, and further improve the accuracy of the model processing result.

S102, performing attention coding on the current hyperbolic word vector in the hyperbolic word vector sequence to obtain the attention coding feature corresponding to the current hyperbolic word vector.

In this embodiment of the application, the natural language processing model may perform linear transformation on each hyperbolic word vector in the hyperbolic word vector sequence to obtain a key vector, a value vector, and an inquiry word vector corresponding to each hyperbolic word vector, and further perform normalization processing according to the key vector and the inquiry word vector of each hyperbolic word vector when encoding the current hyperbolic word vector based on an attention mechanism, calculate an attention weight of each hyperbolic word vector relative to the current hyperbolic word vector, and further perform weighted summation on the value vector corresponding to each hyperbolic word vector according to the attention weight corresponding to each hyperbolic word vector to obtain an attention encoding feature corresponding to the current hyperbolic word vector. Therefore, the attention coding features corresponding to the current hyperbolic word vectors contain the correlation information of other hyperbolic word vectors and the current hyperbolic word vectors, so that the attention coding features can capture the relationship between the vectors, and the feature expression of the hyperbolic word vectors is enriched.

In some embodiments, the natural language processing model may use a calculation method of a current hyperbolic neural network model to apply a value vector x corresponding to each hyperbolic word vector_iAs a point in hyperbolic space, the use of logarithmic mapping will focus on x in the mechanism_iWith attention weight v_iAnd mapping the process of weighted summation into a tangent space corresponding to the point, completing weighted summation calculation in the tangent space by an Euclidean space calculation method, and mapping the calculation result in the tangent space back to the hyperbolic space through exponential mapping to obtain the attention coding feature corresponding to the current hyperbolic word vector.

Here, it should be noted that the process of solving the centroid position of the point set in the hyperbolic space is a process of performing weighted summation on the lorentz squared distance between the point in the point set and the candidate centroid point according to different weight values, and taking the candidate centroid capable of minimizing the weighted summation result as the centroid of the point set. Therefore, the applicant finds that the weighted summation process for performing centroid solution in the hyperbolic space is similar to the process for performing weighted summation on each value vector according to the attention weight in the attention mechanism, so that the process for performing weighted summation on each value vector according to the attention weight in the attention mechanism of natural language processing can be realized by using the centroid solution process in the hyperbolic space to obtain the attention coding feature, and the details will be described in S1021-S1024.

S103, performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; and the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model.

In the embodiment of the application, the natural language processing model performs linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features. In some embodiments, the hyperbolic transformation module in FIG. 2 may be implemented as a linear transformation layer in a natural language processing model. The linear transformation layer can also be called a full connection layer, and each neuron is connected with all neurons of the previous layer to realize linear combination or linear transformation of the previous layer. The core operation of the linear transformation layer is that the matrix vector product y is Mx, and the essence is that the input eigenvector is subjected to weighting and processing, and the input eigenvector is linearly transformed from one eigenspace to another eigenspace, so as to obtain the output eigenvector.

In the embodiment of the application, the natural language processing model can pre-train the linear transformation matrix meeting the preset constraint condition, so that after the input feature vector in the hyperbolic space is processed through the linear transformation matrix meeting the preset constraint condition, the obtained output feature vector is still located in the hyperbolic space.

In the embodiment of the application, the model can be obtained through a Lorentz model

Representing a hyperbolic space, input feature vectors x in the hyperbolic space satisfying

The constraint of the linear transformation matrix M can be expressed as

That is, the input feature vector x and the output feature vector Mx corresponding to M are both located in the hyperbolic space. That is, when the input feature vector in the hyperbolic space is processed through the trained linear transformation matrix M, the input feature vector does not need to be mapped to the euclidean space in the tangent space, and the computation can be directly performed in the hyperbolic space to obtain the processing result.

Here, it should be noted that, in the narrow relativity theory, the lorentz transform is a linear transform defined in the lorentz model, and therefore, a linear transform process in the hyperbolic space can be realized by using the lorentz transform. The lorentz transform may transform an event in time-space from one spatio-temporal frame into another frame that moves at a constant velocity relative to the spatio-temporal frame. Any Lorentz transformation can be decomposed into a combination of Lorentz acceleration (Lorentz boost) and Lorentz rotation (Lorentz rotation) by extremum decomposition. However, the existing hyperbolic neural network calculation method for performing linear transformation in the origin tangent space of the hyperbolic space is only equivalent to lorentz rotation under the condition of relaxing the limitation, and lorentz acceleration is not considered, so that the expression capacity of the existing hyperbolic space neural network is limited, and all transformation in the hyperbolic space is not enough to be expressed. In order to enhance the model expression capability, the applicant proposes a linear transformation method in a hyperbolic space capable of simultaneously covering lorentz rotation and lorentz acceleration in the hyperbolic space, which will be described in detail below.

In the embodiment of the application, the lorentz rotation describes a conversion rule in relative motion of rotation of the spatial coordinate axes. The formula of the lorentz rotation matrix R can be as shown in formula (1), as follows:

wherein the content of the first and second substances,

namely, it is

In the embodiment of the application, the lorentz acceleration rate describes a conversion rule in relative motion of constant space-time frame speed and no rotation of space coordinate axes. The formula of the lorentz acceleration matrix B can be as shown in formula (2), as follows:

wherein the content of the first and second substances,

c is the speed of light in vacuum, beta is the ratio of a given velocity vector v to the speed of light, v belonging to an n-dimensional vector space

Is arbitrary real (i.e. is

) And satisfy

It should be noted that although lorentz acceleration and lorentz rotation are linear transformations in the lorentz model, they cannot be directly applied to the neural network. On the one hand, the lorentz transformation can only transform the coordinate frame without changing the dimensions. On the other hand, the complex requirements of the lorentz transformation (such as a special orthogonal matrix of lorentz rotation) make the calculation need to be subjected to constraint optimization, which cannot be realized under the training framework of the current gradient descent optimization. For this purpose, the training can be filled withFoot preset constraint condition

The problem of the matrix M of (a) is converted into a training matrix M ', where M' ═ u^T；W],

u^TI.e., the 0 th dimension of the matrix M ', W contains other matrix elements except the 0 th dimension of the matrix M' for providing weights corresponding to the linear transformation. The matrix M' needs to satisfy the preset constraint condition of

Wherein the content of the first and second substances,

any matrix may be mapped to a suitable hyperbolic spatial linear transformation matrix.

In some embodiments, for input feature vectors located in hyperbolic space

Based on the above formula (1) and formula (2), f including the lorentz transformation and the lorentz rotation can be obtained_xThe expression of (M'), i.e., the preset lorentz transformation function, is as shown in equation (3), as follows:

and K is a preset curvature value of the hyperbolic space.

It can be understood that f in the formula (3)_x(M') linear transformation in the hyperbolic space can comprise two transformation forms of Lorentz rotation and Lorentz acceleration, so that the model expression capacity is improved, and the accuracy of natural language processing is improved.

In this embodiment of the application, the natural language processing model may use the attention coding feature obtained in S102 as an input feature vector, and perform linear transformation on the attention coding feature by using a preset lorentz transformation function of formula (3) to obtain a linear coding feature corresponding to the attention coding feature.

S104, decoding and predicting the linear coding characteristics to obtain a current processing result of the current hyperbolic word vector, and processing each hyperbolic word vector to obtain a target processing result corresponding to the to-be-processed statement.

In the embodiment of the application, the natural language processing model may input the linear coding features corresponding to the current hyperbolic word vector into the decoding model of the corresponding neural network, and perform decoding prediction on the linear coding features through the decoding model to obtain a decoding prediction result corresponding to the current hyperbolic word vector as a current processing result. The natural language processing model processes each hyperbolic word vector in the hyperbolic word vector sequence in the same process to obtain a decoding prediction result corresponding to the whole hyperbolic word vector sequence, and the decoding prediction result is used as a target processing result corresponding to a sentence to be processed, so that natural language processing work under scenes such as machine translation, knowledge graph completion, fine-grained entity classification and the like is realized.

It can be understood that a hyperbolic space is represented by a Lorentz model, and a linear transformation matrix model which satisfies that input and output eigenvectors are expressed by the Lorentz model is trained to obtain a preset Lorentz transformation function; therefore, when the preset Lorentz transformation function is used for carrying out linear transformation processing on the input attention coding features, the output linear coding features can be ensured to be still located in the hyperbolic space, so that model calculation of the hyperbolic neural network can be completed in the hyperbolic space, the stability and the operation efficiency of the hyperbolic neural network model are improved, and the stability and the efficiency of natural language processing by using the hyperbolic neural network are improved.

In some embodiments, referring to fig. 4, fig. 4 is an optional flowchart of the natural language processing method provided in the embodiment of the present application, and S102 in fig. 3 may be implemented by performing S1021 to S1024, which will be described with reference to each step.

S1021, performing key-value pair conversion on the hyperbolic word vector sequence to obtain a key set, a value set and an inquiry set corresponding to the hyperbolic word vector sequence; the key set comprises key vectors corresponding to each hyperbolic word vector; the query set contains a query word vector corresponding to each hyperbolic word vector.

In the embodiment of the present application, the attention module in fig. 2 may be implemented as an attention layer in a natural language processing model. The attention layer may include three linear sub-network layers, and the natural language processing model may perform Key-Value pair conversion processing on each hyperbolic word vector in the hyperbolic word vector sequence through the three linear sub-network layers in the attention layer to obtain a Query word (Query) vector, a Key (Key) vector, and a Value (Value) vector corresponding to each hyperbolic word vector, and further obtain a Key set, a Query set, and a Value set corresponding to the hyperbolic word vector sequence.

In some embodiments, the set of keys formed by the key vectors corresponding to each hyperbolic word vector may be represented as

The query set formed by the query word vectors corresponding to each hyperbolic word vector may be represented as Q ═ Q₁,…,q_QThe value set formed by the value vectors corresponding to each hyperbolic word vector can be expressed as

And S1022, acquiring the current query word vector corresponding to the current hyperbolic word vector from the query set.

In the embodiment of the present application, when performing attention coding on the current hyperbolic word vector, the natural language processing model may be set from a query set Q ═ Q₁,…,q_QAnd determining the query word vector corresponding to the current hyperbolic word vector as the current query word vector.

And S1023, obtaining an attention weight set corresponding to the current hyperbolic word vector by calculating the Lorentz square distance between the current query word vector and each key vector in the key set.

In the embodiment of the application, the natural language processing model may obtain, when performing attention coding, an attention weight corresponding to each hyperbolic word vector with respect to a current hyperbolic word vector by calculating a lorentz square distance between an inquiry word vector and each key vector in a key set, as an attention weight set corresponding to the current hyperbolic word vector.

In some embodiments, S1023 may be implemented by performing S1023-1 to S1023-2, which will be described in connection with the steps.

And S1023-1, calculating the Lorentzian squared distance between the current query word vector and each key vector.

In the embodiment of the present application, Q is { Q ] for the query set₁,…,q_QCurrent query word vector q in_iThe natural language processing model can calculate the current query word vector q according to the calculation method of the Lorentz square distance_iWith each key vector k_jLorentz square distance between

In the embodiment of the present application, the lorentz square distance in the hyperbolic space can be expressed as

The calculation formula of the lorentz average distance can be as shown in formula (4-1) as follows:

in the formula (4-1), K is a preset curvature value of the hyperbolic space,

minkowski inner products of a and b.

Can be shown as equation (4-2) as follows:

in the embodiment of the application, the natural language processing model can convert the current query word vector q_iAs a, each key vector k in the key set_jAnd b, calculating the Lorentz square distance between the current query word vector and each key vector in the key set according to the formula (5-1) and the formula (5-2).

And S1023-2, negating the Lorentz square distance, and performing normalization exponential processing on the ratio of the negation result to a preset constant to obtain the attention weight of each key vector relative to the current query word vector as an attention weight set.

In the embodiment of the application, the natural language processing model can negate the Lorentz square distance between the current query word vector and each key vector to obtain an negated result; and dividing the negation result by a preset constant, then carrying out normalization index processing, taking the calculation result as the attention weight of each key vector relative to the current query word vector, and further obtaining an attention weight set corresponding to the current hyperbolic word vector according to each key vector in the key set.

In some embodiments, the predetermined constant may take the value of the root of the predetermined vector dimension. For example, for a preset vector dimension n, a preset constant may take the value of

The process of calculating the attention weight of each key vector with respect to the current query word vector in S1023-2 may be as shown in equation (4-3) as follows:

in equation (4-3), the natural language processing model may be applied to the current query word vector q_iWith each key vector k_jThe Lorentz square distance between them is inverted to obtain

The natural language processing model will take the inverse result and the square root of the preset vector dimensionRatio of

Carrying out normalization index processing to obtain a key vector k_jRelative to q_iAttention weight w of_ijI.e. the attention weight of the jth hyperbolic word vector relative to the current ith hyperbolic word vector. The natural language processing model can obtain a key set through calculation according to a formula (4-3)

All key vectors in (1) relative to q_iAs the attention weight set w corresponding to the current hyperbolic word vector_i。

In some embodiments, the preset constant may also be determined as another value according to an actual situation, and the embodiments of the present application are not limited.

S1024, obtaining attention coding features through a preset hyperbolic space centroid calculation method based on the attention weight set and the value set; the preset hyperbolic space centroid calculation method is used for calculating the centroid position of the hyperbolic space through weighted summation.

In the embodiment of the application, since the process of solving the centroid in the hyperbolic space is also a weighted summation process, which is similar to the process of performing weighted summation on each value vector according to the attention weight in the attention mechanism, a calculation process of performing weighted summation on each value vector according to the attention weight can be performed in the hyperbolic space by using a preset hyperbolic space centroid calculation method in the hyperbolic space.

In the embodiment of the present application, the method for calculating the preset hyperbolic space centroid can be as shown in formula (5-1), as follows:

in the formula (5-1), x_iRepresenting the ith point, v, in a hyperbolic spatial point set_iThe weight of the ith point is represented,

indicates the number of point vectors, mu_cRepresenting the centroid position of the hyperbolic space. The formula (5-1) is characterized in

In the hyperbolic space of (2), the μ point is solved so that other points x in the hyperbolic space_iThe weighted sum value to the point mu reaches the minimum value, and the point mu is taken as the centroid position mu of the hyperbolic space_c。

Here, when the centroid solving formula (5-1) of the hyperbolic space is applied to the scenario of the neural network model operation, x may be used_iAnd (3) representing any vector in the hyperbolic feature space of the neural network, and realizing a weighted summation process in the hyperbolic space through a formula (5-1). In some embodiments, when applying equation (5-1) to the weighted summation process in the attention mechanism, the value vector corresponding to each hyperbolic word vector may be taken as x_iAnd sets attention weights w_iAs v_i. In order to obtain an analytic solution of the formula (5-1), the natural language processing model may perform weighted summation on the value set according to the attention weight set to obtain an initial attention feature; calculating to obtain an attention weight normalization factor based on a preset curvature, an attention weight set and a value set of a hyperbolic space; the attention weight normalization factor is used for constraining the initial attention feature to be in a hyperbolic space; and taking the ratio of the initial attention feature to the attention weight normalization factor as the attention coding feature corresponding to the current hyperbolic word vector. The above process can be shown as equation (5-2) as follows:

in equation (5-2), the natural language processing model may be based on the attention weight w corresponding to each hyperbolic word vector in the attention weight set_iFor each hyperbolic word vector, corresponding value vector x_iPerforming weighted summation to obtain

As an initial attention feature. The natural language processing model calculates and obtains an attention weight normalization factor corresponding to the current hyperbolic word vector based on a preset curvature value K, an attention weight set and a value set of a hyperbolic space

I.e., the denominator portion in equation (5-2). The attention weight normalization factor is used to constrain the initial attention feature to be in hyperbolic space. And the natural language processing model takes the ratio mu of the initial attention feature to the attention weight normalization factor as the attention coding feature corresponding to the current hyperbolic word vector.

It can be understood that, in the embodiment of the present application, the attention coding weight corresponding to the current hyperbolic word vector in the hyperbolic space is defined by the lorentz squared distance, and then the attention coding feature corresponding to the current hyperbolic word vector is obtained by using the centroid solving process in the hyperbolic space, so that an attention layer component of the hyperbolic neural network that performs operation completely in the hyperbolic space can be constructed, the coding operation of the attention layer in the hyperbolic space is achieved, mapping to the euclidean space corresponding to the tangent space is not required, the stability and efficiency of the hyperbolic neural network model are further improved, and the stability and efficiency of natural language processing are further improved.

In some embodiments, referring to fig. 5, fig. 5 is an optional flowchart of the natural language processing method provided in the embodiments of the present application, and based on fig. 3 or fig. 4, after S101, S201 to S202 may also be executed, which will be described with reference to each step.

S201, embedding the position corresponding to each hyperbolic word vector in the hyperbolic word vector sequence into a vector and transmitting the embedded vector to a plurality of tangent spaces corresponding to a plurality of points in the hyperbolic spaces through Lorentz parallel transmission to obtain a plurality of position vector information corresponding to each hyperbolic word vector.

In the embodiment of the application, for each hyperbolic word vector, the natural language processing model can obtain the position of each hyperbolic word vector in the sentence, and the position is used as the position embedded corresponding to each hyperbolic word vectorEntry vector { p_iAnd constrain p_iLocated in the origin tangent space. To enhance the expressiveness of the model by using multiple tangent spaces instead of just the origin tangent space. The natural language processing model can transmit the position embedding vector to different tangent spaces corresponding to different points in the hyperbolic space in parallel in the hyperbolic space by a Lorentz parallel transmission method to obtain a plurality of position vector information corresponding to each hyperbolic word vector. In some embodiments, the lorentz parallel transmission method may be as shown in equation (6).

The natural language processing model may transmit the position embedding vector to different tangent spaces corresponding to different points in the hyperbolic space in parallel by using formula (6), so as to obtain a plurality of position vector information r shown in formula (7)_iThe following are:

in the formula (7), the first and second groups,

y_i＝f(z_i) Where f determines the tangent space at which point the position embedding vector should be transmitted.

And S202, updating each hyperbolic word vector by using the plurality of position vector information.

In this embodiment of the application, the natural language processing model may combine the plurality of location vector information obtained in S201 with each hyperbolic word vector, and update each hyperbolic word vector, so that when encoding and decoding each hyperbolic word vector, the accuracy of encoding and decoding may be further improved by using the location information corresponding to each hyperbolic word vector.

It can be understood that in the embodiment of the application, through lorentz parallel transmission, the position information in the multiple tangent spaces corresponding to each hyperbolic word vector in the hyperbolic space can be obtained, so that the model expression capacity is enhanced, and the accuracy of the natural language processing task performed by the natural language processing model is improved.

In some embodiments, referring to fig. 6, fig. 6 is an optional flowchart of the natural language processing method provided in the embodiments of the present application, and after S103, S301 to S303 may be further executed, which will be described with reference to the steps.

S301, linear correction is carried out on the linear coding features through a preset residual error network, and coding residual error correction results corresponding to the linear coding features are obtained.

In the embodiment of the application, the output of the hyperbolic linear transformation layer in the natural language processing model may be connected to a preset residual error network, that is, a residual error layer. The hyperbolic transformation layer correspondingly realizes a linear transformation method of a preset Lorentz transformation function, and because vector addition is difficult to define in a hyperbolic space, the application provides a method for combining a residual error layer with a previous network layer to perform residual error processing in the hyperbolic space.

In this embodiment of the application, when the upper network layer of the residual layer is the hyperbolic linear transformation layer, the natural language processing model may combine the hyperbolic linear transformation layer and the residual layer as the first network combination layer, use the attention coding feature as the input of the first network combination layer, and first perform linear transformation on the attention coding feature through the hyperbolic linear transformation layer and using a preset lorentz transformation function to obtain the linear coding feature. And then, linear correction is carried out on the linear coding characteristics by using a preset residual correction function through a residual layer in the first network combination layer, namely a preset residual network, so as to obtain a coding residual correction result.

And S302, combining the coding residual error correction result with the attention coding feature to obtain an intermediate linear coding feature.

In the embodiment of the application, the natural language processing model takes the input, namely the attention coding characteristic, of the first network combination layer as the residual corresponding to the first network combination layer, and combines the coding residual correction result and the attention coding characteristic to realize the residual calculation of the first network combination layer, so as to obtain the intermediate linear coding characteristic.

S303, updating the linear coding characteristics according to the first constraint factor and the intermediate linear coding characteristics; the first constraint factor is used to constrain the linear coding features to be in a hyperbolic space.

In the embodiment of the application, the natural language processing model can perform feature matrix construction according to the first constraint factor and the intermediate linear coding feature, and update the linear coding feature, so that subsequent steps can be performed based on the updated linear coding feature. The first constraint factor is used to constrain the linear coding feature to be in the hyperbolic space, that is, the processing of the residual layer in the embodiment of the present application is also completed in the hyperbolic space.

In some embodiments, the linear coding feature output in S103 is yⁱAttention code feature is xⁱThe process of S301-S303 can be as shown in equation (8), as follows:

in the formula (8), phi (Wy)ⁱU) is the result of the coding residual correction, phi (Wy)ⁱ,u)+xⁱIn order to have an intermediate linear coding characteristic,

and the first constraint factor is used as the 0 th dimension of the feature matrix corresponding to the updated linear coding feature, and the updated linear coding feature is still constrained in the hyperbolic space.

In some embodiments, a residual layer may also be combined with the attention layer for residual fitting the attention-coded features output by the attention layer. That is, after the attention coding feature is obtained in S102, the following steps may also be performed: performing linear correction on the attention coding features through a preset residual error network to obtain an attention residual error correction result corresponding to the attention coding features; combining the attention residual error correction result with the current hyperbolic word vector to obtain intermediate attention coding characteristics; updating the attention coding feature according to the second constraint factor and the intermediate attention coding feature; the second constraint factor is used to constrain the attention-coding feature to be in a hyperbolic space. Here, the attention layer usually includes a linear processing sublayer, and the attention coding feature is output to the residual layer by interfacing the linear processing sublayer with the residual layer, so that the process of performing residual processing on the output of the attention layer by the residual layer is similar to the process of performing residual processing in S301 to S303, and is not described herein again.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

In an actual natural language processing project, a vector extraction layer, an attention layer, a linear transformation layer and a residual error layer which can perform data operation completely in a hyperbolic space can be respectively constructed according to the method in the embodiment of the application. In some embodiments, a fully hyperbolic network model HYBONET may be constructed by using all the hyperbolic spatial neural network components. In order to verify the effectiveness of the HYBONET, the HYBONET can be used for carrying out a comparison test on the current natural language processing model through a plurality of representative scenes including knowledge graph completion, machine translation, fine-grained entity classification and the like.

In some embodiments, for a knowledge-graph complemented natural language processing scenario, given a head entity h and a relationship r, all tail entities may be scored and ordered by a natural language processing model as predicted results. The applicant uses HYBONET provided in the embodiment of the present application to perform a comparison test with natural language processing models in other hyperbolic spaces or euclidean spaces at present on two common knowledge graphs FB15k-237 and WN18RR, and evaluates an experimental result through two evaluation indexes mrr (mean regression rank) and H @ k, where the comparison result can be shown in table 1:

TABLE 1

In table 1, the MRR index represents the reciprocal average of the ranking of the correct entity in the prediction; h @ k characterizes the percentage of the top k bits that the correct entity appears in the predicted rank. The corresponding H @ k indices for

k

10, 3 and 1 are shown in table 1, respectively. MURP (Balazevic et al, 2019a), lorentz (tagent) are current hyperbolic space neural network models, and trans (Bordes et al, 2013), DISTMULT (Yang et al, 2015), COMPLEX (troullon et al, 2017), CONVE (dettomers et al, 2018), ROTATE (Sun et al, 2019), TUCKER (Balazevic et al, 2019b) are neural network models in euclidean space. The optimal values of the indicators in the hyperbolic neural network model are shown in bold in table 1, and the optimal values of the indicators in all the neural network models are shown in bold and underlined.

In some embodiments, for a fine-grained Entity classification scenario, the applicant performs a comparison experiment on hyperbolic neural network models HY BASE, HY target and HY XLARGE of different scales, which are constructed by the method of the embodiment of the present application, based on an Open Entity dataset, and other natural language processing models, where the Open Entity dataset divides types into three levels: coarse, fine and ultra-fine particle sizes. For each entity e, the model will give a score s (t) of all types_iIe) and converting the score into a probability p (t) using a sigmoid function_i|e)＝σ(s(t_iIe)), entity e can be considered to belong to a certain type if the probability of that type is greater than 0.5. The results of the comparative experiments are shown in table 2. Wherein C, F, UF represents F1 index scores of three different granularity levels in entity classification respectively, and Total represents F1 index scores of three levels in combination. In some embodiments, the calculation formula of the F1 index may be F1 ═ 2 precision call/(precision + call), where precision represents accuracy and call represents recall. Watch (A)The underline in fig. 2 shows the index optima in all the neural network models, and the index optima in the hyperbolic neural network model are shown in bold.

TABLE 2

In some embodiments, for machine translation scenarios, applicants conducted comparison experiments on three models, HYBONET, transprogram and lorentz (TANGENT), on the IWSLT '14 EN-DE and WMT' 16EN-DE translation reference datasets, scored using Bilingual Evaluation update Understudy (BLEU) indicators, with the experimental results for each model shown in Table 3:

TABLE 3

In some embodiments, based on the WN18RR data set, the FB15k-237 data set, the IWSLT14 data set, and the Open entity data set, respectively, the applicant performs comparative training on a hyperbolic space model, such as HyboNet, constructed by the hyperbolic neural network component provided in the embodiments of the present application and other current network models, such as a MuRP model or an euclidean model, to obtain convergence comparative results of the models during training, as shown in fig. 7(a), 7(b), 7(c), and 7 (d).

Based on the experimental results, the hyperbolic network model constructed by the embodiment of the application is superior to the Euclidean network model under the condition that the parameters are equivalent or less. Compared with the existing hyperbolic network model depending on the tangent space, the full-hyperbolic network model of the embodiment of the application has better convergence, and simultaneously realizes equivalent or even better performance.

Continuing with the exemplary structure of the natural language processing device 255 implemented as software modules provided in embodiments of the present application, in some embodiments, as shown in fig. 2, the software modules stored in the natural language processing device 255 of the memory 240 may include:

a hyperbolic vector extraction module 2551, an attention module 2552, a hyperbolic transform module 2553 and a decoding prediction module 2554, wherein,

the hyperbolic vector extraction module 2551 is configured to obtain a hyperbolic word vector sequence corresponding to a to-be-processed sentence according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence conforms to a Lorentz model;

the attention module 2552 is configured to perform attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector;

the hyperbolic linear transformation module 2553 is configured to perform linear transformation on the attention coding feature through a preset lorentz transformation function to obtain a linear coding feature corresponding to the attention coding feature; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model;

the decoding prediction module 2554 is configured to perform decoding prediction on the linear coding features to obtain a current processing result of the current hyperbolic word vector, and obtain a target processing result corresponding to the to-be-processed sentence by processing each hyperbolic word vector.

In some embodiments, the attention module 2552 is further configured to perform key-value pair conversion on the hyperbolic word vector sequence to obtain a key set, a value set, and a query set corresponding to the hyperbolic word vector sequence; the key set comprises key vectors corresponding to each hyperbolic word vector; the query set comprises query word vectors corresponding to each hyperbolic word vector; acquiring a current query word vector corresponding to the current hyperbolic word vector from the query set; obtaining an attention weight set corresponding to the current hyperbolic word vector by calculating the Lorentz square distance between the current query word vector and each key vector in the key set; obtaining the attention coding feature through a preset hyperbolic space centroid calculation method based on the attention weight set and the value set; the preset hyperbolic space centroid calculation method is used for calculating the centroid position of the hyperbolic space through weighted summation.

In some embodiments, the attention module 2552 is further configured to calculate a lorentz squared distance between the current query word vector and each of the key vectors; and negating the Lorentz square distance, and performing normalization exponential processing on the ratio of a negation result to a preset constant to obtain the attention weight of each key vector relative to the current query word vector as the attention weight set.

In some embodiments, the attention module 2552 is further configured to perform a weighted summation on the value set according to the attention weight set to obtain an initial attention feature; calculating an attention weight normalization factor based on the preset curvature of the hyperbolic space, the attention weight set and the value set; the attention weight normalization factor is used to constrain the initial attention feature to be in hyperbolic space; and taking the ratio of the initial attention feature to the attention weight normalization factor as the attention coding feature corresponding to the current hyperbolic word vector.

In some embodiments, the hyperbolic vector extraction module 2551 is further configured to, after obtaining a hyperbolic word vector sequence corresponding to a to-be-processed sentence according to a preset word vector table in a hyperbolic space, transmit a position embedding vector corresponding to each hyperbolic word vector in the hyperbolic word vector sequence to a plurality of tangent spaces corresponding to a plurality of points in the hyperbolic space through lorentz parallel transmission to obtain a plurality of position vector information corresponding to each hyperbolic word vector; updating each hyperbolic word vector using the plurality of location vector information.

In some embodiments, the natural language processing apparatus further includes a preset residual network, where the preset residual network is configured to perform linear transformation on the attention coding feature through a preset lorentz transformation function to obtain a linear coding feature corresponding to the attention coding feature, and then perform linear correction on the linear coding feature to obtain a coding residual correction result corresponding to the linear coding feature; combining the coding residual error correction result with the attention coding feature to obtain an intermediate linear coding feature; updating the linear coding feature according to a first constraint factor and the intermediate linear coding feature; the first constraint factor is used for constraining the linear coding features to be in the hyperbolic space.

In some embodiments, the preset residual error network is configured to perform attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector, and then perform linear correction on the attention coding feature to obtain an attention residual error correction result corresponding to the attention coding feature; combining the attention residual error correction result with the current hyperbolic word vector to obtain intermediate attention coding features; updating the attention coding feature according to a second constraint factor and the intermediate attention coding feature; the second constraint factor is used to constrain the attention-coding feature to be in the hyperbolic space.

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to perform the natural language processing method provided by embodiments of the present application, for example, the method as shown in fig. 3, 4, 5, and 6.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, may be stored in a portion of a file that holds other programs or data, e.g., in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiment of the application, the hyperbolic space is represented by the lorentz model, the linear transformation matrix model satisfying that the input and output feature vectors all conform to the lorentz model expression is trained, and the preset lorentz transformation function can be obtained; therefore, when the preset Lorentz transformation function is used for carrying out linear transformation processing on the input attention coding features, the output linear coding features can be ensured to be still located in the hyperbolic space, so that model calculation of the hyperbolic neural network can be completed in the hyperbolic space, the stability and the operation efficiency of the hyperbolic neural network model are improved, and the stability and the efficiency of natural language processing by using the hyperbolic neural network are improved. In addition, the linear transformation process in the embodiment of the application covers the expressions of Lorentz rotation and Lorentz acceleration, and the accuracy of natural language processing is improved. Furthermore, the attention coding weight corresponding to the current hyperbolic word vector in the hyperbolic space is defined through the Lorentz square distance, the attention coding feature corresponding to the current hyperbolic word vector is obtained through the centroid solving process in the hyperbolic space, an attention layer component of the hyperbolic neural network which is operated completely in the hyperbolic space can be constructed, the coding operation of the attention layer in the hyperbolic space is achieved, mapping to the Euclidean space corresponding to the tangent space is not needed, the stability and the efficiency of the hyperbolic neural network model are further improved, and the stability and the efficiency of natural language processing are further improved. Furthermore, through Lorentz parallel transmission, the position information in a plurality of tangent spaces corresponding to each hyperbolic word vector in the hyperbolic space can be obtained, so that the model expression capacity is enhanced, and the accuracy of the natural language processing task of the natural language processing model is improved. Further, the embodiment of the application also defines a residual error processing method in the hyperbolic space. An experimental result in a practical scene shows that the hyperbolic network model constructed by the embodiment of the application is superior to the Euclidean network model under the condition of equivalent or less parameters. Compared with the existing hyperbolic network model depending on the tangent space, the full-hyperbolic network model of the embodiment of the application has better convergence, and simultaneously realizes equivalent or even better performance.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A natural language processing method, comprising:

2. The method of claim 1, wherein said performing attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain attention coding features corresponding to the current hyperbolic word vector comprises:

performing key-value pair conversion on the hyperbolic word vector sequence to obtain a key set, a value set and an inquiry set corresponding to the hyperbolic word vector sequence; the key set comprises key vectors corresponding to each hyperbolic word vector; the query set comprises query word vectors corresponding to each hyperbolic word vector;

acquiring a current query word vector corresponding to the current hyperbolic word vector from the query set;

obtaining an attention weight set corresponding to the current hyperbolic word vector by calculating the Lorentz square distance between the current query word vector and each key vector in the key set;

obtaining the attention coding feature through a preset hyperbolic space centroid calculation method based on the attention weight set and the value set; the preset hyperbolic space centroid calculation method is used for calculating the centroid position of the hyperbolic space through weighted summation.

3. The method of claim 2, wherein obtaining the attention weight set corresponding to the current hyperbolic word vector by calculating a Lorentzian squared distance between the current query word vector and each key vector in the set of keys comprises:

calculating a Lorentzian squared distance between the current query word vector and each of the key vectors;

and negating the Lorentz square distance, and performing normalization exponential processing on the ratio of a negation result to a preset constant to obtain the attention weight of each key vector relative to the current query word vector as the attention weight set.

4. The method of claim 2, wherein obtaining the attention-coding feature by a predetermined hyperbolic spatial centroid calculation method based on the set of attention weights and the set of values comprises:

carrying out weighted summation on the value set according to the attention weight set to obtain an initial attention feature;

calculating an attention weight normalization factor based on the preset curvature of the hyperbolic space, the attention weight set and the value set; the attention weight normalization factor is used to constrain the initial attention feature to be in hyperbolic space;

and taking the ratio of the initial attention feature to the attention weight normalization factor as the attention coding feature corresponding to the current hyperbolic word vector.

5. The method according to any one of claims 1 to 4, wherein after obtaining the hyperbolic word vector sequence corresponding to the sentence to be processed according to the preset word vector table in the hyperbolic space, the method further comprises:

embedding a position corresponding to each hyperbolic word vector in the hyperbolic word vector sequence into a vector through Lorentz parallel transmission, and transmitting the embedded vector to a plurality of tangent spaces corresponding to a plurality of points in the hyperbolic spaces to obtain a plurality of position vector information corresponding to each hyperbolic word vector;

updating each hyperbolic word vector using the plurality of location vector information.

6. The method according to any one of claims 1 to 4, wherein after the attention-coding feature is linearly transformed by a preset Lorentzian transformation function to obtain a linear coding feature corresponding to the attention-coding feature, the method further comprises:

performing linear correction on the linear coding features through a preset residual error network to obtain coding residual error correction results corresponding to the linear coding features;

combining the coding residual error correction result with the attention coding feature to obtain an intermediate linear coding feature;

updating the linear coding feature according to a first constraint factor and the intermediate linear coding feature; the first constraint factor is used for constraining the linear coding features to be in the hyperbolic space.

7. The method according to any one of claims 1-4, wherein after the attention coding is performed on the current hyperbolic word vector in the hyperbolic word vector sequence to obtain the attention coding feature corresponding to the current hyperbolic word vector, the method further comprises:

performing linear correction on the attention coding features through a preset residual error network to obtain an attention residual error correction result corresponding to the attention coding features;

combining the attention residual error correction result with the current hyperbolic word vector to obtain intermediate attention coding features;

updating the attention coding feature according to a second constraint factor and the intermediate attention coding feature; the second constraint factor is used to constrain the attention-coding feature to be in the hyperbolic space.

8. A natural language processing apparatus, comprising: a hyperbolic vector extraction module, an attention module, a hyperbolic transformation module and a decoding prediction module, wherein,

9. An electronic device, comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 7 when executing executable instructions stored in the memory.

10. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1 to 7.