CN113761829A - Natural language processing method, device, equipment and computer readable storage medium - Google Patents

Natural language processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN113761829A
CN113761829A CN202110457927.8A CN202110457927A CN113761829A CN 113761829 A CN113761829 A CN 113761829A CN 202110457927 A CN202110457927 A CN 202110457927A CN 113761829 A CN113761829 A CN 113761829A
Authority
CN
China
Prior art keywords
hyperbolic
word vector
attention
vector
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110457927.8A
Other languages
Chinese (zh)
Inventor
刘知远
陈泽
韩旭
林衍凯
李鹏
孙茂松
周杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Tencent Technology Shenzhen Co Ltd filed Critical Tsinghua University
Priority to CN202110457927.8A priority Critical patent/CN113761829A/en
Publication of CN113761829A publication Critical patent/CN113761829A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a natural language processing method, a device, equipment and a computer readable storage medium; relates to the technical field of artificial intelligence, and the method comprises the following steps: obtaining a hyperbolic word vector sequence according to a preset word vector table; performing attention coding on the current hyperbolic word vector to obtain attention coding features corresponding to the current hyperbolic word vector; performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining an output feature vector obtained after the linear transformation matrix model processes the input feature vector in the hyperbolic space to be located in the hyperbolic space; and obtaining a target processing result corresponding to the statement to be processed based on the linear coding characteristics. By the method and the device, the stability and the efficiency of natural language processing by utilizing the hyperbolic neural network can be improved.

Description

Natural language processing method, device, equipment and computer readable storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a natural language processing method, apparatus, device, and computer-readable storage medium.
Background
In recent years, more and more work has explored how to learn complex data structure representations with non-euclidean geometric features in hyperbolic space. Current artificial intelligence efforts have shown that hyperbolic geometries can provide more flexibility than euclidean geometries when modeling complex data structures.
At present, when a hyperbolic neural network performs data processing, such as data processing in a natural language processing scene, it performs operations by mapping and converting points between a hyperbolic space and an euclidean space by using exponential mapping and logarithmic mapping: the method comprises the steps of mapping points in a hyperbolic space to a point tangent space by using logarithmic mapping, performing necessary Euclidean neural operation (such as matrix vector multiplication) in the tangent space, and performing exponential mapping on a result to obtain corresponding points in the hyperbolic space, so that the operation of the hyperbolic neural network is defined in a mixed mode. However, logarithmic and exponential mappings require a series of hyperbolic and hyperbolic functions. The components of the functions are quite complex, and the value range is usually infinite, so that the stability and convergence of the hyperbolic neural network are seriously weakened, and the stability and efficiency of natural language processing by using the hyperbolic neural network are influenced.
Disclosure of Invention
Embodiments of the present application provide a natural language processing method, an apparatus, and a computer-readable storage medium, which can improve stability and efficiency of natural language processing using a hyperbolic neural network.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a natural language processing method, which comprises the following steps:
obtaining a hyperbolic word vector sequence corresponding to a sentence to be processed according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence conforms to a Lorentz model;
performing attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector;
performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model;
and decoding and predicting the linear coding features to obtain a current processing result of the current hyperbolic word vector, and obtaining a target processing result corresponding to the to-be-processed statement by processing each hyperbolic word vector.
An embodiment of the present application provides a natural language processing apparatus, including: .
A hyperbolic vector extraction module, an attention module, a hyperbolic transformation module and a decoding prediction module, wherein,
the hyperbolic vector extraction module is used for obtaining a hyperbolic word vector sequence corresponding to a statement to be processed according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence conforms to a Lorentz model;
the attention module is used for carrying out attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector;
the hyperbolic linear transformation module is used for performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model;
and the decoding prediction module is used for decoding and predicting the linear coding features to obtain a current processing result of the current hyperbolic word vector, and obtaining a target processing result corresponding to the statement to be processed by processing each hyperbolic word vector.
In the above apparatus, the attention module is further configured to perform key-value pair conversion on the hyperbolic word vector sequence to obtain a key set, a value set, and an inquiry set corresponding to the hyperbolic word vector sequence; the key set comprises key vectors corresponding to each hyperbolic word vector; the query set comprises query word vectors corresponding to each hyperbolic word vector; acquiring a current query word vector corresponding to the current hyperbolic word vector from the query set; obtaining an attention weight set corresponding to the current hyperbolic word vector by calculating the Lorentz square distance between the current query word vector and each key vector in the key set; obtaining the attention coding feature through a preset hyperbolic space centroid calculation method based on the attention weight set and the value set; the preset hyperbolic space centroid calculation method is used for calculating the centroid position of the hyperbolic space through weighted summation.
In the above apparatus, the attention module is further configured to calculate a lorentz squared distance between the current query word vector and each key vector; and negating the Lorentz square distance, and performing normalization exponential processing on the ratio of a negation result to a preset constant to obtain the attention weight of each key vector relative to the current query word vector as the attention weight set.
In the above apparatus, the attention module is further configured to perform weighted summation on the value set according to the attention weight set to obtain an initial attention feature; calculating an attention weight normalization factor based on the preset curvature of the hyperbolic space, the attention weight set and the value set; the attention weight normalization factor is used to constrain the initial attention feature to be in hyperbolic space; and taking the ratio of the initial attention feature to the attention weight normalization factor as the attention coding feature corresponding to the current hyperbolic word vector.
In the above apparatus, the hyperbolic vector extraction module is further configured to, after obtaining a hyperbolic word vector sequence corresponding to a to-be-processed sentence according to a preset word vector table in a hyperbolic space, transmit a position embedding vector corresponding to each hyperbolic word vector in the hyperbolic word vector sequence to a plurality of tangent spaces corresponding to a plurality of points in the hyperbolic space through lorentz parallel transmission, so as to obtain a plurality of position vector information corresponding to each hyperbolic word vector; updating each hyperbolic word vector using the plurality of location vector information.
In the above apparatus, the natural language processing apparatus further includes a preset residual network, where the preset residual network is configured to perform linear transformation on the attention coding feature through a preset lorentz transformation function to obtain a linear coding feature corresponding to the attention coding feature, and then perform linear correction on the linear coding feature to obtain a coding residual correction result corresponding to the linear coding feature; combining the coding residual error correction result with the attention coding feature to obtain an intermediate linear coding feature; updating the linear coding feature according to a first constraint factor and the intermediate linear coding feature; the first constraint factor is used for constraining the linear coding features to be in the hyperbolic space.
In the above apparatus, the preset residual error network is configured to perform attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector, and then perform linear correction on the attention coding feature to obtain an attention residual error correction result corresponding to the attention coding feature; combining the attention residual error correction result with the current hyperbolic word vector to obtain intermediate attention coding features; updating the attention coding feature according to a second constraint factor and the intermediate attention coding feature; the second constraint factor is used to constrain the attention-coding feature to be in the hyperbolic space.
An embodiment of the present application provides an electronic device, including:
a memory for storing executable instructions;
and the processor is used for realizing the natural language processing method provided by the embodiment of the application when the processor executes the executable instructions stored in the memory.
The embodiment of the application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute, so as to implement the natural language processing method provided by the embodiment of the application.
The embodiment of the application has the following beneficial effects: representing a hyperbolic space through a Lorentz model, training a linear transformation matrix model meeting the condition that input and output eigenvectors all conform to the Lorentz model expression, and obtaining a preset Lorentz transformation function; therefore, when the preset Lorentz transformation function is used for carrying out linear transformation processing on the input attention coding features, the output linear coding features can be ensured to be still located in the hyperbolic space, so that model calculation of the hyperbolic neural network can be completed in the hyperbolic space, the stability and the operation efficiency of the hyperbolic neural network model are improved, and the stability and the efficiency of natural language processing by using the hyperbolic neural network are improved.
Drawings
FIG. 1 is an alternative architectural diagram of a natural language processing system architecture provided by embodiments of the present application;
FIG. 2 is an alternative structural diagram of a natural language processing apparatus according to an embodiment of the present application;
FIG. 3 is an alternative flow chart of a natural language processing method provided by an embodiment of the present application;
FIG. 4 is an alternative flow chart of a natural language processing method provided by an embodiment of the present application;
FIG. 5 is an alternative flow chart of a natural language processing method provided by an embodiment of the present application;
FIG. 6 is an alternative flow chart of a natural language processing method provided by an embodiment of the present application;
fig. 7(a) is a schematic diagram of experimental results of convergence comparison experiments performed on the hyperbolic space model provided in the embodiment of the present application and other network models based on the WN18RR data set;
FIG. 7(b) is a schematic diagram of experimental results of convergence comparison experiments of hyperbolic space models and other network models provided by embodiments of the present application based on FB15k-237 data sets;
FIG. 7(c) is a schematic diagram of experimental results of convergence comparison experiments performed on the hyperbolic space model provided by the embodiment of the present application and other network models based on IWSLT14 data set;
fig. 7(d) is a schematic diagram of an experimental result of a convergence comparison experiment performed on the hyperbolic space model and other network models provided in the embodiment of the present application based on an Open entity dataset.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
1) Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
2) Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
3) And (3) completing a knowledge graph: a knowledge graph is a set of fact triples, each triplet (h, r, t) illustrating the existence of a relationship r between a head entity h and a tail entity t. Because knowledge maps are generally incomplete, predicting missing triplets is an important research problem. Specifically, the purpose of the knowledge graph completion task is to solve the problem of (h, r, are) and (. Wherein? Missing portions of corresponding positions in the triplet.
4) Classifying fine-grained entities: given a sentence containing entity e, the purpose of entity classification is to predict the type of e from a list of type candidates based on the information provided by the sentence, which is a multi-label classification problem because multiple types can be assigned to e. For fine-grained entity classification, the type labels are further partitioned using fine-grained, such that the type candidate list contains thousands of types.
5) Machine translation, also known as automatic translation, is the process of converting one natural language (source language) to another (target language) using a computer. It is a branch of computational linguistics, is one of the ultimate targets of artificial intelligence, and has important scientific research value. With the progress of deep learning, Machine Translation (Neural Machine Translation) based on artificial Neural network is gradually emerging. The technical core is a deep neural network with massive nodes (neurons), and translation knowledge can be automatically learned from a corpus. After the sentences in one language are vectorized, the sentences in the other language are transmitted in layers in the network and converted into a representation form which can be understood by a computer, and then translations in the other language are generated through multiple layers of complex conducting operations. The translation mode of understanding the language and generating the translation is realized. Neural network machine translation typically employs an encoder-decoder architecture to model variable-length input sentences. The encoder realizes the 'understanding' of source language sentences, forms a floating point number vector with a specific dimension, and then the decoder generates translation results of a target language word by word according to the vector. Currently, the mainstream framework of machine translation in the industry adopts a self-attention network (transducer), which is not only applied to machine translation, but also has outstanding performances in the fields of self-supervision learning and the like.
6) Lorentz model
Figure BDA0003040127610000071
All points in a hyperbolic space are defined as a geometric model on the anterior leaf of a two-leaf hyperboloid with a curvature K.
7) Poincare ball model: a geometric model in which all points in the hyperbolic space are defined in a unit sphere.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The scheme provided by the embodiment of the application relates to artificial intelligence natural language processing technology, including knowledge graph related representation learning technology, natural language processing field machine translation technology and entity classification technology, and the like, and is specifically explained by the following embodiments:
most of the existing hyperbolic neural networks are obtained by redefining basic algebraic operations such as vector addition in the neural network by using a rotation vector space (Gyrovector space) in a Poincare (Poincare) sphere model, constructing modules such as a feedforward neural network and polynomial logistic regression by using a sphere model frame, performing a series of subsequent processing, and adapting various Euclidean neural networks to the hyperbolic space. These hyperbolic neural networks can cover a wide range of scenarios, such as shallow neural networks or simple neural components, as well as word-embedded word vector representations, graph-embedded vector representations, knowledge graph-embedded vector representations, deep neural networks such as attention modules and variational autocoders, and so on. The hyperbolic neural network can achieve performance equivalent to or even better than that of a high-dimensional Euclidean neural network in a low-dimensional hyperbolic characteristic space.
However, in the current hyperbolic neural network, not all the operation operations are completed in the hyperbolic space. In practical applications, some arithmetic operations in euclidean neural networks, such as matrix vector multiplication, are difficult to correspond directly to operations in hyperbolic geometries. Since for each point in the hyperbolic space, the cut space at that point is a euclidean subspace, all euclidean neural elements can operate in these cut spaces. Therefore, the operation of the hyperbolic neural network is mainly realized by mapping and converting points between a hyperbolic space and an euclidean space by using exponential mapping and logarithmic mapping. Specifically, a point in the hyperbolic space is mapped to a tangent space at a certain point by using logarithmic mapping, necessary euclidean neural operation (such as matrix vector multiplication) is performed in the tangent space, and then the result is subjected to exponential mapping to obtain a corresponding point in the hyperbolic space, so that the operation of the hyperbolic neural network is realized in a mixed manner. However, the logarithmic mapping and the exponential mapping need to be implemented by a series of hyperbolic and hyperbolic function calculations, the composition of these functions is rather complex, and the range of values is usually infinite, thereby seriously impairing the stability and convergence of the hyperbolic neural network.
In order to solve the problem of complex transformations of exponential and logarithmic mappings, the applicant has discovered, based on the principle of the narrow theory of relativity, a method of directly defining the operation of a neural network in a hyperbolic space: the narrow relativity theory uses minkowski space (a model of lorentz) to measure space-time and defines the linear transformation in time-space as a lorentz transformation. The applicant utilizes a Lorentz model as a feature space of the hyperbolic neural network, establishes the hyperbolic neural network through Lorentz transformation, and constructs a neural network component which is completely operated in the hyperbolic space so as to perform natural language processing on feature vectors in the hyperbolic space.
Embodiments of the present application provide a natural language processing method, apparatus, device, and computer-readable storage medium, which can improve stability and convergence of a hyperbolic neural network, and an exemplary application of an electronic device provided in an embodiment of the present application is described below. In the following, an exemplary application will be explained when the device is implemented as a server.
Referring to fig. 1, fig. 1 is an alternative architecture diagram of a natural language processing system 100 provided in this embodiment of the present application, in order to support a natural language processing application, such as a machine translation application, a terminal 400 is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of both.
The terminal 400 is configured to run a client 410 of the machine translation application, receive a sentence to be translated, which is input by a user through voice or manually, as a sentence to be processed through the client 410, and send the sentence to be processed to the server 200 through the network 300.
The server 200 is configured to obtain a hyperbolic word vector sequence corresponding to a to-be-processed sentence according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence accords with a Lorentz model; performing attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain attention coding features corresponding to the current hyperbolic word vector; performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model; and decoding and predicting the linear coding characteristics to obtain a current processing result of the current hyperbolic word vector, and processing each hyperbolic word vector to obtain a target translation statement corresponding to the statement to be processed as a target processing result. The server 200 may further send the target translation sentence to the terminal 400 through the network 300, and the terminal 400 may display the target translation sentence to the user through the client 410 in a manner of voice or interface display.
In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a server 200 according to an embodiment of the present application, where the server 200 shown in fig. 2 includes: at least one processor 210, memory 250, at least one network interface 220, and a user interface 230. The various components in server 200 are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 2.
The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 230 includes one or more output devices 231, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.
The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 250 described in embodiments herein is intended to comprise any suitable type of memory.
In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.
An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
a presentation module 253 to enable presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 231 (e.g., a display screen, speakers, etc.) associated with the user interface 230;
an input processing module 254 for detecting one or more user inputs or interactions from one of the one or more input devices 232 and translating the detected inputs or interactions.
In some embodiments, the natural language processing device provided in the embodiments of the present application may be implemented in software, and fig. 2 shows a natural language processing device 255 stored in a memory 250, which may be software in the form of programs and plug-ins, and includes the following software modules: hyperbolic vector extraction module 2551, attention module 2552, hyperbolic transformation module 2553 and decoding prediction module 2554, which are logical and therefore can be arbitrarily combined or further split depending on the functions implemented. In some embodiments, the natural language processing device may be implemented as a neural network model, such as a natural language processing model; the hyperbolic vector extraction module, the attention module, the hyperbolic transformation module and the decoding prediction module can be implemented as a hyperbolic vector extraction layer, an attention layer, a hyperbolic transformation layer and a decoding prediction layer in a natural language processing model.
The functions of the respective modules will be explained below.
In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the natural language processing method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
The natural language processing method provided by the embodiment of the present application will be described by taking an example of implementing the natural language processing device 255 on the server as a natural language processing model in conjunction with an exemplary application and implementation of the server provided by the embodiment of the present application.
Referring to fig. 3, fig. 3 is an alternative flowchart of a natural language processing method provided in an embodiment of the present application, and will be described with reference to the steps shown in fig. 3.
S101, obtaining a hyperbolic word vector sequence corresponding to a sentence to be processed according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence conforms to a lorentz model.
In the embodiment of the present application, the sentence to be processed may be a natural language sentence with semantic information, which includes at least one word. The preset word vector table contains the corresponding relation between a plurality of embedded word vectors and hyperbolic space word vectors, and is used for mapping the embedded word vectors in the to-be-processed sentences to hyperbolic space. The natural language processing model can extract an embedded word vector sequence (word embedding) from the sentence to be processed through the hyperbolic vector layer, and look up in a preset word vector table according to each embedded word vector in the embedded word vector sequence to obtain a hyperbolic word vector corresponding to each embedded word vector, and further obtain the hyperbolic word vector sequence according to the hyperbolic word vector corresponding to each embedded word vector.
Here, the hyperbolic space has a strong expression capability for structured data (e.g., data that can be represented as a tree, a network, etc.). According to the method and the device, the structural characteristic energy in the data in the hyperbolic space can be expressed through the Lorentz model. For example, for a set of social networking data, the data itself has a hierarchical structure of networks and can therefore be well expressed by the Lorentzian model. Or, in the text field, each sentence contains structured feature data such as a grammar tree and the like, so that the model can better learn the grammar knowledge by representing word vectors in the text by using a Lorentz model.
In some embodiments, for a training scenario or an application scenario of the natural language processing model, the preset word vector table may be subjected to gradient update based on a training result and an error between an application result and an expected result each time, so as to maintain the content accuracy of the preset word vector table, and further improve the accuracy of the model processing result.
S102, performing attention coding on the current hyperbolic word vector in the hyperbolic word vector sequence to obtain the attention coding feature corresponding to the current hyperbolic word vector.
In this embodiment of the application, the natural language processing model may perform linear transformation on each hyperbolic word vector in the hyperbolic word vector sequence to obtain a key vector, a value vector, and an inquiry word vector corresponding to each hyperbolic word vector, and further perform normalization processing according to the key vector and the inquiry word vector of each hyperbolic word vector when encoding the current hyperbolic word vector based on an attention mechanism, calculate an attention weight of each hyperbolic word vector relative to the current hyperbolic word vector, and further perform weighted summation on the value vector corresponding to each hyperbolic word vector according to the attention weight corresponding to each hyperbolic word vector to obtain an attention encoding feature corresponding to the current hyperbolic word vector. Therefore, the attention coding features corresponding to the current hyperbolic word vectors contain the correlation information of other hyperbolic word vectors and the current hyperbolic word vectors, so that the attention coding features can capture the relationship between the vectors, and the feature expression of the hyperbolic word vectors is enriched.
In some embodiments, the natural language processing model may use a calculation method of a current hyperbolic neural network model to apply a value vector x corresponding to each hyperbolic word vectoriAs a point in hyperbolic space, the use of logarithmic mapping will focus on x in the mechanismiWith attention weight viAnd mapping the process of weighted summation into a tangent space corresponding to the point, completing weighted summation calculation in the tangent space by an Euclidean space calculation method, and mapping the calculation result in the tangent space back to the hyperbolic space through exponential mapping to obtain the attention coding feature corresponding to the current hyperbolic word vector.
Here, it should be noted that the process of solving the centroid position of the point set in the hyperbolic space is a process of performing weighted summation on the lorentz squared distance between the point in the point set and the candidate centroid point according to different weight values, and taking the candidate centroid capable of minimizing the weighted summation result as the centroid of the point set. Therefore, the applicant finds that the weighted summation process for performing centroid solution in the hyperbolic space is similar to the process for performing weighted summation on each value vector according to the attention weight in the attention mechanism, so that the process for performing weighted summation on each value vector according to the attention weight in the attention mechanism of natural language processing can be realized by using the centroid solution process in the hyperbolic space to obtain the attention coding feature, and the details will be described in S1021-S1024.
S103, performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; and the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model.
In the embodiment of the application, the natural language processing model performs linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features. In some embodiments, the hyperbolic transformation module in FIG. 2 may be implemented as a linear transformation layer in a natural language processing model. The linear transformation layer can also be called a full connection layer, and each neuron is connected with all neurons of the previous layer to realize linear combination or linear transformation of the previous layer. The core operation of the linear transformation layer is that the matrix vector product y is Mx, and the essence is that the input eigenvector is subjected to weighting and processing, and the input eigenvector is linearly transformed from one eigenspace to another eigenspace, so as to obtain the output eigenvector.
In the embodiment of the application, the natural language processing model can pre-train the linear transformation matrix meeting the preset constraint condition, so that after the input feature vector in the hyperbolic space is processed through the linear transformation matrix meeting the preset constraint condition, the obtained output feature vector is still located in the hyperbolic space.
In the embodiment of the application, the model can be obtained through a Lorentz model
Figure BDA0003040127610000141
Representing a hyperbolic space, input feature vectors x in the hyperbolic space satisfying
Figure BDA0003040127610000142
The constraint of the linear transformation matrix M can be expressed as
Figure BDA0003040127610000143
Figure BDA0003040127610000144
That is, the input feature vector x and the output feature vector Mx corresponding to M are both located in the hyperbolic space. That is, when the input feature vector in the hyperbolic space is processed through the trained linear transformation matrix M, the input feature vector does not need to be mapped to the euclidean space in the tangent space, and the computation can be directly performed in the hyperbolic space to obtain the processing result.
Here, it should be noted that, in the narrow relativity theory, the lorentz transform is a linear transform defined in the lorentz model, and therefore, a linear transform process in the hyperbolic space can be realized by using the lorentz transform. The lorentz transform may transform an event in time-space from one spatio-temporal frame into another frame that moves at a constant velocity relative to the spatio-temporal frame. Any Lorentz transformation can be decomposed into a combination of Lorentz acceleration (Lorentz boost) and Lorentz rotation (Lorentz rotation) by extremum decomposition. However, the existing hyperbolic neural network calculation method for performing linear transformation in the origin tangent space of the hyperbolic space is only equivalent to lorentz rotation under the condition of relaxing the limitation, and lorentz acceleration is not considered, so that the expression capacity of the existing hyperbolic space neural network is limited, and all transformation in the hyperbolic space is not enough to be expressed. In order to enhance the model expression capability, the applicant proposes a linear transformation method in a hyperbolic space capable of simultaneously covering lorentz rotation and lorentz acceleration in the hyperbolic space, which will be described in detail below.
In the embodiment of the application, the lorentz rotation describes a conversion rule in relative motion of rotation of the spatial coordinate axes. The formula of the lorentz rotation matrix R can be as shown in formula (1), as follows:
Figure BDA0003040127610000145
wherein the content of the first and second substances,
Figure BDA0003040127610000151
namely, it is
Figure BDA0003040127610000152
In the embodiment of the application, the lorentz acceleration rate describes a conversion rule in relative motion of constant space-time frame speed and no rotation of space coordinate axes. The formula of the lorentz acceleration matrix B can be as shown in formula (2), as follows:
Figure BDA0003040127610000153
wherein the content of the first and second substances,
Figure BDA0003040127610000154
c is the speed of light in vacuum, beta is the ratio of a given velocity vector v to the speed of light, v belonging to an n-dimensional vector space
Figure BDA0003040127610000155
Is arbitrary real (i.e. is
Figure BDA0003040127610000156
) And satisfy
Figure BDA0003040127610000157
It should be noted that although lorentz acceleration and lorentz rotation are linear transformations in the lorentz model, they cannot be directly applied to the neural network. On the one hand, the lorentz transformation can only transform the coordinate frame without changing the dimensions. On the other hand, the complex requirements of the lorentz transformation (such as a special orthogonal matrix of lorentz rotation) make the calculation need to be subjected to constraint optimization, which cannot be realized under the training framework of the current gradient descent optimization. For this purpose, the training can be filled withFoot preset constraint condition
Figure BDA0003040127610000158
The problem of the matrix M of (a) is converted into a training matrix M ', where M' ═ uT;W],
Figure BDA0003040127610000159
uTI.e., the 0 th dimension of the matrix M ', W contains other matrix elements except the 0 th dimension of the matrix M' for providing weights corresponding to the linear transformation. The matrix M' needs to satisfy the preset constraint condition of
Figure BDA00030401276100001510
Wherein the content of the first and second substances,
Figure BDA00030401276100001511
any matrix may be mapped to a suitable hyperbolic spatial linear transformation matrix.
In some embodiments, for input feature vectors located in hyperbolic space
Figure BDA00030401276100001512
Based on the above formula (1) and formula (2), f including the lorentz transformation and the lorentz rotation can be obtainedxThe expression of (M'), i.e., the preset lorentz transformation function, is as shown in equation (3), as follows:
Figure BDA0003040127610000161
and K is a preset curvature value of the hyperbolic space.
It can be understood that f in the formula (3)x(M') linear transformation in the hyperbolic space can comprise two transformation forms of Lorentz rotation and Lorentz acceleration, so that the model expression capacity is improved, and the accuracy of natural language processing is improved.
In this embodiment of the application, the natural language processing model may use the attention coding feature obtained in S102 as an input feature vector, and perform linear transformation on the attention coding feature by using a preset lorentz transformation function of formula (3) to obtain a linear coding feature corresponding to the attention coding feature.
S104, decoding and predicting the linear coding characteristics to obtain a current processing result of the current hyperbolic word vector, and processing each hyperbolic word vector to obtain a target processing result corresponding to the to-be-processed statement.
In the embodiment of the application, the natural language processing model may input the linear coding features corresponding to the current hyperbolic word vector into the decoding model of the corresponding neural network, and perform decoding prediction on the linear coding features through the decoding model to obtain a decoding prediction result corresponding to the current hyperbolic word vector as a current processing result. The natural language processing model processes each hyperbolic word vector in the hyperbolic word vector sequence in the same process to obtain a decoding prediction result corresponding to the whole hyperbolic word vector sequence, and the decoding prediction result is used as a target processing result corresponding to a sentence to be processed, so that natural language processing work under scenes such as machine translation, knowledge graph completion, fine-grained entity classification and the like is realized.
It can be understood that a hyperbolic space is represented by a Lorentz model, and a linear transformation matrix model which satisfies that input and output eigenvectors are expressed by the Lorentz model is trained to obtain a preset Lorentz transformation function; therefore, when the preset Lorentz transformation function is used for carrying out linear transformation processing on the input attention coding features, the output linear coding features can be ensured to be still located in the hyperbolic space, so that model calculation of the hyperbolic neural network can be completed in the hyperbolic space, the stability and the operation efficiency of the hyperbolic neural network model are improved, and the stability and the efficiency of natural language processing by using the hyperbolic neural network are improved.
In some embodiments, referring to fig. 4, fig. 4 is an optional flowchart of the natural language processing method provided in the embodiment of the present application, and S102 in fig. 3 may be implemented by performing S1021 to S1024, which will be described with reference to each step.
S1021, performing key-value pair conversion on the hyperbolic word vector sequence to obtain a key set, a value set and an inquiry set corresponding to the hyperbolic word vector sequence; the key set comprises key vectors corresponding to each hyperbolic word vector; the query set contains a query word vector corresponding to each hyperbolic word vector.
In the embodiment of the present application, the attention module in fig. 2 may be implemented as an attention layer in a natural language processing model. The attention layer may include three linear sub-network layers, and the natural language processing model may perform Key-Value pair conversion processing on each hyperbolic word vector in the hyperbolic word vector sequence through the three linear sub-network layers in the attention layer to obtain a Query word (Query) vector, a Key (Key) vector, and a Value (Value) vector corresponding to each hyperbolic word vector, and further obtain a Key set, a Query set, and a Value set corresponding to the hyperbolic word vector sequence.
In some embodiments, the set of keys formed by the key vectors corresponding to each hyperbolic word vector may be represented as
Figure BDA0003040127610000171
The query set formed by the query word vectors corresponding to each hyperbolic word vector may be represented as Q ═ Q1,…,qQThe value set formed by the value vectors corresponding to each hyperbolic word vector can be expressed as
Figure BDA0003040127610000172
And S1022, acquiring the current query word vector corresponding to the current hyperbolic word vector from the query set.
In the embodiment of the present application, when performing attention coding on the current hyperbolic word vector, the natural language processing model may be set from a query set Q ═ Q1,…,qQAnd determining the query word vector corresponding to the current hyperbolic word vector as the current query word vector.
And S1023, obtaining an attention weight set corresponding to the current hyperbolic word vector by calculating the Lorentz square distance between the current query word vector and each key vector in the key set.
In the embodiment of the application, the natural language processing model may obtain, when performing attention coding, an attention weight corresponding to each hyperbolic word vector with respect to a current hyperbolic word vector by calculating a lorentz square distance between an inquiry word vector and each key vector in a key set, as an attention weight set corresponding to the current hyperbolic word vector.
In some embodiments, S1023 may be implemented by performing S1023-1 to S1023-2, which will be described in connection with the steps.
And S1023-1, calculating the Lorentzian squared distance between the current query word vector and each key vector.
In the embodiment of the present application, Q is { Q ] for the query set1,…,qQCurrent query word vector q iniThe natural language processing model can calculate the current query word vector q according to the calculation method of the Lorentz square distanceiWith each key vector kjLorentz square distance between
Figure BDA0003040127610000181
In the embodiment of the present application, the lorentz square distance in the hyperbolic space can be expressed as
Figure BDA0003040127610000182
The calculation formula of the lorentz average distance can be as shown in formula (4-1) as follows:
Figure BDA0003040127610000183
in the formula (4-1), K is a preset curvature value of the hyperbolic space,
Figure BDA0003040127610000184
minkowski inner products of a and b.
Figure BDA0003040127610000185
Can be shown as equation (4-2) as follows:
Figure BDA0003040127610000186
in the embodiment of the application, the natural language processing model can convert the current query word vector qiAs a, each key vector k in the key setjAnd b, calculating the Lorentz square distance between the current query word vector and each key vector in the key set according to the formula (5-1) and the formula (5-2).
And S1023-2, negating the Lorentz square distance, and performing normalization exponential processing on the ratio of the negation result to a preset constant to obtain the attention weight of each key vector relative to the current query word vector as an attention weight set.
In the embodiment of the application, the natural language processing model can negate the Lorentz square distance between the current query word vector and each key vector to obtain an negated result; and dividing the negation result by a preset constant, then carrying out normalization index processing, taking the calculation result as the attention weight of each key vector relative to the current query word vector, and further obtaining an attention weight set corresponding to the current hyperbolic word vector according to each key vector in the key set.
In some embodiments, the predetermined constant may take the value of the root of the predetermined vector dimension. For example, for a preset vector dimension n, a preset constant may take the value of
Figure BDA0003040127610000191
The process of calculating the attention weight of each key vector with respect to the current query word vector in S1023-2 may be as shown in equation (4-3) as follows:
Figure BDA0003040127610000192
in equation (4-3), the natural language processing model may be applied to the current query word vector qiWith each key vector kjThe Lorentz square distance between them is inverted to obtain
Figure BDA0003040127610000193
The natural language processing model will take the inverse result and the square root of the preset vector dimensionRatio of
Figure BDA0003040127610000194
Carrying out normalization index processing to obtain a key vector kjRelative to qiAttention weight w ofijI.e. the attention weight of the jth hyperbolic word vector relative to the current ith hyperbolic word vector. The natural language processing model can obtain a key set through calculation according to a formula (4-3)
Figure BDA0003040127610000195
All key vectors in (1) relative to qiAs the attention weight set w corresponding to the current hyperbolic word vectori
In some embodiments, the preset constant may also be determined as another value according to an actual situation, and the embodiments of the present application are not limited.
S1024, obtaining attention coding features through a preset hyperbolic space centroid calculation method based on the attention weight set and the value set; the preset hyperbolic space centroid calculation method is used for calculating the centroid position of the hyperbolic space through weighted summation.
In the embodiment of the application, since the process of solving the centroid in the hyperbolic space is also a weighted summation process, which is similar to the process of performing weighted summation on each value vector according to the attention weight in the attention mechanism, a calculation process of performing weighted summation on each value vector according to the attention weight can be performed in the hyperbolic space by using a preset hyperbolic space centroid calculation method in the hyperbolic space.
In the embodiment of the present application, the method for calculating the preset hyperbolic space centroid can be as shown in formula (5-1), as follows:
Figure BDA0003040127610000196
in the formula (5-1), xiRepresenting the ith point, v, in a hyperbolic spatial point setiThe weight of the ith point is represented,
Figure BDA0003040127610000197
indicates the number of point vectors, mucRepresenting the centroid position of the hyperbolic space. The formula (5-1) is characterized in
Figure BDA0003040127610000198
In the hyperbolic space of (2), the μ point is solved so that other points x in the hyperbolic spaceiThe weighted sum value to the point mu reaches the minimum value, and the point mu is taken as the centroid position mu of the hyperbolic spacec
Here, when the centroid solving formula (5-1) of the hyperbolic space is applied to the scenario of the neural network model operation, x may be usediAnd (3) representing any vector in the hyperbolic feature space of the neural network, and realizing a weighted summation process in the hyperbolic space through a formula (5-1). In some embodiments, when applying equation (5-1) to the weighted summation process in the attention mechanism, the value vector corresponding to each hyperbolic word vector may be taken as xiAnd sets attention weights wiAs vi. In order to obtain an analytic solution of the formula (5-1), the natural language processing model may perform weighted summation on the value set according to the attention weight set to obtain an initial attention feature; calculating to obtain an attention weight normalization factor based on a preset curvature, an attention weight set and a value set of a hyperbolic space; the attention weight normalization factor is used for constraining the initial attention feature to be in a hyperbolic space; and taking the ratio of the initial attention feature to the attention weight normalization factor as the attention coding feature corresponding to the current hyperbolic word vector. The above process can be shown as equation (5-2) as follows:
Figure BDA0003040127610000201
in equation (5-2), the natural language processing model may be based on the attention weight w corresponding to each hyperbolic word vector in the attention weight setiFor each hyperbolic word vector, corresponding value vector xiPerforming weighted summation to obtain
Figure BDA0003040127610000202
As an initial attention feature. The natural language processing model calculates and obtains an attention weight normalization factor corresponding to the current hyperbolic word vector based on a preset curvature value K, an attention weight set and a value set of a hyperbolic space
Figure BDA0003040127610000203
I.e., the denominator portion in equation (5-2). The attention weight normalization factor is used to constrain the initial attention feature to be in hyperbolic space. And the natural language processing model takes the ratio mu of the initial attention feature to the attention weight normalization factor as the attention coding feature corresponding to the current hyperbolic word vector.
It can be understood that, in the embodiment of the present application, the attention coding weight corresponding to the current hyperbolic word vector in the hyperbolic space is defined by the lorentz squared distance, and then the attention coding feature corresponding to the current hyperbolic word vector is obtained by using the centroid solving process in the hyperbolic space, so that an attention layer component of the hyperbolic neural network that performs operation completely in the hyperbolic space can be constructed, the coding operation of the attention layer in the hyperbolic space is achieved, mapping to the euclidean space corresponding to the tangent space is not required, the stability and efficiency of the hyperbolic neural network model are further improved, and the stability and efficiency of natural language processing are further improved.
In some embodiments, referring to fig. 5, fig. 5 is an optional flowchart of the natural language processing method provided in the embodiments of the present application, and based on fig. 3 or fig. 4, after S101, S201 to S202 may also be executed, which will be described with reference to each step.
S201, embedding the position corresponding to each hyperbolic word vector in the hyperbolic word vector sequence into a vector and transmitting the embedded vector to a plurality of tangent spaces corresponding to a plurality of points in the hyperbolic spaces through Lorentz parallel transmission to obtain a plurality of position vector information corresponding to each hyperbolic word vector.
In the embodiment of the application, for each hyperbolic word vector, the natural language processing model can obtain the position of each hyperbolic word vector in the sentence, and the position is used as the position embedded corresponding to each hyperbolic word vectorEntry vector { piAnd constrain piLocated in the origin tangent space. To enhance the expressiveness of the model by using multiple tangent spaces instead of just the origin tangent space. The natural language processing model can transmit the position embedding vector to different tangent spaces corresponding to different points in the hyperbolic space in parallel in the hyperbolic space by a Lorentz parallel transmission method to obtain a plurality of position vector information corresponding to each hyperbolic word vector. In some embodiments, the lorentz parallel transmission method may be as shown in equation (6).
Figure BDA0003040127610000211
The natural language processing model may transmit the position embedding vector to different tangent spaces corresponding to different points in the hyperbolic space in parallel by using formula (6), so as to obtain a plurality of position vector information r shown in formula (7)iThe following are:
Figure BDA0003040127610000212
in the formula (7), the first and second groups,
Figure BDA0003040127610000213
yi=f(zi) Where f determines the tangent space at which point the position embedding vector should be transmitted.
And S202, updating each hyperbolic word vector by using the plurality of position vector information.
In this embodiment of the application, the natural language processing model may combine the plurality of location vector information obtained in S201 with each hyperbolic word vector, and update each hyperbolic word vector, so that when encoding and decoding each hyperbolic word vector, the accuracy of encoding and decoding may be further improved by using the location information corresponding to each hyperbolic word vector.
It can be understood that in the embodiment of the application, through lorentz parallel transmission, the position information in the multiple tangent spaces corresponding to each hyperbolic word vector in the hyperbolic space can be obtained, so that the model expression capacity is enhanced, and the accuracy of the natural language processing task performed by the natural language processing model is improved.
In some embodiments, referring to fig. 6, fig. 6 is an optional flowchart of the natural language processing method provided in the embodiments of the present application, and after S103, S301 to S303 may be further executed, which will be described with reference to the steps.
S301, linear correction is carried out on the linear coding features through a preset residual error network, and coding residual error correction results corresponding to the linear coding features are obtained.
In the embodiment of the application, the output of the hyperbolic linear transformation layer in the natural language processing model may be connected to a preset residual error network, that is, a residual error layer. The hyperbolic transformation layer correspondingly realizes a linear transformation method of a preset Lorentz transformation function, and because vector addition is difficult to define in a hyperbolic space, the application provides a method for combining a residual error layer with a previous network layer to perform residual error processing in the hyperbolic space.
In this embodiment of the application, when the upper network layer of the residual layer is the hyperbolic linear transformation layer, the natural language processing model may combine the hyperbolic linear transformation layer and the residual layer as the first network combination layer, use the attention coding feature as the input of the first network combination layer, and first perform linear transformation on the attention coding feature through the hyperbolic linear transformation layer and using a preset lorentz transformation function to obtain the linear coding feature. And then, linear correction is carried out on the linear coding characteristics by using a preset residual correction function through a residual layer in the first network combination layer, namely a preset residual network, so as to obtain a coding residual correction result.
And S302, combining the coding residual error correction result with the attention coding feature to obtain an intermediate linear coding feature.
In the embodiment of the application, the natural language processing model takes the input, namely the attention coding characteristic, of the first network combination layer as the residual corresponding to the first network combination layer, and combines the coding residual correction result and the attention coding characteristic to realize the residual calculation of the first network combination layer, so as to obtain the intermediate linear coding characteristic.
S303, updating the linear coding characteristics according to the first constraint factor and the intermediate linear coding characteristics; the first constraint factor is used to constrain the linear coding features to be in a hyperbolic space.
In the embodiment of the application, the natural language processing model can perform feature matrix construction according to the first constraint factor and the intermediate linear coding feature, and update the linear coding feature, so that subsequent steps can be performed based on the updated linear coding feature. The first constraint factor is used to constrain the linear coding feature to be in the hyperbolic space, that is, the processing of the residual layer in the embodiment of the present application is also completed in the hyperbolic space.
In some embodiments, the linear coding feature output in S103 is yiAttention code feature is xiThe process of S301-S303 can be as shown in equation (8), as follows:
Figure BDA0003040127610000231
in the formula (8), phi (Wy)iU) is the result of the coding residual correction, phi (Wy)i,u)+xiIn order to have an intermediate linear coding characteristic,
Figure BDA0003040127610000232
and the first constraint factor is used as the 0 th dimension of the feature matrix corresponding to the updated linear coding feature, and the updated linear coding feature is still constrained in the hyperbolic space.
In some embodiments, a residual layer may also be combined with the attention layer for residual fitting the attention-coded features output by the attention layer. That is, after the attention coding feature is obtained in S102, the following steps may also be performed: performing linear correction on the attention coding features through a preset residual error network to obtain an attention residual error correction result corresponding to the attention coding features; combining the attention residual error correction result with the current hyperbolic word vector to obtain intermediate attention coding characteristics; updating the attention coding feature according to the second constraint factor and the intermediate attention coding feature; the second constraint factor is used to constrain the attention-coding feature to be in a hyperbolic space. Here, the attention layer usually includes a linear processing sublayer, and the attention coding feature is output to the residual layer by interfacing the linear processing sublayer with the residual layer, so that the process of performing residual processing on the output of the attention layer by the residual layer is similar to the process of performing residual processing in S301 to S303, and is not described herein again.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
In an actual natural language processing project, a vector extraction layer, an attention layer, a linear transformation layer and a residual error layer which can perform data operation completely in a hyperbolic space can be respectively constructed according to the method in the embodiment of the application. In some embodiments, a fully hyperbolic network model HYBONET may be constructed by using all the hyperbolic spatial neural network components. In order to verify the effectiveness of the HYBONET, the HYBONET can be used for carrying out a comparison test on the current natural language processing model through a plurality of representative scenes including knowledge graph completion, machine translation, fine-grained entity classification and the like.
In some embodiments, for a knowledge-graph complemented natural language processing scenario, given a head entity h and a relationship r, all tail entities may be scored and ordered by a natural language processing model as predicted results. The applicant uses HYBONET provided in the embodiment of the present application to perform a comparison test with natural language processing models in other hyperbolic spaces or euclidean spaces at present on two common knowledge graphs FB15k-237 and WN18RR, and evaluates an experimental result through two evaluation indexes mrr (mean regression rank) and H @ k, where the comparison result can be shown in table 1:
Figure BDA0003040127610000251
TABLE 1
In table 1, the MRR index represents the reciprocal average of the ranking of the correct entity in the prediction; h @ k characterizes the percentage of the top k bits that the correct entity appears in the predicted rank. The corresponding H @ k indices for k 10, 3 and 1 are shown in table 1, respectively. MURP (Balazevic et al, 2019a), lorentz (tagent) are current hyperbolic space neural network models, and trans (Bordes et al, 2013), DISTMULT (Yang et al, 2015), COMPLEX (troullon et al, 2017), CONVE (dettomers et al, 2018), ROTATE (Sun et al, 2019), TUCKER (Balazevic et al, 2019b) are neural network models in euclidean space. The optimal values of the indicators in the hyperbolic neural network model are shown in bold in table 1, and the optimal values of the indicators in all the neural network models are shown in bold and underlined.
In some embodiments, for a fine-grained Entity classification scenario, the applicant performs a comparison experiment on hyperbolic neural network models HY BASE, HY target and HY XLARGE of different scales, which are constructed by the method of the embodiment of the present application, based on an Open Entity dataset, and other natural language processing models, where the Open Entity dataset divides types into three levels: coarse, fine and ultra-fine particle sizes. For each entity e, the model will give a score s (t) of all typesiIe) and converting the score into a probability p (t) using a sigmoid functioni|e)=σ(s(tiIe)), entity e can be considered to belong to a certain type if the probability of that type is greater than 0.5. The results of the comparative experiments are shown in table 2. Wherein C, F, UF represents F1 index scores of three different granularity levels in entity classification respectively, and Total represents F1 index scores of three levels in combination. In some embodiments, the calculation formula of the F1 index may be F1 ═ 2 precision call/(precision + call), where precision represents accuracy and call represents recall. Watch (A)The underline in fig. 2 shows the index optima in all the neural network models, and the index optima in the hyperbolic neural network model are shown in bold.
Figure BDA0003040127610000261
TABLE 2
In some embodiments, for machine translation scenarios, applicants conducted comparison experiments on three models, HYBONET, transprogram and lorentz (TANGENT), on the IWSLT '14 EN-DE and WMT' 16EN-DE translation reference datasets, scored using Bilingual Evaluation update Understudy (BLEU) indicators, with the experimental results for each model shown in Table 3:
Figure BDA0003040127610000271
TABLE 3
In some embodiments, based on the WN18RR data set, the FB15k-237 data set, the IWSLT14 data set, and the Open entity data set, respectively, the applicant performs comparative training on a hyperbolic space model, such as HyboNet, constructed by the hyperbolic neural network component provided in the embodiments of the present application and other current network models, such as a MuRP model or an euclidean model, to obtain convergence comparative results of the models during training, as shown in fig. 7(a), 7(b), 7(c), and 7 (d).
Based on the experimental results, the hyperbolic network model constructed by the embodiment of the application is superior to the Euclidean network model under the condition that the parameters are equivalent or less. Compared with the existing hyperbolic network model depending on the tangent space, the full-hyperbolic network model of the embodiment of the application has better convergence, and simultaneously realizes equivalent or even better performance.
Continuing with the exemplary structure of the natural language processing device 255 implemented as software modules provided in embodiments of the present application, in some embodiments, as shown in fig. 2, the software modules stored in the natural language processing device 255 of the memory 240 may include:
a hyperbolic vector extraction module 2551, an attention module 2552, a hyperbolic transform module 2553 and a decoding prediction module 2554, wherein,
the hyperbolic vector extraction module 2551 is configured to obtain a hyperbolic word vector sequence corresponding to a to-be-processed sentence according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence conforms to a Lorentz model;
the attention module 2552 is configured to perform attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector;
the hyperbolic linear transformation module 2553 is configured to perform linear transformation on the attention coding feature through a preset lorentz transformation function to obtain a linear coding feature corresponding to the attention coding feature; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model;
the decoding prediction module 2554 is configured to perform decoding prediction on the linear coding features to obtain a current processing result of the current hyperbolic word vector, and obtain a target processing result corresponding to the to-be-processed sentence by processing each hyperbolic word vector.
In some embodiments, the attention module 2552 is further configured to perform key-value pair conversion on the hyperbolic word vector sequence to obtain a key set, a value set, and a query set corresponding to the hyperbolic word vector sequence; the key set comprises key vectors corresponding to each hyperbolic word vector; the query set comprises query word vectors corresponding to each hyperbolic word vector; acquiring a current query word vector corresponding to the current hyperbolic word vector from the query set; obtaining an attention weight set corresponding to the current hyperbolic word vector by calculating the Lorentz square distance between the current query word vector and each key vector in the key set; obtaining the attention coding feature through a preset hyperbolic space centroid calculation method based on the attention weight set and the value set; the preset hyperbolic space centroid calculation method is used for calculating the centroid position of the hyperbolic space through weighted summation.
In some embodiments, the attention module 2552 is further configured to calculate a lorentz squared distance between the current query word vector and each of the key vectors; and negating the Lorentz square distance, and performing normalization exponential processing on the ratio of a negation result to a preset constant to obtain the attention weight of each key vector relative to the current query word vector as the attention weight set.
In some embodiments, the attention module 2552 is further configured to perform a weighted summation on the value set according to the attention weight set to obtain an initial attention feature; calculating an attention weight normalization factor based on the preset curvature of the hyperbolic space, the attention weight set and the value set; the attention weight normalization factor is used to constrain the initial attention feature to be in hyperbolic space; and taking the ratio of the initial attention feature to the attention weight normalization factor as the attention coding feature corresponding to the current hyperbolic word vector.
In some embodiments, the hyperbolic vector extraction module 2551 is further configured to, after obtaining a hyperbolic word vector sequence corresponding to a to-be-processed sentence according to a preset word vector table in a hyperbolic space, transmit a position embedding vector corresponding to each hyperbolic word vector in the hyperbolic word vector sequence to a plurality of tangent spaces corresponding to a plurality of points in the hyperbolic space through lorentz parallel transmission to obtain a plurality of position vector information corresponding to each hyperbolic word vector; updating each hyperbolic word vector using the plurality of location vector information.
In some embodiments, the natural language processing apparatus further includes a preset residual network, where the preset residual network is configured to perform linear transformation on the attention coding feature through a preset lorentz transformation function to obtain a linear coding feature corresponding to the attention coding feature, and then perform linear correction on the linear coding feature to obtain a coding residual correction result corresponding to the linear coding feature; combining the coding residual error correction result with the attention coding feature to obtain an intermediate linear coding feature; updating the linear coding feature according to a first constraint factor and the intermediate linear coding feature; the first constraint factor is used for constraining the linear coding features to be in the hyperbolic space.
In some embodiments, the preset residual error network is configured to perform attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector, and then perform linear correction on the attention coding feature to obtain an attention residual error correction result corresponding to the attention coding feature; combining the attention residual error correction result with the current hyperbolic word vector to obtain intermediate attention coding features; updating the attention coding feature according to a second constraint factor and the intermediate attention coding feature; the second constraint factor is used to constrain the attention-coding feature to be in the hyperbolic space.
Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to perform the natural language processing method provided by embodiments of the present application, for example, the method as shown in fig. 3, 4, 5, and 6.
In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, may be stored in a portion of a file that holds other programs or data, e.g., in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
In summary, according to the embodiment of the application, the hyperbolic space is represented by the lorentz model, the linear transformation matrix model satisfying that the input and output feature vectors all conform to the lorentz model expression is trained, and the preset lorentz transformation function can be obtained; therefore, when the preset Lorentz transformation function is used for carrying out linear transformation processing on the input attention coding features, the output linear coding features can be ensured to be still located in the hyperbolic space, so that model calculation of the hyperbolic neural network can be completed in the hyperbolic space, the stability and the operation efficiency of the hyperbolic neural network model are improved, and the stability and the efficiency of natural language processing by using the hyperbolic neural network are improved. In addition, the linear transformation process in the embodiment of the application covers the expressions of Lorentz rotation and Lorentz acceleration, and the accuracy of natural language processing is improved. Furthermore, the attention coding weight corresponding to the current hyperbolic word vector in the hyperbolic space is defined through the Lorentz square distance, the attention coding feature corresponding to the current hyperbolic word vector is obtained through the centroid solving process in the hyperbolic space, an attention layer component of the hyperbolic neural network which is operated completely in the hyperbolic space can be constructed, the coding operation of the attention layer in the hyperbolic space is achieved, mapping to the Euclidean space corresponding to the tangent space is not needed, the stability and the efficiency of the hyperbolic neural network model are further improved, and the stability and the efficiency of natural language processing are further improved. Furthermore, through Lorentz parallel transmission, the position information in a plurality of tangent spaces corresponding to each hyperbolic word vector in the hyperbolic space can be obtained, so that the model expression capacity is enhanced, and the accuracy of the natural language processing task of the natural language processing model is improved. Further, the embodiment of the application also defines a residual error processing method in the hyperbolic space. An experimental result in a practical scene shows that the hyperbolic network model constructed by the embodiment of the application is superior to the Euclidean network model under the condition of equivalent or less parameters. Compared with the existing hyperbolic network model depending on the tangent space, the full-hyperbolic network model of the embodiment of the application has better convergence, and simultaneously realizes equivalent or even better performance.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (10)

1. A natural language processing method, comprising:
obtaining a hyperbolic word vector sequence corresponding to a sentence to be processed according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence conforms to a Lorentz model;
performing attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector;
performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model;
and decoding and predicting the linear coding features to obtain a current processing result of the current hyperbolic word vector, and obtaining a target processing result corresponding to the to-be-processed statement by processing each hyperbolic word vector.
2. The method of claim 1, wherein said performing attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain attention coding features corresponding to the current hyperbolic word vector comprises:
performing key-value pair conversion on the hyperbolic word vector sequence to obtain a key set, a value set and an inquiry set corresponding to the hyperbolic word vector sequence; the key set comprises key vectors corresponding to each hyperbolic word vector; the query set comprises query word vectors corresponding to each hyperbolic word vector;
acquiring a current query word vector corresponding to the current hyperbolic word vector from the query set;
obtaining an attention weight set corresponding to the current hyperbolic word vector by calculating the Lorentz square distance between the current query word vector and each key vector in the key set;
obtaining the attention coding feature through a preset hyperbolic space centroid calculation method based on the attention weight set and the value set; the preset hyperbolic space centroid calculation method is used for calculating the centroid position of the hyperbolic space through weighted summation.
3. The method of claim 2, wherein obtaining the attention weight set corresponding to the current hyperbolic word vector by calculating a Lorentzian squared distance between the current query word vector and each key vector in the set of keys comprises:
calculating a Lorentzian squared distance between the current query word vector and each of the key vectors;
and negating the Lorentz square distance, and performing normalization exponential processing on the ratio of a negation result to a preset constant to obtain the attention weight of each key vector relative to the current query word vector as the attention weight set.
4. The method of claim 2, wherein obtaining the attention-coding feature by a predetermined hyperbolic spatial centroid calculation method based on the set of attention weights and the set of values comprises:
carrying out weighted summation on the value set according to the attention weight set to obtain an initial attention feature;
calculating an attention weight normalization factor based on the preset curvature of the hyperbolic space, the attention weight set and the value set; the attention weight normalization factor is used to constrain the initial attention feature to be in hyperbolic space;
and taking the ratio of the initial attention feature to the attention weight normalization factor as the attention coding feature corresponding to the current hyperbolic word vector.
5. The method according to any one of claims 1 to 4, wherein after obtaining the hyperbolic word vector sequence corresponding to the sentence to be processed according to the preset word vector table in the hyperbolic space, the method further comprises:
embedding a position corresponding to each hyperbolic word vector in the hyperbolic word vector sequence into a vector through Lorentz parallel transmission, and transmitting the embedded vector to a plurality of tangent spaces corresponding to a plurality of points in the hyperbolic spaces to obtain a plurality of position vector information corresponding to each hyperbolic word vector;
updating each hyperbolic word vector using the plurality of location vector information.
6. The method according to any one of claims 1 to 4, wherein after the attention-coding feature is linearly transformed by a preset Lorentzian transformation function to obtain a linear coding feature corresponding to the attention-coding feature, the method further comprises:
performing linear correction on the linear coding features through a preset residual error network to obtain coding residual error correction results corresponding to the linear coding features;
combining the coding residual error correction result with the attention coding feature to obtain an intermediate linear coding feature;
updating the linear coding feature according to a first constraint factor and the intermediate linear coding feature; the first constraint factor is used for constraining the linear coding features to be in the hyperbolic space.
7. The method according to any one of claims 1-4, wherein after the attention coding is performed on the current hyperbolic word vector in the hyperbolic word vector sequence to obtain the attention coding feature corresponding to the current hyperbolic word vector, the method further comprises:
performing linear correction on the attention coding features through a preset residual error network to obtain an attention residual error correction result corresponding to the attention coding features;
combining the attention residual error correction result with the current hyperbolic word vector to obtain intermediate attention coding features;
updating the attention coding feature according to a second constraint factor and the intermediate attention coding feature; the second constraint factor is used to constrain the attention-coding feature to be in the hyperbolic space.
8. A natural language processing apparatus, comprising: a hyperbolic vector extraction module, an attention module, a hyperbolic transformation module and a decoding prediction module, wherein,
the hyperbolic vector extraction module is used for obtaining a hyperbolic word vector sequence corresponding to a statement to be processed according to a preset word vector table in a hyperbolic space; the expression of each hyperbolic word vector in the hyperbolic word vector sequence conforms to a Lorentz model;
the attention module is used for carrying out attention coding on a current hyperbolic word vector in the hyperbolic word vector sequence to obtain an attention coding feature corresponding to the current hyperbolic word vector;
the hyperbolic linear transformation module is used for performing linear transformation on the attention coding features through a preset Lorentz transformation function to obtain linear coding features corresponding to the attention coding features; the preset Lorentz transformation function is obtained by training a linear transformation matrix model meeting preset constraint conditions; the preset constraint condition is used for constraining the linear transformation matrix model to process the input characteristic vector which accords with the Lorentz model, and the obtained output characteristic vector accords with the Lorentz model;
and the decoding prediction module is used for decoding and predicting the linear coding features to obtain a current processing result of the current hyperbolic word vector, and obtaining a target processing result corresponding to the statement to be processed by processing each hyperbolic word vector.
9. An electronic device, comprising:
a memory for storing executable instructions;
a processor for implementing the method of any one of claims 1 to 7 when executing executable instructions stored in the memory.
10. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1 to 7.
CN202110457927.8A 2021-04-26 2021-04-26 Natural language processing method, device, equipment and computer readable storage medium Pending CN113761829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110457927.8A CN113761829A (en) 2021-04-26 2021-04-26 Natural language processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110457927.8A CN113761829A (en) 2021-04-26 2021-04-26 Natural language processing method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113761829A true CN113761829A (en) 2021-12-07

Family

ID=78786895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110457927.8A Pending CN113761829A (en) 2021-04-26 2021-04-26 Natural language processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113761829A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116933697A (en) * 2023-09-18 2023-10-24 上海芯联芯智能科技有限公司 Method and device for converting natural language into hardware description language

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116933697A (en) * 2023-09-18 2023-10-24 上海芯联芯智能科技有限公司 Method and device for converting natural language into hardware description language
CN116933697B (en) * 2023-09-18 2023-12-08 上海芯联芯智能科技有限公司 Method and device for converting natural language into hardware description language

Similar Documents

Publication Publication Date Title
JP7122582B2 (en) Arithmetic processing device, text evaluation device, and text evaluation method
US10248664B1 (en) Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval
CN111368993B (en) Data processing method and related equipment
EP4209965A1 (en) Data processing method and related device
WO2022057776A1 (en) Model compression method and apparatus
WO2022068314A1 (en) Neural network training method, neural network compression method and related devices
CN111951805A (en) Text data processing method and device
WO2023160472A1 (en) Model training method and related device
CN111898636B (en) Data processing method and device
RU2670781C9 (en) System and method for data storage and processing
CN112115687A (en) Problem generation method combining triples and entity types in knowledge base
WO2021057884A1 (en) Sentence paraphrasing method, and method and apparatus for training sentence paraphrasing model
CN113505193A (en) Data processing method and related equipment
CN112699215B (en) Grading prediction method and system based on capsule network and interactive attention mechanism
US20240046067A1 (en) Data processing method and related device
JP2024519443A (en) Method and system for action recognition using bidirectional space-time transformer
CN114266897A (en) Method and device for predicting pox types, electronic equipment and storage medium
CN116432019A (en) Data processing method and related equipment
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
Tangpanitanon et al. Explainable natural language processing with matrix product states
CN112988851B (en) Counterfactual prediction model data processing method, device, equipment and storage medium
CN113761829A (en) Natural language processing method, device, equipment and computer readable storage medium
CN114911778A (en) Data processing method and device, computer equipment and storage medium
CN115206421B (en) Drug repositioning method, and repositioning model training method and device
CN112036546B (en) Sequence processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination