WO2022001724A1 - 一种数据处理方法及装置 - Google Patents

一种数据处理方法及装置 Download PDF

Info

Publication number
WO2022001724A1
WO2022001724A1 PCT/CN2021/101225 CN2021101225W WO2022001724A1 WO 2022001724 A1 WO2022001724 A1 WO 2022001724A1 CN 2021101225 W CN2021101225 W CN 2021101225W WO 2022001724 A1 WO2022001724 A1 WO 2022001724A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
conformal
data
space
model
Prior art date
Application number
PCT/CN2021/101225
Other languages
English (en)
French (fr)
Inventor
朱煜东
肖镜辉
周迪
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21832153.7A priority Critical patent/EP4152212A4/en
Publication of WO2022001724A1 publication Critical patent/WO2022001724A1/zh
Priority to US18/084,267 priority patent/US20230117973A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/245Classification techniques relating to the decision surface
    • G06F18/2451Classification techniques relating to the decision surface linear, e.g. hyperplane
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a data processing method and device.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Shallow network is characterized by fast training and prediction speed and low resource consumption.
  • the input and intermediate layers are constructed in Euclidean space, and the model description ability and parameter distribution are restricted by Euclidean geometric properties.
  • the present application provides a data processing method, the method comprising:
  • the training device may acquire the data to be processed and the corresponding category label.
  • the data to be processed may include at least one of the following: natural language data data, knowledge graph data, genetic data or image data.
  • the category labeling is related to the type of task to be implemented by the neural network with training. For example, for the neural network to be used for text classification, its category is labeled as the category of the data to be processed. For the neural network to be used for semantic recognition, its category is labeled Semantics for the data to be processed, etc.
  • natural language data includes multiple words, and there will be upper and lower relationships between words.
  • natural language data can be understood as a data with a tree-like hierarchical structure.
  • the neural network includes a feature extraction network and a classification network; the feature extraction network is configured to extract the feature vector expressed in the hyperbolic space of the data to be processed, and the classification network is configured as a hyperbolic space-based The algorithm processes the feature vector to obtain the processing result.
  • An embodiment of the present application provides a data processing method, the method includes: acquiring data to be processed and corresponding category labels; processing the data to be processed by using a neural network to output a processing result; wherein, the neural network It includes a feature extraction network and a classification network; the feature extraction network is configured to extract the feature vector of the data to be processed, and the classification network is configured to process the feature vector based on the operation rule of the hyperbolic space to obtain the Processing result; obtaining a loss based on the category label and the processing result, and updating the neural network based on the loss to obtain an updated neural network.
  • the hyperbolic space Due to the characteristics of the hyperbolic space itself, expressing the feature vector in the hyperbolic space can enhance the fitting ability of the neural network model and improve the data processing accuracy of the model on the data set containing the tree-like hierarchical structure, for example, it can improve the text classification accuracy. Moreover, the neural network model constructed based on hyperbolic space will greatly reduce the number of model parameters while improving the fitting ability of the model.
  • the data to be processed includes at least one of the following:
  • Natural language data data knowledge graph data, genetic data or image data.
  • the classification network includes a plurality of neural units, each of which is configured to process input data based on an activation function, wherein the activation function includes an arithmetic rule based on the hyperbolic space.
  • the activation function can be configured to be expressed as an arithmetic rule based on a hyperbolic space.
  • the hyperbolic space-based operation rule includes at least one of the following: Mobius matrix multiplication and Mobius addition.
  • *Mobius means Mobius matrix multiplication
  • +Mobius means Mobius addition.
  • the feature extraction network includes: a first processing layer and a second processing layer; the first processing layer is configured to process the data to be processed an embedding vector represented in the hyperbolic space; the second processing layer is configured to calculate the geometric center of the embedding vector on the hyperbolic space to obtain the feature vector.
  • the first processing layer may be an input layer, which is configured to process the data to be processed to obtain an embedding vector corresponding to the data to be processed.
  • the embedded vector can be calculated in the hyperbolic space.
  • the feature vector can be extracted by using the geometric mean extraction method of hyperbolic space.
  • the embedding vector is expressed based on a first conformal model;
  • the feature extraction network further includes: a conformal transformation layer;
  • the conformal transformation layer is configured to convert the first processing layer
  • the obtained embedding vector is converted into a vector expressed based on the second conformal model, and the vector expressed based on the second conformal model is input to the second processing layer;
  • the second processing layer is configured In order to calculate the geometric center of the vector expressed based on the second conformal model to obtain the feature vector;
  • the conformal transformation layer is further configured to transform the feature vector obtained by the second processing layer is a vector expressed based on the first conformal model, and the vector expressed based on the first conformal model is input to the classification network, wherein the first conformal model represents the hyperbolic space
  • the second conformal model indicates that the hyperbolic space is mapped to the Euclidean space through the second conformal mapping.
  • the first conformal model and the second conformal model may be Poincare Model, Hyperboloid Model or Klein Model.
  • the conformal model is used to describe the hyperbolic space, which defines a series of vector algebraic transformations and geometric constraints of the hyperbolic gyro vector space. Different conformal models have different properties.
  • the first processing layer needs to be The output embedding vector is converted into an embedding vector expressed based on the Klein Model, and based on the embedding vector expressed by the Klein Model, its geometric center is calculated using the Einstein midpoint to obtain the feature vector, and the feature vector at this time is expressed based on the Klein Model.
  • the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and convert the vector expressed based on the first conformal model Input to the classification network, at this time, the conformal conversion layer can convert the feature vector expressed based on the Klein Model to the vector expressed based on the Poincare Model, and convert the feature vector expressed based on the first conformal model. vector input to the classification network.
  • the embedding vector is expressed based on a second conformal model
  • the second processing layer is configured to calculate the geometric center of the embedding vector expressed based on the second conformal model to obtain the feature vector, wherein the second conformal model represents the hyperbolic space It is mapped to the Euclidean space by means of the second conformal mapping.
  • the classification network is configured to process the feature vector based on an operation rule of the hyperbolic space to obtain a vector to be normalized expressed in the hyperbolic space;
  • the vector to be normalized is mapped into the Euclidean space, and the vector to be normalized mapped into the Euclidean space is normalized to obtain the processing result.
  • the output of the classification layer can be converted from the hyperbolic space to the Euclidean space, which is consistent with the subsequent objective loss function.
  • the present application provides a data processing method, the method comprising:
  • the neural network includes a feature extraction network and a classification network;
  • the feature extraction network is configured to extract a feature vector of the training data, and the classification network is configured to process the feature vector based on an arithmetic rule of hyperbolic space , to obtain the processing result;
  • the gradient expressed in the hyperbolic space is obtained, and the neural network is updated based on the gradient to obtain an updated neural network.
  • updating the neural network based on the gradient to obtain an updated neural network including:
  • the feature extraction network in the neural network is updated based on the gradient to obtain an updated feature extraction network, and the updated feature extraction network is configured to extract the training data expressed in the hyperbolic space Feature vector.
  • the training data includes at least one of the following:
  • Natural language data knowledge graph data, genetic data or image data.
  • the classification network includes a plurality of neural units, each of which is configured to process input data based on an activation function, wherein the activation function includes an arithmetic rule based on the hyperbolic space.
  • the hyperbolic space-based operation rule includes at least one of the following: Mobius matrix multiplication and Mobius addition.
  • the feature extraction network includes: a first processing layer and a second processing layer;
  • the first processing layer is configured to process the training data to obtain an embedding vector corresponding to the training data
  • the second processing layer is configured to calculate the geometric center of the embedding vector on the hyperbolic space to obtain the feature vector.
  • the embedding vector is expressed based on the first conformal model
  • the feature extraction network further includes: a conformal transformation layer;
  • the conformal conversion layer is configured to convert the embedding vector obtained by the first processing layer into a vector expressed based on a second conformal model, and input the vector expressed based on the second conformal model to the second processing layer;
  • the second processing layer is configured to calculate the geometric center of the vector expressed based on the second conformal model to obtain the feature vector;
  • the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and to express the feature vector based on the first conformal model.
  • the vector of is input to the classification network, wherein the first conformal model represents that the hyperbolic space is mapped to the Euclidean space by means of a first conformal mapping, and the second conformal model represents the The hyperbolic space is mapped to the Euclidean space by means of the second conformal mapping.
  • the data in the hyperbolic space can be expressed based on a second conformal model, and the second conformal model indicates that the hyperbolic space is mapped to the Euclidean space by means of a second conformal mapping; Wherein, the embedding vector is expressed based on the second conformal model;
  • the second processing layer is configured to calculate the geometric center of the embedding vector expressed based on the second conformal model to obtain the feature vector.
  • the classification network is configured to process the feature vector based on an operation rule of the hyperbolic space to obtain a vector to be normalized expressed in the hyperbolic space;
  • the vector to be normalized is mapped into the Euclidean space, and the vector to be normalized mapped into the Euclidean space is normalized to obtain the processing result.
  • the obtaining a loss based on the category label and the processing result includes:
  • a loss is obtained based on the class label, the processing result, and a target loss function, wherein the target loss function is a function expressed in the Euclidean space.
  • the updating the neural network based on the loss includes:
  • the neural network is updated based on the gradient expressed in hyperbolic space.
  • the present application provides a data classification device, the device comprising:
  • a fetch module configured to fetch the data to be processed
  • a processing module configured to process the data to be processed by using the neural network obtained by training to output a processing result
  • the neural network includes a feature extraction network and a classification network; the feature extraction network is configured to extract the feature vector expressed in the hyperbolic space of the data to be classified, and the classification network is configured as a hyperbolic space-based
  • the algorithm processes the feature vector to obtain the processing result.
  • the data to be processed includes at least one of the following:
  • Natural language data knowledge graph data, genetic data or image data.
  • the classification network includes a plurality of neural units, each of which is configured to process input data based on an activation function, wherein the activation function includes an arithmetic rule based on the hyperbolic space.
  • the feature extraction network includes: a first processing layer and a second processing layer;
  • the first processing layer is configured to process the data to be processed to obtain an embedding vector representing the data to be processed in the hyperbolic space;
  • the second processing layer is configured to calculate the geometric center of the embedding vector on the hyperbolic space to obtain the feature vector.
  • the embedding vector is expressed based on the first conformal model
  • the feature extraction network further includes: a conformal transformation layer;
  • the conformal conversion layer is configured to convert the embedding vector obtained by the first processing layer into a vector expressed based on a second conformal model, and input the vector expressed based on the second conformal model to the second processing layer;
  • the second processing layer is configured to calculate the geometric center of the vector expressed based on the second conformal model to obtain the feature vector;
  • the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and to express the feature vector based on the first conformal model.
  • the vector of is input to the classification network, wherein the first conformal model represents that the hyperbolic space is mapped to the Euclidean space by means of a first conformal mapping, and the second conformal model represents the The hyperbolic space is mapped to the Euclidean space by means of the second conformal mapping.
  • the embedding vector is expressed based on a second conformal model
  • the second processing layer is configured to calculate the geometric center of the embedding vector expressed based on the second conformal model to obtain the feature vector, wherein the second conformal model represents the hyperbolic space It is mapped to the Euclidean space by means of the second conformal mapping.
  • the classification network is configured to process the feature vector based on an operation rule of the hyperbolic space to obtain a vector to be normalized expressed in the hyperbolic space;
  • the vector to be normalized is mapped into the Euclidean space, and the vector to be normalized mapped into the Euclidean space is normalized to obtain the processing result.
  • the present application provides a data processing device, the device comprising:
  • an acquisition module configured to acquire training data and corresponding category labels
  • a processing module configured to process the training data by using a neural network to output a processing result
  • the neural network includes a feature extraction network and a classification network;
  • the feature extraction network is configured to extract a feature vector of the training data, and the classification network is configured to process the feature vector based on an arithmetic rule of hyperbolic space , to obtain the processing result;
  • the model updating module is configured to obtain the gradient expressed in the hyperbolic space based on the loss, and update the neural network based on the gradient to obtain an updated neural network.
  • model update module is configured to:
  • the feature extraction network in the neural network is updated based on the gradient to obtain an updated feature extraction network, and the updated feature extraction network is configured to extract the training data expressed in the hyperbolic space Feature vector.
  • the training data includes at least one of the following:
  • Natural language data knowledge graph data, genetic data or image data.
  • the classification network includes a plurality of neural units, each of which is configured to process input data based on an activation function, wherein the activation function includes an arithmetic rule based on the hyperbolic space.
  • the feature extraction network includes: a first processing layer and a second processing layer;
  • the first processing layer is configured to process the training data to obtain an embedding vector corresponding to the training data
  • the second processing layer is configured to calculate the geometric center of the embedding vector on the hyperbolic space to obtain the feature vector.
  • the embedding vector is expressed based on the first conformal model
  • the feature extraction network further includes: a conformal transformation layer;
  • the conformal conversion layer is configured to convert the embedding vector obtained by the first processing layer into a vector expressed based on a second conformal model, and input the vector expressed based on the second conformal model to the second processing layer;
  • the second processing layer is configured to calculate the geometric center of the vector expressed based on the second conformal model to obtain the feature vector;
  • the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and to express the feature vector based on the first conformal model.
  • the vector of is input to the classification network, wherein the first conformal model represents that the hyperbolic space is mapped to the Euclidean space by means of a first conformal mapping, and the second conformal model represents the The hyperbolic space is mapped to the Euclidean space by means of the second conformal mapping.
  • the data in the hyperbolic space can be expressed based on a second conformal model, and the second conformal model indicates that the hyperbolic space is mapped to the Euclidean space by means of a second conformal mapping; Wherein, the embedding vector is expressed based on the second conformal model;
  • the second processing layer is configured to calculate the geometric center of the embedding vector expressed based on the second conformal model to obtain the feature vector.
  • the classification network is configured to process the feature vector based on an operation rule of the hyperbolic space to obtain a vector to be normalized expressed in the hyperbolic space;
  • the vector to be normalized is mapped into the Euclidean space, and the vector to be normalized mapped into the Euclidean space is normalized to obtain the processing result.
  • the loss obtaining module is configured to obtain a loss based on the category label, the processing result and a target loss function, wherein the target loss function is in the Euclidean space express function.
  • the model updating module is configured to calculate the gradient corresponding to the loss, wherein the gradient is expressed in Euclidean space; convert the gradient to be in the hyperbolic space expressed gradient; updating the neural network based on the expressed gradient in hyperbolic space.
  • the present application discloses a data processing method, including: acquiring data to be processed; and processing the data to be processed by using a neural network obtained by training to output a processing result; wherein the neural network includes a feature extraction network and a classification network; the feature extraction network is configured to extract the feature vector expressed in the hyperbolic space of the data to be processed, and the classification network is configured to process the feature vector based on the operation rule of the hyperbolic space to obtain the the processing result.
  • the present application can improve the data processing accuracy of the model on the data set containing the tree-like hierarchical structure, and reduce the amount of model parameters.
  • Fig. 1 is a kind of structural schematic diagram of artificial intelligence main frame
  • Figure 2a shows a natural language processing system
  • Figure 2b shows another natural language processing system
  • 2c is a schematic diagram of a related device for natural language processing provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the architecture of a system 100 provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a feature extraction network provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a classification network provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a Riemann optimizer provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a system provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a system provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a data processing apparatus provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a data processing apparatus provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a training device provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence.
  • the above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensed process of "data-information-knowledge-wisdom".
  • the "IT value chain” from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system reflects the value brought by artificial intelligence to the information technology industry.
  • the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, safe city, etc.
  • Figure 2a shows a natural language processing system
  • the natural language processing system includes user equipment and data processing equipment.
  • the user equipment includes smart terminals such as mobile phones, personal computers, or information processing centers.
  • the user equipment is the initiator of natural language data processing, and as the initiator of requests such as language question and answer or query, the user usually initiates the request through the user equipment.
  • the above-mentioned data processing device may be a device or server with data processing functions, such as a cloud server, a network server, an application server, and a management server.
  • the data processing equipment receives the query sentences/voice/text and other questions from the intelligent terminal through the interactive interface, and then performs machine learning, deep learning, search, reasoning, decision-making and other languages through the memory for storing data and the processor for data processing. data processing.
  • the memory in the data processing device may be a general term, including local storage and a database for storing historical data.
  • the database may be on the data processing device or on other network servers.
  • the user equipment can receive an instruction from the user.
  • the user equipment can receive a piece of text input by the user, and then initiate a request to the data processing device, so that the data processing device can respond to the paragraph obtained by the user equipment.
  • the text executes natural language processing applications (such as text classification, text reasoning, named entity recognition, translation, etc.), so as to obtain the corresponding natural language processing application processing results (such as processing results, inference results, named entity recognition results for this piece of text) , translation results, etc.).
  • natural language processing applications such as text classification, text reasoning, named entity recognition, translation, etc.
  • the user equipment may receive a segment of Chinese input by the user, and then initiate a request to the data processing device, so that the data processing device performs entity classification on the segment of Chinese, thereby obtaining an entity processing result for the segment of Chinese;
  • the device may receive a segment of Chinese input by the user, and then initiate a request to the data processing device, so that the data processing device translates the segment of Chinese into English, thereby obtaining an English translation for the segment of Chinese.
  • the data processing device may execute the data processing method of the embodiment of the present application.
  • Figure 2b shows another natural language processing system.
  • the user equipment is directly used as a data processing device.
  • the user equipment can directly receive input from the user and process it directly by the hardware of the user equipment itself.
  • the specific process is the same as Similar to FIG. 2a, reference may be made to the above description, which will not be repeated here.
  • the user equipment can receive instructions from the user, for example, the user equipment can receive a piece of text input by the user, and then the user equipment can execute a natural language processing application (such as text classification) for the piece of text. , text reasoning, named entity recognition, translation, etc.), so as to obtain the processing results of the corresponding natural language processing application for the piece of text (for example, processing results, inference results, named entity recognition results, translation results, etc.).
  • a natural language processing application such as text classification
  • the user equipment may receive a segment of Chinese input by the user, and perform entity classification for the segment of Chinese, thereby obtaining an entity processing result for the segment of Chinese;
  • the Chinese paragraph is translated into English, thereby obtaining an English translation for the Chinese paragraph.
  • the user equipment itself can execute the data processing method of the embodiment of the present application.
  • FIG. 2c is a schematic diagram of a related device for natural language processing provided by an embodiment of the present application.
  • the data storage system 350 may be integrated on the execution device 310, or may be set on the cloud or other network servers.
  • the processors in Figures 2a and 2b may perform data training/machine learning/deep learning through a neural network model or other models (eg, support vector machine-based models), and use the data to finally train or learn the model for text sequences
  • Execute natural language processing applications such as text classification, sequence tagging, reading comprehension, text generation, text reasoning, translation, etc. to obtain corresponding processing results.
  • this application can also be applied to knowledge graph data processing, genetic data processing, image classification processing, and so on.
  • FIG. 3 is a schematic diagram of the architecture of a system 100 provided by an embodiment of the present application.
  • the execution device 110 is configured with an input/output (I/O) interface 112, which is used for data interaction with external devices.
  • I/O input/output
  • the user may input data to the I/O interface 112 through the client device 140, and the input data may include: various tasks to be scheduled, callable resources, and other parameters in this embodiment of the present application.
  • the execution device 110 may call the data storage system 150
  • the data, codes, etc. in the corresponding processing can also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing results to the client device 140 for provision to the user.
  • the training device 120 can generate corresponding target models/rules based on different training data for different goals or tasks, and the corresponding target models/rules can be used to achieve the above-mentioned goals or complete the above-mentioned tasks. , which provides the user with the desired result.
  • the user can manually specify input data, which can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 .
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 .
  • the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .
  • FIG. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
  • the neural network can be obtained by training according to the training device 120 .
  • An embodiment of the present application also provides a chip, where the chip includes a neural network processor NPU 50.
  • the chip can be set in the execution device 110 as shown in FIG. 3 to complete the calculation work of the calculation module 111 .
  • the chip can also be set in the training device 120 as shown in FIG. 3 to complete the training work of the training device 120 and output the target model/rule.
  • the neural network processor NPU 40 is mounted on the main central processing unit (CPU) (host CPU) as a co-processor, and tasks are allocated by the main CPU.
  • the core part of the NPU is the operation circuit 403, and the controller 404 controls the operation circuit 403 to extract the data in the memory (weight memory or input memory) and perform operations.
  • the arithmetic circuit 403 includes multiple processing units (process engines, PEs). In some implementations, arithmetic circuit 403 is a two-dimensional systolic array. The arithmetic circuit 403 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 403 is a general-purpose matrix processor.
  • the operation circuit fetches the data corresponding to the matrix B from the weight memory 402 and buffers it on each PE in the operation circuit.
  • the arithmetic circuit fetches the data of matrix A and matrix B from the input memory 401 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 408 .
  • the vector calculation unit 407 can further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and the like.
  • the vector computing unit 407 can be used for network computation of non-convolutional/non-FC layers in the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector computation unit 407 can store the processed output vectors to the unified buffer 406 .
  • the vector calculation unit 407 may apply a nonlinear function to the output of the arithmetic circuit 403, such as a vector of accumulated values, to generate activation values.
  • vector computation unit 407 generates normalized values, merged values, or both.
  • the vector of processed outputs can be used as an activation input to the arithmetic circuit 403, such as for use in subsequent layers in a neural network.
  • Unified memory 406 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 401 and/or the unified memory 406 through the storage unit access controller 405 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 402, and storing the data in the unified memory 406 into the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (bus interface unit, BIU) 410 is used to realize the interaction between the main CPU, the DMAC and the instruction fetch memory 409 through the bus.
  • the instruction fetch memory (instruction fetch buffer) 409 connected with the controller 404 is used to store the instructions used by the controller 404;
  • the controller 404 is used for invoking the instructions cached in the memory 409 to control the working process of the operation accelerator.
  • the unified memory 406, the input memory 401, the weight memory 402 and the instruction fetch memory 409 are all on-chip (On-Chip) memories, and the external memory is the memory outside the NPU, and the external memory can be double data rate synchronous dynamic random access Memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
  • DDR SDRAM double data rate synchronous dynamic random access Memory
  • HBM high bandwidth memory
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • the work of each layer in a neural network can be expressed mathematically To describe: From the physical level, the work of each layer in the neural network can be understood as the transformation from the input space to the output space (that is, the row space of the matrix to the column space) through five operations on the input space (set of input vectors). ), the five operations include: 1. Dimension raising/lowering; 2. Enlarging/reducing; 3. Rotation; 4. Translation; 5. "Bending”. Among them, the operations of 1, 2, and 3 are determined by Complete, the operation of 4 is completed by +b, and the operation of 5 is realized by a().
  • W is the weight vector, and each value in the vector represents the weight value of a neuron in the neural network of this layer.
  • the vector W determines the space transformation from the input space to the output space described above, that is, the weight W of each layer controls how the space is transformed.
  • the purpose of training the neural network is to finally obtain the weight matrix of all layers of the trained neural network (the weight matrix formed by the vectors W of many layers). Therefore, the training process of the neural network is essentially learning the way to control the spatial transformation, and more specifically, learning the weight matrix.
  • the neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller.
  • BP error back propagation
  • the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges.
  • the back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
  • Natural language is human language
  • natural language processing is the processing of human language.
  • Natural language processing is the process of systematically analyzing, understanding, and extracting information from text data in an intelligent and efficient manner.
  • MT machine translation
  • NER named entity recognition
  • RE relation extraction
  • IE information extraction
  • sentiment analysis sentiment analysis
  • speech recognition speech recognition
  • question answering question answering
  • natural language processing tasks can fall into the following categories.
  • Sequence tagging Each word in a sentence requires the model to give a categorical category based on the context. Such as Chinese word segmentation, part-of-speech tagging, named entity recognition, semantic role tagging.
  • Classification tasks output a classification value for the entire sentence, such as text classification.
  • Sentence relationship inference Given two sentences, determine whether the two sentences have a nominal relationship. Such as endilment, QA, semantic rewriting, natural language inference.
  • Generative task output a piece of text, generate another piece of text.
  • Word segmentation (word segmentation or word breaker, WB): Divide continuous natural language data into lexical sequences with semantic rationality and integrity, which can solve the problem of cross ambiguity. For example: to graduates and students who have not yet graduated; participle 1: to graduates and students who have not yet graduated; participle 2: to graduates and students who have not yet graduated.
  • NER Named Entity Recognition
  • Part-speech tagging assign a part of speech (noun, verb, adjective, etc.) to each word in natural language data; dependency parsing: automatically analyze the syntactic components (subject, predicate, object, attributive, adverbial and complement, etc.), can solve the problem of structural ambiguity. Comment: You can enjoy the sunrise in the room; Ambiguity 1: The room is okay; Ambiguity 2: You can enjoy the sunrise; Part of speech: In the room (subject), you can (predicate), enjoy the sunrise (verb-object phrase).
  • Word embedding&semantic similarity vectorized representation of vocabulary, and the calculation of semantic similarity of vocabulary based on this, which can solve the similarity of vocabulary and language. For example: watermelon and (dumb melon/strawberry), which is closer?
  • Vectorized representation watermelon (0.1222, 0.22333,..); similarity calculation: dummy (0.115) strawberry (0.325); vectorized representation: (-0.333, 0.1223..) (0.333, 0.3333,..).
  • Text semantic similarity Relying on the massive data of the whole network and deep neural network technology, it can realize the ability to calculate the semantic similarity between texts, and can solve the problem of text semantic similarity. For example: how to prevent the license plate from the front of the car and (how to install the front license plate/how to apply for the Beijing license plate), which is closer?
  • Vectorized representation how to prevent the front of the car from the license plate (0.1222,0.22333,..); similarity calculation: how to install the front license plate (0.762), how to apply for the Beijing license plate (0.486), vectorized representation: (-0.333,0.1223..)( 0.333, 0.3333, .. ).
  • the language model is the basic model in NPL. It is trained and learned through a large amount of corpus, so that LM can infer the probability of unknown words based on existing information (such as text information such as words that have appeared in the context). It can also be understood that LM is used A probabilistic model to compute the probability of a sentence.
  • a language model is a probability distribution over sequences of natural language data that characterize the likelihood that a certain sequence of text of a certain length exists. In short, the language model is to predict what the next word is based on the context. Since there is no need to manually label the corpus, the language model can learn rich semantic knowledge from the unlimited large-scale corpus.
  • Large-scale pre-training language models also known as large-scale language pre-training models, generally refer to the use of large-scale corpora (such as sentences, paragraphs and other language training materials) to design language model training tasks and train large-scale neural network algorithms
  • the resulting large-scale neural network algorithm structure is a large-scale pre-training language model, and other tasks can be performed on the basis of this model for feature extraction or task fine-tuning to achieve specific tasks.
  • the idea of pre-training is to first train a task to obtain a set of model parameters, then use the set of model parameters to initialize the network model parameters, and then use the initialized network model to train other tasks to obtain models suitable for other tasks. .
  • the neural language representation model can learn powerful language representation capabilities, and can extract rich syntactic and semantic information from texts.
  • a large-scale pre-trained language model can provide tokens containing rich semantic information and sentence-level features for downstream tasks, or directly perform fine-tune on the pre-trained model for downstream tasks to quickly and easily obtain downstream exclusive models. .
  • Knowledge graph aims to describe various entities or concepts and their relationships in the real world, which constitute a huge semantic network graph.
  • Nodes represent entities or concepts, and edges are composed of attributes or relationships.
  • edges are composed of attributes or relationships.
  • relationships to describe the relationship between two entities, such as the relationship between Beijing and China; for the attributes of an entity, we use "attribute-value pairs" to describe its internal characteristics, such as a certain character, his There are age, height, weight attributes, etc.
  • the knowledge graph has been used to refer to various large-scale knowledge bases.
  • Entity refers to something that is distinguishable and exists independently. Such as a person, a city, a certain plant, a certain commodity, etc. Everything in the world consists of concrete things, which refer to entities, such as "China”, “America”, “Japan”, etc. Entities are the most basic elements in the knowledge graph, and different entities have different relationships.
  • Semantic class A collection of entities with the same characteristics, such as countries, nations, books, computers, etc. Concepts mainly refer to collections, categories, object types, and types of things, such as people, geography, etc.
  • Property (value) (property): The value of a property pointing to it from an entity. Different attribute types correspond to edges of different types of attributes.
  • the attribute value mainly refers to the value of the specified attribute of the object. For example “area”, “population”, “capital” are several different attributes of the entity “China”.
  • the attribute value mainly refers to the value of the specified attribute of the object. For example, the value of the "Area” attribute specified by "China” is "9.6 million square kilometers”.
  • Relation Formalized as a function that maps k points to a boolean value.
  • a relationship is a function that maps k graph nodes (entities, semantic classes, attribute values) to Boolean values.
  • triples are a general representation of knowledge graphs based on triples.
  • the basic forms of triples mainly include (entity 1-relation-entity 2) and (entity-attribute-attribute value).
  • entity the extension of the concept
  • AVP attribute-value pair
  • AVP attribute-value pair
  • China is an entity
  • Beijing is an entity
  • (China-Capital-Beijing) is an example of a (entity-relation-entity) triplet
  • Beijing is an entity
  • area is an attribute
  • 20.693 million is an attribute value
  • Beijing-population-20.693 million constitutes an example of a triplet of (entity-attribute-attribute value).
  • the difference between an attribute and a relationship is that the two entities corresponding to the triplet where the attribute is located are mostly one entity and a string, while the two entities corresponding to the triplet where the relationship is located are mostly two entities.
  • the attribute value in the triplet where the attribute is located is also regarded as an entity, and the attribute is regarded as a connection between two entities.
  • Represented knowledge is used to indicate a relationship between two entities, where a relationship between two entities can be a relationship between two entities (eg (entity1-relationship-entity2)), or between two entities
  • An association can be an attribute of one of the entities and an attribute value of that attribute in the other entity (eg (entity-attribute-attribute-value)).
  • the knowledge represented by triples in the embodiments of the present application may also be referred to as structured knowledge.
  • the representation form of the triple is not limited to the above-mentioned (entity 1-relation-entity 2) and (entity-attribute-attribute value) forms, for example, it can also be expressed as (entity 1-entity 2-relationship) and ( entity-attribute-value-attribute) etc.
  • attributes can also be viewed as a generalized relationship.
  • the text processing method of the present application can be used to perform natural language processing tasks on natural language data sequences, wherein corresponding to different natural language processing tasks (that is, the target tasks in this application), the target processing for processing natural language data sequences Models are different.
  • the method provided by the present application will be described below from the training side of the neural network and the application side of the neural network.
  • the neural network training method provided in the embodiment of the present application involves the processing of natural language data, and can be specifically applied to data processing methods such as data training, machine learning, and deep learning. knowledge data) to carry out symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc., and finally obtain a trained target processing model; and, the text processing method provided in the embodiment of the present application can use the above trained target processing model.
  • Target processing model Input data (such as text to be processed in this application) into the trained target processing model to obtain output data (such as processing results corresponding to the target task in this application).
  • training method of the target processing model and the text processing method provided by the embodiments of this application are inventions based on the same idea, and can also be understood as two parts in a system, or two parts of an overall process Stages: such as model training stage and model application stage.
  • FIG. 4 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • the data processing method provided by the embodiment of the present application includes:
  • the training device may acquire training data and corresponding category labels.
  • the training data may include at least one of the following: natural language data, knowledge graph data, genetic data or image data.
  • the category labeling is related to the type of task to be implemented by the neural network with training. For example, for the neural network to perform text classification, its category is marked as the category of training data, and for the neural network to be used for semantic recognition, its category is marked as Semantics of training data, etc.
  • natural language data includes multiple words, and there will be upper and lower relationships between words.
  • natural language data can be understood as a data with a tree-like hierarchical structure.
  • the neural network includes a feature extraction network and a classification network; the feature extraction network is configured to extract the feature vector of the training data, so The classification network is configured to process the feature vector based on the operation rule of the hyperbolic space to obtain the processing result.
  • the neural network to be trained may include a feature extraction network, and the feature extraction network is configured to extract the feature vector expressed by the training data in the hyperbolic space, and then transmit the obtained feature vector to the classification network.
  • FIG. 5 is a schematic structural diagram of a feature extraction network provided by an embodiment of the application.
  • the feature extraction network may include: a first processing layer and a second processing layer; wherein the first processing layer is configured to process the training data to obtain an embedding vector corresponding to the training data; the second processing layer is configured to calculate the embedding vector in the hyperbolic the geometric center in space to obtain the eigenvectors.
  • the first processing layer may be an input layer, which is configured to process the training data to obtain an embedding vector corresponding to the training data, and after obtaining the embedding vector, the second processing layer may calculate and obtain the embedding vector on the hyperbolic space.
  • Geometric center eigenvector
  • the feature vector can be extracted by using the geometric mean extraction method of hyperbolic space.
  • the embedded vector output by the first processing layer can be processed by the second processing layer.
  • the data in the hyperbolic space can be represented based on a first conformal model and based on a second conformal model, the first conformal model representing the hyperbolic space through a first conformal mapping mapping) to the Euclidean space, and the second conformal model represents that the hyperbolic space is mapped to the Euclidean space by means of the second conformal mapping; wherein, the embedding vector is expressed based on the first conformal model;
  • the feature extraction network further includes: a conformal transformation layer; the conformal transformation layer is configured to convert the embedding vector obtained by the first processing layer into a vector expressed based on the second conformal model, and convert the embedding vector obtained by the first processing layer into a vector expressed by the second conformal model.
  • the vector expressed based on the second conformal model is input to the second processing layer; correspondingly, the second processing layer is configured to calculate the geometry of the vector expressed based on the second conformal model center to obtain the feature vector; the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and convert the A vector expressed based on the first conformal model is input to the classification network.
  • the first conformal model and the second conformal model may be Poincare Model, Hyperboloid Model or Klein Model.
  • the conformal model is used to describe the hyperbolic space, and defines a series of vector algebraic transformations and geometric constraints of the hyperbolic gyro vector space. Different conformal models have different properties.
  • the first processing layer needs to be The output embedding vector is converted into an embedding vector expressed based on the Klein Model, and based on the embedding vector expressed by the Klein Model, its geometric center is calculated using the Einstein midpoint to obtain the feature vector, and the feature vector at this time is expressed based on the Klein Model.
  • the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and convert the vector expressed based on the first conformal model Input to the classification network, at this time, the conformal conversion layer can convert the feature vector expressed based on the Klein Model to the vector expressed based on the Poincare Model, and convert the feature vector expressed based on the first conformal model. vector input to the classification network.
  • the calculation method of Einstein midpoint can be:
  • the classification network may include a plurality of neural units, and each neural unit is configured to process input data based on an activation function, wherein the activation function is configured to be expressed based on an arithmetic rule in a hyperbolic space.
  • the hyperbolic space-based operation rule includes at least one of the following: Mobius matrix multiplication and Mobius addition.
  • *Mobius means Mobius matrix multiplication
  • +Mobius means Mobius addition.
  • the mathematical definitions of Mobius matrix multiplication and Mobius addition differ in different conformal models. Exemplarily, the mathematical definitions of Mobius matrix multiplication and Mobius addition on the Poincare Model can be:
  • FIG. 6 is a schematic structural diagram of a classification network provided by an embodiment of the present application.
  • the classification network may be configured as an operation based on a hyperbolic space
  • the eigenvectors are processed regularly to obtain the to-be-normalized vector expressed in the hyperbolic space, and the to-be-normalized vector is mapped to the Euclidean space, and the to-be-normalized vector mapped to the Euclidean space
  • a normalization process is performed to obtain the process result.
  • the output of the classification layer can be converted from hyperbolic space to Euclidean space, which is consistent with the subsequent objective loss function, and the mathematics of the conversion can be defined as
  • a loss may be obtained through a target loss function based on the category label and the processing result, where the target loss function is a function expressed in the Euclidean space, and the corresponding loss is calculated.
  • the gradient of wherein the gradient is expressed in Euclidean space; the gradient is converted into a gradient expressed in hyperbolic space; and the neural network is updated based on the gradient expressed in hyperbolic space.
  • FIG. 7 is a schematic structural diagram of a Riemann optimizer provided by an embodiment of the application.
  • the Euclidean spatial gradient is first calculated and then mathematically converted into a Riemann gradient (that is, in the hyperbolic The gradient expressed in space), and then the Riemann gradient is shrunk to the conformal model, and the parallel movement is performed according to the Riemann gradient (ie, the weights in the neural network are updated) to obtain the updated neural network.
  • the feature extraction network in the neural network may be updated based on the gradient to obtain an updated feature extraction network, and the updated feature extraction network is configured to extract the training data in the Eigenvectors expressed in the hyperbolic space.
  • the updated neural network includes an updated feature extraction network configured to extract feature vectors expressed by the training data in hyperbolic space.
  • An embodiment of the present application provides a data processing method, the method includes: acquiring training data and corresponding category labels; using a neural network to process the training data to output a processing result; wherein the neural network includes features an extraction network and a classification network; the feature extraction network is configured to extract a feature vector of the training data, and the classification network is configured to process the feature vector based on the operation rule of the hyperbolic space to obtain the processing result; Based on the class annotation and the processing result, a loss is obtained, and based on the loss, a gradient expressed in the hyperbolic space is obtained, and the neural network is updated based on the gradient to obtain an updated neural network.
  • the hyperbolic space Due to the characteristics of the hyperbolic space itself, expressing the feature vector in the hyperbolic space can enhance the fitting ability of the neural network model and improve the data processing accuracy of the model on the data set containing the tree-like hierarchical structure, for example, it can improve the text classification accuracy. Moreover, the neural network model constructed based on hyperbolic space will greatly reduce the number of model parameters while improving the fitting ability of the model.
  • the Poincare Model can be used as the hyperbolic conformal model.
  • the schematic diagram of the embodiment can be shown in Figure 8.
  • the input text is retrieved based on the Embedding embedding vector of the Poincare Model, and the text feature vector set is obtained and converted into the Klein Model
  • the hyperbolic geometric mean is calculated by Einstein midpoint as the text feature vector representation, and the text feature vector representation is restored to the Poincare Model.
  • Mobius Linear in hyperbolic geometry is used as the classification layer combined with the objective function to find the classification surface, the gradient calculation uses the Riemann optimizer, and the feature extraction network and the classification network are updated based on the Riemann optimizer.
  • the Hyperboloid Model can be used as the hyperboloid conformal model.
  • the schematic diagram of the embodiment is shown in Figure 9.
  • the input text corpus retrieves the hyperboloid input Embedding based on the Hyperboloid Model, obtains the feature vector set and converts it to the Klein Model
  • the vector of uses the Einstein midpoint to calculate the hyperbolic geometric mean as the feature vector representation, and restores the text feature vector representation to the Hyperboloid Model.
  • the Mobius Linear classification layer in hyperbolic geometry is used in combination with the objective function to find the classification surface.
  • the gradient calculation uses a Riemann optimizer.
  • FIG. 10 is a schematic flowchart of a data processing method provided by the embodiment of the present application. As shown in FIG. 10 , the Methods include:
  • the training device may acquire the data to be processed and the corresponding category label.
  • the data to be processed may include at least one of the following: natural language data, knowledge graph data, genetic data or image data.
  • Category labeling is related to the type of task the trained neural network is to perform, for example, for a neural network that is to perform text classification.
  • natural language data includes multiple words, and there will be upper and lower relationships between words.
  • natural language data can be understood as a data with a tree-like hierarchical structure.
  • the neural network obtained by training to process the data to be processed to output a processing result; wherein the neural network includes a feature extraction network and a classification network; the feature extraction network is configured to extract the data to be processed A feature vector expressed in a hyperbolic space, the classification network is configured to process the feature vector based on an operation rule of the hyperbolic space to obtain the processing result.
  • the neural network obtained by training may include a feature extraction network, and the feature extraction network is configured to extract the feature vector expressed in the hyperbolic space of the data to be processed, and then transmit the obtained feature vector to the classification network.
  • the data to be processed includes at least one of the following:
  • Natural language data knowledge graph data, genetic data or image data.
  • the classification network includes a plurality of neural units, each of which is configured to process input data based on an activation function, wherein the activation function is configured to express an arithmetic rule based on a hyperbolic space .
  • the feature extraction network includes: a first processing layer and a second processing layer;
  • the first processing layer is configured to process the data to be processed to obtain an embedding vector representing the data to be processed in a hyperbolic space;
  • the second processing layer is configured to calculate the geometric center of the embedding vector on the hyperbolic space to obtain the feature vector.
  • the first processing layer may be an input layer, which is configured to process the data to be processed to obtain an embedding vector corresponding to the data to be processed.
  • the embedded vector can be calculated in the hyperbolic space.
  • the feature vector can be extracted by using the geometric mean extraction method of hyperbolic space.
  • the data in the hyperbolic space can be expressed based on a first conformal model and based on a second conformal model, the first conformal model representing that the hyperbolic space passes through the first conformal
  • the second conformal model is mapped to the Euclidean space by means of conformal mapping
  • the second conformal model represents that the hyperbolic space is mapped to the Euclidean space by means of the second conformal mapping; wherein, the embedding vector is based on the first conformal model.
  • the feature extraction network further includes: a conformal transformation layer;
  • the conformal conversion layer is configured to convert the embedding vector obtained by the first processing layer into a vector expressed based on the second conformal model, and convert the embedded vector expressed based on the second conformal model. vector input to the second processing layer;
  • the second processing layer is configured to calculate the geometric center of the vector expressed based on the second conformal model to obtain the feature vector;
  • the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and to express the feature vector based on the first conformal model.
  • a vector of input to the classification network is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and to express the feature vector based on the first conformal model.
  • the first conformal model and the second conformal model may be a Poincare Model, a Hyperboloid Model, or a Klein Model.
  • the conformal model is used to describe the hyperbolic space, which defines a series of vector algebraic transformations and geometric constraints of the hyperbolic gyro vector space. Different conformal models have different properties.
  • the first processing layer needs to be The output embedding vector is converted into an embedding vector expressed based on the Klein Model, and based on the embedding vector expressed by the Klein Model, its geometric center is calculated using the Einstein midpoint to obtain the feature vector, and the feature vector at this time is expressed based on the Klein Model.
  • the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and convert the vector expressed based on the first conformal model Input to the classification network, at this time, the conformal conversion layer can convert the feature vector expressed based on the Klein Model to the vector expressed based on the Poincare Model, and convert the feature vector expressed based on the first conformal model. vector input to the classification network.
  • the classification network is configured to process the feature vector based on the operation rule of the hyperbolic space to obtain a vector to be normalized expressed in the hyperbolic space;
  • the vector to be normalized is mapped into the Euclidean space, and the vector to be normalized mapped into the Euclidean space is normalized to obtain the processing result.
  • An embodiment of the present application provides a data processing method, including: acquiring data to be processed; and processing the data to be processed by using a neural network obtained by training to output a processing result; wherein the neural network includes feature extraction A network and a classification network; the feature extraction network is configured to extract the feature vector expressed in the hyperbolic space of the data to be processed, and the classification network is configured to process the feature vector based on the operation rule of the hyperbolic space, to The processing results are obtained.
  • the present application can improve the data processing accuracy of the model on the data set containing the tree-like hierarchical structure, and reduce the amount of model parameters.
  • FIG. 11 is a schematic diagram of a data processing device 1100 provided by the embodiment of the present application.
  • a data processing apparatus 1100 provided by an embodiment of the application includes:
  • the processing module 1102 is configured to use the neural network obtained by training to process the data to be processed to output a processing result
  • the neural network includes a feature extraction network and a classification network; the feature extraction network is configured to extract the feature vector expressed in the hyperbolic space of the data to be classified, and the classification network is configured as a hyperbolic space-based
  • the algorithm processes the feature vector to obtain the processing result.
  • the data to be processed includes at least one of the following:
  • Natural language data knowledge graph data, genetic data or image data.
  • the classification network includes a plurality of neural units, each of which is configured to process input data based on an activation function, wherein the activation function includes an arithmetic rule based on the hyperbolic space.
  • the feature extraction network includes: a first processing layer and a second processing layer;
  • the first processing layer is configured to process the data to be processed to obtain an embedding vector representing the data to be processed in the hyperbolic space;
  • the second processing layer is configured to calculate the geometric center of the embedding vector on the hyperbolic space to obtain the feature vector.
  • the embedding vector is expressed based on the first conformal model
  • the feature extraction network further includes: a conformal transformation layer;
  • the conformal conversion layer is configured to convert the embedding vector obtained by the first processing layer into a vector expressed based on a second conformal model, and input the vector expressed based on the second conformal model to the second processing layer;
  • the second processing layer is configured to calculate the geometric center of the vector expressed based on the second conformal model to obtain the feature vector;
  • the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and to express the feature vector based on the first conformal model.
  • the vector of is input to the classification network, wherein the first conformal model represents that the hyperbolic space is mapped to the Euclidean space by means of a first conformal mapping, and the second conformal model represents the The hyperbolic space is mapped to the Euclidean space by means of the second conformal mapping.
  • the embedding vector is expressed based on a second conformal model
  • the second processing layer is configured to calculate the geometric center of the embedding vector expressed based on the second conformal model to obtain the feature vector, wherein the second conformal model represents the hyperbolic space It is mapped to the Euclidean space by means of the second conformal mapping.
  • the classification network is configured to process the feature vector based on an operation rule of the hyperbolic space to obtain a vector to be normalized expressed in the hyperbolic space;
  • the vector to be normalized is mapped into the Euclidean space, and the vector to be normalized mapped into the Euclidean space is normalized to obtain the processing result.
  • An embodiment of the present application provides a data classification device, the device includes: an acquisition module configured to acquire data to be processed; and a processing module configured to use a neural network obtained by training to process the data to be processed , to output the processing result; wherein, the neural network includes a feature extraction network and a classification network; the feature extraction network is configured to extract the feature vector expressed in the hyperbolic space of the data to be classified, and the classification network is configured In order to process the feature vector based on the operation rule of the hyperbolic space to obtain the processing result, the present application can improve the data processing accuracy of the model on the data set containing the tree-like hierarchical structure, and reduce the amount of model parameters.
  • FIG. 12 is a schematic diagram of a data processing apparatus 1200 provided by an embodiment of the present application.
  • a data processing apparatus 1200 provided by an embodiment of the present application includes:
  • the obtaining module 1201 is configured to obtain the data to be processed and the corresponding category label
  • the processing module 1202 is configured to use a neural network to process the data to be processed to output a processing result; wherein, the neural network includes a feature extraction network and a classification network; the feature extraction network is configured to extract the to-be-processed data. processing the feature vector of the data, the classification network is configured to process the feature vector based on the operation rule of the hyperbolic space to obtain the processing result;
  • a model update module 1203, configured to obtain a loss based on the class annotation and the processing result
  • the gradient expressed in the hyperbolic space is obtained, and the neural network is updated based on the gradient to obtain an updated neural network.
  • model update module is configured to:
  • the feature extraction network in the neural network is updated based on the gradient to obtain an updated feature extraction network, and the updated feature extraction network is configured to extract the training data expressed in the hyperbolic space Feature vector.
  • the training data includes at least one of the following:
  • Natural language data knowledge graph data, genetic data or image data.
  • the classification network includes a plurality of neural units, each of which is configured to process input data based on an activation function, wherein the activation function includes an arithmetic rule based on the hyperbolic space.
  • the feature extraction network includes: a first processing layer and a second processing layer;
  • the first processing layer is configured to process the training data to obtain an embedding vector corresponding to the training data
  • the second processing layer is configured to calculate the geometric center of the embedding vector on the hyperbolic space to obtain the feature vector.
  • the embedding vector is expressed based on the first conformal model
  • the feature extraction network further includes: a conformal transformation layer;
  • the conformal conversion layer is configured to convert the embedding vector obtained by the first processing layer into a vector expressed based on a second conformal model, and input the vector expressed based on the second conformal model to the second processing layer;
  • the second processing layer is configured to calculate the geometric center of the vector expressed based on the second conformal model to obtain the feature vector;
  • the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and to express the feature vector based on the first conformal model.
  • the vector of is input to the classification network, wherein the first conformal model represents that the hyperbolic space is mapped to the Euclidean space by means of a first conformal mapping, and the second conformal model represents the The hyperbolic space is mapped to the Euclidean space by means of the second conformal mapping.
  • the data in the hyperbolic space can be expressed based on a second conformal model, and the second conformal model indicates that the hyperbolic space is mapped to the Euclidean space by means of a second conformal mapping; Wherein, the embedding vector is expressed based on the second conformal model;
  • the second processing layer is configured to calculate the geometric center of the embedding vector expressed based on the second conformal model to obtain the feature vector.
  • the classification network is configured to process the feature vector based on an operation rule of the hyperbolic space to obtain a vector to be normalized expressed in the hyperbolic space;
  • the vector to be normalized is mapped into the Euclidean space, and the vector to be normalized mapped into the Euclidean space is normalized to obtain the processing result.
  • the loss obtaining module is configured to obtain a loss based on the category label, the processing result and a target loss function, wherein the target loss function is in the Euclidean space express function.
  • the model updating module is configured to calculate the gradient corresponding to the loss, wherein the gradient is expressed in Euclidean space; convert the gradient to be in the hyperbolic space expressed gradient; updating the neural network based on the expressed gradient in hyperbolic space.
  • An embodiment of the present application provides a data processing device, the device includes: an acquisition module configured to acquire data to be processed and corresponding category labels; a processing module configured to process the data to be processed by using a neural network , to output the processing result; wherein, the neural network includes a feature extraction network and a classification network; the feature extraction network is configured to extract the feature vector of the data to be processed, and the classification network is configured as a hyperbolic space-based The operation rule processes the feature vector to obtain the processing result; the model updating module is configured to obtain a loss based on the category label and the processing result, and update the neural network based on the loss, and obtain the updated neural network.
  • the hyperbolic space Due to the characteristics of the hyperbolic space itself, expressing the feature vector in the hyperbolic space can enhance the fitting ability of the neural network model and improve the data processing accuracy of the model on the data set containing the tree-like hierarchical structure, for example, it can improve the text classification accuracy. Moreover, the neural network model constructed based on hyperbolic space will greatly reduce the number of model parameters while improving the fitting ability of the model.
  • FIG. 13 is a schematic structural diagram of the execution device provided by the embodiment of the present application. Smart wearable devices, servers, etc., are not limited here.
  • the data processing apparatus described in the embodiment corresponding to FIG. 10 may be deployed on the execution device 1300 to implement the data processing function in the embodiment corresponding to FIG. 10 .
  • the execution device 1300 includes: a receiver 1301, a transmitter 1302, a processor 1303, and a memory 1304 (wherein the number of processors 1303 in the execution device 1300 may be one or more, and one processor is taken as an example in FIG. 13 ) , wherein the processor 1303 may include an application processor 13031 and a communication processor 13032 .
  • the receiver 1301, the transmitter 1302, the processor 1303, and the memory 1304 may be connected by a bus or otherwise.
  • Memory 1304 may include read-only memory and random access memory, and provides instructions and data to processor 1303 .
  • a portion of memory 1304 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1304 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1303 controls the operation of the execution device.
  • various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the above embodiments of the present application may be applied to the processor 1303 or implemented by the processor 1303 .
  • the processor 1303 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1303 or an instruction in the form of software.
  • the above-mentioned processor 1303 may be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), a field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • FPGA field programmable Field-programmable gate array
  • the processor 1303 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 1304, and the processor 1303 reads the information in the memory 1304, and completes the steps of the above method in combination with its hardware.
  • the receiver 1301 can be used to receive input numerical or character information, and to generate signal input related to performing the relevant setting and function control of the device.
  • the transmitter 1302 can be used to output digital or character information through the first interface; the transmitter 1302 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1302 can also include a display device such as a display screen .
  • the processor 1303 is configured to execute the data processing method executed by the execution device in the embodiment corresponding to FIG. 4 . Specifically, the processor 1303 may execute the following steps:
  • the neural network includes a feature extraction network and a classification network; the feature extraction network is configured to extract the The feature vector of the data to be processed expressed in the hyperbolic space, the classification network is configured to process the feature vector based on the operation rule of the hyperbolic space to obtain the processing result.
  • the data to be processed includes at least one of the following:
  • Natural language data data knowledge graph data, genetic data or image data.
  • the classification network includes a plurality of neural units, each of which is configured to process input data based on an activation function, wherein the activation function includes an arithmetic rule based on the hyperbolic space.
  • the feature extraction network includes: a first processing layer and a second processing layer; the first processing layer is configured to process the data to be processed an embedding vector represented in the hyperbolic space; the second processing layer is configured to calculate the geometric center of the embedding vector on the hyperbolic space to obtain the feature vector.
  • the first processing layer may be an input layer, which is configured to process the data to be processed to obtain an embedding vector corresponding to the data to be processed.
  • the embedded vector can be calculated in the hyperbolic space.
  • the feature vector can be extracted by using the geometric mean extraction method of hyperbolic space.
  • the embedding vector is expressed based on a first conformal model;
  • the feature extraction network further includes: a conformal transformation layer;
  • the conformal transformation layer is configured to convert the first processing layer
  • the obtained embedding vector is converted into a vector expressed based on the second conformal model, and the vector expressed based on the second conformal model is input to the second processing layer;
  • the second processing layer is configured In order to calculate the geometric center of the vector expressed based on the second conformal model to obtain the feature vector;
  • the conformal transformation layer is further configured to transform the feature vector obtained by the second processing layer is a vector expressed based on the first conformal model, and the vector expressed based on the first conformal model is input to the classification network, wherein the first conformal model represents the hyperbolic space
  • the second conformal model indicates that the hyperbolic space is mapped to the Euclidean space through the second conformal mapping.
  • the embedding vector is expressed based on a second conformal model
  • the second processing layer is configured to calculate the geometric center of the embedding vector expressed based on the second conformal model to obtain the feature vector, wherein the second conformal model represents the hyperbolic space It is mapped to the Euclidean space by means of the second conformal mapping.
  • the classification network is configured to process the feature vector based on an operation rule of the hyperbolic space to obtain a vector to be normalized expressed in the hyperbolic space;
  • the vector to be normalized is mapped into the Euclidean space, and the vector to be normalized mapped into the Euclidean space is normalized to obtain the processing result.
  • the output of the classification layer can be converted from the hyperbolic space to the Euclidean space, which is consistent with the subsequent objective loss function.
  • FIG. 14 is a schematic structural diagram of the training device provided by the embodiment of the present application.
  • the training device 1400 is implemented by one or more servers.
  • the training device 1400 can vary widely by configuration or performance, and can include one or more central processing units (CPUs) 1414 (eg, one or more processors) and memory 1432, one or more storage applications
  • a storage medium 1430 eg, one or more mass storage devices for programs 1442 or data 1444.
  • the memory 1432 and the storage medium 1430 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the training device. Further, the central processing unit 1414 may be configured to communicate with the storage medium 1430 to execute a series of instruction operations in the storage medium 1430 on the training device 1400 .
  • Training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input and output interfaces 1458; or, one or more operating systems 1441, such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1441 such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • the training device may perform the following steps:
  • the neural network includes a feature extraction network and a classification network;
  • the feature extraction network is configured to extract a feature vector of the training data, and the classification network is configured to process the feature vector based on an arithmetic rule of hyperbolic space , to obtain the processing result;
  • the gradient expressed in the hyperbolic space is obtained, and the neural network is updated based on the gradient to obtain an updated neural network.
  • the feature extraction network in the neural network may be updated based on the gradient to obtain an updated feature extraction network, and the updated feature extraction network is configured to extract the training The eigenvectors of the data expressed in the hyperbolic space.
  • the training data includes at least one of the following:
  • Natural language data knowledge graph data, genetic data or image data.
  • the classification network includes a plurality of neural units, each of which is configured to process input data based on an activation function, wherein the activation function includes an arithmetic rule based on the hyperbolic space.
  • the hyperbolic space-based operation rule includes at least one of the following: Mobius matrix multiplication and Mobius addition.
  • the feature extraction network includes: a first processing layer and a second processing layer;
  • the first processing layer is configured to process the training data to obtain an embedding vector corresponding to the training data
  • the second processing layer is configured to calculate the geometric center of the embedding vector on the hyperbolic space to obtain the feature vector.
  • the embedding vector is expressed based on the first conformal model
  • the feature extraction network further includes: a conformal transformation layer;
  • the conformal conversion layer is configured to convert the embedding vector obtained by the first processing layer into a vector expressed based on a second conformal model, and input the vector expressed based on the second conformal model to the second processing layer;
  • the second processing layer is configured to calculate the geometric center of the vector expressed based on the second conformal model to obtain the feature vector;
  • the conformal conversion layer is further configured to convert the feature vector obtained by the second processing layer into a vector expressed based on the first conformal model, and to express the feature vector based on the first conformal model.
  • the vector of is input to the classification network, wherein the first conformal model represents that the hyperbolic space is mapped to the Euclidean space by means of a first conformal mapping, and the second conformal model represents the The hyperbolic space is mapped to the Euclidean space by means of the second conformal mapping.
  • the data in the hyperbolic space can be expressed based on a second conformal model, and the second conformal model indicates that the hyperbolic space is mapped to the Euclidean space by means of a second conformal mapping; Wherein, the embedding vector is expressed based on the second conformal model;
  • the second processing layer is configured to calculate the geometric center of the embedding vector expressed based on the second conformal model to obtain the feature vector.
  • the classification network is configured to process the feature vector based on an operation rule of the hyperbolic space to obtain a vector to be normalized expressed in the hyperbolic space;
  • the vector to be normalized is mapped into the Euclidean space, and the vector to be normalized mapped into the Euclidean space is normalized to obtain the processing result.
  • the obtaining a loss based on the category label and the processing result includes:
  • a loss is obtained based on the class label, the processing result, and a target loss function, wherein the target loss function is a function expressed in the Euclidean space.
  • the updating the neural network based on the loss includes:
  • the neural network is updated based on the gradient expressed in hyperbolic space.
  • Embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to perform the steps performed by the aforementioned execution device, or causes the computer to perform the steps performed by the aforementioned training device.
  • Embodiments of the present application further provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the steps performed by the aforementioned execution device. , or, causing the computer to perform the steps as performed by the aforementioned training device.
  • the execution device, training device, or terminal device provided in this embodiment of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, pins or circuits, etc.
  • the processing unit can execute the computer executable instructions stored in the storage unit, so that the chip in the execution device executes the data processing method described in the above embodiments, or the chip in the training device executes the data processing method described in the above embodiment.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • FIG. 15 is a schematic structural diagram of a chip provided by an embodiment of the application.
  • the chip may be represented as a neural network processor NPU 1500, and the NPU 1500 is mounted as a co-processor to the main CPU (Host CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1503, which is controlled by the controller 1504 to extract the matrix data in the memory and perform multiplication operations.
  • the arithmetic circuit 1503 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 1503 is a two-dimensional systolic array. The arithmetic circuit 1503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1503 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1502 and buffers it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the data of matrix A and matrix B from the input memory 1501 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 1508 .
  • Unified memory 1506 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1505, and the DMAC is transferred to the weight memory 1502.
  • Input data is also moved into unified memory 1506 via the DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 1510, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 1509.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and also for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1506 or the weight data to the weight memory 1502 or the input data to the input memory 1501 .
  • the vector calculation unit 1507 includes a plurality of operation processing units, and further processes the output of the operation circuit 1503, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., if necessary. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
  • the vector computation unit 1507 can store the vector of processed outputs to the unified memory 1506 .
  • the vector calculation unit 1507 may apply a linear function; or a non-linear function to the output of the operation circuit 1503, such as linear interpolation of the feature plane extracted by the convolution layer, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit 1507 generates normalized values, pixel-level summed values, or both.
  • the vector of processed outputs can be used as activation input to the arithmetic circuit 1503, such as for use in subsequent layers in a neural network.
  • the instruction fetch buffer (instruction fetch buffer) 1509 connected to the controller 1504 is used to store the instructions used by the controller 1504;
  • the unified memory 1506, the input memory 1501, the weight memory 1502 and the instruction fetch memory 1509 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above program.
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data Transmission from the center to another website site, computer, training facility or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means.
  • wired eg coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种数据处理方法,应用于人工智能领域,包括:获取待处理数据;以及,利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待处理数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果。本申请可以提升模型在蕴含树状层次结构的数据集上的数据处理的精度,并减少模型参数量。

Description

一种数据处理方法及装置
本申请要求于2020年06月28日提交中国专利局、申请号为202010596738.4、发明名称为“一种数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种数据处理方法及装置。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
结构复杂的深层神经网络虽然比浅层神经网络的拟合能力更强,但在工业界很多要求效率和效果平衡的场景中,架构简洁高效的浅层神经网络是普遍被采用的一个选择。浅层网络的特点是训练和预测速度快,占用资源少。
然而,现有浅层神经网络模型,其输入和中间层都是在欧式空间里构建,模型描述能力和参数分布受欧式几何性质制约。
发明内容
第一方面,本申请提供了一种数据处理方法,所述方法包括:
获取待处理数据。
本申请实施例中,训练设备可以获取待处理数据以及对应的类别标注。其中,待处理数据可以包括如下的至少一种:自然语言数据数据、知识图谱数据,基因数据或图像数据。类别标注和带训练的神经网络要实现的任务类型有关,例如,针对于要进行文本分类的神经网络,其类别标注为待处理数据的类别,针对于要进行语义识别的神经网络,其类别标注为待处理数据的语义,等等。
需要说明的是,树状层级结构在自然语言数据、基因序列和知识图谱等数据类型中是普遍存在的,例如自然语言数据中包括多个词语,词语与词语之间会存在上下位的关系,进而,自然语言数据可以理解为一个具有树状层级结构特征的数据。
利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;
其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待处理数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果。
本申请实施例提供了一种数据处理方法,所述方法包括:获取待处理数据以及对应的类别标注;利用神经网络对所述待处理数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待处理数据的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述 处理结果;基于所述类别标注和所述处理结果,获取损失,并基于所述损失更新所述神经网络,得到更新后的神经网络。由于双曲空间本身的特性,在双曲空间中表达特征向量,可以增强神经网络模型的拟合能力,提升了模型在蕴含树状层次结构的数据集上的数据处理的精度,例如可以提升文本分类的准确率。且,基于双曲空间构建的神经网络模型会在提升模型的拟合能力的同时模型参数量大幅减少。
在一种可选的实现中,所述待处理数据包括如下的至少一种:
自然语言数据数据、知识图谱数据,基因数据或图像数据。
在一种可选的实现中,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
其中,所述激活函数可以被配置为基于双曲空间的运算规则表达。所述基于双曲空间的运算规则,至少包括如下的一种:莫比乌斯Mobius矩阵乘法以及Mobius加法。分类网络可以使用双曲几何上的向量代数变换寻找分类层,如使用Mobius Linear分类层,其形式是O=W*Mobius X+Mobius b,其中O是Mobius Linear的输出参数,W是权重参数,X是输入参数,b是偏置项。*Mobius表示Mobius矩阵乘法,+Mobius表示Mobius加法。
在一种可选的实现中,所述特征提取网络包括:第一处理层和第二处理层;所述第一处理层被配置为处理所述待处理数据,以得到所述待处理数据在所述双曲空间中表示的嵌入向量;所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
第一处理层可以为输入层,其被配置为处理所述待处理数据,以得到所述待处理数据对应的嵌入向量,第二处理层获得嵌入向量后,可以计算得到嵌入向量在双曲空间上的几何中心(特征向量)。具体的,可以使用双曲空间的几何平均值提取方法来提取特征向量。
在一种可选的实现中,所述嵌入向量基于第一共形模型表达;所述特征提取网络还包括:共形转换层;所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
例如,第一共形模型和第二共形模型可以是庞家莱模型Poincare Model,双曲面模型Hyperboloid Model或Klein Model。其中,共形模型用于刻画双曲空间,定义了双曲陀螺向量空间的一系列向量代数变换和几何约束,不同的共形模型性质有所区别。若第一处理层输出的嵌入向量基于庞家莱模型Poincare Model表达,而第二处理层被配置为使用Einstein midpoint计算几何平均值,由于Einstein midpoint依赖于Klein Model,因此,需要将第一处理层输出的嵌入向量转换为基于Klein Model表达的嵌入向量,并基于通过 Klein Model表达的嵌入向量利用Einstein midpoint计算其几何中心,以得到所述特征向量,此时的特征向量基于Klein Model表达。共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,此时,共形转换层可以将基于Klein Model表达的特征向量转换为基于庞家莱模型Poincare Model表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络。
在一种可选的实现中,所述嵌入向量基于第二共形模型表达;
所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量,其中,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
在一种可选的实现中,所述分类网络被配置为基于所述双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
本申请实施例中,分类层的输出可以从双曲空间转换到欧式空间上,与后续的目标损失函数保持一致。
第二方面,本申请提供了一种数据处理方法,所述方法包括:
获取训练数据以及对应的类别标注;
利用神经网络对所述训练数据做处理,以输出处理结果;
其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述训练数据的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果;
基于所述类别标注和所述处理结果,获取损失;以及,
基于所述损失,获取在所述双曲空间中表达的梯度,基于所述梯度更新所述神经网络,得到更新后的神经网络。
在一种可选的实现中,所述基于所述梯度更新所述神经网络,得到更新后的神经网络,包括:
基于所述梯度更新所述神经网络中的所述特征提取网络,得到更新后的特征提取网络,所述更新后的特征提取网络被配置为提取所述训练数据在所述双曲空间中表达的特征向量。
在一种可选的实现中,所述训练数据包括如下的至少一种:
自然语言数据、知识图谱数据,基因数据或图像数据。
在一种可选的实现中,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
在一种可选的实现中,所述基于双曲空间的运算规则,至少包括如下的一种:莫比乌斯Mobius矩阵乘法以及Mobius加法。
在一种可选的实现中,所述特征提取网络包括:第一处理层和第二处理层;
所述第一处理层被配置为处理所述训练数据,以得到所述训练数据对应的嵌入向量;
所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
在一种可选的实现中,所述嵌入向量基于第一共形模型表达;
所述特征提取网络还包括:共形转换层;
所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
相应的,所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
在一种可选的实现中,所述双曲空间中的数据可基于第二共形模型表达,所述第二共形模型表示双曲空间通过第二共形映射的方式映射到欧式空间;其中,所述嵌入向量基于所述第二共形模型表达;
所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量。
在一种可选的实现中,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
在一种可选的实现中,所述基于所述类别标注和所述处理结果,获取损失,包括:
基于所述类别标注、所述处理结果以及目标损失函数,获取损失,其中,所述目标损失函数为在所述欧式空间中表达的函数。
在一种可选的实现中,所述基于所述损失更新所述神经网络,包括:
计算所述损失对应的梯度,其中,所述梯度在欧式空间中表达;
将所述梯度转换为在所述双曲空间中表达的梯度;
基于所述在双曲空间中表达的梯度更新所述神经网络。
第三方面,本申请提供了一种数据分类装置,所述装置包括:
获取模块,被配置为获取待处理数据;以及,
处理模块,被配置为利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;
其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待分类数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果。
在一种可选的实现中,所述待处理数据包括如下的至少一种:
自然语言数据、知识图谱数据,基因数据或图像数据。
在一种可选的实现中,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
在一种可选的实现中,所述特征提取网络包括:第一处理层和第二处理层;
所述第一处理层被配置为处理所述待处理数据,以得到所述待处理数据在所述双曲空间中表示的嵌入向量;
所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
在一种可选的实现中,所述嵌入向量基于第一共形模型表达;
所述特征提取网络还包括:共形转换层;
所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
在一种可选的实现中,所述嵌入向量基于第二共形模型表达;
所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量,其中,所述第二共形模型表示所述双曲空间通过第二共形映射的方 式映射到欧式空间。
在一种可选的实现中,所述分类网络被配置为基于所述双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
第四方面,本申请提供了一种数据处理装置,所述装置包括:
获取模块,被配置为获取训练数据以及对应的类别标注;
处理模块,被配置为利用神经网络对所述训练数据做处理,以输出处理结果;
其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述训练数据的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果;
基于所述类别标注和所述处理结果,获取损失;以及,
模型更新模块,被配置为基于所述损失,获取在所述双曲空间中表达的梯度,基于所述梯度更新所述神经网络,得到更新后的神经网络。
在一种可选的实现中,所述模型更新模块,被配置为:
基于所述梯度更新所述神经网络中的所述特征提取网络,得到更新后的特征提取网络,所述更新后的特征提取网络被配置为提取所述训练数据在所述双曲空间中表达的特征向量。
在一种可选的实现中,所述训练数据包括如下的至少一种:
自然语言数据、知识图谱数据,基因数据或图像数据。
在一种可选的实现中,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
在一种可选的实现中,所述特征提取网络包括:第一处理层和第二处理层;
所述第一处理层被配置为处理所述训练数据,以得到所述训练数据对应的嵌入向量;
所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
在一种可选的实现中,所述嵌入向量基于第一共形模型表达;
所述特征提取网络还包括:共形转换层;
所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
相应的,所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到 欧式空间。
在一种可选的实现中,所述双曲空间中的数据可基于第二共形模型表达,所述第二共形模型表示双曲空间通过第二共形映射的方式映射到欧式空间;其中,所述嵌入向量基于所述第二共形模型表达;
所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量。
在一种可选的实现中,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
在一种可选的实现中,所述损失获取模块,被配置为基于所述类别标注、所述处理结果以及目标损失函数,获取损失,其中,所述目标损失函数为在所述欧式空间中表达的函数。
在一种可选的实现中,所述模型更新模块,被配置为计算所述损失对应的梯度,其中,所述梯度在欧式空间中表达;将所述梯度转换为在所述双曲空间中表达的梯度;基于所述在双曲空间中表达的梯度更新所述神经网络。
本申请公开了一种数据处理方法,包括:获取待处理数据;以及,利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待处理数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果。本申请可以提升模型在蕴含树状层次结构的数据集上的数据处理的精度,并减少模型参数量。
附图说明
图1为人工智能主体框架的一种结构示意图;
图2a示出了一种自然语言处理系统;
图2b示出了另一种自然语言处理系统;
图2c是本申请实施例提供的自然语言处理的相关设备的示意图;
图3是本申请实施例提供的一种系统100架构的示意图;
图4为本申请实施例提供的一种数据处理方法的流程示意图;
图5为本申请实施例提供的一种特征提取网络的结构示意;
图6为本申请实施例提供的一种分类网络的结构示意;
图7为本申请实施例提供的一种黎曼优化器的结构示意;
图8为本申请实施例提供的一种系统的结构示意;
图9为本申请实施例提供的一种系统的结构示意图;
图10为本申请实施例提供的一种数据处理方法的流程示意;
图11为本申请实施例提供的一种数据处理装置的示意;
图12为本申请实施例提供的一种数据处理装置的示意;
图13为本申请实施例提供的执行设备的一种结构示意图;
图14是本申请实施例提供的训练设备一种结构示意图;
图15为本申请实施例提供的芯片的一种结构示意图。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、平安城市等。
接下来介绍几种本申请的应用场景。
图2a示出了一种自然语言处理系统,该自然语言处理系统包括用户设备以及数据处理设备。其中,用户设备包括手机、个人电脑或者信息处理中心等智能终端。用户设备为自然语言数据处理的发起端,作为语言问答或者查询等请求的发起方,通常用户通过用户设备发起请求。
上述数据处理设备可以是云服务器、网络服务器、应用服务器以及管理服务器等具有数据处理功能的设备或服务器。数据处理设备通过交互接口接收来自智能终端的查询语句/语音/文本等问句,再通过存储数据的存储器以及数据处理的处理器环节进行机器学习,深度学习,搜索,推理,决策等方式的语言数据处理。数据处理设备中的存储器可以是一个统称,包括本地存储以及存储历史数据的数据库,数据库可以在数据处理设备上,也可以在其它网络服务器上。
在图2a所示的自然语言处理系统中,用户设备可以接收用户的指令,例如用户设备可以接收用户输入的一段文本,然后向数据处理设备发起请求,使得数据处理设备针对用户设备得到的该一段文本执行自然语言处理应用(例如文本分类、文本推理、命名实体识别、翻译等),从而得到针对该一段文本的对应的自然语言处理应用的处理结果(例如处理结果、推理结果、命名实体识别结果、翻译结果等)。示例性的,用户设备可以接收用户输入的一段中文,然后向数据处理设备发起请求,使得数据处理设备对该一段中文进行实体分类,从而得到针对该一段中文的实体处理结果;示例性的,用户设备可以接收用户输入的一段中文,然后向数据处理设备发起请求,使得数据处理设备将该一段中文翻译成英文,从而得到针对该一段中文的英文译文。
在图2a中,数据处理设备可以执行本申请实施例的数据处理方法。
图2b示出了另一种自然语言处理系统,在图2b中,用户设备直接作为数据处理设备,该用户设备能够直接接收来自用户的输入并直接由用户设备本身的硬件进行处理,具体过程与图2a相似,可参考上面的描述,在此不再赘述。
在图2b所示的自然语言处理系统中,用户设备可以接收用户的指令,例如用户设备可以接收用户输入的一段文本,然后再由用户设备自身针对该一段文本执行自然语言处理应 用(例如文本分类、文本推理、命名实体识别、翻译等),从而得到针对该一段文本的对应的自然语言处理应用的处理结果(例如处理结果、推理结果、命名实体识别结果、翻译结果等)。示例性的,用户设备可以接收用户输入的一段中文,并针对该一段中文进行实体分类,从而得到针对该一段中文的实体处理结果;示例性的,用户设备可以接收用户输入的一段中文,并将该一段中文翻译成英文,从而得到针对该一段中文的英文译文。
在图2b中,用户设备自身就可以执行本申请实施例的数据处理方法。
图2c是本申请实施例提供的自然语言处理的相关设备的示意图。
上述图2a和图2b中的用户设备具体可以是图2c中的本地设备301或者本地设备302,图2a中的数据处理设备具体可以是图2c中的执行设备310,其中,数据存储系统350可以存储执行设备310的待处理数据,数据存储系统350可以集成在执行设备310上,也可以设置在云上或其它网络服务器上。
图2a和图2b中的处理器可以通过神经网络模型或者其它模型(例如,基于支持向量机的模型)进行数据训练/机器学习/深度学习,并利用数据最终训练或者学习得到的模型针对文本序列执行自然语言处理应用(例如文本分类、序列标注、阅读理解、文本生成、文本推理、翻译等),从而得到相应的处理结果。
此外,本申请还可以应用在知识图谱数据处理,基因数据处理,图片分类处理等等。
图3是本申请实施例提供的一种系统100架构的示意图,在图3中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:各个待调度任务、可调用资源以及其他参数。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理(比如进行本申请中神经网络的功能实现)过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果返回给客户设备140,从而提供给用户。
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则,该相应的目标模型/规则即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在图3中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
值得注意的是,图3仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图3中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。如图3所示,可以根据训练设备120训练得到神经网络。
本申请实施例还提供的一种芯片,该芯片包括神经网络处理器NPU 50。该芯片可以被设置在如图3所示的执行设备110中,用以完成计算模块111的计算工作。该芯片也可以被设置在如图3所示的训练设备120中,用以完成训练设备120的训练工作并输出目标模型/规则。
神经网络处理器NPU 40,NPU作为协处理器挂载到主中央处理器(central processing unit,CPU)(host CPU)上,由主CPU分配任务。NPU的核心部分为运算电路403,控制器404控制运算电路403提取存储器(权重存储器或输入存储器)中的数据并进行运算。
在一些实现中,运算电路403内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路403是二维脉动阵列。运算电路403还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路403是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器402中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器401中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)408中。
向量计算单元407可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元407可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。
在一些实现种,向量计算单元能407将经处理的输出的向量存储到统一缓存器406。例如,向量计算单元407可以将非线性函数应用到运算电路403的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元407生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路403的激活输入,例如用于在神经网络中的后续层中的使用。
统一存储器406用于存放输入数据以及输出数据。
权重数据直接通过存储单元访问控制器405(direct memory access controller,DMAC)将外部存储器中的输入数据搬运到输入存储器401和/或统一存储器406、将外部存储器中的权重数据存入权重存储器402,以及将统一存储器406中的数据存入外部存储器。
总线接口单元(bus interface unit,BIU)410,用于通过总线实现主CPU、DMAC和取指存储器409之间进行交互。
与控制器404连接的取指存储器(instruction fetch buffer)409,用于存储控制器404使用的指令;
控制器404,用于调用指存储器409中缓存的指令,实现控制该运算加速器的工作过程。
一般地,统一存储器406,输入存储器401,权重存储器402以及取指存储器409均为片 上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2021101225-appb-000001
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
神经网络中的每一层的工作可以用数学表达式
Figure PCTCN2021101225-appb-000002
来描述:从物理层面神经网络中的每一层的工作可以理解为通过五种对输入空间(输入向量的集合)的操作,完成输入空间到输出空间的变换(即矩阵的行空间到列空间),这五种操作包括:1、升维/降维;2、放大/缩小;3、旋转;4、平移;5、“弯曲”。其中1、2、3的操作由
Figure PCTCN2021101225-appb-000003
完成,4的操作由+b完成,5的操作则由a()来实现。这里之所以用“空间”二字来表述是因为被分类的对象并不是单个事物,而是一类事物,空间是指这类事物所有个体的集合。其中,W是权重向量,该向量中的每一个值表示该层神经网络中的一个神经元的权重值。该向量W决定着上文所述的输入空间到输出空间的空间变换,即每一层的权重W控制着如何变换空间。训练神经网络的目的,也就是最终得到训练好的神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。因此,神经网络的训练过程本质上就是学习控制空间变换的方式,更具体的就是学习权重矩阵。
因为希望神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到神经网络能够预测出真正想要的目标值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么神经网络的训练就变成了尽可能缩小这个loss的过程。
(2)反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的 神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
(3)自然语言处理(natural language processing,NLP)
自然语言(natural language)即人类语言,自然语言处理(NLP)就是对人类语言的处理。自然语言处理是以一种智能与高效的方式,对文本数据进行系统化分析、理解与信息提取的过程。通过使用NLP及其组件,我们可以管理非常大块的文本数据,或者执行大量的自动化任务,并且解决各式各样的问题,如自动摘要(automatic summarization),机器翻译(m achine translation,MT),命名实体识别(named entity recognition,NER),关系提取(relatio n extraction,RE),信息抽取(information extraction,IE),情感分析,语音识别(speech rec ognition),问答系统(question answering)以及主题分割等等。
示例性的,自然语言处理任务可以有以下几类。
序列标注:句子中每一个单词要求模型根据上下文给出一个分类类别。如中文分词、词性标注、命名实体识别、语义角色标注。
分类任务:整个句子输出一个分类值,如文本分类。
句子关系推断:给定两个句子,判断这两个句子是否具备某种名义关系。例如entilme nt、QA、语义改写、自然语言推断。
生成式任务:输出一段文本,生成另一段文本。如机器翻译、文本摘要、写诗造句、看图说话。
下面示例性的列举一些自然语言处理案例。
分词(word segmentation或word breaker,WB):将连续的自然语言数据,切分成具有语义合理性和完整性的词汇序列,可以解决交叉歧义问题。例句:致毕业和尚未毕业的同学;分词1:致毕业和尚未毕业的同学;分词2:致毕业和尚未毕业的同学。
命名实体识别(named entity recognition,NER):识别自然语言数据中具有特定意义的实体(人、地、机构、时间、作品等),可以从粒度整合未登录体词。例句:天使爱美丽在线观看;分词:天使爱美丽在线观看;实体:天使爱美丽->电影。
词性标注(part-speech tagging):为自然语言数据中的每个词汇赋予一个词性(名词、动词、形容词等);依存句法分析(dependency parsing):自动分析句子中的句法成分(主语、谓语、宾语、定语、状语和补语等成分),可以解决结构歧义问题。评论:房间里还可以欣赏日出;歧义1:房间还可以;歧义2:可以欣赏日出;词性:房间里(主语),还可以(谓语),欣赏日出(动宾短语)。
词向量与语义相似度(word embedding&semantic similarity):对词汇进行向量化表示,并据此实现词汇的语义相似度计算,可以解决词汇语言相似度。例如:西瓜与(呆瓜/草莓),哪个更接近?向量化表示:西瓜(0.1222,0.22333,..);相似度计算:呆瓜(0.115)草莓(0.325);向量化表示:(-0.333,0.1223..)(0.333,0.3333,..)。
文本语义相似度(text semantic similarity):依托全网海量数据和深度神经网络技术,实 现文本间的语义相似度计算的能力,可以解决文本语义相似度问题。例如:车头如何防止车牌与(前牌照怎么装/如何办理北京牌照),哪个更接近?向量化表示:车头如何防止车牌(0.1222,0.22333,..);相似度计算:前牌照怎么装(0.762),如何办理北京牌照(0.486),向量化表示:(-0.333,0.1223..)(0.333,0.3333,..)。
(4)语言模型(language model,LM)
语言模型是NPL中的基础模型,通过大量语料训练学习,使得LM能够根据已有的信息(例如上下文中已经出现过的词等文本信息)来推测未知词的概率,也可以理解为LM是用来计算一个句子的概率的概率模型。换句话说,语言模型是自然语言数据序列的概率分布,表征特定长度特定序列文本存在的可能性。简而言之,语言模型即是根据上下文去预测下一个词是什么,由于不需要人工标注语料,因此语言模型能够从无限制的大规模语料中学习到丰富的语义知识。
(5)大规模预训练语言模型(large scale pretrained language model)
大规模预训练语言模型,也可称为大规模语言预训练模型,一般是指使用大规模的语料(例如句子,段落等语言训练素材),设计语言模型训练任务,训练大规模的神经网络算法结构来学习实现,最终得到的大规模神经网络算法结构就是大规模预训练语言模型,后续有其它任务可以在该模型的基础上进行特征抽取或者任务微调来实现特定任务目的。预训练的思想就是先对一个任务进行训练得到一套模型参数,然后利用该套模型参数对网络模型参数进行初始化,再利用经初始化的网络模型对其他任务进行训练,得到其他任务适配的模型。通过在大规模的语料上进行预训练,神经语言表示模型可以学习到强大语言表示能力,能够从文本中抽取出丰富的句法、语义信息。大规模预训练语言模型可以提供包含丰富语义信息的token以及句子级的特征供下游任务使用,或者直接在预训练模型上进行针对下游任务的微调(fine-tune),方便快捷地得到下游专属模型。
(6)知识图谱(knowledge graph)
知识图谱旨在描述真实世界中存在的各种实体或概念及其关系,其构成一张巨大的语义网络图,节点表示实体或概念,边则由属性或关系构成。我们用关系去描述两个实体之间的关联,例如北京和中国之间的关系;对于实体的属性,我们就用“属性—值对”来刻画它的内在特性,比如说某个人物,他有年龄、身高、体重属性等。现在的知识图谱已被用来泛指各种大规模的知识库(knowledge base)。
实体:指的是具有可区别性且独立存在的某种事物。如某一个人、某一个城市、某一种植物等、某一种商品等等。世界万物由具体事物组成,此指实体,例如“中国”、“美国”、“日本”等。实体是知识图谱中的最基本元素,不同的实体间存在不同的关系。
语义类(概念):具有同种特性的实体构成的集合,如国家、民族、书籍、电脑等。概念主要指集合、类别、对象类型、事物的种类,例如人物、地理等。
内容:通常作为实体和语义类的名字、描述、解释等,可以由文本、图像、音视频等来表达。
属性(值)(property):从一个实体指向它的属性值。不同的属性类型对应于不同类型属性的边。属性值主要指对象指定属性的值。例如“面积”、“人口”、“首都”是“中国”这一实体的几种 不同的属性。属性值主要指对象指定属性的值,例如“中国”指定的“面积”属性的值为“960万平方公里”。
关系(relation):形式化为一个函数,它把kk个点映射到一个布尔值。在知识图谱上,关系则是一个把kk个图节点(实体、语义类、属性值)映射到布尔值的函数。
基于上述定义,为了方便计算机的处理和理解,可以用更加形式化、简洁化的方式表示知识,即三元组(triple),基于三元组是知识图谱的一种通用表示方式。三元组的基本形式主要包括(实体1-关系-实体2)和(实体-属性-属性值)等。每个实体(概念的外延)可用一个全局唯一确定的ID来标识,每个属性-属性值对(attribute-value pair,AVP)可用来刻画实体的内在特性,而关系可用来连接两个实体,刻画它们之间的关联。例如,中国是一个实体,北京是一个实体,(中国-首都-北京)是一个(实体-关系-实体)的三元组样例,北京是一个实体,面积是一种属性,2069.3万是属性值,(北京-人口-2069.3万)构成一个(实体-属性-属性值)的三元组样例。属性和关系的区别在于,属性所在的三元组对应的两个实体多为一个实体和一个字符串,而关系所在的三元组所对应的两个实体多为两个实体,本申请实施例中,为方便理解和描述,对于属性所在的三元组中属性值也视为一个实体,属性视为两个实体之间的一种联系,换句话说,本申请实施例中基于三元组表示的知识用于指示两个实体之间的联系,其中两个实体之间的联系可以是两个实体之间的关系(例如(实体1-关系-实体2)),或者两个实体之间的联系可以是其中一个实体的一种属性,而另一个实体为该属性的属性值(例如(实体-属性-属性值))。本申请实施例中基于三元组表示的知识在也可以称为结构化知识。还应理解,三元组的表示形式不限于上述(实体1-关系-实体2)和(实体-属性-属性值)的形式,例如还可以表示为(实体1-实体2-关系)和(实体-属性值-属性)等。在一些实施例中,属性也可以视为一种广义的关系。
本申请的文本处理方法可用于对自然语言数据序列执行自然语言处理任务,其中对应于不同的自然语言处理任务(即本申请中的目标任务),用于对自然语言数据序列进行处理的目标处理模型是不同的。下面从神经网络的训练侧和神经网络的应用侧对本申请提供的方法进行描述。
本申请实施例提供的神经网络的训练方法,涉及自然语言数据的处理,具体可以应用于数据训练、机器学习、深度学习等数据处理方法,对训练数据(如本申请中的训练文本和第一知识数据)进行符号化和形式化的智能信息建模、抽取、预处理、训练等,最终得到训练好的目标处理模型;并且,本申请实施例提供的文本处理的方法可以运用上述训练好的目标处理模型,将输入数据(如本申请中待处理文本)输入到所述训练好的目标处理模型中,得到输出数据(如本申请中与目标任务对应的处理结果)。需要说明的是,本申请实施例提供的目标处理模型的训练方法和文本处理的方法是基于同一个构思产生的发明,也可以理解为一个系统中的两个部分,或一个整体流程的两个阶段:如模型训练阶段和模型应用阶段。
参见图4,图4为本申请实施例提供的一种数据处理方法的流程示意图。如图4所示,本申请实施例提供的数据处理方法,包括:
401、获取训练数据以及对应的类别标注。
本申请实施例中,训练设备可以获取训练数据以及对应的类别标注。其中,训练数据 可以包括如下的至少一种:自然语言数据、知识图谱数据,基因数据或图像数据。类别标注和带训练的神经网络要实现的任务类型有关,例如,针对于要进行文本分类的神经网络,其类别标注为训练数据的类别,针对于要进行语义识别的神经网络,其类别标注为训练数据的语义,等等。
需要说明的是,树状层级结构在自然语言数据、基因序列和知识图谱等数据类型中是普遍存在的,例如自然语言数据中包括多个词语,词语与词语之间会存在上下位的关系,进而,自然语言数据可以理解为一个具有树状层级结构特征的数据。
402、利用神经网络对所述训练数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述训练数据的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果。
本申请实施例中,待训练的神经网络可以包括特征提取网络,特征提取网络被配置为提取所述训练数据在双曲空间中表达的特征向量,然后将获得的特征向量传递到分类网络。
参照图5,图5为本申请实施例提供的一种特征提取网络的结构示意,如图5中示出的那样,在一种实现中,所述特征提取网络可以包括:第一处理层和第二处理层;其中,所述第一处理层被配置为处理所述训练数据,以得到所述训练数据对应的嵌入向量;所述第二处理层被配置为计算所述嵌入向量在双曲空间上的几何中心,以得到所述特征向量。
第一处理层可以为输入层,其被配置为处理所述训练数据,以得到所述训练数据对应的嵌入向量,第二处理层获得嵌入向量后,可以计算得到嵌入向量在双曲空间上的几何中心(特征向量)。具体的,可以使用双曲空间的几何平均值提取方法来提取特征向量。
如图5中示出的那样,第一处理层输出的嵌入向量可以通过第二处理层进行数据处理。
在一种实现中,所述双曲空间中的数据可基于第一共形模型表达以及基于第二共形模型表达,所述第一共形模型表示双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示双曲空间通过第二共形映射的方式映射到欧式空间;其中,所述嵌入向量基于所述第一共形模型表达;所述特征提取网络还包括:共形转换层;所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于所述第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;相应的,所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络。
例如,第一共形模型和第二共形模型可以是庞家莱模型Poincare Model,双曲面模型Hyperboloid Model或Klein Model。其中,共形模型用于刻画双曲空间,定义了双曲陀螺向量空间的一系列向量代数变换和几何约束,不同的共形模型性质有所区别。若第一处理层输出的嵌入向量基于庞家莱模型Poincare Model表达,而第二处理层被配置为使用Einstein midpoint计算几何平均值,由于Einstein midpoint依赖于Klein Model,因此,需要将第一处理层输出的嵌入向量转换为基于Klein Model表达的嵌入向量,并基于通过Klein Model表达 的嵌入向量利用Einstein midpoint计算其几何中心,以得到所述特征向量,此时的特征向量基于Klein Model表达。共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,此时,共形转换层可以将基于Klein Model表达的特征向量转换为基于庞家莱模型Poincare Model表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络。示例性的,Einstein midpoint的计算方法可以为:
Figure PCTCN2021101225-appb-000004
其中,
Figure PCTCN2021101225-appb-000005
为洛伦兹因子,P表示Einstein midpoint计算出的特征表示。
本申请实施例中,所述分类网络可以包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数被配置为基于双曲空间的运算规则表达。所述基于双曲空间的运算规则,至少包括如下的一种:莫比乌斯Mobius矩阵乘法以及Mobius加法。
分类网络可以使用双曲几何上的向量代数变换寻找分类层,如使用Mobius Linear分类层,其形式是O=W*Mobius X+Mobius b,其中O是Mobius Linear的输出参数,W是权重参数,X是输入参数,b是偏置项。*Mobius表示Mobius矩阵乘法,+Mobius表示Mobius加法。Mobius矩阵乘和Mobius加法的数学定义在不同的共形模型上有所区别。示例性的,Poincare Model上Mobius矩阵乘和Mobius加法的数学定义可以为:
Figure PCTCN2021101225-appb-000006
其中,c为曲率。
参照图6,图6为本申请实施例提供的一种分类网络的结构示意,如图6中示出的那样,本申请实施例中,所述分类网络可以被配置为基于双曲空间的运算规则处理所述特征向量,以得到在双曲空间中表达的待归一化向量,并将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
本申请实施例中,分类层的输出可以从双曲空间转换到欧式空间上,与后续的目标损失函数保持一致,转换的数学可以定义为
Figure PCTCN2021101225-appb-000007
403、基于所述类别标注和所述处理结果,获取损失;以及基于所述损失,获取在双曲空间中表达的梯度,基于所述梯度更新所述神经网络,得到更新后的神经网络。
本申请实施例中,可以基于所述类别标注和所述处理结果,通过目标损失函数,获取损失,其中,所述目标损失函数为在所述欧式空间中表达的函数,并计算所述损失对应的梯度,其中,所述梯度在欧式空间中表达;将所述梯度转换为在双曲空间中表达的梯度; 基于所述在双曲空间中表达的梯度更新所述神经网络。
参照图7,图7为本申请实施例提供的一种黎曼优化器的结构示意,如图7中示出的那样,首先计算欧式空间梯度然后经过数学转换为黎曼梯度(即在双曲空间中表达的梯度),之后将黎曼梯度收缩到共形模型上,根据黎曼梯度进行平行移动(即更新所述神经网络中的权重),以得到更新后的所述神经网络。
本申请实施例中,可以基于所述梯度更新所述神经网络中的所述特征提取网络,得到更新后的特征提取网络,所述更新后的特征提取网络被配置为提取所述训练数据在所述双曲空间中表达的特征向量。
所述更新后的神经网络包括更新后的特征提取网络,所述更新后的特征提取网络被配置为提取所述训练数据在双曲空间中表达的特征向量。
本申请实施例提供了一种数据处理方法,所述方法包括:获取训练数据以及对应的类别标注;利用神经网络对所述训练数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述训练数据的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果;基于所述类别标注和所述处理结果,获取损失,并基于所述损失,获取在双曲空间中表达的梯度,基于所述梯度更新所述神经网络,得到更新后的神经网络。由于双曲空间本身的特性,在双曲空间中表达特征向量,可以增强神经网络模型的拟合能力,提升了模型在蕴含树状层次结构的数据集上的数据处理的精度,例如可以提升文本分类的准确率。且,基于双曲空间构建的神经网络模型会在提升模型的拟合能力的同时模型参数量大幅减少。
以训练数据为自然语言数据为例,接下来给出一种相比于图4包括更多细节的实施例。
在一种实现中,可以使用Poincare Model作为双曲共形模型,实施例示意图可以如图8所示,输入的文本检索基于Poincare Model的Embedding嵌入向量,拿到文本特征向量集合后转换为Klein Model上的向量,然后通过Einstein midpoint计算双曲几何平均值作为文本特征向量表示,将文本特征向量表示恢复到Poincare Model上。再使用双曲几何中的Mobius Linear作为分类层结合目标函数寻找分类面,梯度计算使用黎曼优化器,并基于黎曼优化器对特征提取网络和分类网络进行模型更新。
在一种实现中,可以使用Hyperboloid Model作为双曲共形模型,实施例示意图如图9所示,输入文本语料检索基于Hyperboloid Model的双曲输入Embedding,拿到特征向量集合后转换为Klein Model上的向量,使用Einstein midpoint计算双曲几何平均值作为本特征向量表示,将文本特征向量表示恢复到Hyperboloid Model上。转换回Hyperboloid Model后再使用双曲几何中的Mobius Linear分类层结合目标函数寻找分类面。梯度计算使用黎曼优化器。
接下来从应用侧对本申请实施例提供的数据处理方法进行描述,参照图10,图10为本申请实施例提供的一种数据处理方法的流程示意,如图10中示出的那样,所述方法包括:
1001、获取待处理数据。
本申请实施例中,训练设备可以获取待处理数据以及对应的类别标注。其中,待处理数据可以包括如下的至少一种:自然语言数据、知识图谱数据,基因数据或图像数据。类 别标注和带训练的神经网络要实现的任务类型有关,例如,针对于要进行文本分类的神经网络。
需要说明的是,树状层级结构在自然语言数据、基因序列和知识图谱等数据类型中是普遍存在的,例如自然语言数据中包括多个词语,词语与词语之间会存在上下位的关系,进而,自然语言数据可以理解为一个具有树状层级结构特征的数据。
1002、利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待处理数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果。
本申请实施例中,训练得到的神经网络可以包括特征提取网络,特征提取网络被配置为提取所述待处理数据在双曲空间中表达的特征向量,然后将获得的特征向量传递到分类网络。
在一种可选的实现中,所述待处理数据包括如下的至少一种:
自然语言数据、知识图谱数据,基因数据或图像数据。
在一种可选的实现中,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数被配置为基于双曲空间的运算规则表达。
在一种可选的实现中,所述特征提取网络包括:第一处理层和第二处理层;
所述第一处理层被配置为处理所述待处理数据,以得到所述待处理数据在双曲空间中表示的嵌入向量;
所述第二处理层被配置为计算所述嵌入向量在双曲空间上的几何中心,以得到所述特征向量。
第一处理层可以为输入层,其被配置为处理所述待处理数据,以得到所述待处理数据对应的嵌入向量,第二处理层获得嵌入向量后,可以计算得到嵌入向量在双曲空间上的几何中心(特征向量)。具体的,可以使用双曲空间的几何平均值提取方法来提取特征向量。
在一种可选的实现中,所述双曲空间中的数据可基于第一共形模型表达以及基于第二共形模型表达,所述第一共形模型表示双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示双曲空间通过第二共形映射的方式映射到欧式空间;其中,所述嵌入向量基于所述第一共形模型表达;
所述特征提取网络还包括:共形转换层;
所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于所述第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
相应的,所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络。
在一种实现中,第一共形模型和第二共形模型可以是庞家莱模型Poincare Model,双曲面模型Hyperboloid Model或Klein Model。其中,共形模型用于刻画双曲空间,定义了双曲 陀螺向量空间的一系列向量代数变换和几何约束,不同的共形模型性质有所区别。若第一处理层输出的嵌入向量基于庞家莱模型Poincare Model表达,而第二处理层被配置为使用Einstein midpoint计算几何平均值,由于Einstein midpoint依赖于Klein Model,因此,需要将第一处理层输出的嵌入向量转换为基于Klein Model表达的嵌入向量,并基于通过Klein Model表达的嵌入向量利用Einstein midpoint计算其几何中心,以得到所述特征向量,此时的特征向量基于Klein Model表达。共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,此时,共形转换层可以将基于Klein Model表达的特征向量转换为基于庞家莱模型Poincare Model表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络。
在一种可选的实现中,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到在双曲空间中表达的待归一化向量;
将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
本申请实施例提供了一种数据处理方法,包括:获取待处理数据;以及,利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待处理数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果。本申请可以提升模型在蕴含树状层次结构的数据集上的数据处理的精度,并减少模型参数量。
接下来从装置的角度对本申请实施例提供的数据处理装置进行描述,参照图11,图11为本申请实施例提供的一种数据处理装置1100的示意,如图11中示出的那样,本申请实施例提供的一种数据处理装置1100,包括:
获取模块1101,被配置为获取待处理数据;以及,
处理模块1102,被配置为利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;
其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待分类数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果。
在一种可选的实现中,所述待处理数据包括如下的至少一种:
自然语言数据、知识图谱数据,基因数据或图像数据。
在一种可选的实现中,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
在一种可选的实现中,所述特征提取网络包括:第一处理层和第二处理层;
所述第一处理层被配置为处理所述待处理数据,以得到所述待处理数据在所述双曲空间中表示的嵌入向量;
所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
在一种可选的实现中,所述嵌入向量基于第一共形模型表达;
所述特征提取网络还包括:共形转换层;
所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
在一种可选的实现中,所述嵌入向量基于第二共形模型表达;
所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量,其中,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
在一种可选的实现中,所述分类网络被配置为基于所述双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
本申请实施例提供了一种数据分类装置,所述装置包括:获取模块,被配置为获取待处理数据;以及,处理模块,被配置为利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待分类数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果,本申请可以提升模型在蕴含树状层次结构的数据集上的数据处理的精度,并减少模型参数量。
参照图12,图12为本申请实施例提供的一种数据处理装置1200的示意,如图12中示出的那样,本申请实施例提供的一种数据处理装置1200,包括:
获取模块1201,被配置为获取待处理数据以及对应的类别标注;
处理模块1202,被配置为利用神经网络对所述待处理数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所 述待处理数据的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果;
模型更新模块1203,被配置为基于所述类别标注和所述处理结果,获取损失;以及
基于所述损失,获取在双曲空间中表达的梯度,基于所述梯度更新所述神经网络,得到更新后的神经网络。
在一种可选的实现中,所述模型更新模块,被配置为:
基于所述梯度更新所述神经网络中的所述特征提取网络,得到更新后的特征提取网络,所述更新后的特征提取网络被配置为提取所述训练数据在所述双曲空间中表达的特征向量。
在一种可选的实现中,所述训练数据包括如下的至少一种:
自然语言数据、知识图谱数据,基因数据或图像数据。
在一种可选的实现中,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
在一种可选的实现中,所述特征提取网络包括:第一处理层和第二处理层;
所述第一处理层被配置为处理所述训练数据,以得到所述训练数据对应的嵌入向量;
所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
在一种可选的实现中,所述嵌入向量基于第一共形模型表达;
所述特征提取网络还包括:共形转换层;
所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
相应的,所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
在一种可选的实现中,所述双曲空间中的数据可基于第二共形模型表达,所述第二共形模型表示双曲空间通过第二共形映射的方式映射到欧式空间;其中,所述嵌入向量基于所述第二共形模型表达;
所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量。
在一种可选的实现中,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
在一种可选的实现中,所述损失获取模块,被配置为基于所述类别标注、所述处理结 果以及目标损失函数,获取损失,其中,所述目标损失函数为在所述欧式空间中表达的函数。
在一种可选的实现中,所述模型更新模块,被配置为计算所述损失对应的梯度,其中,所述梯度在欧式空间中表达;将所述梯度转换为在所述双曲空间中表达的梯度;基于所述在双曲空间中表达的梯度更新所述神经网络。
本申请实施例提供了一种数据处理装置,所述装置包括:获取模块,被配置为获取待处理数据以及对应的类别标注;处理模块,被配置为利用神经网络对所述待处理数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待处理数据的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果;模型更新模块,被配置为基于所述类别标注和所述处理结果,获取损失,并基于所述损失更新所述神经网络,得到更新后的神经网络。由于双曲空间本身的特性,在双曲空间中表达特征向量,可以增强神经网络模型的拟合能力,提升了模型在蕴含树状层次结构的数据集上的数据处理的精度,例如可以提升文本分类的准确率。且,基于双曲空间构建的神经网络模型会在提升模型的拟合能力的同时模型参数量大幅减少。
接下来介绍本申请实施例提供的一种执行设备,请参阅图13,图13为本申请实施例提供的执行设备的一种结构示意图,执行设备1300具体可以表现为手机、平板、笔记本电脑、智能穿戴设备、服务器等,此处不做限定。其中,执行设备1300上可以部署有图10对应实施例中所描述的数据处理装置,用于实现图10对应实施例中数据处理的功能。具体的,执行设备1300包括:接收器1301、发射器1302、处理器1303和存储器1304(其中执行设备1300中的处理器1303的数量可以一个或多个,图13中以一个处理器为例),其中,处理器1303可以包括应用处理器13031和通信处理器13032。在本申请的一些实施例中,接收器1301、发射器1302、处理器1303和存储器1304可通过总线或其它方式连接。
存储器1304可以包括只读存储器和随机存取存储器,并向处理器1303提供指令和数据。存储器1304的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1304存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1303控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1303中,或者由处理器1303实现。处理器1303可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1303中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1303可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、 现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1303可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1304,处理器1303读取存储器1304中的信息,结合其硬件完成上述方法的步骤。
接收器1301可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1302可用于通过第一接口输出数字或字符信息;发射器1302还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1302还可以包括显示屏等显示设备。
本申请实施例中,在一种情况下,处理器1303,用于执行图4对应实施例中的执行设备执行的数据处理方法,具体的,处理器1303可以执行如下步骤:
获取待处理数据,利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待处理数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果。
在一种可选的实现中,所述待处理数据包括如下的至少一种:
自然语言数据数据、知识图谱数据,基因数据或图像数据。
在一种可选的实现中,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
在一种可选的实现中,所述特征提取网络包括:第一处理层和第二处理层;所述第一处理层被配置为处理所述待处理数据,以得到所述待处理数据在所述双曲空间中表示的嵌入向量;所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
第一处理层可以为输入层,其被配置为处理所述待处理数据,以得到所述待处理数据对应的嵌入向量,第二处理层获得嵌入向量后,可以计算得到嵌入向量在双曲空间上的几何中心(特征向量)。具体的,可以使用双曲空间的几何平均值提取方法来提取特征向量。
在一种可选的实现中,所述嵌入向量基于第一共形模型表达;所述特征提取网络还包括:共形转换层;所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过 第二共形映射的方式映射到欧式空间。
在一种可选的实现中,所述嵌入向量基于第二共形模型表达;
所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量,其中,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
在一种可选的实现中,所述分类网络被配置为基于所述双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
本申请实施例中,分类层的输出可以从双曲空间转换到欧式空间上,与后续的目标损失函数保持一致。
本申请实施例还提供了一种训练设备,请参阅图14,图14是本申请实施例提供的训练设备一种结构示意图,具体的,训练设备1400由一个或多个服务器实现,训练设备1400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1414(例如,一个或一个以上处理器)和存储器1432,一个或一个以上存储应用程序1442或数据1444的存储介质1430(例如一个或一个以上海量存储设备)。其中,存储器1432和存储介质1430可以是短暂存储或持久存储。存储在存储介质1430的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1414可以设置为与存储介质1430通信,在训练设备1400上执行存储介质1430中的一系列指令操作。
训练设备1400还可以包括一个或一个以上电源1426,一个或一个以上有线或无线网络接口1450,一个或一个以上输入输出接口1458;或,一个或一个以上操作系统1441,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
具体的,训练设备可以执行如下步骤:
获取训练数据以及对应的类别标注;
利用神经网络对所述训练数据做处理,以输出处理结果;
其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述训练数据的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果;
基于所述类别标注和所述处理结果,获取损失;以及,
基于所述损失,获取在所述双曲空间中表达的梯度,基于所述梯度更新所述神经网络,得到更新后的神经网络。
在一种可选的实现中,可以基于所述梯度更新所述神经网络中的所述特征提取网络,得到更新后的特征提取网络,所述更新后的特征提取网络被配置为提取所述训练数据在所述双曲空间中表达的特征向量。
在一种可选的实现中,所述训练数据包括如下的至少一种:
自然语言数据、知识图谱数据,基因数据或图像数据。
在一种可选的实现中,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
在一种可选的实现中,所述基于双曲空间的运算规则,至少包括如下的一种:莫比乌斯Mobius矩阵乘法以及Mobius加法。
在一种可选的实现中,所述特征提取网络包括:第一处理层和第二处理层;
所述第一处理层被配置为处理所述训练数据,以得到所述训练数据对应的嵌入向量;
所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
在一种可选的实现中,所述嵌入向量基于第一共形模型表达;
所述特征提取网络还包括:共形转换层;
所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
相应的,所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
在一种可选的实现中,所述双曲空间中的数据可基于第二共形模型表达,所述第二共形模型表示双曲空间通过第二共形映射的方式映射到欧式空间;其中,所述嵌入向量基于所述第二共形模型表达;
所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量。
在一种可选的实现中,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
在一种可选的实现中,所述基于所述类别标注和所述处理结果,获取损失,包括:
基于所述类别标注、所述处理结果以及目标损失函数,获取损失,其中,所述目标损失函数为在所述欧式空间中表达的函数。
在一种可选的实现中,所述基于所述损失更新所述神经网络,包括:
计算所述损失对应的梯度,其中,所述梯度在欧式空间中表达;
将所述梯度转换为在所述双曲空间中表达的梯度;
基于所述在双曲空间中表达的梯度更新所述神经网络。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。
本申请实施例提供的执行设备、训练设备或终端设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使执行设备内的芯片执行上述实施例描述的数据处理方法,或者,以使训练设备内的芯片执行上述实施例描述的数据处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体的,请参阅图15,图15为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 1500,NPU 1500作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1503,通过控制器1504控制运算电路1503提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路1503内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1503是二维脉动阵列。运算电路1503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1503是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1508中。
统一存储器1506用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1505,DMAC被搬运到权重存储器1502中。输入数据也通过DMAC被搬运到统一存储器1506中。
BIU为Bus Interface Unit即,总线接口单元1510,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1509的交互。
总线接口单元1510(Bus Interface Unit,简称BIU),用于取指存储器1509从外部存储器获取指令,还用于存储单元访问控制器1505从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1506或将权重数据搬运到权重存储器1502中或将输入数据数据搬运到输入存储器1501中。
向量计算单元1507包括多个运算处理单元,在需要的情况下,对运算电路1503的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元1507能将经处理的输出的向量存储到统一存储器1506。例如,向量计算单元1507可以将线性函数;或,非线性函数应用到运算电路1503的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1507生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1503的激活输入,例如用于在神经网络中的后续层中的使用。
控制器1504连接的取指存储器(instruction fetch buffer)1509,用于存储控制器1504使用的指令;
统一存储器1506,输入存储器1501,权重存储器1502以及取指存储器1509均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储 在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (36)

  1. 一种数据处理方法,其特征在于,所述方法包括:
    获取待处理数据;以及,
    利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;
    其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待处理数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果。
  2. 根据权利要求1所述的方法,其特征在于,所述待处理数据包括如下的至少一种:
    自然语言数据数据、知识图谱数据,基因数据或图像数据。
  3. 根据权利要求1或2所述的方法,其特征在于,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述特征提取网络包括:第一处理层和第二处理层;
    所述第一处理层被配置为处理所述待处理数据,以得到所述待处理数据在所述双曲空间中表示的嵌入向量;
    所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
  5. 根据权利要求4所述的方法,其特征在于,所述嵌入向量基于第一共形模型表达;
    所述特征提取网络还包括:共形转换层;
    所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
    所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
    所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
  6. 根据权利要求4所述的方法,其特征在于,所述嵌入向量基于第二共形模型表达;
    所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量,其中,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
  7. 根据权利要求1至6任一所述的方法,其特征在于,所述分类网络被配置为基于所述双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
    将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
  8. 一种数据处理方法,其特征在于,所述方法包括:
    获取训练数据以及对应的类别标注;
    利用神经网络对所述训练数据做处理,以输出处理结果;
    其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述训练数据的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果;
    基于所述类别标注和所述处理结果,获取损失;以及,
    基于所述损失,获取在所述双曲空间中表达的梯度,基于所述梯度更新所述神经网络,得到更新后的神经网络。
  9. 根据权利要求8所述的方法,其特征在于,所述基于所述梯度更新所述神经网络,得到更新后的神经网络,包括:
    基于所述梯度更新所述神经网络中的所述特征提取网络,得到更新后的特征提取网络,所述更新后的特征提取网络被配置为提取所述训练数据在所述双曲空间中表达的特征向量。
  10. 根据权利要求8或9所述的方法,其特征在于,所述训练数据包括如下的至少一种:
    自然语言数据、知识图谱数据,基因数据或图像数据。
  11. 根据权利要求8至10任一所述的方法,其特征在于,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
  12. 根据权利要求8至11任一所述的方法,其特征在于,所述特征提取网络包括:第一处理层和第二处理层;
    所述第一处理层被配置为处理所述训练数据,以得到所述训练数据对应的嵌入向量;
    所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
  13. 根据权利要求12所述的方法,其特征在于,所述嵌入向量基于第一共形模型表达;
    所述特征提取网络还包括:共形转换层;
    所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形 模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
    相应的,所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
    所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
  14. 根据权利要求12所述的方法,其特征在于,所述双曲空间中的数据可基于第二共形模型表达,所述第二共形模型表示双曲空间通过第二共形映射的方式映射到欧式空间;其中,所述嵌入向量基于所述第二共形模型表达;
    所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量。
  15. 根据权利要求8至14任一所述的方法,其特征在于,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
    将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
  16. 根据权利要求15所述的方法,其特征在于,所述基于所述类别标注和所述处理结果,获取损失,包括:
    基于所述类别标注、所述处理结果以及目标损失函数,获取损失,其中,所述目标损失函数为在所述欧式空间中表达的函数。
  17. 根据权利要求15或16所述的方法,其特征在于,所述基于所述损失更新所述神经网络,包括:
    计算所述损失对应的梯度,其中,所述梯度在欧式空间中表达;
    将所述梯度转换为在所述双曲空间中表达的梯度;
    基于所述在双曲空间中表达的梯度更新所述神经网络。
  18. 一种数据分类装置,其特征在于,所述装置包括:
    获取模块,被配置为获取待处理数据;以及,
    处理模块,被配置为利用训练得到的神经网络对所述待处理数据做处理,以输出处理结果;
    其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述待分类数据在双曲空间中表达的特征向量,所述分类网络被配置为基于双曲空间的 运算规则处理所述特征向量,以得到所述处理结果。
  19. 根据权利要求18所述的装置,其特征在于,所述待处理数据包括如下的至少一种:
    自然语言数据、知识图谱数据,基因数据或图像数据。
  20. 根据权利要求18或19所述的装置,其特征在于,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
  21. 根据权利要求18至20任一所述的装置,其特征在于,所述特征提取网络包括:第一处理层和第二处理层;
    所述第一处理层被配置为处理所述待处理数据,以得到所述待处理数据在所述双曲空间中表示的嵌入向量;
    所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
  22. 根据权利要求21所述的装置,其特征在于,所述嵌入向量基于第一共形模型表达;
    所述特征提取网络还包括:共形转换层;
    所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
    所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
    所述共形转换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
  23. 根据权利要求21所述的装置,其特征在于,所述嵌入向量基于第二共形模型表达;
    所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量,其中,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
  24. 根据权利要求18至23任一所述的装置,其特征在于,所述分类网络被配置为基于所述双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
    将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
  25. 一种数据处理装置,其特征在于,所述装置包括:
    获取模块,被配置为获取训练数据以及对应的类别标注;
    处理模块,被配置为利用神经网络对所述训练数据做处理,以输出处理结果;
    其中,所述神经网络包括特征提取网络以及分类网络;所述特征提取网络被配置为提取所述训练数据的特征向量,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到所述处理结果;
    基于所述类别标注和所述处理结果,获取损失;以及,
    模型更新模块,被配置为基于所述损失,获取在所述双曲空间中表达的梯度,基于所述梯度更新所述神经网络,得到更新后的神经网络。
  26. 根据权利要求25所述的装置,其特征在于,所述模型更新模块,被配置为:
    基于所述梯度更新所述神经网络中的所述特征提取网络,得到更新后的特征提取网络,所述更新后的特征提取网络被配置为提取所述训练数据在所述双曲空间中表达的特征向量。
  27. 根据权利要求25或26所述的装置,其特征在于,所述训练数据包括如下的至少一种:
    自然语言数据、知识图谱数据,基因数据或图像数据。
  28. 根据权利要求25至27任一所述的装置,其特征在于,所述分类网络包括多个神经单元,每个神经单元被配置为基于激活函数处理输入数据,其中,所述激活函数包括基于所述双曲空间的运算规则。
  29. 根据权利要求25至28任一所述的装置,其特征在于,所述特征提取网络包括:第一处理层和第二处理层;
    所述第一处理层被配置为处理所述训练数据,以得到所述训练数据对应的嵌入向量;
    所述第二处理层被配置为计算所述嵌入向量在所述双曲空间上的几何中心,以得到所述特征向量。
  30. 根据权利要求29所述的装置,其特征在于,所述嵌入向量基于第一共形模型表达;
    所述特征提取网络还包括:共形转换层;
    所述共形转换层被配置为将所述第一处理层得到的所述嵌入向量转换为基于第二共形模型表达的向量,并将所述基于所述第二共形模型表达的向量输入到所述第二处理层;
    相应的,所述第二处理层被配置为计算所述基于所述第二共形模型表达的向量的几何中心,以得到所述特征向量;
    所述共形装换层还被配置为将所述所述第二处理层得到的特征向量转换为基于所述第一共形模型表达的向量,并将所述基于所述第一共形模型表达的向量输入到所述分类网络,其中,所述第一共形模型表示所述双曲空间通过第一共形映射(conformal mapping)的方 式映射到欧式空间,所述第二共形模型表示所述双曲空间通过第二共形映射的方式映射到欧式空间。
  31. 根据权利要求29所述的装置,其特征在于,所述双曲空间中的数据可基于第二共形模型表达,所述第二共形模型表示双曲空间通过第二共形映射的方式映射到欧式空间;其中,所述嵌入向量基于所述第二共形模型表达;
    所述第二处理层被配置为计算基于所述第二共形模型表达的所述嵌入向量的几何中心,以得到所述特征向量。
  32. 根据权利要求25至31任一所述的装置,其特征在于,所述分类网络被配置为基于双曲空间的运算规则处理所述特征向量,以得到在所述双曲空间中表达的待归一化向量;
    将所述待归一化向量映射到欧式空间中,并对映射到欧式空间中的待归一化向量进行归一化处理,以得到所述处理结果。
  33. 根据权利要求32所述的装置,其特征在于,所述损失获取模块,被配置为基于所述类别标注、所述处理结果以及目标损失函数,获取损失,其中,所述目标损失函数为在所述欧式空间中表达的函数。
  34. 根据权利要求32或33所述的装置,其特征在于,所述模型更新模块,被配置为计算所述损失对应的梯度,其中,所述梯度在欧式空间中表达;将所述梯度转换为在所述双曲空间中表达的梯度;基于所述在双曲空间中表达的梯度更新所述神经网络。
  35. 一种数据处理装置,其特征在于,所述装置包括存储器和处理器;所述存储器存储有代码,所述处理器被配置为获取所述代码,并执行如权利要求1至17任一所述的方法。
  36. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有一个或多个指令,所述指令在由一个或多个计算机执行时使得所述一个或多个计算机实施权利要求1至17任一所述的方法。
PCT/CN2021/101225 2020-06-28 2021-06-21 一种数据处理方法及装置 WO2022001724A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21832153.7A EP4152212A4 (en) 2020-06-28 2021-06-21 DATA PROCESSING METHOD AND DEVICE
US18/084,267 US20230117973A1 (en) 2020-06-28 2022-12-19 Data processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010596738.4 2020-06-28
CN202010596738.4A CN111898636B (zh) 2020-06-28 2020-06-28 一种数据处理方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/084,267 Continuation US20230117973A1 (en) 2020-06-28 2022-12-19 Data processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2022001724A1 true WO2022001724A1 (zh) 2022-01-06

Family

ID=73206461

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/101225 WO2022001724A1 (zh) 2020-06-28 2021-06-21 一种数据处理方法及装置

Country Status (4)

Country Link
US (1) US20230117973A1 (zh)
EP (1) EP4152212A4 (zh)
CN (1) CN111898636B (zh)
WO (1) WO2022001724A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898636B (zh) * 2020-06-28 2024-05-14 华为技术有限公司 一种数据处理方法及装置
CN112287126B (zh) * 2020-12-24 2021-03-19 中国人民解放军国防科技大学 一种适于多模态知识图谱的实体对齐方法及设备
CN113486189A (zh) * 2021-06-08 2021-10-08 广州数说故事信息科技有限公司 一种开放性知识图谱挖掘方法及系统
CN114723073B (zh) * 2022-06-07 2023-09-05 阿里健康科技(杭州)有限公司 语言模型预训练、产品搜索方法、装置以及计算机设备
CN117609902B (zh) * 2024-01-18 2024-04-05 北京知呱呱科技有限公司 一种基于图文多模态双曲嵌入的专利ipc分类方法及系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110544310A (zh) * 2019-08-23 2019-12-06 太原师范学院 一种双曲共形映射下三维点云的特征分析方法
CN111898636A (zh) * 2020-06-28 2020-11-06 华为技术有限公司 一种数据处理方法及装置

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110544310A (zh) * 2019-08-23 2019-12-06 太原师范学院 一种双曲共形映射下三维点云的特征分析方法
CN111898636A (zh) * 2020-06-28 2020-11-06 华为技术有限公司 一种数据处理方法及装置

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CAGLAR GULCEHRE; MISHA DENIL; MATEUSZ MALINOWSKI; ALI RAZAVI; RAZVAN PASCANU; KARL MORITZ HERMANN; PETER BATTAGLIA; VICTOR BAPST; : "Hyperbolic Attention Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 24 May 2018 (2018-05-24), 201 Olin Library Cornell University Ithaca, NY 14853 , XP080882099 *
See also references of EP4152212A4
YI TAY; LUU ANH TUAN; SIU CHEUNG HUI: "Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 July 2017 (2017-07-25), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081282295, DOI: 10.1145/3159652.3159664 *

Also Published As

Publication number Publication date
EP4152212A1 (en) 2023-03-22
CN111898636A (zh) 2020-11-06
EP4152212A4 (en) 2023-11-29
US20230117973A1 (en) 2023-04-20
CN111898636B (zh) 2024-05-14

Similar Documents

Publication Publication Date Title
WO2022007823A1 (zh) 一种文本数据处理方法及装置
WO2020228376A1 (zh) 文本处理方法、模型训练方法和装置
WO2022057776A1 (zh) 一种模型压缩方法及装置
WO2022068627A1 (zh) 一种数据处理方法及相关设备
WO2022001724A1 (zh) 一种数据处理方法及装置
WO2021159714A1 (zh) 一种数据处理方法及相关设备
WO2021047286A1 (zh) 文本处理模型的训练方法、文本处理方法及装置
CN112131366B (zh) 训练文本分类模型及文本分类的方法、装置及存储介质
WO2022068314A1 (zh) 神经网络训练的方法、神经网络的压缩方法以及相关设备
WO2023160472A1 (zh) 一种模型训练方法及相关设备
WO2022156561A1 (zh) 一种自然语言处理方法以及装置
CN113761153B (zh) 基于图片的问答处理方法、装置、可读介质及电子设备
WO2023236977A1 (zh) 一种数据处理方法及相关设备
WO2022253074A1 (zh) 一种数据处理方法及相关设备
WO2023284716A1 (zh) 一种神经网络搜索方法及相关设备
WO2021057884A1 (zh) 语句复述方法、训练语句复述模型的方法及其装置
WO2022179586A1 (zh) 一种模型训练方法及其相关联设备
WO2021129411A1 (zh) 文本处理方法及装置
WO2020192523A1 (zh) 译文质量检测方法、装置、机器翻译系统和存储介质
WO2023116572A1 (zh) 一种词句生成方法及相关设备
WO2023045949A1 (zh) 一种模型训练方法及其相关设备
WO2021083312A1 (zh) 训练语句复述模型的方法、语句复述方法及其装置
WO2023143262A1 (zh) 一种数据处理方法及相关设备
WO2023236900A1 (zh) 一种项目推荐方法及其相关设备
WO2021129410A1 (zh) 文本处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21832153

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021832153

Country of ref document: EP

Effective date: 20221216

NENP Non-entry into the national phase

Ref country code: DE