WO2021174774A1 - 神经网络关系抽取方法、计算机设备及可读存储介质 - Google Patents

神经网络关系抽取方法、计算机设备及可读存储介质 Download PDF

Info

Publication number
WO2021174774A1
WO2021174774A1 PCT/CN2020/111513 CN2020111513W WO2021174774A1 WO 2021174774 A1 WO2021174774 A1 WO 2021174774A1 CN 2020111513 W CN2020111513 W CN 2020111513W WO 2021174774 A1 WO2021174774 A1 WO 2021174774A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
neural network
sentence
network model
extraction
Prior art date
Application number
PCT/CN2020/111513
Other languages
English (en)
French (fr)
Inventor
回艳菲
王健宗
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174774A1 publication Critical patent/WO2021174774A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of this application relate to the field of artificial intelligence technology, and in particular to a neural network relationship extraction method, computer equipment, and readable storage medium.
  • Relation extraction is a very important research in the field of natural language processing. As an important subtask, relation extraction aims to extract a predefined semantic relationship between two entities from the text. The extracted relationship and the entity can be It is organized into triples and stored in the graph database, and is applied to the medical knowledge graph based on the relevant knowledge graph technology. How to construct a high-quality medical knowledge graph is inseparable from high-quality relationship extraction. Therefore, the status of relationship extraction is particularly important for medical knowledge graphs.
  • CNN Convolutional Neural Network
  • RNN Recurrent Neural Network
  • the purpose of the embodiments of the present application is to provide a neural network relationship extraction method, which can perform relationship extraction with high quality.
  • the embodiments of the present application provide a neural network relationship extraction method, the method includes: constructing a two-channel neural network model, the two-channel neural network model includes a first channel and a second channel; and obtaining The sentence to be processed; the dependency syntax analysis is performed on the sentence to generate a dependency syntax analysis tree, and the two shortest dependency paths between target entities are found from the dependency syntax analysis tree, and the two shortest paths represent the sentence The two clauses; input the two clauses into the first channel, and perform feature extraction through the convolutional neural network model to obtain the first extraction information; input the sentence into the second channel, pass
  • the long-term short-term memory network model performs feature extraction to obtain the second extraction information; the first extraction information and the second extraction information are weighted and summarized by the attention mechanism to obtain the final extracted features of the sentence, and the final The extracted features are input to the softmax layer to complete the classification of the relationship categories between the target entities.
  • the embodiment of the present application also provides a neural network relationship extraction system, including: a building module for building a two-channel neural network model, the two-channel neural network model including a first channel and a second channel; To obtain the sentence to be processed; the shortest path generation module is used to perform dependency syntactic analysis on the sentence to obtain the two clauses of the sentence; the first extraction module is used to combine the The two clauses are input into the first channel, and feature extraction is performed through the convolutional neural network model to obtain the first extraction information; the second extraction module is used to input the sentence into the second channel in the second channel.
  • feature extraction is performed through the long- and short-term memory network model to obtain the second extraction information; the classification module is used for weighting and summarizing the first extraction information and the second extraction information through the attention mechanism.
  • the final extracted feature of the sentence is input to the softmax layer of the dual-channel neural network model to complete the classification of the relationship category between the target entities.
  • the embodiment of the present application also provides a computer device that includes a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the computer-readable instructions When executed by the processor, the following steps are implemented:
  • the dual-channel neural network model including a first channel and a second channel
  • the first extraction information and the second extraction information are weighted and summarized to obtain the final extracted features of the sentence, and the final extracted features are input to the softmax layer of the dual-channel neural network model. Complete the classification of the relationship categories between the target entities.
  • the embodiment of the present application also provides a computer-readable storage medium having computer-readable instructions stored in the computer-readable storage medium, and the computer-readable instructions may be executed by at least one processor to enable the At least one processor performs the following steps:
  • the dual-channel neural network model including a first channel and a second channel
  • the first extraction information and the second extraction information are weighted and summarized to obtain the final extracted features of the sentence, and the final extracted features are input to the softmax layer of the dual-channel neural network model. Complete the classification of the relationship categories between the target entities.
  • the dual-channel neural network relationship extraction model proposed in the embodiment of this application combines the key information of the shortest dependency path, and uses the original sentence to maintain the information that the dependency path cannot capture, extracts local information through CNN, and uses a pooling layer The most useful information is gathered, excellent partial information is extracted, and the key information for classifying relationships is retained.
  • Use LSTM to extract information from the entire sentence, which can extract excellent representations for long-distance sentences.
  • the information extracted by the two models is weighted and summarized through the attention mechanism to obtain the final representation of the current sentence, and the current sentence contains the information that contributes the most to the relationship classification. Finally, it is classified through the softmax layer to achieve the predetermined relationship. The effect of extraction.
  • FIG. 1 is a schematic flowchart of a neural network relationship extraction method according to the first embodiment of the present application
  • Fig. 2 is a schematic diagram of a dual-channel neural network model in the first embodiment of the present application
  • FIG. 3 is a schematic diagram of feature extraction performed by a convolutional neural network model in the first embodiment of the present application
  • FIG. 4 is a program module diagram of the neural network relationship extraction system according to the second embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a computer device according to a third embodiment of the present application.
  • This application can be applied to smart government affairs/smart city management/smart communities/smart security/smart logistics/smart healthcare/smart education/smart environmental protection/smart transportation scenarios to promote the construction of smart cities.
  • the first embodiment of this application relates to a neural network relationship extraction method.
  • the core of this embodiment is to propose a dual-channel neural network relationship extraction model, which uses a convolutional neural network (Convolutional Neural Networks, CNN) model to extract the shortest Depend on the key information of the path, use the Long Short-Term Memory (LSTM) model to extract information from the entire sentence. For long-distance sentences, you can extract excellent representations, and use the pooling layer to extract the most useful information. Convergence, extracts excellent local information, and retains the key information for classifying relationships.
  • CNN convolutional neural network
  • LSTM Long Short-Term Memory
  • the features extracted by the two models are weighted and summarized through the attention mechanism (also called the attention mechanism) to obtain the final vector representation of the current sentence, and finally classified through the softmax layer to achieve the effect of extracting the predetermined relationship.
  • the attention mechanism also called the attention mechanism
  • the implementation details of the neural network relationship extraction method of this embodiment will be described in detail below. The following content is only provided for ease of understanding and is not necessary for the implementation of this solution.
  • FIG. 1 The flow diagram of the neural network relationship extraction method in this embodiment is shown in FIG. 1, and the method is applied to computer equipment.
  • Step S101 Construct a dual-channel neural network model, the dual-channel neural network model including a first channel and a second channel.
  • Relation extraction aims to extract the pre-defined semantic relationship between two entities from the text.
  • the extracted relations and entities can be organized into triples and stored in the graph database. Based on the relevant knowledge graph technology, it is applied to the medical knowledge graph. middle. Constructing a high-quality medical knowledge graph cannot do without high-quality relationship extraction. Therefore, the status of relationship extraction is particularly important for medical knowledge graphs.
  • CNN Convolutional Neural Network
  • RNN Recurrent Neural Network
  • DNN Deep Neural Networks
  • a single CNN or RNN model is generally used to vectorize the sentence, but a single model may not be able to capture the key points, especially in the medical field, where the length of sentences varies, and there is no single model.
  • Adaptable, and not all words in the sentence contribute to the entity relationship, because some sentences are too long, so the quality of the relationship extraction of a single model is not high. Therefore, in this embodiment, in order to improve the quality of the relationship extraction, the relationship extraction is performed by establishing a two-channel neural network model in this application.
  • training the constructed dual-channel neural network model includes: obtaining a training set, inputting the training set to the dual-channel neural network model to output the prediction relationship category of the training set, according to the The predicted relationship category output by the dual-channel neural network model and the actual relationship category of the training set are calculated, the cross-entropy of the loss function is calculated, and the cross-entropy of the loss function is minimized through an optimization algorithm to train the dual-channel neural network model.
  • the training set is a set of training data whose actual relationship types are known.
  • the loss function is:
  • r i represents the probability value of the i-th category of the relationship between entities
  • s is a single sentence
  • S is a set of sentences
  • t is the number of categories of entity relationships.
  • training the dual-channel neural network model by minimizing the cross-entropy of the loss function is actually minimizing the cross-entropy of the loss function between the predicted relationship category and the actual relationship category.
  • Step S102 Acquire sentences to be processed.
  • the sentence to be processed refers to the sentence for which relationship extraction is required.
  • Step S103 Perform dependency syntax analysis on the sentence to obtain two clauses of the sentence.
  • a dependency syntax analysis is performed on the sentence through a syntax analyzer to generate a dependency syntax analysis tree; the two shortest dependency paths between target entities are found from the dependency syntax analysis tree, and the two shortest paths represent the State the two clauses of the sentence.
  • Stanfordparser and Berkeley parser are more representative of open source Chinese syntax analyzers.
  • Stanford parser is based on a factor model
  • Berkeley parser is based on a non-lexical analysis model.
  • a dependency syntax analysis is performed on the sentence through a syntax analyzer (Stanfordparser).
  • other syntax analyzers can also be used to perform dependency syntax analysis, which is not limited in this embodiment.
  • the sentence to be processed is subjected to dependency syntax analysis to obtain the two shortest dependency paths between target entities. Since the shortest dependency path screen contains the main part of expressing the relational pattern except for the unimportant modifier blocks, the two shortest paths are actually the two clauses of the sentence. Moreover, by obtaining the shortest dependency path of the sentence, the information that contributes the most to the relationship classification can be captured, and the excellent local information in the sentence can be extracted.
  • Step S104 Input the two clauses into the first channel, and perform feature extraction through a CNN model to obtain first extraction information.
  • FIG. 2 is a schematic diagram of a dual-channel model in a preferred embodiment of this application.
  • the two clauses are respectively input to the CNN model for feature extraction to obtain the first extraction information.
  • the two clauses are respectively input to the CNN model for feature extraction to obtain the first extraction information, including:
  • Fig. 3 is a schematic diagram of feature extraction performed by the convolutional neural network model in the first embodiment of the present application.
  • the convolution operation is performed in the convolutional layer, and the maximum pooling (maxpooling) processing is used in the pooling layer.
  • the value after the maximum pooling operation is taken for a certain row vector. The largest value in the row.
  • the processed vector representations of the two clauses are fused through a hidden layer to obtain the first extracted information s 1 .
  • the sentence to be processed is subjected to dependency parsing through the Stanfordparser parser to obtain two shortest dependency paths (also the two clauses of the sentence to be processed), and then the vector representation of the two clauses, where,
  • the vector representation of the two clauses is specifically: define the vector of the word i on the two shortest dependent paths in, It is a word embedding, in which the word vector of the corresponding word can be found directly through the open source word vector file by using the pre-trained word vector. Is the position vector (position embedding).
  • Position embedding refers to the relative distance between the two entity words of the current word distance clause on the shortest dependency path, in Is the relative distance between the current word and the j-th entity.
  • the clause x is transformed into Given a clause x, assuming that the clause has n words, a pair of clauses x is expressed as a vector by a formula.
  • Formula one is: Among them, n represents the number of words contained in each clause, Represents the vector of the i-th word in the sentence x, and Z n represents the vector representation of the clause.
  • the word vector and the position vector are used as the input of the convolutional neural network, and the vector representation of the processed sentence is obtained through the convolutional layer, the pooling layer and the nonlinear layer.
  • the vector representations of the two clauses are processed through the convolutional layer, the pooling layer, and the nonlinear layer according to the second formula.
  • [r x i ] j max[f(W 1 z n + b 1)] j, where, [r x i] j represents the j-th vector of the vector r x i a, r x i refers to the value obtained by taking the maximum pooling operation on a row vector, W 1 is the weight convolutional layer weight Matrix, f is the nonlinear transformation tanh function, Z n represents the vector representation of the clause, and b 1 is the partial value, which is a constant.
  • the convolutional layer uses a matrixed vector to perform a convolution operation on Z n in two virtual windows with a window size of k.
  • the vector representations of the two clauses are fused through a hidden layer (Hidden Layer) to obtain
  • the first extraction information is also the last sentence of the two clauses in the CNN model representing s 1 .
  • the sentence representation s 1 is the final feature vector that can represent the sentence to be processed, and the information in the sentence to be processed is included in this feature vector.
  • Step S105 Input the sentence into the second channel, and perform feature extraction through the LSTM model to obtain second extraction information.
  • performing word segmentation operations on the sentence to obtain L word segmentation respectively performing word vector mapping on the L word segments to obtain an L*d-dimensional word vector matrix, and the L word segmentation is mapped into a d-dimensional word vector
  • the d-dimensional word vectors of the L word segmentation are sequentially input into the long and short-term memory network model for feature extraction to obtain the second extraction information.
  • the vector representation of the complete sentence to be processed is input to the LSTM model for feature extraction, and the second extraction information is obtained, that is, the final sentence representation of the sentence to be processed in the LSTM model s 2 .
  • the schematic diagram is the same as the schematic diagram in Figure 3, that is, the sentence to be processed is represented by a vector, and the vector representation of the sentence to be processed is represented by the convolution
  • the sentence to be processed is input to the LSTM model to obtain the vector representation of the sentence to be processed.
  • the vector representation of the sentence to be processed is passed through the convolutional layer, the pooling layer and the non-
  • the specific calculation method for processing the linear layer is the same as that in step S104, and will not be repeated here.
  • the vector representation of the complete sentence to be processed is the word embedding representation of the complete sentence to be processed.
  • Step S106 Weighted and summarized the first extraction information and the second extraction information through the attention mechanism to obtain the final extracted features of the sentence, and input the final extracted features into the softmax of the dual-channel neural network model Layer to complete the classification of the relationship categories between the target entities.
  • the CNN model has great advantages for short sentence processing, while the LSTM model is easier to learn long-distance information, and has superior performance for extracting long-distance sentence features.
  • the two clauses of the sentence to be processed are input to the CNN model for feature extraction, and the final sentence representation s 1 of the two clauses in the CNN model is obtained, and in the second channel, Input the sentence into the LSTM model for feature extraction, and the sentence to be processed is obtained after the final sentence representation s 2 of the LSTM model.
  • an attention mechanism is used to weight and summarize the first extraction information and the second extraction information to obtain the final extracted feature of the sentence, that is, the final vector representation s of the sentence.
  • the first extraction information and the second extraction information are weighted and summarized by formula 3 and formula 4,
  • ⁇ i is the weight of the final vector representation of each sentence
  • s i is the vector representation of the sentence after feature extraction, such as the above s 1 , s 2 ;
  • t i is a query-based method, which is matched by the sentence si and the prediction relation r;
  • s i s i Ar
  • s i the vector representation of the sentence after feature extraction, for example, the first extraction information s 1 or the second extraction information s 2
  • A is a weighted diagonal matrix
  • r the sum relation
  • the query vector related to r is a vector representation of the relation r.
  • conditional probability is defined through the softmax layer, where the calculation formula of the conditional probability is:
  • n r represents the number of predefined relationships.
  • o Ms+d, where o is the probability value of all relation categories, M is the relation matrix representation, and d is a deviation vector.
  • the probability values of all relationship categories output by o are essentially a one-dimensional column vector. Each number in the column vector represents the probability value of a relationship category, indicating that the target entity is a certain relationship category. Sex size.
  • the attention mechanism is used to fuse the representation output by the CNN model and the representation output by the LSTM model, and extract the outstanding representation for the current sentence, so that the finally trained relation extraction model can be suitable for long and short sentences.
  • the dual-channel neural network model proposed in the implementation of this application integrates the key information of the shortest dependency path, and uses the original sentence to maintain the information that cannot be captured by the dependency path, uses CNN to extract local information, and uses a pooling layer to extract the most local information. Useful information is gathered, excellent partial information is extracted, and the key information for classifying relationships is retained. Use LSTM to extract information from the entire sentence, which can extract excellent representations for long-distance sentences. The information extracted by the two models is weighted and summarized through the attention mechanism to obtain the final representation of the current sentence, and the current sentence contains the information that contributes the most to the relationship classification. Finally, it is classified through the softmax layer of the dual-channel neural network model to achieve The effect of extracting the predetermined relationship.
  • data related to the network relationship may be uploaded to the blockchain.
  • Corresponding summary information is obtained based on the related data of the network relationship.
  • the summary information is obtained by hashing the related data of the network relationship, for example, obtained by processing the sha256s algorithm.
  • Uploading summary information to the blockchain can ensure its security and fairness and transparency to users.
  • the user equipment can download the summary information from the blockchain to verify whether the data related to the network relationship has been tampered with.
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the second embodiment of the present application relates to a block diagram of a neural network relationship extraction system.
  • the neural network relationship extraction system of the computer device can be divided into one or more program modules, and the one or more program modules are stored in a storage medium, It is executed by one or more processors to complete the embodiments of the present application.
  • the program module referred to in the embodiment of the present application refers to a series of computer-readable instruction segments that can complete specific functions. The following description will specifically introduce the function of each program module in this embodiment.
  • the neural network relationship extraction system 400 may include an establishment module 410, an acquisition module 420, a shortest path generation module 430, a first module 440, a second module 450, a classification module 460, and a training module 470, in which:
  • the establishment module 410 is configured to construct a dual-channel neural network model, the dual-channel neural network model including a first channel and a second channel.
  • the obtaining module 420 is used to obtain sentences to be processed.
  • the shortest path generation module 430 is configured to perform dependency syntactic analysis on the sentence to obtain two clauses of the sentence.
  • a dependency syntax analysis is performed on the sentence through a syntax analyzer to generate a dependency syntax analysis tree, and the two shortest dependency paths between target entities are found from the dependency syntax analysis tree, and the two shortest paths represent the State the two clauses of the sentence.
  • the first extraction module 440 is configured to input the two clauses into the first channel, and perform feature extraction through a Convolutional Neural Networks (CNN) model to obtain first extraction information.
  • CNN Convolutional Neural Networks
  • the second extraction module 450 is configured to input the sentence into the second channel, and perform feature extraction through a Long Short-Term Memory (LSTM) model to obtain second extraction information.
  • LSTM Long Short-Term Memory
  • the classification module 460 is configured to perform a weighted summary of the first extracted information and the second extracted information through an attention mechanism to obtain the final extracted features of the sentence, and input the final extracted features to the dual-channel neural
  • the softmax layer of the network model is used to classify the relationship categories between the target entities.
  • neural network relationship extraction system 400 further includes:
  • the training module 470 is used to train the constructed dual-channel neural network model.
  • the training module 470 is further configured to: obtain a training set; input the training set to the two-channel neural network model to output the prediction relationship category of the training set; and according to the prediction relationship output by the two-channel neural network model The actual relationship between the category and the training set is calculated, and the cross entropy of the loss function is calculated; the loss function is minimized by an optimization algorithm to train the dual-channel neural network model.
  • the first extraction module 440 is further configured to: perform vector representations of the words of the two clauses; process the vector representations of the two clauses through a convolutional layer, a pooling layer, and a non-linear layer; The vector representation of the two clauses of is fused through the hidden layer to obtain the first extraction information.
  • the third embodiment of the present application relates to a computer device. As shown in FIG. 5, it is a schematic diagram of the hardware architecture of the computer device extracted from the neural network relationship of the present application.
  • the computer device 500 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • it can be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers).
  • the computer device 500 at least includes but is not limited to: a memory 510, a processor 520, and a network interface 530 that can communicate with each other through a system bus. in:
  • the memory 510 includes at least one type of readable storage medium.
  • the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 510 may be an internal storage module of the computer device 500, such as a hard disk or memory of the computer device 400.
  • the memory 510 may also be an external storage device of the computer device 500, for example, a plug-in hard disk equipped on the computer device 500, a smart memory card (Smart Media Card, referred to as SMC), and a secure digital (Secure Digital). Digital, abbreviated as SD) card, flash card (Flash Card), etc.
  • the memory 510 may also include both an internal storage module of the computer device 500 and an external storage device thereof.
  • the memory 510 is generally used to store the operating system and various application software installed in the computer device 500, such as the program code of the blockchain secure transaction method.
  • the memory 510 may also be used to temporarily store various types of data that have been output or will be output.
  • the processor 420 may be a central processing unit (Central Processing Unit, CPU for short), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 520 is generally used to control the overall operation of the computer device 500, such as performing data interaction or communication-related control and processing with the computer device 500.
  • the processor 520 is configured to run program codes stored in the memory 510 or process data.
  • the network interface 530 may include a wireless network interface or a wired network interface, and the network interface 530 is generally used to establish a communication link between the computer device 500 and other computer devices.
  • the network interface 530 is used to connect the computer device 500 to an external terminal through a network, and to establish a data transmission channel and a communication link between the computer device 500 and the external terminal.
  • the network can be Intranet, Internet, Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network , 5G network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 5 only shows a computer device with components 510-530, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the blockchain secure transaction method stored in the memory 510 can also be divided into one or more program modules and executed by one or more processors (the processor 520 in this embodiment). To complete this application.
  • the memory 510 stores instructions that can be executed by the at least one processor 520, and the instructions are executed by the at least one processor 520, so that the at least one processor 520 can execute the steps of the neural network relationship extraction method described above.
  • the embodiments of the present application also provide a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor, so that the at least one processor executes the following steps:
  • the dual-channel neural network model including a first channel and a second channel
  • the first extraction information and the second extraction information are weighted and summarized to obtain the final extracted features of the sentence, and the final extracted features are input to the softmax layer of the dual-channel neural network model. Complete the classification of the relationship categories between the target entities.
  • the program is stored in a storage medium and includes several instructions to enable a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) that executes all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

一种神经网络关系抽取方法、计算机设备及计算机可读存储介质,所述方法包括:构建双通道神经网络模型(S101);获取待处理的句子(S102);对所述句子进行依存句法分析,得到所述句子的两个子句(S103);将两个子句输入到第一通道中,通过CNN模型进行特征抽取,得到第一抽取信息(S104);将所述句子输入到第二通道中,通过LSTM模型进行特征抽取,得到第二抽取信息(S105);通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述语句最终的抽取特征,将所述最终的抽取特征输入到softmax层以完成对所述目标实体之间的关系类别进行分类(S106)。所述神经网络关系抽取方法,能够高质量地进行关系抽取。

Description

神经网络关系抽取方法、计算机设备及可读存储介质
本申请申明2020年07月30日递交的申请号为202010752459.2、名称为“神经网络关系抽取方法、计算机设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术领域,特别涉及一种神经网络关系抽取方法、计算机设备及可读存储介质。
背景技术
关系抽取在自然语言处理领域是一项非常重要的研究,作为一项重要的子任务,关系抽取旨在文本中抽取出两个实体间预先定义好的语义关系,抽取的关系和实体之间可以组织成三元组的形式存入图数据库,基于相关的知识图谱技术应用到医疗知识图谱中。如何构造一个高质量的医疗知识图谱,又离不开高质量的关系抽取。所以对于医疗知识图谱,关系抽取的地位尤为重要。
发明人发现,传统的关系抽取任务,一般通过单一的卷积神经网络(Convolutional Neural Network,CNN)或者循环神经网络(Recurrent Neural Network,RNN)等模型对句子进行向量化表示,但是单一模型的关系抽取质量并不高。
发明内容
本申请实施方式的目的在于提供一种神经网络关系抽取方法,能够高质量地进行关系抽取。
为解决上述技术问题,本申请的实施方式提供了一种神经网络关系抽取方法,所述方法包括:构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道;获取待处理的句子;对所述句子进行依存句法分析,生成依存句法分析树,从所述依存句法分析树中找出目标实体间的两条最短依存路径,所述两条最短路径表示所述句子的两个子句;将所述两个子句输入到所述第一通道中,通过卷积神经网络模型进行特征抽取,得到第一抽取信息;将所述句子输入到所述第二通道中,通过长短期记忆网络模型进行特征抽取,得到第二抽取信息;通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到softmax层以完成对所述目标实体之间的关系类别进行分类。
本申请的实施方式还提供了一种神经网络关系抽取系统,包括:建立模块,用于构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道;获取模块,用于获取待处理的句子;最短路径生成模块,用于对所述句子进行依存句法分析,得到所述句子的两个子句;第一抽取模块,用于在所述第一通道中,将所述两个子句输入到所述第一通道中,通过卷积神经网络模型进行特征抽取,得到第一抽取信息;第二抽取模块,用于在所述第二通道中,将所述句子输入到所述第二通道中,通过长短期记忆网络模型进行特征抽取,得到第二抽取信息;分类模块,用于通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到所述双通道神经网络模型的softmax层以完成对所述目标实体之间的关系类别进行分类。
本申请的实施方式还提供了一种计算机设备,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述计算机可读指令被处理器执行时实现以下步骤:
构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道;
获取待处理的句子;
对所述句子进行依存句法分析,得到所述句子的两个子句;
将所述两个子句输入到所述第一通道中,通过卷积神经网络模型进行特征抽取,得到第一抽取信息;
将所述句子输入到所述第二通道中,通过长短期记忆网络模型进行特征抽取,得到第二抽取信息;
通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到所述双通道神经网络模型的softmax层以完成对所述目标实体之间的关系类别进行分类。
本申请的实施方式还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行如下步骤:
构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道;
获取待处理的句子;
对所述句子进行依存句法分析,得到所述句子的两个子句;
将所述两个子句输入到所述第一通道中,通过卷积神经网络模型进行特征抽取,得到第一抽取信息;
将所述句子输入到所述第二通道中,通过长短期记忆网络模型进行特征抽取,得到第二抽取信息;
通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到所述双通道神经网络模型的softmax层以完成对所述目标实体之间的关系类别进行分类。
本申请实施方式中提出的双通道神经网络关系抽取模型,该模型融合了最短依存路径的关键信息,又使用原语句保持依存路径捕捉不到的信息,通过CNN来抽取局部信息,并用池化层将最有用的信息进行汇聚,抽取了优秀的局部信息,保留了对关系进行分类的关键信息。使用LSTM对整个句子进行信息抽取,对于长距离的句子可以抽取出优秀的表示。对两种模型抽取出的信息,通过注意力机制进行加权汇总,得到当前句子的最终表示,且当前句子包含了对关系分类贡献最大的信息,最后经过softmax层进行分类,达到了对预定关系进行抽取的效果。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定。
图1是根据本申请第一实施方式的神经网络关系抽取方法的流程示意图;
图2是本申请第一实施方式中双通道神经网络模型的示意图;
图3是本申请第一实施方式中卷积神经网络模型进行特征抽取的示意图;
图4是根据本申请第二实施方式的神经网络关系抽取系统的程序模块图;
图5是根据本申请第三实施方式的计算机设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施方式中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。
本申请可应用于智慧政务/智慧城管/智慧社区/智慧安防/智慧物流/智慧医疗/智慧教育/智慧环保/智慧交通场景中,从而推动智慧城市的建设。
本申请的第一实施方式涉及一种神经网络关系抽取方法,本实施方式的核心在于,提出一种双通道神经网络关系抽取模型,采用卷积神经网络(Convolutional Neural Networks,CNN)模型来抽取最短依存路径的关键信息,使用长短期记忆网络(Long Short-Term Memory,LSTM)模型对整个句子进行信息抽取,对于长距离的句子可以抽取出优秀的表示,并用池化层将最有用的信息进行汇聚,抽取了优秀的局部信息,保留了对关系进行分类的关键信息。对两种模型抽取出的特征,通过注意力机制(也叫attention机制)进行加权汇总,得到当前句子的最终向量表示,最后经过softmax层进行分类,达到了对预定关系进行抽取的效果。下面对本实施方式的神经网络关系抽取方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。
本实施方式中的神经网络关系抽取方法的流程示意图如图1所示,该方法应用于计算机设备。
在本实施方式中,根据不同的需求,图1所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。
步骤S101:构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道。
关系抽取旨在文本中抽取出两个实体间预先定义好的语义关系,抽取的关系和实体之间可以组织成三元组的形式存入图数据库,基于相关的知识图谱技术应用到医疗知识图谱中。构造一个高质量的医疗知识图谱,离不开高质量的关系抽取。所以对于医疗知识图谱,关系抽取的地位尤为重要。
现有技术中,卷积神经网络(Convolutional Neural Network,CNN)和循环神经网络(Recurrent Neural Network,RNN)作为深度神经网络(Deep Neural Networks,DNN)的两种主要架构类型,在传统的关系抽取任务中,一般通过单一的CNN或者RNN等模型对句子进行向量化表示,但单一的模型可能会抓不到重点,尤其是在医疗领域中,句子的长短不一,没有哪种单一的模型是适配的,并且句子中并不是所有词语都对实体关系有贡献,因为有的句子过于冗长,所以单一模型的关系抽取质量并不高。因此,本实施例中,为了使关系抽取的质量更高,本申请中通过建立双通道神经网络模型来进行关系抽取。
本实施例中,在构建双通道神经网络模型后,还需对构建的所述双通道神经网络模型进行训练。具体地,对构建的所述双通道神经网络模型进行训练,包括:获取训练集,将所述训练集输入到所述双通道神经网络模型以输出所述训练集的预测关系类别,依据所述双通道神经网络模型输出的预测关系类别与所述训练集的实际关系类别,计算损失函数交叉熵,通过优化算法对所述损失函数交叉熵进行最小化,以训练所述双通道神经网络模型。
本实施例中,训练集是一组已知实际关系类别的训练数据的集合。
本实施例中,损失函数为:
Figure PCTCN2020111513-appb-000001
其中,r i代表实体之间的关系为第i种类别的概率值,s是单个句子,S为句子集合,t是实体关系的类别数。当实体关系为第i种类别时,则r i为1,否则为0。
举个例子,假设实体之间一共有10种关系,实体1与实体2之间的实体关系为3,那么r 3=1。
本实施例中,通过最小化损失函数交叉熵以训练所述双通道神经网络模型,实际上是最小化预测关系类别与实际关系类别之间的损失函数交叉熵。
步骤S102:获取待处理的句子。
本实施例中,待处理的句子是指需要进行关系抽取的句子。
步骤S103:对所述句子进行依存句法分析,得到所述句子的两个子句。
具体地,通过句法分析器对所述句子进行依存句法分析,生成依存句法分析树;从所述依存句法分析树中找出目标实体间的两条最短依存路径,所述两条最短路径表示所述句子的两个子句。
目前在开源中文句法分析器中比较具有代表性有Stanfordparser和Berkeley parser。Stanford parser基于因子模型,Berkeley parser基于非词汇化分析模型。本实施例中,通过句法分析器(Stanfordparser)对所述句子进行依存句法分析。当然,在其他实施例中,也可以采用其他的句法分析器进行依存句法分析,本实施例中,对此并不作限定。
本实施例中,对待处理的句子进行依存句法分析,得到目标实体间的两条最短依存路径。由于,最短依存路径屏除了不重要的修饰语块,包含了表达关系模式的主干部分,因此,两条最短路径实际上为所述句子的两个子句。而且,通过获取句子的最短依存路径能够捕捉对关系分类贡献最大的信息,抽取句子中优秀的局部信息。
步骤S104:将所述两个子句输入到所述第一通道中,通过CNN模型进行特征抽取,得到第一抽取信息。
本实施例中,图2为本申请优先的实施例中双通道模型的示意图。如图2所示,在所述第一通道(左边的通道)中将两个子句(两条最短依存路径)分别输入到CNN模型进行特征抽取,得到第一抽取信息。具体地,在所述第一通道中将两个子句分别输入到CNN模型进行特征抽取,得到第一抽取信息,包括:
图3是本申请第一实施方式中卷积神经网络模型进行特征抽取的示意图,如图3所示,将两个子句进行向量表示,将所述两个子句的向量表示通过卷积层,池化层和非线性层进行处理,具体地,在卷积层进行卷积操作,在池化层使用最大池化(maxpooling)处理,其中,对某一行向量取最大池化操作后的值为该行中最大的值。进一步地,如图2所示,将处理后的两个子句的向量表示通过一层隐层进行融合,得到第一抽取信息s 1
本实施例中,将待处理的句子通过Stanfordparser句法分析器进行依存句法分析后得到两条最短依存路径(也是待处理的句子的两个子句),然后,对两个子句的向量表示,其中,对两个子句的向量表示,具体为:定义两条最短依存路径上的词语i的向量
Figure PCTCN2020111513-appb-000002
其中,
Figure PCTCN2020111513-appb-000003
为词向量(word embedding),其中,利用预训练的词向量,直接通过开源的词向量文件便能查找对应单词的词向量。
Figure PCTCN2020111513-appb-000004
为位置向量(position embedding)。Position embedding是指最短依存路径上的当前词距离子句的两个实体词的相对距离,
Figure PCTCN2020111513-appb-000005
其中
Figure PCTCN2020111513-appb-000006
是当前词和第j个实体的相对距离。通过对子句的向量表示,则子句x转化为
Figure PCTCN2020111513-appb-000007
给定一个子句x,假设子句有n个词,则通过公式一对子句x进行向量表示,公式一为:
Figure PCTCN2020111513-appb-000008
其中,n表示每个子句包含的词的个数,
Figure PCTCN2020111513-appb-000009
表示句子x中第i个词的向量,Z n表示子句的向量表示。然后,将词向量和位置向量作为卷积神经网络的输入,通过卷积层、池化层和非线性层进行处理得到处理后句子的向量表示。具体地,根据公式二将所述两个子句的向量表示通过卷积层、池化层和非线性层进行处理,公式二为:[r x i] j=max[f(W 1z n+b 1)] j,其中,[r x i] j表示向量r x i的第j个向量,r x i指对某一行向量取最大池化操作后的值,W 1是卷积层的权重矩阵,f是非线性变换tanh函数,Z n表示子句的向量表示,b 1为偏值,为一个常量。本实施例中,卷积层在每个窗口大小为 k的两虚窗口,使用矩阵化向量对Z n进行卷积操作。
进一步地,如图2所示,当两个子句完成了卷积层、池化层和非线性层的处理后,将两个子句的向量表示通过一层隐层(Hidden Layer)进行融合,得到第一抽取信息,也是两个子句在CNN模型的最后的句子表示s 1。换言之,句子表示s 1为能够表示待处理的句子的最终的特征向量,待处理的句子中的信息被包含在这个特征向量中。
步骤S105:将所述句子输入到所述第二通道中,通过LSTM模型进行特征抽取,得到第二抽取信息。
具体地,对所述句子进行分词操作以获取L个分词,对所述L个分词分别进行词向量映射,以获取L*d维词向量矩阵,所述L个分词映射为一个d维词向量;将所述L个分词的d维词向量依顺序输入到所述长短期记忆网络模型中进行特征抽取,得到所述第二抽取信息。
本实施例中,如图2右边部分所示,将待处理的完整句子的向量表示输入到LSTM模型进行特征抽取,得到第二抽取信息,也就是待处理的句子在LSTM模型的最后的句子表示s 2。其中,当待处理的完整句子的向量表示输入到LSTM模型进行特征抽取的示意图同理于图3中的示意图,即将待处理的句子进行向量表示,将待处理的句子的向量表示通过卷积层,池化层和非线性层进行处理后得到待处理的句子输入到LSTM模型后得到待处理的句子的向量表示,其中,将待处理的句子的向量表示通过卷积层,池化层和非线性层进行处理具体的演算方法与步骤S104中均相同,此处不再赘述。
本实施例中,待处理的完整句子的向量表示为待处理的完整句子的词嵌入表示。
步骤S106:通过attention机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到所述双通道神经网络模型的softmax层以完成对所述目标实体之间的关系类别进行分类。
现有技术中,CNN模型对短句处理有巨大优势,而LSTM模型比较容易学习到长距离信息,对于抽取长距离句子特征有着优越的表现。本实施例中,在第一通道中,将待处理的句子的两个子句输入到CNN模型进行特征抽取,得到两个子句在CNN模型的最后的句子表示s 1,以及在第二通道中,将所述句子输入到LSTM模型进行特征抽取,得到待处理的句子在LSTM模型的最后的句子表示s 2后,为了可以同时处理长句和短句,以及考虑到最短依存路径有时会遗漏信息,本实施例中采用attention机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,即所述语句的最终的向量表示s。具体地,通过公式三及公式四将所述第一抽取信息与所述第二抽取信息进行加权汇总,
公式三为:
Figure PCTCN2020111513-appb-000010
其中α i为每个句子最终向量表示的权重,s i为句子进行特征抽取后的向量表示,例如上述的s 1,s 2
公式四为:
Figure PCTCN2020111513-appb-000011
其中t i是一个基于查询的方法,它由句子s i和预测关系r进行匹配;
公式五为:
t i=s iAr,其中,s i为句子进行特征抽取后的向量表示,例如,第一抽取信息s 1或者第二抽取信息s 2,A是一个加权的对角矩阵,r是和关系r相关的查询向量,是关系r的向量表示。
进一步地,本实施例中,通过softmax层定义条件概率,其中,条件概率的计算公式为:
Figure PCTCN2020111513-appb-000012
其中,n r代表预定义的关系数量。
本实施例中,通过softma层进行分类后,还通过公式六得到输出的所有关系类别的概率值,其中,公式六为:
o=Ms+d,其中,o为所有关系类别的概率值,M是关系矩阵表示,d是一个偏差向量。
本实施例中,o输出的所有关系类别的概率值,本质上为一个一维的列向量,列向量中的每个数字代表一个关系类别的概率值,表示目标实体为某种关系类别的可能性大小。
本实施例中,使用attention机制对CNN模型输出的表示与LSTM模型输出的表示进行融合,抽取了对于当前句子优秀的表示,使得最终训练出的关系抽取模型可以适合长短句。
本申请实施方式中提出的双通道神经网络模型,该模型融合了最短依存路径的关键信息,又使用原语句保持依存路径捕捉不到的信息,采用CNN来抽取局部信息,并用池化层将最有用的信息进行汇聚,抽取了优秀的局部信息,保留了对关系进行分类的关键信息。使用LSTM对整个句子进行信息抽取,对于长距离的句子可以抽取出优秀的表示。对两种模型抽取出的信息,通过attention机制进行加权汇总,得到当前句子的最终表示,且当前句子包含了对关系分类贡献最大的信息,最后经过双通道神经网络模型的softmax层进行分类,达到了对预定关系进行抽取的效果。
上面各种方法的步骤划分,只是为了描述清楚,并不对步骤执行的先后顺序进行限定,而且实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
在示例性的实施例中,可以将网络关系的相关数据上传至区块链中。基于网络关系的相关数据得到对应的摘要信息,具体来说,摘要信息由网络关系的相关数据进行散列处理得到,比如利用sha256s算法处理得到。将摘要信息上传至区块链可保证其安全性和对用户的公正透明性。用户设备可以从区块链中下载得该摘要信息,以便查证网络关系的相关数据是否被篡改。本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
本申请第二实施方式涉及一种神经网络关系抽取系统的框图,该计算机设备的神经网络关系抽取系统可以被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请实施例。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机可读指令段,以下描述将具体介绍本实施例中各程序模块的功能。
如图4所示,该神经网络关系抽取系统400可以包括建立模块410、获取模块420、最短路径生成模块430、第一模块440、第二模块450、分类模块460和训练模块470,其中:
建立模块410,用于构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道。
获取模块420,用于获取待处理的句子。
最短路径生成模块430,用于对所述句子进行依存句法分析,得到所述句子的两个子句。
具体地,通过句法分析器对所述句子进行依存句法分析,生成依存句法分析树,从所述依存句法分析树中找出目标实体间的两条最短依存路径,所述两条最短路径表示所述句子的两个子句。
第一抽取模块440,用于将所述两个子句输入到所述第一通道中,通过卷积神经网络(Convolutional Neural Networks,CNN)模型进行特征抽取,得到第一抽取信息。
第二抽取模块450,用于将所述句子输入到所述第二通道中,通过长短期记忆网络(Long Short-Term Memory,LSTM)模型进行特征抽取,得到第二抽取信息。
分类模块460,用于通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到所述双通道神经网络模型的softmax层以完成对所述目标实体之间的关系类别进行分类。
进一步地,所述神经网络关系抽取系统400还包括:
训练模块470,用于对构建的所述双通道神经网络模型进行训练。
所述训练模块470还用于:获取训练集;将所述训练集输入到所述双通道神经网络模型以输出所述训练集的预测关系类别;依据所述双通道神经网络模型输出的预测关系类别与所述训练集的实际关系类别,计算损失函数交叉熵;通过优化算法对所述损失函数进行最小化,以训练所述双通道神经网络模型。
所述第一抽取模块440还用于:将所述两个子句的词语进行向量表示;将所述两个子句的向量表示通过卷积层、池化层和非线性层进行处理;将处理后的两个子句的向量表示通过隐层进行融合,得到第一抽取信息。
本申请第三实施方式涉及一种计算机设备,参阅图5所示,是本申请神经网络关系抽取的计算机设备的硬件架构示意图。
本实施方式中,计算机设备500是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。例如,可以是智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图5所示,计算机设备500至少包括但不限于:可通过系统总线相互通信链接存储器510、处理器520、网络接口530。其中:
存储器510至少包括一种类型的可读存储介质,可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器510可以是计算机设备500的内部存储模块,例如该计算机设备400的硬盘或内存。在另一些实施例中,存储器510也可以是计算机设备500的外部存储设备,例如该计算机设备500上配备的插接式硬盘,智能存储卡(Smart Media Card,简称为SMC),安全数字(Secure Digital,简称为SD)卡,闪存卡(Flash Card)等。当然,存储器510还可以既包括计算机设备500的内部存储模块也包括其外部存储设备。本实施例中,存储器510通常用于存储安装于计算机设备500的操作系统和各类应用软件,例如区块链安全交易方法的程序代码等。此外,存储器510还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器420在一些实施例中可以是中央处理器(Central Processing Unit,简称为CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器520通常用于控制计算机设备500的总体操作,例如执行与计算机设备500进行数据交互或者通信相关的控制和处理等。本实施例中,处理器520用于运行存储器510中存储的程序代码或者处理数据。
网络接口530可包括无线网络接口或有线网络接口,该网络接口530通常用于在计算机设备500与其他计算机设备之间建立通信链接。例如,网络接口530用于通过网络将计算机设备500与外部终端相连,在计算机设备500与外部终端之间的建立数据传输通道和通信链接等。网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,简称为GSM)、宽带码分多址(Wideband Code Division  Multiple Access,简称为WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图5仅示出了具有部件510-530的计算机设备,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器510中的区块链安全交易方法还可以被分割为一个或者多个程序模块,并由一个或多个处理器(本实施例为处理器520)所执行,以完成本申请。
存储器510存储有可被至少一个处理器520执行的指令,指令被至少一个处理器520执行,以使至少一个处理器520能够执行上述神经网络关系抽取方法的步骤。
本申请的实施方式还提供了一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行如下步骤:
构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道;
获取待处理的句子;
对所述句子进行依存句法分析,得到所述句子的两个子句;
将所述两个子句输入到所述第一通道中,通过卷积神经网络模型进行特征抽取,得到第一抽取信息;
将所述句子输入到所述第二通道中,通过长短期记忆网络模型进行特征抽取,得到第二抽取信息;
通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到所述双通道神经网络模型的softmax层以完成对所述目标实体之间的关系类别进行分类。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (20)

  1. 一种神经网络关系抽取方法,其中,包括:
    构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道;
    获取待处理的句子;
    对所述句子进行依存句法分析,得到所述句子的两个子句;
    将所述两个子句输入到所述第一通道中,通过卷积神经网络模型进行特征抽取,得到第一抽取信息;
    将所述句子输入到所述第二通道中,通过长短期记忆网络模型进行特征抽取,得到第二抽取信息;
    通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到所述双通道神经网络模型的softmax层以完成对所述目标实体之间的关系类别进行分类。
  2. 根据权利要求1所述的神经网络关系抽取方法,其中,还包括:
    对构建的所述双通道神经网络模型进行训练。
  3. 根据权利要求2所述的神经网络关系抽取方法,其中,所述对构建的所述双通道神经网络模型进行训练,包括:
    获取训练集;
    将所述训练集输入到所述双通道神经网络模型以输出所述训练集的预测关系类别;
    依据所述双通道神经网络模型输出的预测关系类别与所述训练集的实际关系类别,计算损失函数交叉熵;
    通过优化算法对所述损失函数进行最小化,以训练所述双通道神经网络模型。
  4. 根据权利要求1所述的神经网络关系抽取方法,其中,所述将所述两个子句输入到所述第一通道中,通过卷积神经网络模型进行特征抽取,得到第一抽取信息,包括:
    将所述两个子句的词语进行向量表示;
    将所述两个子句的向量表示通过卷积层、池化层和非线性层进行处理;
    将处理后的两个子句的向量表示通过隐层进行融合,得到第一抽取信息。
  5. 根据权利要求1所述的神经网络关系抽取方法,其中,所述将所述句子输入到所述第二通道中,通过长短期记忆网络模型进行特征抽取,得到第二抽取信息,包括:
    对所述句子进行分词操作以获取L个分词;
    对所述L个分词分别进行词向量映射,以获取L*d维词向量矩阵,所述L个分词映射为一个d维词向量;
    将所述L个分词的d维词向量依顺序输入到所述长短期记忆网络模型中进行特征抽取,得到所述第二抽取信息。
  6. 根据权利要求1所述的神经网络关系抽取方法,其中,所述对所述句子进行依存句法分析,得到所述句子的两个子句,包括:
    通过句法分析器对所述句子进行依存句法分析,生成依存句法分析树;
    从所述依存句法分析树中找出目标实体间的两条最短依存路径,所述两条最短路径表示所述句子的两个子句。
  7. 一种神经网络关系抽取系统,其中,包括:
    建立模块,用于构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道;
    获取模块,用于获取待处理的句子;
    最短路径生成模块,用于对所述句子进行依存句法分析,得到所述句子的两个子句;
    第一抽取模块,用于在所述第一通道中,将所述两个子句输入到所述第一通道中,通过卷积神经网络模型进行特征抽取,得到第一抽取信息;
    第二抽取模块,用于在所述第二通道中,将所述句子输入到所述第二通道中,通过长短期记忆网络模型进行特征抽取,得到第二抽取信息;
    分类模块,用于通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到所述双通道神经网络模型的softmax层以完成对所述目标实体之间的关系类别进行分类。
  8. 根据权利要求7所述的神经网络关系抽取系统,其中,所述第一抽取模块还用于:
    将所述两个子句的词语进行向量表示;
    将所述两个子句的向量表示通过卷积层、池化层和非线性层进行处理;
    将处理后的两个子句的向量表示通过隐层进行融合,得到第一抽取信息。
  9. 一种计算机设备,其中,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述计算机可读指令被处理器执行时实现以下步骤:
    构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道;
    获取待处理的句子;
    对所述句子进行依存句法分析,得到所述句子的两个子句;
    将所述两个子句输入到所述第一通道中,通过卷积神经网络模型进行特征抽取,得到第一抽取信息;
    将所述句子输入到所述第二通道中,通过长短期记忆网络模型进行特征抽取,得到第二抽取信息;
    通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到所述双通道神经网络模型的softmax层以完成对所述目标实体之间的关系类别进行分类。
  10. 根据权利要求9所述的计算机设备,其中,所述计算机可读指令被处理器执行时还实现以下步骤:
    对构建的所述双通道神经网络模型进行训练。
  11. 根据权利要求10所述的计算机设备,其中,所述计算机可读指令被处理器执行时还实现以下步骤:
    获取训练集;
    将所述训练集输入到所述双通道神经网络模型以输出所述训练集的预测关系类别;
    依据所述双通道神经网络模型输出的预测关系类别与所述训练集的实际关系类别,计算损失函数交叉熵;
    通过优化算法对所述损失函数进行最小化,以训练所述双通道神经网络模型。
  12. 根据权利要求9所述的计算机设备,其中,所述计算机可读指令被处理器执行时还实现以下步骤:
    将所述两个子句的词语进行向量表示;
    将所述两个子句的向量表示通过卷积层、池化层和非线性层进行处理;
    将处理后的两个子句的向量表示通过隐层进行融合,得到第一抽取信息。
  13. 根据权利要求9所述的计算机设备,其中,所述计算机可读指令被处理器执行时还实现以下步骤:
    对所述句子进行分词操作以获取L个分词;
    对所述L个分词分别进行词向量映射,以获取L*d维词向量矩阵,所述L个分词映射为一个d维词向量;
    将所述L个分词的d维词向量依顺序输入到所述长短期记忆网络模型中进行特征抽取,得到所述第二抽取信息。
  14. 根据权利要求9所述的计算机设备,其中,所述计算机可读指令被处理器执行时还实现以下步骤:
    通过句法分析器对所述句子进行依存句法分析,生成依存句法分析树;
    从所述依存句法分析树中找出目标实体间的两条最短依存路径,所述两条最短路径表示所述句子的两个子句。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行如下步骤:
    构建双通道神经网络模型,所述双通道神经网络模型包括第一通道及第二通道;
    获取待处理的句子;
    对所述句子进行依存句法分析,得到所述句子的两个子句;
    将所述两个子句输入到所述第一通道中,通过卷积神经网络模型进行特征抽取,得到第一抽取信息;
    将所述句子输入到所述第二通道中,通过长短期记忆网络模型进行特征抽取,得到第二抽取信息;
    通过注意力机制将所述第一抽取信息与所述第二抽取信息进行加权汇总得到所述句子最终的抽取特征,将所述最终的抽取特征输入到所述双通道神经网络模型的softmax层以完成对所述目标实体之间的关系类别进行分类。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时还实现以下步骤:
    对构建的所述双通道神经网络模型进行训练。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时还实现以下步骤:
    获取训练集;
    将所述训练集输入到所述双通道神经网络模型以输出所述训练集的预测关系类别;
    依据所述双通道神经网络模型输出的预测关系类别与所述训练集的实际关系类别,计算损失函数交叉熵;
    通过优化算法对所述损失函数进行最小化,以训练所述双通道神经网络模型。
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时还实现以下步骤:
    将所述两个子句的词语进行向量表示;
    将所述两个子句的向量表示通过卷积层、池化层和非线性层进行处理;
    将处理后的两个子句的向量表示通过隐层进行融合,得到第一抽取信息。
  19. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时还实现以下步骤:
    对所述句子进行分词操作以获取L个分词;
    对所述L个分词分别进行词向量映射,以获取L*d维词向量矩阵,所述L个分词映射为一个d维词向量;
    将所述L个分词的d维词向量依顺序输入到所述长短期记忆网络模型中进行特征抽取,得到所述第二抽取信息。
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时还实现以下步骤:
    通过句法分析器对所述句子进行依存句法分析,生成依存句法分析树;
    从所述依存句法分析树中找出目标实体间的两条最短依存路径,所述两条最短路径表示所述句子的两个子句。
PCT/CN2020/111513 2020-07-30 2020-08-26 神经网络关系抽取方法、计算机设备及可读存储介质 WO2021174774A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010752459.2 2020-07-30
CN202010752459.2A CN111898364B (zh) 2020-07-30 2020-07-30 神经网络关系抽取方法、计算机设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2021174774A1 true WO2021174774A1 (zh) 2021-09-10

Family

ID=73182595

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111513 WO2021174774A1 (zh) 2020-07-30 2020-08-26 神经网络关系抽取方法、计算机设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN111898364B (zh)
WO (1) WO2021174774A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113990473A (zh) * 2021-10-28 2022-01-28 上海昆亚医疗器械股份有限公司 一种医疗设备运维信息收集分析系统及其使用方法
CN114065702A (zh) * 2021-09-28 2022-02-18 南京邮电大学 一种融合实体关系和事件要素的事件检测方法
CN114417846A (zh) * 2021-11-25 2022-04-29 湘潭大学 一种基于注意力贡献度的实体关系抽取方法及其用途
CN114861630A (zh) * 2022-05-10 2022-08-05 马上消费金融股份有限公司 信息获取及相关模型的训练方法、装置、电子设备和介质
WO2023060633A1 (zh) * 2021-10-12 2023-04-20 深圳前海环融联易信息科技服务有限公司 增强语义的关系抽取方法、装置、计算机设备及存储介质
CN116108206A (zh) * 2023-04-13 2023-05-12 中南大学 一种金融数据实体关系的联合抽取方法及相关设备
CN116386895A (zh) * 2023-04-06 2023-07-04 之江实验室 基于异构图神经网络的流行病舆情实体识别方法与装置
WO2023134069A1 (zh) * 2022-01-14 2023-07-20 平安科技(深圳)有限公司 实体关系的识别方法、设备及可读存储介质
CN117054396A (zh) * 2023-10-11 2023-11-14 天津大学 基于双路径无乘法神经网络的拉曼光谱检测方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528326B (zh) * 2020-12-09 2024-01-02 维沃移动通信有限公司 信息处理方法、装置及电子设备
CN112560481B (zh) * 2020-12-25 2024-05-31 北京百度网讯科技有限公司 语句处理方法、设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563653A (zh) * 2017-12-21 2018-09-21 清华大学 一种用于知识图谱中知识获取模型的构建方法及系统
CN109710932A (zh) * 2018-12-22 2019-05-03 北京工业大学 一种基于特征融合的医疗实体关系抽取方法
WO2019220128A1 (en) * 2018-05-18 2019-11-21 Benevolentai Technology Limited Graph neutral networks with attention

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11574122B2 (en) * 2018-08-23 2023-02-07 Shenzhen Keya Medical Technology Corporation Method and system for joint named entity recognition and relation extraction using convolutional neural network
CN109783618B (zh) * 2018-12-11 2021-01-19 北京大学 基于注意力机制神经网络的药物实体关系抽取方法及系统
CN110020671B (zh) * 2019-03-08 2023-04-18 西北大学 基于双通道cnn-lstm网络的药物关系分类模型构建及分类方法
CN110598001A (zh) * 2019-08-05 2019-12-20 平安科技(深圳)有限公司 联合实体关系抽取方法、装置及存储介质
CN111428481A (zh) * 2020-03-26 2020-07-17 南京搜文信息技术有限公司 一种基于深度学习的实体关系抽取方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563653A (zh) * 2017-12-21 2018-09-21 清华大学 一种用于知识图谱中知识获取模型的构建方法及系统
WO2019220128A1 (en) * 2018-05-18 2019-11-21 Benevolentai Technology Limited Graph neutral networks with attention
CN109710932A (zh) * 2018-12-22 2019-05-03 北京工业大学 一种基于特征融合的医疗实体关系抽取方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG XIAOBIN, CHEN FUCAI, HUANG RUIYANG: "Relation Extraction based on CNN and Bi-LSTM", CHINESE JOURNAL OF NETWORK AND INFORMATION SECURITY, vol. 4, no. 9, 30 September 2018 (2018-09-30), pages 44 - 51, XP055842237, ISSN: 2096-109X, DOI: 10.11959/j.issn.2096-109x.2018074 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114065702A (zh) * 2021-09-28 2022-02-18 南京邮电大学 一种融合实体关系和事件要素的事件检测方法
WO2023060633A1 (zh) * 2021-10-12 2023-04-20 深圳前海环融联易信息科技服务有限公司 增强语义的关系抽取方法、装置、计算机设备及存储介质
CN113990473B (zh) * 2021-10-28 2022-09-30 上海昆亚医疗器械股份有限公司 一种医疗设备运维信息收集分析系统及其使用方法
CN113990473A (zh) * 2021-10-28 2022-01-28 上海昆亚医疗器械股份有限公司 一种医疗设备运维信息收集分析系统及其使用方法
CN114417846A (zh) * 2021-11-25 2022-04-29 湘潭大学 一种基于注意力贡献度的实体关系抽取方法及其用途
CN114417846B (zh) * 2021-11-25 2023-12-19 湘潭大学 一种基于注意力贡献度的实体关系抽取方法
WO2023134069A1 (zh) * 2022-01-14 2023-07-20 平安科技(深圳)有限公司 实体关系的识别方法、设备及可读存储介质
CN114861630A (zh) * 2022-05-10 2022-08-05 马上消费金融股份有限公司 信息获取及相关模型的训练方法、装置、电子设备和介质
CN116386895B (zh) * 2023-04-06 2023-11-28 之江实验室 基于异构图神经网络的流行病舆情实体识别方法与装置
CN116386895A (zh) * 2023-04-06 2023-07-04 之江实验室 基于异构图神经网络的流行病舆情实体识别方法与装置
CN116108206A (zh) * 2023-04-13 2023-05-12 中南大学 一种金融数据实体关系的联合抽取方法及相关设备
CN117054396A (zh) * 2023-10-11 2023-11-14 天津大学 基于双路径无乘法神经网络的拉曼光谱检测方法及装置
CN117054396B (zh) * 2023-10-11 2024-01-05 天津大学 基于双路径无乘法神经网络的拉曼光谱检测方法及装置

Also Published As

Publication number Publication date
CN111898364A (zh) 2020-11-06
CN111898364B (zh) 2023-09-26

Similar Documents

Publication Publication Date Title
WO2021174774A1 (zh) 神经网络关系抽取方法、计算机设备及可读存储介质
WO2023065545A1 (zh) 风险预测方法、装置、设备及存储介质
WO2020140386A1 (zh) 基于TextCNN知识抽取方法、装置、计算机设备及存储介质
US20220050967A1 (en) Extracting definitions from documents utilizing definition-labeling-dependent machine learning background
US7593927B2 (en) Unstructured data in a mining model language
US9875319B2 (en) Automated data parsing
US20220171936A1 (en) Analysis of natural language text in document
WO2022048363A1 (zh) 网站分类方法、装置、计算机设备及存储介质
WO2020147409A1 (zh) 一种文本分类方法、装置、计算机设备及存储介质
CN112287069B (zh) 基于语音语义的信息检索方法、装置及计算机设备
CN115858825A (zh) 基于机器学习的设备故障诊断知识图谱构建方法和装置
CN113051914A (zh) 一种基于多特征动态画像的企业隐藏标签抽取方法及装置
CN113051356A (zh) 开放关系抽取方法、装置、电子设备及存储介质
CN113704420A (zh) 文本中的角色识别方法、装置、电子设备及存储介质
CN115730597A (zh) 多级语义意图识别方法及其相关设备
CN116821373A (zh) 基于图谱的prompt推荐方法、装置、设备及介质
CN114818682A (zh) 基于自适应实体路径感知的文档级实体关系抽取方法
WO2022073341A1 (zh) 基于语音语义的疾病实体匹配方法、装置及计算机设备
CN113254649B (zh) 敏感内容识别模型的训练方法、文本识别方法及相关装置
CN116383412B (zh) 基于知识图谱的功能点扩增方法和系统
CN112395401A (zh) 自适应负样本对采样方法、装置、电子设备及存储介质
WO2022127124A1 (zh) 基于元学习的实体类别识别方法、装置、设备和存储介质
CN112529743B (zh) 合同要素抽取方法、装置、电子设备及介质
CN113434789B (zh) 基于多维度文本特征的搜索排序方法及相关设备
CN115129885A (zh) 实体链指方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20922939

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20922939

Country of ref document: EP

Kind code of ref document: A1