WO2021159762A1 - 数据关系抽取方法、装置、电子设备及存储介质 - Google Patents

数据关系抽取方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021159762A1
WO2021159762A1 PCT/CN2020/125342 CN2020125342W WO2021159762A1 WO 2021159762 A1 WO2021159762 A1 WO 2021159762A1 CN 2020125342 W CN2020125342 W CN 2020125342W WO 2021159762 A1 WO2021159762 A1 WO 2021159762A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
embedding vector
text
sequence
word embedding
Prior art date
Application number
PCT/CN2020/125342
Other languages
English (en)
French (fr)
Inventor
颜泽龙
王健宗
吴天博
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021159762A1 publication Critical patent/WO2021159762A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application belongs to the field of artificial intelligence technology, and in particular relates to a data relationship extraction method, device, electronic equipment, and storage medium.
  • Information extraction refers to extracting various useful information from natural language processing texts, including but not limited to entities, relationships, events, etc.
  • relationship extraction is a task in information extraction, which is mainly used to extract relationships between entities.
  • the embodiments of the present application provide a data relationship extraction method, device, electronic device, and storage medium to solve the problem that the traditional feature method in the prior art requires a lot of professional energy and professional knowledge, and it is easy to introduce some human efforts. Errors, and many deeper features are difficult to be found directly, leading to the problem of poor information utilization and extraction.
  • the first aspect of the embodiments of the present application provides a data relationship extraction method, including:
  • a second aspect of the embodiments of the present application provides a data relationship extraction device, including:
  • the first obtaining module is used to obtain the text sequence obtained by word segmentation from the text to be processed
  • the second acquisition module is configured to acquire the related dependent words of each word in the text sequence and the semantic relationship between each word and the related dependent words based on the syntactic dependency tree;
  • the generating module is used to generate the target word corresponding to the text to be processed according to each word, the related dependent words of each word, and the semantic relationship between each word and the related dependent words Embedding vector
  • the third acquisition module is configured to input the target word embedding vector into a deep convolutional neural network, and perform entity relationship information extraction on the to-be-processed text based on the target word embedding vector through the deep convolutional neural network, Obtain a target entity relationship output by the deep convolutional network with a set predicted probability value.
  • the third aspect of the embodiments of the present application provides an electronic device including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, The steps of the method as described in the first aspect are implemented.
  • the fourth aspect of the embodiments of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program implements the steps of the method described in the first aspect when the computer program is executed by a processor.
  • the fifth aspect of the present application provides a computer program product, which when the computer program product runs on an electronic device, causes the electronic device to perform the steps of the method described in the first aspect.
  • the text sequence obtained by obtaining word segmentation from the to-be-processed text is based on the syntactic dependency tree to obtain the related dependent words of each word in the text sequence and the relationship between each word and the related dependent words.
  • Semantic relationship generate the word embedding vector corresponding to the text to be processed, and extract entity relationship information from the text to be processed based on the word embedding vector through the deep convolutional neural network, and obtain the target with the set predicted probability value output by the deep convolutional network Entity relations are realized through the deep convolutional network model to solve the relation extraction task, and the syntactic dependency tree is used to realize the analysis of the syntactic characteristics of the text information, realize the construction of the text features, and fully consider the semantic relationship between the texts, without the need for artificial construction features , You can achieve better results and improve the convenience and accuracy of information extraction of the final entity relationship.
  • FIG. 1 is a first flowchart of a data relationship extraction method provided by an embodiment of the present application
  • FIG. 2 is a second flowchart of a data relationship extraction method provided by an embodiment of the present application.
  • FIG. 3 is a structural diagram of a data relationship extraction device provided by an embodiment of the present application.
  • Fig. 4 is a structural diagram of an electronic device provided by an embodiment of the present application.
  • the data relationship extraction method involved in the embodiments of the present application may be executed by a control terminal or an electronic device.
  • the data relationship extraction method involved in the embodiments of the present application is applied to smart medical scenarios, thereby promoting the construction of smart cities.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context .
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • the electronic devices described in the embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers with touch-sensitive surfaces (for example, touch screen displays and/or touch pads).
  • the device is not a portable communication device, but a desktop computer with a touch-sensitive surface (e.g., touch screen display and/or touch pad).
  • the electronic device may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.
  • Electronic devices support various applications, such as one or more of the following: drawing applications, presentation applications, word processing applications, website creation applications, disk burning applications, spreadsheet applications, game applications, telephones Apps, video conferencing apps, email apps, instant messaging apps, exercise support apps, photo management apps, digital camera apps, digital camera apps, web browsing apps, digital music player apps And/or digital video player application.
  • drawing applications such as one or more of the following: drawing applications, presentation applications, word processing applications, website creation applications, disk burning applications, spreadsheet applications, game applications, telephones Apps, video conferencing apps, email apps, instant messaging apps, exercise support apps, photo management apps, digital camera apps, digital camera apps, web browsing apps, digital music player apps And/or digital video player application.
  • Various application programs that can be executed on the electronic device can use at least one common physical user interface device such as a touch-sensitive surface.
  • One or more functions of the touch-sensitive surface and corresponding information displayed on the electronic device can be adjusted and/or changed between applications and/or within corresponding applications.
  • the common physical architecture for example, touch-sensitive surfaces
  • the common physical architecture of electronic devices can support various applications with user interfaces that are intuitive and transparent to users.
  • FIG. 1 is a first flowchart of a data relationship extraction method provided by an embodiment of the present application. As shown in Figure 1, a data relationship extraction method includes the following steps:
  • Step 101 Obtain a text sequence obtained by word segmentation from a text to be processed.
  • the constituent elements in the text sequence are words obtained from word segmentation in the text to be processed. That is, the text sequence is specifically a sequence of words corresponding to the text to be processed.
  • the text to be processed may specifically be a text of a medical case, a text of historical documents, and so on. This step realizes the conversion of the to-be-processed text into a text sequence in units of words.
  • the text sequence obtained by word segmentation obtained from the text to be processed includes:
  • the to-be-processed text needs to be segmented.
  • the statistical word segmentation method is used to label the text sequence to convert the word segmentation problem into a word classification problem.
  • each word can include 4 categories: word beginning (B), word middle (M), word ending (E), and single word into word (S), so that each word in the text to be processed can be classified to obtain To indicate the labeling information of the category of the words of each character.
  • the words contained in the text to be processed are obtained based on the labeling information, and a text sequence composed of these words is formed.
  • the text to be processed [Xiao Zhang’s doctor is Xiao Li]
  • the text length is 8, after word segmentation, the predicted label of each word is [B E S B E S B E]]
  • a text sequence of length 5 words as a unit of [Xiao Zhang’s doctor is Xiao Li] is obtained.
  • the process of classifying each word in the text to be processed can be realized by using pre-set information such as the structure of the word structure and the meaning of the text.
  • Step 102 based on the syntactic dependency tree, obtain the related dependent words of each word in the text sequence and the semantic relationship between each word and the related dependent words.
  • the syntactic dependency tree is used to describe the dependency relationship between various words and is constructed in advance.
  • the related dependent words are also words in the text sequence.
  • the related dependent words are specifically words that have a syntactic dependency relationship with each word.
  • the word that has a syntactic dependency with "Xiao Zhang” is “doctor”
  • the word that has a syntactic dependency with " ⁇ ” is also “doctor.”
  • the determination of words needs to be determined based on the syntactic dependency relationship specified in the syntactic dependency tree.
  • each word has a semantic relationship with related dependent words, and the semantic relationship needs to be determined based on the syntactic dependence relationship specified in the syntactic dependence tree.
  • the semantic relationship can specifically have different types, such as a master-slave-passive relationship, fixed Matching relationship and so on.
  • Step 103 Generate a target word embedding vector corresponding to the text to be processed according to each word, the related dependent words of each word, and the semantic relationship between each word and the related dependent words.
  • Word Embedding is an important concept in natural language (NLP).
  • NLP natural language
  • the word embedding vector can be used to convert a word into a fixed-length vector representation to facilitate mathematical processing.
  • the target word embedding vector is jointly generated to facilitate subsequent passage
  • the deep convolutional neural network performs mathematical analysis and processing.
  • the syntactic dependency tree is used to analyze the syntactic characteristics of the text information, and the text sequence and the related dependent words of each word in the text sequence and the semantic relationship between each word and the related dependent words are obtained and jointly generated and processed
  • the target word embedding vector corresponding to the text realizes the construction of text features, fully considering the semantic relationship between texts, does not require artificial construction of features, avoids the introduction of some human errors, and is conducive to the direct discovery of deeper text features, and improves The convenience and accuracy of information extraction of the final entity relationship.
  • Step 104 Input the target word embedding vector to the deep convolutional neural network, and extract entity relationship information from the text to be processed based on the target word embedding vector through the deep convolutional neural network, and obtain the set predicted probability value output by the deep convolutional network The target entity relationship.
  • the entity relationship information is specifically the relationship information between entities in the text to be processed.
  • the entities in the text to be processed [Xiao Zhang’s doctor is Xiao Li] are "Xiao Zhang”, “Doctor” and “Xiao Li”, and the target word embedding vector corresponding to the text to be processed is input to the deep convolutional neural network.
  • the target word embedding vector corresponding to the text to be processed is input to the deep convolutional neural network.
  • Target entity relationship when obtaining the target entity relationship output by the deep convolutional network with a set predicted probability value, it is specifically to obtain the predicted probability value of the L entity relationships output by the deep convolutional network, and the entity relationship with the highest predicted probability value is determined as Target entity relationship.
  • the related dependent words of each word in the text sequence and the semantic relationship between each word and the related dependent words are obtained, and the generated
  • the word embedding vector corresponding to the text to be processed through the deep convolutional neural network, based on the word embedding vector, extracts entity relationship information from the text to be processed, and obtains the target entity relationship output by the deep convolution network with the set predicted probability value to achieve
  • the deep convolutional network model is used to solve the relationship extraction task, and the syntactic dependency tree is used to analyze the syntactic features of the text information and realize the construction of the text features.
  • the semantic relationship between the texts is fully considered, and it can be obtained without artificially constructing features. Better results, improve the convenience and accuracy of information extraction of the final entity relationship.
  • the embodiments of the present application also provide different implementations of the data relationship extraction method.
  • FIG. 2 is a second flowchart of a data relationship extraction method provided by an embodiment of the present application.
  • a data relationship extraction method includes the following steps:
  • Step 201 Obtain a text sequence obtained by word segmentation from the text to be processed.
  • Step 202 based on the syntactic dependency tree, obtain the related dependent words of each word in the text sequence and the semantic relationship between each word and the related dependent words.
  • step 102 The implementation process of this step is the same as the implementation process of step 102 in the foregoing embodiment, and will not be repeated here.
  • Step 203 Generate a first word embedding vector according to each word, generate a second word embedding vector corresponding to the first word embedding vector according to the related dependent words of each word, and according to the relationship between each word and the related dependent words Semantic relationship, generate the third word embedding vector.
  • the first word embedding vector is generated according to each word
  • the second word embedding vector corresponding to the first word embedding vector is generated according to the related dependent words of each word, and, according to each word
  • the semantic relationship between words and related dependent words is used to generate the third word embedding vector, including:
  • This process realizes the generation of the related dependent words of each word and the sequence corresponding to the semantic relationship between each word and the related dependent words, namely the related word sequence and the semantic relationship sequence. Then standardize the text length for the text sequence containing each word, the related word sequence containing related dependent words, and the semantic relationship sequence containing the semantic relationship between each word and related dependent words, so as to standardize each sequence after the processing. Based on this, the corresponding word embedding vector is generated.
  • the standardization of text length is to set the standard text length to N. Sequences with a length exceeding N will be truncated, and only the first N words will be retained. For sequences with a length less than N, the content will be filled with zeros to obtain three
  • the sequence of length N is the processed text sequence, related word sequence and semantic relation sequence.
  • a word embedding vector corresponds to a word in the sequence, for example, corresponds to a word in a text sequence, or a related dependent word in a related word sequence, or a semantic relationship description word in a semantic relationship sequence.
  • the foregoing steps generate a related word sequence containing the related dependent words of each word and a semantic relationship sequence containing the semantic relationship between each word and the related dependent words, which specifically include:
  • the related dependent words of each word, and the semantic relationship between each word and related dependent words the semantic triples of each word in the text sequence are obtained; the semantic triples of each word are integrated , Get the related word sequence containing the related dependent words of each word, and the semantic relationship sequence containing the semantic relationship between each word and the related dependent words.
  • the semantic triple is a combination of elements including a word in a text sequence, a related dependent word of the word, and a semantic relationship between the word and the related dependent word.
  • Each word in the text sequence corresponds to a semantic triple.
  • the related dependent words in the triples of each word can be integrated to obtain the related word sequence, and the words in the triples of each word and the related dependents
  • the semantic relations between words are integrated to obtain a sequence of semantic relations.
  • the text sequence [Xiao Zhang’s doctor is Xiao Li], combined with the syntactic dependency tree to obtain a total of 5 triples formed by each word: (Xiao Zhang, doctor, 1), ( ⁇ , doctor, 2), (doctor , Yes, 3), (Yes, Yes, 4), (Xiao Li, Yes, 5).
  • the corresponding related word sequence can be obtained as [Doctor, doctor is yes], and the corresponding semantic relationship sequence is [1 2 3 4 5], and a number represents a semantic relationship.
  • the determination of the triples needs to be implemented specifically based on the syntactic dependency relationship specified in the syntactic dependency tree.
  • Step 204 Combine the first word embedding vector, the second word embedding vector, and the third word embedding vector to obtain a target word embedding vector corresponding to the text to be processed.
  • the first word embedding vector, the second word embedding vector, and the third word embedding vector are merged, specifically based on the corresponding relationship of each element in the processed text sequence, related word sequence, and semantic relationship sequence, such as text sequence [Xiao Zhang’s doctor is Xiao Li], the related word sequence is [Doctor, doctor is yes], and the corresponding semantic relationship sequence is [1 2 3 4 5], then the first element in the text sequence "Xiao Zhang”
  • the first word embedding vector, the second word embedding vector of the first element "doctor” in the related word sequence, and the third word embedding vector of the first element "1" in the semantic relationship sequence are superimposed to realize the merging process.
  • combining the first word embedding vector, the second word embedding vector, and the third word embedding vector to obtain the target word embedding vector corresponding to the text to be processed includes:
  • N is the text sequence and related words after normalization in length The number of elements contained in the sequence and the semantic relationship sequence
  • M is the vector dimension of the first word embedding vector, the second word embedding vector, and the third word embedding vector.
  • the number of elements contained in the standardized text sequence, related word sequence, and semantic relationship sequence is the same, and they are all N.
  • the vector dimensions of the first word embedding vector, the second word embedding vector, and the third word embedding vector are also the same, all of which are M, and the same vector dimension is used to realize the numerical expression of the word embedding of different words.
  • the vector dimension for the numerical expression of each element of the word embedding is also increased to 3M. This process realizes the reasonable generation of the target word embedding vector corresponding to the text to be processed.
  • Step 205 Input the target word embedding vector to the deep convolutional neural network, and extract entity relationship information from the text to be processed based on the target word embedding vector through the deep convolutional neural network, and obtain the set predicted probability value output by the deep convolutional network The target entity relationship.
  • step 104 The implementation process of this step is the same as the implementation process of step 104 in the foregoing embodiment, and will not be repeated here.
  • the related dependent words of each word in the text sequence and the semantic relationship between each word and the related dependent words are obtained, respectively Generate the corresponding word embedding vector according to each word, the related dependent words of each word, and the semantic relationship between each word and the related dependent words, and merge the word embedding vectors to obtain the target word embedding vector corresponding to the text to be processed .
  • the deep convolutional neural network based on the target word embedding vector, the entity relationship information is extracted from the text to be processed, and the target entity relationship with the set predicted probability value output by the deep convolution network is obtained, and the relationship is solved through the deep convolution network model.
  • the corresponding target entity relationship is obtained based on the text to be processed.
  • the target entity relationship is obtained by extracting entity relationship information from a deep convolutional neural network, for example, using syntax
  • the dependency tree is obtained by extracting entity relationship information.
  • Uploading the target entity relationship to the blockchain can ensure its security and fairness and transparency to users.
  • the user equipment can download the target entity relationship from the blockchain to verify whether the target entity relationship has been tampered with.
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • FIG. 3 is a structural diagram of a data relationship extraction device provided by an embodiment of the present application. For ease of description, only parts related to the embodiment of the present application are shown.
  • the data relation extraction device 300 includes:
  • the first obtaining module 301 is configured to obtain a text sequence obtained by word segmentation from the text to be processed;
  • the second obtaining module 302 is configured to obtain the related dependent words of each word in the text sequence and the semantic relationship between each word and the related dependent words based on the syntactic dependency tree;
  • the generating module 303 is configured to generate a target corresponding to the to-be-processed text according to each word, the related dependent words of each word, and the semantic relationship between each word and the related dependent words Word embedding vector;
  • the third acquisition module 304 is configured to input the target word embedding vector to a deep convolutional neural network, and perform entity relationship information extraction on the to-be-processed text based on the target word embedding vector through the deep convolutional neural network , Acquiring the target entity relationship output by the deep convolutional network with a set predicted probability value.
  • the generating module includes:
  • the first generation sub-module is configured to generate a first word embedding vector according to each word, and generate a second word embedding vector corresponding to the first word embedding vector according to the dependent words of each word, and , Generating a third word embedding vector according to the semantic relationship between each word and the related dependent words;
  • the second generation sub-module is configured to merge the first word embedding vector, the second word embedding vector, and the third word embedding vector to obtain a target word embedding vector corresponding to the text to be processed.
  • the first generation sub-module is specifically used for:
  • the second generation sub-module is specifically used for:
  • the N is the number of elements included in the text sequence, the related word sequence, and the semantic relationship sequence after the length is standardized;
  • the M is the first word embedding vector, the second word The embedding vector and the vector dimension of the third word embedding vector.
  • the first generation sub-module is more specifically used for:
  • the first acquisition module is specifically used for:
  • the to-be-processed text is parsed to obtain the words constituting the to-be-processed text, and a text sequence with the words as constituent elements is generated.
  • the data relationship extraction device provided in the embodiment of the present application can implement each process of the embodiment of the above data relationship extraction method, and can achieve the same technical effect. In order to avoid repetition, it will not be repeated here.
  • Fig. 4 is a structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 4 of this embodiment includes: at least one processor 40 (only one is shown in FIG. 4), a memory 41, and a memory 41 that is stored in the memory 41 and can be stored in the at least one processor.
  • a computer program 42 running on the processor 40 when the processor 40 executes the computer program 42, the steps in any of the above-mentioned data relationship extraction method embodiments are implemented, such as steps 101 to 104 shown in FIG. 1, or shown in FIG. ⁇ steps 201 to 205.
  • the processor 40 executes the computer program 42
  • the functions of the units in the embodiment corresponding to FIG. 3 are implemented, for example, the functions of the modules 301 to 304 shown in FIG. 3, please refer to the corresponding implementation in FIG. 3 for details The related description in the example will not be repeated here.
  • the computer program 42 may be divided into one or more units, and the one or more units are stored in the memory 41 and executed by the processor 40 to complete the application.
  • the one or more units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 42 in the electronic device 4.
  • the computer program 42 may be divided into a first acquisition module, a second acquisition module, a generation module, and a third acquisition module, and the specific functions of each unit are as described above.
  • the electronic device 4 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the electronic device 4 may include, but is not limited to, a processor 40 and a memory 41.
  • FIG. 4 is only an example of the electronic device 4, and does not constitute a limitation on the electronic device 4. It may include more or less components than shown, or a combination of certain components, or different components.
  • the electronic device may also include input and output devices, network access devices, buses, and so on.
  • the processor 40 may be a central processing unit (Central Processing Unit, CPU), it can also be other general-purpose processors, Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 41 may be an internal storage unit of the electronic device 4, such as a hard disk or a memory of the electronic device 4.
  • the memory 41 may also be an external storage device of the electronic device 4, such as a plug-in hard disk equipped on the electronic device 4, a smart memory card (Smart Media Card, SMC), Secure Digital (Secure Digital, SD) card, flash memory card (Flash Card) and so on.
  • the memory 41 may also include both an internal storage unit of the electronic device 4 and an external storage device.
  • the memory 41 is used to store the computer program and other programs and data required by the electronic device.
  • the memory 41 can also be used to temporarily store data that has been output or will be output.
  • the disclosed device/electronic device and method may be implemented in other ways.
  • the device/electronic device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units.
  • components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the present application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media etc.
  • the content contained in the computer-readable medium can be appropriately added or deleted according to the requirements of the legislation and patent practice in the jurisdiction.
  • the computer-readable medium Does not include electrical carrier signals and telecommunication signals.
  • This application implements all or part of the processes in the above-mentioned embodiments of the method, and can also be implemented by a computer program product.
  • the computer program product When the computer program product is run on an electronic device, the implementation of the electronic device can be realized when the electronic device is executed. A step of.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请适用于人工智能技术领域,提供一种数据关系抽取方法、装置、电子设备及存储介质,其中方法包括:从待处理文本中获取文本序列,基于句法依存树,获取文本序列中每一词的相关依赖词及每一词与相关依赖词之间的语义关系;根据该每一词、每一词的相关依赖词及语义关系,生成与待处理文本对应的目标词嵌入向量;将目标词嵌入向量输入至深度卷积神经网络,基于目标词嵌入向量对待处理文本进行实体关系信息抽取,获取深度卷积网络输出的具有设定预测概率值的目标实体关系。本申请可应用于智慧医疗场景中,提升相关信息中实体关系的抽取便捷度及准确度,推动智慧城市的建设。

Description

数据关系抽取方法、装置、电子设备及存储介质
本申请要求于2020年09月08日提交中国专利局,申请号为202010935378.6,发明名称为“数据关系抽取方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于人工智能技术领域,尤其涉及一种数据关系抽取方法、装置、电子设备及存储介质。
背景技术
信息抽取是指从自然语言处理文本中,抽取出各种有用的信息,这些信息包括但又不限于实体、关系、事件等。其中关系抽取是信息抽取中的一种任务,主要用于抽取实体之间的关系。
在许多领域,例如医学领域,存在大量的文本,包括各种病例记录,医学实验记录等等,里面充斥着各种有用的信息,所以一种有效的信息抽取方法显得格外重要。
发明人发现,在通常情况下,对于类似于医学数据等具有领域倾向性的数据处理时,关系抽取方法大多依赖于特征工程,而传统的特征方法需要大量的专业人士的精力跟专业知识,而且效果有限,一方面构造特征的过程中容易引入一些人为误差,另一方面有很多更深层次的特征很难被直接发现,导致信息的利用及抽取效果欠佳。
技术问题
有鉴于此,本申请实施例提供了一种数据关系抽取方法、装置、电子设备及存储介质,以解决现有技术中传统的特征方法需要大量的专业人士的精力跟专业知识,容易引入一些人为误差,且很多更深层次的特征很难被直接发现,导致信息的利用及抽取效果欠佳的问题。
技术解决方案
本申请实施例的第一方面提供了一种数据关系抽取方法,包括:
从待处理文本中获取分词得到的文本序列;
基于句法依存树,获取所述文本序列中每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系;
根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,生成与所述待处理文本对应的目标词嵌入向量;
将所述目标词嵌入向量输入至深度卷积神经网络,通过所述深度卷积神经网络,基于所述目标词嵌入向量对所述待处理文本进行实体关系信息抽取,获取所述深度卷积网络输出的具有设定预测概率值的目标实体关系。
本申请实施例的第二方面提供了一种数据关系抽取装置,包括:
第一获取模块,用于从待处理文本中获取分词得到的文本序列;
第二获取模块,用于基于句法依存树,获取所述文本序列中每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系;
生成模块,用于根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,生成与所述待处理文本对应的目标词嵌入向量;
第三获取模块,用于将所述目标词嵌入向量输入至深度卷积神经网络,通过所述深度卷积神经网络,基于所述目标词嵌入向量对所述待处理文本进行实体关系信息抽取,获取所述深度卷积网络输出的具有设定预测概率值的目标实体关系。
本申请实施例的第三方面提供了一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面所述方法的步骤。
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面所述方法的步骤。
本申请的第五方面提供了一种计算机程序产品,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行上述第一方面所述方法的步骤。
有益效果
由上可见,本申请实施例中,通过从待处理文本中获取分词得到的文本序列,基于句法依存树,获取文本序列中每一词的相关依赖词及每一词与相关依赖词之间的语义关系,生成与待处理文本对应的词嵌入向量,通过深度卷积神经网络,基于该词嵌入向量对待处理文本进行实体关系信息抽取,获取深度卷积网络输出的具有设定预测概率值的目标实体关系,实现通过深度卷积网络模型解决关系抽取任务,且利用句法依存树实现文本信息句法特征的分析,实现对文本特征的构建,充分考虑到文本之间的语义关系,不需要人为构造特征,就能取得更好的效果,提升最终实体关系的信息抽取便捷度及准确度。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种数据关系抽取方法的流程图一;
图2是本申请实施例提供的一种数据关系抽取方法的流程图二;
图3是本申请实施例提供的一种数据关系抽取装置的结构图;
图4是本申请实施例提供的一种电子设备的结构图。
本发明的实施方式
以为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例所涉及的数据关系抽取方法,可以由控制终端或电子设备执行。
本申请实施例涉及的数据关系抽取方法,应用于智慧医疗场景中,从而推动智慧城市的建设。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
具体实现中,本申请实施例中描述的电子设备包括但不限于诸如具有触摸敏感表面(例如,触摸屏显示器和/或触摸板)的移动电话、膝上型计算机或平板计算机之类的其它便携式设备。还应当理解的是,在某些实施例中,所述设备并非便携式通信设备,而是具有触摸敏感表面(例如,触摸屏显示器和/或触摸板)的台式计算机。
在接下来的讨论中,描述了包括显示器和触摸敏感表面的电子设备。然而,应当理解的是,电子设备可以包括诸如物理键盘、鼠标和/或控制杆的一个或多个其它物理用户接口设备。
电子设备支持各种应用程序,例如以下中的一个或多个:绘图应用程序、演示应用程序、文字处理应用程序、网站创建应用程序、盘刻录应用程序、电子表格应用程序、游戏应用程序、电话应用程序、视频会议应用程序、电子邮件应用程序、即时消息收发应用程序、锻炼支持应用程序、照片管理应用程序、数码相机应用程序、数字摄影机应用程序、web浏览应用程序、数字音乐播放器应用程序和/或数字视频播放器应用程序。
可以在电子设备上执行的各种应用程序可以使用诸如触摸敏感表面的至少一个公共物理用户接口设备。可以在应用程序之间和/或相应应用程序内调整和/或改变触摸敏感表面的一个或多个功能以及电子设备上显示的相应信息。这样,电子设备的公共物理架构(例如,触摸敏感表面)可以支持具有对用户而言直观且透明的用户界面的各种应用程序。
应理解,本实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
为了说明本申请所述的技术方案,下面通过具体实施例来进行说明。
参见图1,图1是本申请实施例提供的一种数据关系抽取方法的流程图一。如图1所示,一种数据关系抽取方法,该方法包括以下步骤:
步骤101,从待处理文本中获取分词得到的文本序列。
其中,该文本序列中的构成元素为从待处理文本中分词得到的词。即该文本序列具体为与待处理文本对应的词的序列。
该待处理文本具体可以是医学病例文本、历史文献资料文本等等。该步骤实现将待处理文本转化为以词为单位的文本序列。
作为一可选的实施方式,该从待处理文本中获取分词得到的文本序列,包括:
对待处理文本中的每一字进行标注,得到用于指示该每一字的词语构成类别的标注信息;根据该标注信息,对待处理文本进行解析,得到构成待处理文本的词,并生成以该词作为构成元素的文本序列。
在将待处理文本转化为文本序列时,需要先将待处理文本进行分词处理,具体是使用统计的分词方法,对文本序列进行标注,将分词问题转化为字的分类问题。例如,每个字可以包括4种类别:词首(B)、词中(M)、词尾(E)和单字成词(S),以对待处理文本中的每一字进行类别标注,得到用于指示该每一字的词语构成类别的标注信息。进而实现基于该标注信息得到待处理文本中所包含的词,形成由该些词组成的文本序列。
例如,待处理文本【小张的医生是小李】,文本长度为8,经过分词,各个字的预测标签为【B E S B E S B E】,通过对标签的整合,得到【小张的医生是小李】长度为5的以词为单位的文本序列。
其中,对待处理文本中的每一字进行类别标注的过程,可以通过采用事先设置好的词语构成结构、文字含义等信息来实现。
步骤102,基于句法依存树,获取文本序列中每一词的相关依赖词及每一词与相关依赖词之间的语义关系。
该句法依存树用于描述各个词语之间的依存关系,为事先构建而成。
对于文本序列中所包含的每一词均需获取其相关依赖词。该相关依赖词同样为文本序列中的词。
该相关依赖词具体为与该每一词之间具有句法依存关系的词。例如,文本序列“小张的医生是小李”中,与“小张”具有句法依存关系的词为“医生”,与“的”具有句法依存关系的词也为“医生”,该相关依赖词的确定需要基于句法依存树中所规定的句法依存关系来确定。
其中,每一词与相关依赖词之间具有语义关系,该语义关系需要基于句法依存树中所规定的句法依存关系来确定,该语义关系具体可以有不同的类型,例如主从被动关系,固定搭配关系等等。
步骤103,根据该每一词、每一词的相关依赖词及每一词与该相关依赖词之间的语义关系,生成与待处理文本对应的目标词嵌入向量。
词嵌入向量(Word Embedding)是自然语言(NLP)里面一个重要的概念,可以利用词嵌入向量将一个单词转换成固定长度的向量表示,从而便于进行数学处理。
这里,根据从待处理文本中获取分词得到的文本序列中的每一个词及其相关依赖词及每一词与该相关依赖词之间的语义关系,共同生成目标词嵌入向量,以便于后续通过深度卷积神经网络进行数学分析处理。
上述过程中,利用句法依存树实现文本信息句法特征的分析,得到文本序列及文本序列中每一词的相关依赖词及每一词与该相关依赖词之间的语义关系,共同生成与待处理文本对应的目标词嵌入向量,实现对文本特征的构建,充分考虑到文本之间的语义关系,不需要人为构造特征,避免一些人为误差的引入,有利于更深层次的文本特征的直接发现,提升最终实体关系的信息抽取便捷度及准确度。
步骤104,将目标词嵌入向量输入至深度卷积神经网络,通过深度卷积神经网络,基于目标词嵌入向量对待处理文本进行实体关系信息抽取,获取深度卷积网络输出的具有设定预测概率值的目标实体关系。
该步骤实现对待处理文本的关系抽取过程。该实体关系信息具体为待处理文本中各实体之间的关系信息。例如,待处理文本【小张的医生是小李】中实体为“小张”、“医生”及“小李”,通过将该待处理文本对应的目标词嵌入向量输入至深度卷积神经网络,实现对“小张”、“医生”及“小李”三个实体之间关系信息的抽取。
其中,在获取深度卷积网络输出的具有设定预测概率值的目标实体关系时,具体为获取深度卷积网络输出的L个实体关系的预测概率值,将预测概率值最高的实体关系确定为目标实体关系。
在通过深度卷积神经网络,基于目标词嵌入向量对待处理文本进行实体关系信息抽取时,具体为通过多个卷积层和相应的池化层进行相应的特征提取,并通过全连接层得到隐藏层的输出数据X,基于该X通过softmax进行概率的归一化计算,输出L个实体关系的预测概率值(取值在0到1之间,L是实体关系的种类数目),从softmax输出的结果中选取概率最高的那类实体关系作为最后预测结果。
例如,只有三种实体关系,【医生,病人,工作人员】,对应的概率值是【0.8,0.15,0.05】,那么则认为该文本中的实体之间的关系就是【医生】。
该过程中实现通过利用文本间的语义关系,提升最终文本中实体关系的提取准确度及快捷度。
本申请实施例中,通过从待处理文本中获取分词得到的文本序列,基于句法依存树,获取文本序列中每一词的相关依赖词及每一词与相关依赖词之间的语义关系,生成与待处理文本对应的词嵌入向量,通过深度卷积神经网络,基于该词嵌入向量对待处理文本进行实体关系信息抽取,获取深度卷积网络输出的具有设定预测概率值的目标实体关系,实现通过深度卷积网络模型解决关系抽取任务,且利用句法依存树实现文本信息句法特征的分析,实现对文本特征的构建,充分考虑到文本之间的语义关系,不需要人为构造特征,就能取得更好的效果,提升最终实体关系的信息抽取便捷度及准确度。
本申请实施例中还提供了数据关系抽取方法的不同实施方式。
参见图2,图2是本申请实施例提供的一种数据关系抽取方法的流程图二。如图2所示,一种数据关系抽取方法,该方法包括以下步骤:
步骤201,从待处理文本中获取分词得到的文本序列。
该步骤的实现过程与前述实施方式中步骤101的实现过程相同,此处不再赘述。
步骤202,基于句法依存树,获取文本序列中每一词的相关依赖词及每一词与相关依赖词之间的语义关系。
该步骤的实现过程与前述实施方式中步骤102的实现过程相同,此处不再赘述。
步骤203,根据每一词生成第一词嵌入向量,根据每一词的相关依赖词,生成与第一词嵌入向量对应的第二词嵌入向量,及,根据每一词与相关依赖词间的语义关系,生成第三词嵌入向量。
在生成与待处理文本对应的目标词嵌入向量时,需要先分别生成与文本序列中每一词、每一词的相关依赖词及每一词与相关依赖词间的语义关系对应的词嵌入向量。
作为一可选的实施方式,其中该根据每一词生成第一词嵌入向量,根据每一词的相关依赖词,生成与第一词嵌入向量对应的第二词嵌入向量,及,根据每一词与相关依赖词间的语义关系,生成第三词嵌入向量,包括:
生成包含每一词的相关依赖词的相关词序列及包含每一词与相关依赖词之间的语义关系的语义关系序列;对文本序列、相关词序列及语义关系序列进行文本长度标准化处理;生成与文本长度标准化处理后的文本序列中每一词对应的第一词嵌入向量,生成与文本长度标准化处理后的相关词序列中每一相关依赖词对应的第二词嵌入向量,生成与文本长度标准化处理后的语义关系序列中每一语义关系对应的第三词嵌入向量。
该过程实现分别生成每一词的相关依赖词及每一词与相关依赖词间的语义关系对应的序列,即相关词序列和语义关系序列。再对包含每一词的文本序列、包含相关依赖词的相关词序列及包含每一词与相关依赖词间的语义关系的语义关系序列进行文本长度的标准化处理,以在标准化处理后的各序列基础上,生成对应的词嵌入向量。
其中,文本长度的标准化处理,具体为设置标准文本长度为N,长度超过N的序列则进行内容截断,只保留前N个字,长度少于N的序列则进行内容补零,进而得到三个长度为N的序列,即处理后的文本序列、相关词序列和语义关系序列。
一个词嵌入向量对应于序列中的一个单词,例如对应于文本序列中的一个词,或者相关词序列中的一个相关依赖词,或者语义关系序列中的一个语义关系描述词。
在具体实施过程中,前述步骤生成包含每一词的相关依赖词的相关词序列及包含每一词与相关依赖词之间的语义关系的语义关系序列,具体包括:
根据每一词、每一词的相关依赖词及每一词与相关依赖词之间的语义关系,得到文本序列中每一词的语义三元组;对每一词的语义三元组进行整合,得到包含每一词的相关依赖词的相关词序列,及包含每一词与相关依赖词之间的语义关系的语义关系序列。
具体地,该语义三元组为包含文本序列中的词、该词的相关依赖词及该词与相关依赖词之间的语义关系的元素组合。文本序列中的每一词均对应有一个语义三元组。
在对每一词的语义三元组进行整合时,可以是将每一词的三元组中的相关依赖词进行整合得到相关词序列,将每一词的三元组中的词与相关依赖词之间的语义关系进行整合得到语义关系序列。
例如,文本序列【小张的医生是小李】,结合句法依存树得到各个单词所形成的三元组共5个:(小张,医生,1)、(的,医生,2)、(医生,是,3)、(是,是,4)、(小李,是,5)。经过对上述三元组的整合,可以获得对应的相关词序列为【医生医生是是是】,相应的语义关系序列为【1 2 3 4 5】,一个数字代表一个语义关系。该三元组的确定需要基于句法依存树中所规定的句法依存关系来具体实现。
步骤204,将第一词嵌入向量、第二词嵌入向量及第三词嵌入向量进行合并,得到与待处理文本对应的目标词嵌入向量。
该对第一词嵌入向量、第二词嵌入向量及第三词嵌入向量进行合并,具体是基于处理后的文本序列、相关词序列和语义关系序列中各元素的对应关系进行合并,例如文本序列【小张的医生是小李】,相关词序列为【医生医生是是是】,相应的语义关系序列为【1 2 3 4 5】,则将文本序列中第一个元素“小张”的第一词嵌入向量、相关词序列中第一个元素“医生”的第二词嵌入向量及语义关系序列中第一个元素“1”的第三词嵌入向量进行叠加实现合并处理。
作为一可选的实施方式,该将第一词嵌入向量、第二词嵌入向量及第三词嵌入向量进行合并,得到与待处理文本对应的目标词嵌入向量,包括:
将第一词嵌入向量、第二词嵌入向量及第三词嵌入向量进行合并,得到与待处理文本对应的N*3M的目标词嵌入向量;其中,N为长度标准化后的文本序列、相关词序列及语义关系序列中所包含的元素数量;M为第一词嵌入向量、第二词嵌入向量及第三词嵌入向量的向量维度。
其中,长度标准化后的文本序列、相关词序列及语义关系序列中所包含的元素数量相同,均为N。第一词嵌入向量、第二词嵌入向量及第三词嵌入向量的向量维度亦相同,均为M,采用相同的向量维度实现对不同词进行词嵌入数值表达。合并后生成的目标词嵌入向量中对各元素进行词嵌入数值表达的向量维度亦增加为3M。该过程实现对待处理文本对应的目标词嵌入向量的合理生成。
步骤205,将目标词嵌入向量输入至深度卷积神经网络,通过深度卷积神经网络,基于目标词嵌入向量对待处理文本进行实体关系信息抽取,获取深度卷积网络输出的具有设定预测概率值的目标实体关系。
该步骤的实现过程与前述实施方式中步骤104的实现过程相同,此处不再赘述。
本申请实施例中,通过从待处理文本中获取分词得到的文本序列,基于句法依存树,获取文本序列中每一词的相关依赖词及每一词与相关依赖词之间的语义关系,分别根据每一词、每一词的相关依赖词及每一词与相关依赖词间的语义关系生成对应的词嵌入向量,并将词嵌入向量进行合并,得到与待处理文本对应的目标词嵌入向量,通过深度卷积神经网络,基于该目标词嵌入向量对待处理文本进行实体关系信息抽取,获取深度卷积网络输出的具有设定预测概率值的目标实体关系,实现通过深度卷积网络模型解决关系抽取任务,且利用句法依存树实现文本信息句法特征的分析,实现对文本特征的构建,充分考虑到文本之间的语义关系,不需要人为构造特征,就能取得更好的效果,提升最终实体关系的信息抽取便捷度及准确度。
此外,需要阐述的是,在本申请的所有实施例中,基于待处理文本得到对应的目标实体关系,具体来说,目标实体关系由深度卷积神经网络进行实体关系信息抽取得到,比如利用句法依存树进行实体关系信息抽取得到。将目标实体关系上传至区块链可保证其安全性和对用户的公正透明性。用户设备可以从区块链中下载得该目标实体关系,以便查证目标实体关系是否被篡改。本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
参见图3,图3是本申请实施例提供的一种数据关系抽取装置的结构图,为了便于说明,仅示出了与本申请实施例相关的部分。
该数据关系抽取装置300包括:
第一获取模块301,用于从待处理文本中获取分词得到的文本序列;
第二获取模块302,用于基于句法依存树,获取所述文本序列中每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系;
生成模块303,用于根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,生成与所述待处理文本对应的目标词嵌入向量;
第三获取模块304,用于将所述目标词嵌入向量输入至深度卷积神经网络,通过所述深度卷积神经网络,基于所述目标词嵌入向量对所述待处理文本进行实体关系信息抽取,获取所述深度卷积网络输出的具有设定预测概率值的目标实体关系。
其中,生成模块包括:
第一生成子模块,用于根据所述每一词生成第一词嵌入向量,根据所述每一词的相关依赖词,生成与所述第一词嵌入向量对应的第二词嵌入向量,及,根据所述每一词与所述相关依赖词间的语义关系,生成第三词嵌入向量;
第二生成子模块,用于将所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量进行合并,得到与所述待处理文本对应的目标词嵌入向量。
其中该第一生成子模块具体用于:
生成包含所述每一词的相关依赖词的相关词序列及包含所述每一词与所述相关依赖词之间的语义关系的语义关系序列;
对所述文本序列、所述相关词序列及所述语义关系序列进行文本长度标准化处理;
生成与文本长度标准化处理后的所述文本序列中每一词对应的第一词嵌入向量,生成与文本长度标准化处理后的所述相关词序列中每一相关依赖词对应的第二词嵌入向量,生成与文本长度标准化处理后的所述语义关系序列中每一语义关系对应的第三词嵌入向量。
其中,该第二生成子模块具体用于:
将所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量进行合并,得到与所述待处理文本对应的N*3M的目标词嵌入向量;
其中,所述N为长度标准化后的所述文本序列、所述相关词序列及所述语义关系序列中所包含的元素数量;所述M为所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量的向量维度。
其中,该第一生成子模块更具体用于:
根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,得到文本序列中每一词的语义三元组;
对每一词的所述语义三元组进行整合,得到包含所述每一词的相关依赖词的相关词序列,及包含所述每一词与所述相关依赖词之间的语义关系的语义关系序列。
其中,第一获取模块具体用于:
对所述待处理文本中的每一字进行标注,得到用于指示所述每一字的词语构成类别的标注信息;
根据所述标注信息,对所述待处理文本进行解析,得到构成所述待处理文本的词,并生成以所述词作为构成元素的文本序列。
本申请实施例提供的数据关系抽取装置能够实现上述数据关系抽取方法的实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
图4是本申请实施例提供的一种电子设备的结构图。如该图4所示,该实施例的电子设备4包括:至少一个处理器40(图4中仅示出一个)、存储器41以及存储在所述存储器41中并可在所述至少一个处理器40上运行的计算机程序42,所述处理器40执行所述计算机程序42时实现上述任意各个数据关系抽取方法实施例中的步骤,例如图1所示的步骤101至104,或者图2所示的步骤201至205。或者,所述处理器40执行所述计算机程序42时实现上述图3对应的实施例中各单元的功能,例如,图3所示的模块301至304的功能,具体请参阅图3对应的实施例中的相关描述,此处不赘述。
示例性的,所述计算机程序42可以被分割成一个或多个单元,所述一个或者多个单元被存储在所述存储器41中,并由所述处理器40执行,以完成本申请。所述一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序42在所述电子设备4中的执行过程。例如,所述计算机程序42可以被分割成第一获取模块、第二获取模块、生成模块、第三获取模块,各单元具体功能如上所述。
所述电子设备4可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述电子设备4可包括,但不仅限于,处理器40、存储器41。本领域技术人员可以理解,图4仅仅是电子设备4的示例,并不构成对电子设备4的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备还可以包括输入输出设备、网络接入设备、总线等。
所述处理器40可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器41可以是所述电子设备4的内部存储单元,例如电子设备4的硬盘或内存。所述存储器41也可以是所述电子设备4的外部存储设备,例如所述电子设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器41还可以既包括所述电子设备4的内部存储单元也包括外部存储设备。所述存储器41用于存储所述计算机程序以及所述电子设备所需的其他程序和数据。所述存储器41还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的装置/电子设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/电子设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。该计算机可读存储介质可以是非易失性,也可以是易失性。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序产品来实现,当计算机程序产品在电子设备上运行时,使得所述电子设备执行时实现可实现上述各个方法实施例中的步骤。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种数据关系抽取方法,其中,包括:
    从待处理文本中获取分词得到的文本序列;
    基于句法依存树,获取所述文本序列中每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系;
    根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,生成与所述待处理文本对应的目标词嵌入向量;
    将所述目标词嵌入向量输入至深度卷积神经网络,通过所述深度卷积神经网络,基于所述目标词嵌入向量对所述待处理文本进行实体关系信息抽取,获取所述深度卷积网络输出的具有设定预测概率值的目标实体关系。
  2. 根据权利要求1所述的方法,其中,所述根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,生成与所述待处理文本对应的目标词嵌入向量,包括:
    根据所述每一词生成第一词嵌入向量,根据所述每一词的相关依赖词,生成与所述第一词嵌入向量对应的第二词嵌入向量,及,根据所述每一词与所述相关依赖词间的语义关系,生成第三词嵌入向量;
    将所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量进行合并,得到与所述待处理文本对应的目标词嵌入向量。
  3. 根据权利要求2所述的方法,其中,所述根据所述每一词生成第一词嵌入向量,根据所述每一词的相关依赖词,生成与所述第一词嵌入向量对应的第二词嵌入向量,及,根据所述每一词与所述相关依赖词间的语义关系,生成第三词嵌入向量,包括:
    生成包含所述每一词的相关依赖词的相关词序列及包含所述每一词与所述相关依赖词之间的语义关系的语义关系序列;
    对所述文本序列、所述相关词序列及所述语义关系序列进行文本长度标准化处理;
    生成与文本长度标准化处理后的所述文本序列中每一词对应的第一词嵌入向量,生成与文本长度标准化处理后的所述相关词序列中每一相关依赖词对应的第二词嵌入向量,生成与文本长度标准化处理后的所述语义关系序列中每一语义关系对应的第三词嵌入向量。
  4. 根据权利要求3所述的方法,其中,所述将所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量进行合并,得到与所述待处理文本对应的目标词嵌入向量,包括:
    将所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量进行合并,得到与所述待处理文本对应的N*3M的目标词嵌入向量;
    其中,所述N为长度标准化后的所述文本序列、所述相关词序列及所述语义关系序列中所包含的元素数量;所述M为所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量的向量维度。
  5. 根据权利要求3所述的方法,其中,所述生成包含所述每一词的相关依赖词的相关词序列及包含所述每一词与所述相关依赖词之间的语义关系的语义关系序列,包括:
    根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,得到文本序列中每一词的语义三元组;
    对每一词的所述语义三元组进行整合,得到包含所述每一词的相关依赖词的相关词序列,及包含所述每一词与所述相关依赖词之间的语义关系的语义关系序列。
  6. 根据权利要求1所述的方法,其中,所述从待处理文本中获取分词得到的文本序列,包括:
    对所述待处理文本中的每一字进行标注,得到用于指示所述每一字的词语构成类别的标注信息;
    根据所述标注信息,对所述待处理文本进行解析,得到构成所述待处理文本的词,并生成以所述词作为构成元素的文本序列。
  7. 根据权利要求1所述的方法,其中,所述基于所述目标词嵌入向量对所述待处理文本进行实体关系信息抽取,获取所述深度卷积网络输出的具有设定预测概率值的目标实体关系之后,还包括:
    将所述目标实体关系上传至区块链中。
  8. 一种数据关系抽取装置,其中,包括:
    第一获取模块,用于从待处理文本中获取分词得到的文本序列;
    第二获取模块,用于基于句法依存树,获取所述文本序列中每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系;
    生成模块,用于根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,生成与所述待处理文本对应的目标词嵌入向量;
    第三获取模块,用于将所述目标词嵌入向量输入至深度卷积神经网络,通过所述深度卷积神经网络,基于所述目标词嵌入向量对所述待处理文本进行实体关系信息抽取,获取所述深度卷积网络输出的具有设定预测概率值的目标实体关系。
  9. 根据权利要求8所述的数据关系抽取装置,其中,所述生成模块包括:
    第一生成子模块,用于根据所述每一词生成第一词嵌入向量,根据所述每一词的相关依赖词,生成与所述第一词嵌入向量对应的第二词嵌入向量,及,根据所述每一词与所述相关依赖词间的语义关系,生成第三词嵌入向量;
    第二生成子模块,用于将所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量进行合并,得到与所述待处理文本对应的目标词嵌入向量。
  10. 根据权利要求9所述的数据关系抽取装置,其中,所述第一生成子模块具体用于:
    生成包含所述每一词的相关依赖词的相关词序列及包含所述每一词与所述相关依赖词之间的语义关系的语义关系序列;
    对所述文本序列、所述相关词序列及所述语义关系序列进行文本长度标准化处理;
    生成与文本长度标准化处理后的所述文本序列中每一词对应的第一词嵌入向量,生成与文本长度标准化处理后的所述相关词序列中每一相关依赖词对应的第二词嵌入向量,生成与文本长度标准化处理后的所述语义关系序列中每一语义关系对应的第三词嵌入向量。
  11. 根据权利要求10所述的数据关系抽取装置,其中,所述第二生成子模块具体用于:
    将所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量进行合并,得到与所述待处理文本对应的N*3M的目标词嵌入向量;
    其中,所述N为长度标准化后的所述文本序列、所述相关词序列及所述语义关系序列中所包含的元素数量;所述M为所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量的向量维度。
  12. 根据权利要求10所述的数据关系抽取装置,其中,所述第一生成子模块更具体用于:
    根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,得到文本序列中每一词的语义三元组;
    对每一词的所述语义三元组进行整合,得到包含所述每一词的相关依赖词的相关词序列,及包含所述每一词与所述相关依赖词之间的语义关系的语义关系序列。
  13. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤:
    从待处理文本中获取分词得到的文本序列;
    基于句法依存树,获取所述文本序列中每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系;
    根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,生成与所述待处理文本对应的目标词嵌入向量;
    将所述目标词嵌入向量输入至深度卷积神经网络,通过所述深度卷积神经网络,基于所述目标词嵌入向量对所述待处理文本进行实体关系信息抽取,获取所述深度卷积网络输出的具有设定预测概率值的目标实体关系。
  14. 根据权利要求13所述的电子设备,其中,所述根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,生成与所述待处理文本对应的目标词嵌入向量时,所述处理器执行所述计算机程序时具体实现如下步骤:
    根据所述每一词生成第一词嵌入向量,根据所述每一词的相关依赖词,生成与所述第一词嵌入向量对应的第二词嵌入向量,及,根据所述每一词与所述相关依赖词间的语义关系,生成第三词嵌入向量;
    将所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量进行合并,得到与所述待处理文本对应的目标词嵌入向量。
  15. 根据权利要求14所述的电子设备,其中,所述根据所述每一词生成第一词嵌入向量,根据所述每一词的相关依赖词,生成与所述第一词嵌入向量对应的第二词嵌入向量,及,根据所述每一词与所述相关依赖词间的语义关系,生成第三词嵌入向量时,所述处理器执行所述计算机程序时具体实现如下步骤:
    生成包含所述每一词的相关依赖词的相关词序列及包含所述每一词与所述相关依赖词之间的语义关系的语义关系序列;
    对所述文本序列、所述相关词序列及所述语义关系序列进行文本长度标准化处理;
    生成与文本长度标准化处理后的所述文本序列中每一词对应的第一词嵌入向量,生成与文本长度标准化处理后的所述相关词序列中每一相关依赖词对应的第二词嵌入向量,生成与文本长度标准化处理后的所述语义关系序列中每一语义关系对应的第三词嵌入向量。
  16. 根据权利要求15所述的电子设备,其中,所述将所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量进行合并,得到与所述待处理文本对应的目标词嵌入向量时,所述处理器执行所述计算机程序时具体实现如下步骤:
    将所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量进行合并,得到与所述待处理文本对应的N*3M的目标词嵌入向量;
    其中,所述N为长度标准化后的所述文本序列、所述相关词序列及所述语义关系序列中所包含的元素数量;所述M为所述第一词嵌入向量、所述第二词嵌入向量及所述第三词嵌入向量的向量维度。
  17. 根据权利要求15所述的电子设备,其中,所述生成包含所述每一词的相关依赖词的相关词序列及包含所述每一词与所述相关依赖词之间的语义关系的语义关系序列时,所述处理器执行所述计算机程序时具体实现如下步骤:
    根据所述每一词、所述每一词的相关依赖词及所述每一词与所述相关依赖词之间的语义关系,得到文本序列中每一词的语义三元组;
    对每一词的所述语义三元组进行整合,得到包含所述每一词的相关依赖词的相关词序列,及包含所述每一词与所述相关依赖词之间的语义关系的语义关系序列。
  18. 根据权利要求13所述的方法,其中,所述从待处理文本中获取分词得到的文本序列时,所述处理器执行所述计算机程序时具体实现如下步骤:
    对所述待处理文本中的每一字进行标注,得到用于指示所述每一字的词语构成类别的标注信息;
    根据所述标注信息,对所述待处理文本进行解析,得到构成所述待处理文本的词,并生成以所述词作为构成元素的文本序列。
  19. 根据权利要求13所述的方法,其中,所述处理器执行所述计算机程序时具体还实现如下步骤:
    将所述目标实体关系上传至区块链中。
  20. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述方法的步骤。
PCT/CN2020/125342 2020-09-08 2020-10-30 数据关系抽取方法、装置、电子设备及存储介质 WO2021159762A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010935378.6A CN112016312B (zh) 2020-09-08 2020-09-08 数据关系抽取方法、装置、电子设备及存储介质
CN202010935378.6 2020-09-08

Publications (1)

Publication Number Publication Date
WO2021159762A1 true WO2021159762A1 (zh) 2021-08-19

Family

ID=73516140

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125342 WO2021159762A1 (zh) 2020-09-08 2020-10-30 数据关系抽取方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112016312B (zh)
WO (1) WO2021159762A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303996A (zh) * 2023-05-25 2023-06-23 江西财经大学 基于多焦点图神经网络的主题事件抽取方法
CN116402019A (zh) * 2023-04-21 2023-07-07 华中农业大学 一种基于多特征融合的实体关系联合抽取方法及装置
WO2024021334A1 (zh) * 2022-07-29 2024-02-01 苏州思萃人工智能研究所有限公司 关系抽取方法、计算机设备及程序产品

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613032B (zh) * 2020-12-15 2024-03-26 中国科学院信息工程研究所 基于系统调用序列的主机入侵检测方法及装置
CN113297373A (zh) * 2021-06-09 2021-08-24 北京邮电大学 智慧城市主题信息抽取方法、装置、电子设备和存储介质
CN113609846B (zh) * 2021-08-06 2022-10-04 首都师范大学 一种语句中实体关系的抽取方法及装置
CN113792539B (zh) * 2021-09-15 2024-02-20 平安科技(深圳)有限公司 基于人工智能的实体关系分类方法、装置、电子设备及介质
CN115146068B (zh) * 2022-06-01 2023-10-03 西北工业大学 关系三元组的抽取方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015125209A1 (ja) * 2014-02-18 2015-08-27 株式会社日立製作所 情報構造化システム及び情報構造化方法
CN110196913A (zh) * 2019-05-23 2019-09-03 北京邮电大学 基于文本生成式的多实体关系联合抽取方法和装置
CN110705299A (zh) * 2019-09-26 2020-01-17 北京明略软件系统有限公司 实体和关系的联合抽取方法、模型、电子设备及存储介质
CN110874535A (zh) * 2018-08-28 2020-03-10 阿里巴巴集团控股有限公司 依存关系对齐组件、依存关系对齐训练方法、设备及介质
CN111241295A (zh) * 2020-01-03 2020-06-05 浙江大学 基于语义句法交互网络的知识图谱关系数据抽取方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10540438B2 (en) * 2017-12-22 2020-01-21 International Business Machines Corporation Cognitive framework to detect adverse events in free-form text
CN109165385B (zh) * 2018-08-29 2022-08-09 中国人民解放军国防科技大学 一种基于实体关系联合抽取模型的多三元组抽取方法
EP3660733B1 (en) * 2018-11-30 2023-06-28 Tata Consultancy Services Limited Method and system for information extraction from document images using conversational interface and database querying
CN111241294B (zh) * 2019-12-31 2023-05-26 中国地质大学(武汉) 基于依赖解析和关键词的图卷积网络的关系抽取方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015125209A1 (ja) * 2014-02-18 2015-08-27 株式会社日立製作所 情報構造化システム及び情報構造化方法
CN110874535A (zh) * 2018-08-28 2020-03-10 阿里巴巴集团控股有限公司 依存关系对齐组件、依存关系对齐训练方法、设备及介质
CN110196913A (zh) * 2019-05-23 2019-09-03 北京邮电大学 基于文本生成式的多实体关系联合抽取方法和装置
CN110705299A (zh) * 2019-09-26 2020-01-17 北京明略软件系统有限公司 实体和关系的联合抽取方法、模型、电子设备及存储介质
CN111241295A (zh) * 2020-01-03 2020-06-05 浙江大学 基于语义句法交互网络的知识图谱关系数据抽取方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024021334A1 (zh) * 2022-07-29 2024-02-01 苏州思萃人工智能研究所有限公司 关系抽取方法、计算机设备及程序产品
CN116402019A (zh) * 2023-04-21 2023-07-07 华中农业大学 一种基于多特征融合的实体关系联合抽取方法及装置
CN116402019B (zh) * 2023-04-21 2024-02-02 华中农业大学 一种基于多特征融合的实体关系联合抽取方法及装置
CN116303996A (zh) * 2023-05-25 2023-06-23 江西财经大学 基于多焦点图神经网络的主题事件抽取方法
CN116303996B (zh) * 2023-05-25 2023-08-04 江西财经大学 基于多焦点图神经网络的主题事件抽取方法

Also Published As

Publication number Publication date
CN112016312A (zh) 2020-12-01
CN112016312B (zh) 2023-08-29

Similar Documents

Publication Publication Date Title
WO2021159762A1 (zh) 数据关系抽取方法、装置、电子设备及存储介质
WO2020232882A1 (zh) 命名实体识别方法、装置、设备及计算机可读存储介质
WO2021121187A1 (zh) 基于分词文本的电子病例查重方法、装置、计算机设备
WO2021212683A1 (zh) 基于法律知识图谱的查询方法、装置、电子设备及介质
CN107784063B (zh) 算法的生成方法及终端设备
WO2022083093A1 (zh) 图谱中的概率计算方法、装置、计算机设备及存储介质
US11120215B2 (en) Identifying spans using visual recognition
US20230409744A1 (en) Privacy protection for regulated computing environments
US20120151382A1 (en) Generating and managing electronic documentation
CN114064923A (zh) 数据处理方法、装置、电子设备和存储介质
CN113626576A (zh) 远程监督中关系特征抽取方法、装置、终端及存储介质
CN113192639A (zh) 信息预测模型的训练方法、装置、设备及存储介质
CN116705304A (zh) 基于图像文本的多模态任务处理方法、装置、设备及介质
CN116861875A (zh) 基于人工智能的文本处理方法、装置、设备及存储介质
CN116719904A (zh) 基于图文结合的信息查询方法、装置、设备及存储介质
CN116702776A (zh) 基于跨中西医的多任务语义划分方法、装置、设备及介质
CN111046085A (zh) 数据的溯源处理方法及装置、介质和设备
CN116050359A (zh) 一种保单托管录入方法、系统、终端设备及存储介质
CN111063447B (zh) 查询和文本处理方法及装置、电子设备和存储介质
CN112528647A (zh) 相似文本生成方法、装置、电子设备及可读存储介质
CN112667721A (zh) 数据分析方法、装置、设备及存储介质
CN111933241A (zh) 医疗数据解析方法、装置、电子设备及存储介质
CN112328960B (zh) 数据运算的优化方法、装置、电子设备及存储介质
CN112214556B (zh) 标签生成方法、装置、电子设备及计算机可读存储介质
CN116364223B (zh) 特征处理方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918442

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20918442

Country of ref document: EP

Kind code of ref document: A1