CN115129893A - Entity and/or relationship linking method based on prompt learning - Google Patents

Entity and/or relationship linking method based on prompt learning Download PDF

Info

Publication number
CN115129893A
CN115129893A CN202210787603.5A CN202210787603A CN115129893A CN 115129893 A CN115129893 A CN 115129893A CN 202210787603 A CN202210787603 A CN 202210787603A CN 115129893 A CN115129893 A CN 115129893A
Authority
CN
China
Prior art keywords
entity
sparql
relation
learning
recall
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210787603.5A
Other languages
Chinese (zh)
Inventor
张卫山
孙晨瑜
侯召祥
王振琦
陈涛
陈炳阳
李晓哲
公凡奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202210787603.5A priority Critical patent/CN115129893A/en
Publication of CN115129893A publication Critical patent/CN115129893A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an entity and/or relationship linking method based on prompt learning, which introduces prompt learning on the basis of an entity disambiguation method and comprises the following steps: preprocessing the SPARQL generated by the question-answering system, dividing clauses, judging the type of the SPARQL clauses, and selectively performing subsequent steps; realizing entity recall through BM25 character-level short text matching algorithm, designing entity and/or relation link method based on prompt learning, and linking the entity mentioned by SPARQL with the recall entity; according to the obtained correct entity, inquiring the relation related to the entity in a knowledge base as a recall relation, and using the entity and/or a relation linking method in the process to link the relation mentioned by the SPARQL and the recall relation; the SPARQL is revised according to the chaining result. The invention introduces prompt learning on the basis of the traditional entity disambiguation method, so that an entity disambiguation model can be simultaneously used for entity and relationship linkage, and the generated SPARQL query is revised, thereby improving the accuracy of a question-answering system.

Description

Entity and/or relationship linking method based on prompt learning
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to an entity and/or relationship linking method based on prompt learning.
Background
In a knowledge-graph question-and-answer system, for one entity or relationship in a question, there may be multiple corresponding entities or relationships in the knowledge base. The SPARQL query ambiguity generated by the question-answering system is more and more obvious due to the ambiguity of the entity. At present, an effective method for revising SPARQL output by a question-and-answer model is lacked.
The entity linking method based on the word vector correlation comprises the following steps: the entity linking method based on word vector correlation utilizes a continuous bag of words (CBOW) model to train word vectors of words in a corpus, and then words close to query words are used as expansion words. Word vectors mining semantic relatedness between words from the corpus is a supplement to rule-based query expansion methods to recall candidate entities. When the similarity of the documents is calculated, related words based on the word vectors are used as vector dimensions, and therefore semantic similarity characteristics of the documents are obtained. And finally, linking the query words to corresponding candidate entities by using a ranking learning model based on a single document method. However, this method is less generalized and different styles of training data may cause greater accuracy fluctuations. And representing entities using word vectors trained offline, the model cannot be used for relational disambiguation.
The entity linking method based on the rules comprises the following steps: generating a candidate entity set corresponding to each entity in a given knowledge base by using each entity needing entity linkage; calculating the correlation degree of each entity and each candidate entity in the candidate entity set corresponding to the entity according to the precise attribute, the fuzzy attribute and the related entity of each candidate entity in the candidate entity set; and obtaining the link entity of the current entity according to the correlation degree of the current entity and each candidate entity thereof and the number of the candidate entities corresponding to the entity. However, this method relies on rule design for specific data, and thus has a problem of poor generalization.
Disclosure of Invention
Aiming at the problems, the invention designs a general disambiguation model and combines rules to revise the SPARQL. The invention provides an entity and/or relationship linking method based on prompt learning, which introduces prompt learning on the basis of the existing entity disambiguation method and comprises the following steps:
s1, preprocessing the SPARQL generated by the question answering system, dividing clauses, and judging the SPARQL clause type; is < subject >? x < object > type, it is necessary to perform steps S2 and S4, and if < object > predicatex form, it is necessary to perform steps S2, S3, and S4;
s2, realizing entity recall through BM25 character-level short text matching algorithm, designing entity and/or relation link method based on prompt learning, and linking the entity mentioned by SPARQL and the recall entity;
s3, according to the correct entity obtained in S2, the related relation of the entity is inquired in a knowledge base to be used as a recall relation, and the relation mentioned by SPARQL and the recall relation are linked by using the entity and/or a relation linking method in S2;
and S4, revising the SPARQL according to the entity and/or the relation link result.
In one possible design, in step S1, the long SPARQL is divided into short SPARQL clauses by a regular expression, and the clause type is determined, specifically:
s11, performing regular segmentation on the generated SPARQL according to the characters' ″, and obtaining a plurality of sub SPARQL sentences;
s12, using the regular judgment clause "? x ", if? x "is located at the end of the clause, then it is judged as < subject > predicatex type, otherwise is < subject >? x < object > type.
In a possible design, in step S2, the entity disambiguation task is regarded as a two-classification task, and the entity and/or relationship linking method based on prompt learning is designed, specifically:
s21, extracting entity e in the SPARQL 1 An entity e 1 Calculating similarity with each node in a knowledge base through a BM25 character-level short text matching algorithm, sorting according to the similarity, selecting n entities with the highest similarity as a candidate entity set, and marking as F ═ F 1 ,f 2 ,…,f n Realizing entity recall;
s22, selecting the most fitting knowledge base entity by using the entity disambiguation model based on prompt learning to realize entity disambiguation; candidate entity f obtained in step S21 i Input question Q and entity e obtained in step S21 1 Splicing, using [ SEP]As a separator and finally embedding a prompt parameter P 1 ]-[P n ]And [ MASK ]]The label is used as model input, denoted as T, and the formula is as follows:
T=[Q][SEP][f i ][SEP][e 1 ][SEP][P 1 ]…[P n ][MASK]
treating entity disambiguation tasks as two classification tasks, i.e. e 1 And f i The Answer labels for prompting learning can be expressed as two categories of 'yes' and 'no', each category comprises different Answer subspaces, for example, the category 'yes' comprises a plurality of word expressions for expressing correlation, and the category 'no' comprises a plurality of word expressions for expressing irrelevance; performing task modeling by taking T as input and RoBERTA as a basic model for entity disambiguation, taking charge of semantic information extraction, and generating [ MASK ] through MLM (Multi-level markup language) task]Finally, carrying out answer mapping on the generated characters to obtain a binary classification result;
s23, calculating a probability value by utilizing a Softmax function according to the binary classification result obtained in the step S22; all entities labeled "yes" are sorted according to probability, and the entity link result with the highest probability is selected to replace the error entity of the < subject >.
In one possible design, in step S3, all relationships of the correct entities in S23 are obtained as the candidate set G ═ G of recall relationships 1 ,g 2 ,…,g n And (5) performing relationship disambiguation by using the entity disambiguation model in the step S22 to obtain a correct relationship attribute and replace a predicate error relationship.
The second aspect of the present invention also provides a SPARQL revision device for use in a knowledge-graph question answering system, the device comprising at least one processor and at least one memory, the processor and memory being coupled; the memory has stored therein a computer program; the processor, when executing the computer program stored by the memory, causes the apparatus to perform the method according to the first aspect.
The third aspect of the present invention also provides a computer-readable storage medium having stored therein a program or instructions which, when executed by a processor, cause a computer to perform the method according to the first aspect.
Has the beneficial effects that: the invention provides an entity and/or relationship linking method based on prompt learning, which introduces prompt learning on the basis of the traditional entity disambiguation method, so that a model obtains a good task effect under the condition of few samples, simplifies the disambiguation task into two classification tasks, enables the entity disambiguation model to be simultaneously used for entity linking and relationship linking, revises the generated SPARQL query, and effectively solves the SPARQL query ambiguity problem. The accuracy of the question answering system is effectively improved.
Drawings
FIG. 1 is a flow chart of a hint learning based entity and/or relationship linking method of the present invention.
FIG. 2 is a schematic diagram of an embodiment of an entity and/or relationship linking method based on prompt learning according to the present invention.
FIG. 3 is a schematic diagram of the entity disambiguation model architecture based on the prompt learning according to the present invention.
Fig. 4 is a simple block diagram of the SPARQL revision device applied to the knowledge-graph question-answering system according to the present invention.
Detailed Description
The knowledge map question-answering system can generate a corresponding SPARQL query sentence according to the natural language questions input by the user and query the SPARQL query sentence in a knowledge base so as to obtain answers. For one entity or relationship in a problem, there may be multiple corresponding entities or relationships in the knowledge base. To "when an apple is on the market? "problem as an example, the problem relates to the entity" Apple "but there are multiple interpretations of Apple in the knowledge base, which may refer to Apple Inc (Apple Inc.) or Apple (a plant of the genus Apple, rosaceae). The SPARQL query ambiguity generated by the question answering system is more and more obvious due to the ambiguity of the entity. Therefore, the entities and knowledge base entities involved in SPARQL and the relationships and knowledge base relationships need to be linked one by one. The entity link is divided into three steps: candidate entity recall, entity disambiguation, entity ranking. Based on a short text matching algorithm BM25, an entity recall model is designed to obtain candidate entities with high matching degree in a knowledge base. And designing an entity disambiguation model based on Prompt Learning to obtain the correlation score of the candidate entity. And selecting the entity with the highest confidence coefficient for carrying out knowledge base entity linkage through entity sequencing, inquiring all relation attributes of the entity, and selecting the relation with the highest confidence coefficient for carrying out knowledge base relation linkage by using the entity disambiguation model again. And revising the SPARQL query according to the results of the entity and the relation link. The accuracy of the question answering system is effectively improved.
The invention is further illustrated by the following specific examples.
Example 1:
the present embodiment takes "who is the CEO of apple? "this problem is taken as an example, and a specific flow of the entity and/or relationship linking method based on prompt learning according to the present invention is described with reference to fig. 1 to 3:
s1, select x where { < apple (apple corporation) > < chief executive officer >? And x, preprocessing, dividing clauses, and judging the type of the SPARQL clause. Is < subject >? x < object > type, S2, S4 is only required, and if < object > predicatex type, S2, S3, S4 are required;
s2, realizing entity recall through BM25 character-level short text matching algorithm, designing entity and relation linking method based on prompt learning, and linking the entity mentioned by SPARQL and the recall entity;
s3, according to the correct entity obtained in S2, the related relation of the entity is inquired in a knowledge base to be used as a recall relation, and the relation mentioned by SPARQL and the recall relation are linked by using the entity and relation linking method in S2;
and S4, revising the SPARQL according to the entity and the relation link result.
As shown in FIGS. 2 and 3, the disambiguation model based on prompt learning in the present invention can be applied to entity links and relationship links simultaneously, so as to revise SPARQL. SPARQL is RDF query language; RDF is a resource description framework, a data model expressed by using XML syntax, and is used for describing the characteristics of Web resources and the relationship between the resources.
In step S1, the long SPARQL is divided into short SPARQL clauses by a regular expression, and the clause type is determined, which mainly includes:
s11, performing regular segmentation on the generated SPARQL according to the character "-" to obtain a plurality of sub SPARQL sentences, where in this embodiment, the generated SPARQL has only one clause and thus does not need to be segmented;
s12, using the regular judgment clause "? x ", a location of"? x' is positioned at the end of the clause and is judged to be of the < subject > predicatex type;
in step S2, the entity disambiguation task is regarded as a two-classification task, and an entity and relationship linking method based on prompt learning is designed, which mainly includes:
s21, extracting the entity 'apple _ (apple company)' in the SPARQL as e 1 An entity e 1 Calculating similarity with each node in a knowledge base through a BM25 character-level short text matching algorithm, sorting according to the similarity, and selecting 5 entities with the highest similarity as a candidate entity set { "apple tree { (A) } { (Rosaceous) "," apple (fruit of the genus malus of the family rosaceous) "," apple (apple products inc) "," apple (song dunlisin) "," apple (song anderson and soldier) "}, as F ═ F 1 ,f 2 ,…,f n And realizing entity recall;
and S22, selecting the most fitting knowledge base entity by using the entity disambiguation model based on prompt learning, and realizing entity disambiguation. Candidate entity f obtained in S21 i Entity e obtained in question Q, S21 entered 1 Splicing, using [ SEP]As a separator and finally embedding a prompt parameter P 1 ]-[P n ]And [ MASK ]]The label is used as model input, denoted as T, and the formula is as follows:
T=[Q][SEP][f i ][SEP][e 1 ][SEP][P 1 ]…[P n ][MASK]
treating entity disambiguation tasks as two classification tasks, i.e. e 1 And f i The Answer labels for prompt learning can be expressed as "yes" and "no", each of which contains different Answer subspaces, such as "yes" in the category of multiple word representations related, such as "same" and "consistent", and "no" in the category of multiple word representations unrelated, such as "unrelated" and "different". Taking T as input, RoBERTA (A Robusly Optimized BERT, a stable Optimized BERT model) as a basic model of entity disambiguation to perform task modeling, taking charge of semantic information extraction, and generating (MASK) through MASK Language model MLM (masked Language model) task]Finally, carrying out answer mapping on the generated characters to obtain a binary classification result;
and S23, calculating probability values by utilizing a Softmax function according to the binary classification results obtained in the step S22. All entities labeled as 'yes' are sorted according to probability, and the entity link result with the highest probability is selected, in the embodiment, the entity with the highest probability is 'apple _ (apple products company)' to be used as a correct entity to replace the 'apple _ (apple company)' error entity in the original SPARQL;
in step S3, all relationships of the correct entities in S23 are obtained as recallsThe relation candidate set { "chief executive officer", "chief financial officer", "chief information officer", "chief technique" } is recorded as G ═ G 1 ,g 2 ,…,g n Using the entity disambiguation model of step S22 to perform relationship disambiguation, obtaining correct relationship attributes, and replacing predicate error relationship;
according to the invention, prompt learning is introduced on the basis of a traditional entity disambiguation method, so that an entity disambiguation model can be simultaneously used for entity linking and relation linking, the generated SPARQL query is revised, and the problem of SPARQL query ambiguity is effectively solved. The algorithm aims to utilize the potential of a prompt learning method to explore a pre-training model to the maximum extent, improve the universality of the model, simplify the SPARQL revision process and improve the accuracy of question answering.
Example 2:
as shown in fig. 4, the present invention also provides a SPARQL revision device applied in the knowledge-graph question-answering system, the device comprises at least one processor and at least one memory, and also comprises a communication interface and an internal bus; the memory stores computer executive programs; the processor, when executing the execution program stored in the memory, causes the device to perform the method of embodiment 1. The internal bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Enhanced ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus. The memory may include a high-speed RAM memory, and may further include a non-volatile storage NVM, such as at least one magnetic disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic disk, or an optical disk.
The device may be provided as a terminal, server, or other form of device.
Fig. 4 is a block diagram of an apparatus shown for illustration. The device may include one or more of the following components: processing components, memory, power components, multimedia components, audio components, interfaces for input/output (I/O), sensor components, and communication components. The processing components typically control overall operation of the electronic device, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components may include one or more processors to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component may include one or more modules that facilitate interaction between the processing component and other components. For example, the processing component may include a multimedia module to facilitate interaction between the multimedia component and the processing component.
The memory is configured to store various types of data to support operations at the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component provides power to various components of the electronic device. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for an electronic device. The multimedia component includes a screen providing an output interface between the electronic device and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component is configured to output and/or input an audio signal. For example, the audio assembly includes a Microphone (MIC) configured to receive an external audio signal when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals. The I/O interface provides an interface between the processing component and a peripheral interface module, which may be a keyboard, click wheel, button, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly includes one or more sensors for providing various aspects of status assessment for the electronic device. For example, the sensor assembly may detect an open/closed state of the electronic device, the relative positioning of components, such as a display and keypad of the electronic device, the sensor assembly may also detect a change in position of the electronic device or a component of the electronic device, the presence or absence of user contact with the electronic device, orientation or acceleration/deceleration of the electronic device, and a change in temperature of the electronic device. The sensor assembly may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly may further include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.
Example 3:
the present invention also provides a non-transitory computer-readable storage medium having stored therein a program or instructions which, when executed by a processor, cause a computer to perform the method of embodiment 1.
In particular, a system, apparatus or device may be provided which is provided with a readable storage medium on which software program code implementing the functionality of any of the embodiments described above is stored and which causes a computer or processor of the system, apparatus or device to read out and execute instructions stored in the readable storage medium. In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks (e.g., CD-ROM, CD-R, CD-RW, DVD-20ROM, DVD-RAM, DVD-RW), magnetic tape, or the like. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of hardware and software modules.
It should be understood that a storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in a terminal or server.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Although the present invention has been described with reference to the specific embodiments, it should be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (6)

1. A prompt learning-based entity and/or relationship linking method is characterized in that prompt learning is introduced on the basis of the existing entity disambiguation method, and comprises the following steps:
s1, preprocessing the SPARQL generated by the question-answering system, dividing clauses, and judging the type of the SPARQL clauses; is < subject >? x < object > type, it is necessary to perform steps S2 and S4, and if < object > predicatex form, it is necessary to perform steps S2, S3, and S4;
s2, realizing entity recall through BM25 character-level short text matching algorithm, designing entity and/or relation link method based on prompt learning, and linking the entity mentioned by SPARQL and the recall entity;
s3, according to the correct entity obtained in S2, the related relation of the entity is inquired in a knowledge base to be used as a recall relation, and the relation mentioned by SPARQL and the recall relation are linked by using the entity and/or a relation linking method in S2;
and S4, revising the SPARQL according to the entity and/or the relation link result.
2. The entity and/or relationship linking method based on prompt learning according to claim 1, characterized in that: in step S1, the long SPARQL is divided into short SPARQL clauses by a regular expression, and the clause type is determined, which specifically includes:
s11, performing regular segmentation on the generated SPARQL according to the characters' ″, and obtaining a plurality of sub SPARQL sentences;
s12, using the regular judgment clause "? x ", if? x "is located at the end of the clause, then it is judged as < subject > predicatex type, otherwise is < subject >? x < object > type.
3. The entity and/or relationship linking method based on prompt learning according to claim 1, characterized in that: in step S2, the entity disambiguation task is regarded as a two-classification task, and an entity and/or relationship linking method based on prompt learning is designed, specifically:
s21, extracting entity e in the SPARQL 1 An entity e 1 Character-level short text matching algorithm and method through BM25Calculating similarity of each node in the knowledge base, sorting according to the similarity, selecting n entities with highest similarity as a candidate entity set, and marking as F ═ { F ═ F 1 ,f 2 ,…,f n Realizing entity recall;
s22, selecting the most appropriate knowledge base entity by using the entity disambiguation model based on prompt learning to realize entity disambiguation; candidate entity f obtained in step S21 i Input question Q and entity e acquired in step S21 1 Splicing, using [ SEP]As a separator and finally embedding a prompt parameter P 1 ]-[P n ]And [ MASK ]]The label is used as model input, denoted as T, and the formula is as follows:
T=[Q][SEP][f i ][SEP][e 1 ][SEP][P 1 ]…[P n ][MASK]
treating entity disambiguation tasks as two classification tasks, i.e. e 1 And f i The Answer labels for prompting learning can be expressed as two categories of 'yes' and 'no', each category comprises different Answer subspaces, for example, the category 'yes' comprises a plurality of word expressions for expressing correlation, and the category 'no' comprises a plurality of word expressions for expressing irrelevance; performing task modeling by taking T as input and RoBERTA as a basic model for entity disambiguation, taking charge of semantic information extraction, and generating [ MASK ] through MLM (Multi-level markup language) task]Finally, carrying out answer mapping on the generated characters to obtain a binary classification result;
s23, calculating a probability value by utilizing a Softmax function according to the binary classification result obtained in the step S22; all entities labeled "yes" are sorted by probability, and the entity link result with the highest probability is selected to replace the < subject > error entity.
4. A hint learning based entity and/or relationship linking method according to claim 3, characterized in that: in the step S3, all relationships of the correct entity in S23 are acquired as the candidate set G ═ G of recall relationship 1 ,g 2 ,…,g n Using the entity disambiguation model of step S22 to perform relationship disambiguation and obtain correct relationshipAttribute, replacing the predicate error relationship.
5. A SPARQL revision device for use in a knowledge-graph question-answering system, characterized by: the apparatus comprises at least one processor and at least one memory, the processor and memory coupled; the memory has stored therein a computer program; the processor, when executing the computer program stored by the memory, causes the apparatus to perform the method of any of claims 1 to 4.
6. A computer-readable storage medium, in which a program or instructions are stored, which, when executed by a processor, cause a computer to carry out the method of any one of claims 1 to 4.
CN202210787603.5A 2022-07-06 2022-07-06 Entity and/or relationship linking method based on prompt learning Pending CN115129893A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210787603.5A CN115129893A (en) 2022-07-06 2022-07-06 Entity and/or relationship linking method based on prompt learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210787603.5A CN115129893A (en) 2022-07-06 2022-07-06 Entity and/or relationship linking method based on prompt learning

Publications (1)

Publication Number Publication Date
CN115129893A true CN115129893A (en) 2022-09-30

Family

ID=83382697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210787603.5A Pending CN115129893A (en) 2022-07-06 2022-07-06 Entity and/or relationship linking method based on prompt learning

Country Status (1)

Country Link
CN (1) CN115129893A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117687988A (en) * 2023-11-21 2024-03-12 羚羊工业互联网股份有限公司 Knowledge chain base construction method, question answering method, related device and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117687988A (en) * 2023-11-21 2024-03-12 羚羊工业互联网股份有限公司 Knowledge chain base construction method, question answering method, related device and equipment

Similar Documents

Publication Publication Date Title
Poongodi et al. Chat-bot-based natural language interface for blogs and information networks
US10956683B2 (en) Systems and method for vocabulary management in a natural learning framework
US10599767B1 (en) System for providing intelligent part of speech processing of complex natural language
US10162816B1 (en) Computerized system and method for automatically transforming and providing domain specific chatbot responses
WO2020147428A1 (en) Interactive content generation method and apparatus, computer device, and storage medium
US10262062B2 (en) Natural language system question classifier, semantic representations, and logical form templates
US20220012296A1 (en) Systems and methods to automatically categorize social media posts and recommend social media posts
US11482212B2 (en) Electronic device for analyzing meaning of speech, and operation method therefor
US20190103111A1 (en) Natural Language Processing Systems and Methods
US11151183B2 (en) Processing a request
Chen et al. Mining user requirements to facilitate mobile app quality upgrades with big data
US20160062982A1 (en) Natural language processing system and method
CN107861954B (en) Information output method and device based on artificial intelligence
CN110532573A (en) A kind of interpretation method and system
US10977155B1 (en) System for providing autonomous discovery of field or navigation constraints
Arumugam et al. Hands-On Natural Language Processing with Python: A practical guide to applying deep learning architectures to your NLP applications
CN107239447B (en) Junk information identification method, device and system
CN116244344B (en) Retrieval method and device based on user requirements and electronic equipment
Fu et al. Image-text surgery: Efficient concept learning in image captioning by generating pseudopairs
JP2022548624A (en) Linguistic speech processing in computer systems
Sokolova Big text advantages and challenges: classification perspective
Shekhar et al. An effective cybernated word embedding system for analysis and language identification in code-mixed social media text
CN115129893A (en) Entity and/or relationship linking method based on prompt learning
Yang et al. PurExt: Automated Extraction of the Purpose‐Aware Rule from the Natural Language Privacy Policy in IoT
CN108197100B (en) Emotion analysis method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination