CN110909544A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN110909544A
CN110909544A CN201911143298.0A CN201911143298A CN110909544A CN 110909544 A CN110909544 A CN 110909544A CN 201911143298 A CN201911143298 A CN 201911143298A CN 110909544 A CN110909544 A CN 110909544A
Authority
CN
China
Prior art keywords
text
words
question
common
question sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911143298.0A
Other languages
Chinese (zh)
Inventor
韩庆宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shannon Huiyu Technology Co Ltd
Original Assignee
Beijing Shannon Huiyu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shannon Huiyu Technology Co Ltd filed Critical Beijing Shannon Huiyu Technology Co Ltd
Priority to CN201911143298.0A priority Critical patent/CN110909544A/en
Publication of CN110909544A publication Critical patent/CN110909544A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a data processing method and a data processing device, wherein the method comprises the following steps: acquiring a text and words needing coreference resolution; generating a question sentence according to the words, and finding out characters capable of answering the question sentence from the text as common referents of the words; and extracting the co-reference words by using a candidate text extractor, and completing the co-reference resolution of the words. By the data processing method and the data processing device, the co-referent of the words can be found out from the text in a question-and-answer mode, and the accuracy of co-referent resolution is greatly improved.

Description

Data processing method and device
Technical Field
The invention relates to the technical field of computers, in particular to a data processing method and device.
Background
Currently, to avoid repetition, it is customary in the text to use pronouns, referents and abbreviations to refer to the aforementioned words. For example, at the beginning of the text, "Harbin university of industry" may be written, followed by "Harvard" and "Gongda", etc., and further reference to "this university", "her", etc.; this phenomenon is called a common finger phenomenon. It is very difficult for a computer to perform natural language processing to recognize words having a common phenomenon from text. The computer can carry out coreference resolution on the text, and then words with coreference phenomena can be identified from the text. The coreference resolution is to find all the pronouns of the same word from the text.
In the related art, the coreference resolution method is often based on similarity comparison of tuples to obtain results. Resulting in low accuracy of coreference resolution.
Disclosure of Invention
In order to solve the above problem, embodiments of the present invention provide a data processing method and apparatus.
In a first aspect, an embodiment of the present invention provides a data processing method, including:
acquiring a text and words needing coreference resolution;
generating a question sentence according to the words, and finding out characters capable of answering the question sentence from the text as common referents of the words;
and extracting the co-reference words by using a candidate text extractor, and completing the co-reference resolution of the words.
In a second aspect, an embodiment of the present invention further provides a data processing apparatus, including:
the acquisition module is used for acquiring the text and the words needing coreference resolution;
the processing module is used for generating question sentences according to the words and phrases and finding out characters capable of answering the question sentences from the texts to be used as common referents of the words and phrases;
and the extraction module is used for extracting the common referent by using a candidate text extractor and finishing the common referent resolution of the terms.
In a third aspect, the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the method in the first aspect.
In a fourth aspect, embodiments of the present invention also provide a data processing apparatus, which includes a memory, a processor, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor to perform the steps of the method according to the first aspect.
In the solutions provided in the foregoing first to fourth aspects of the embodiments of the present invention, a question sentence is generated according to an obtained word, and a character capable of answering the question sentence is found from the text as a common referent of the word, and compared with a method of performing common referent resolution based on tuple similarity comparison in the related art, the method and the apparatus can find the character capable of answering the question sentence from the text as the common referent of the word through the question sentence generated by the word, and find the common referent of the word from the text in a question-and-answer manner, thereby greatly improving accuracy of the common referent resolution.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a data processing method according to embodiment 1 of the present invention;
fig. 2 is a schematic structural diagram of a data processing apparatus according to embodiment 2 of the present invention;
fig. 3 is a schematic structural diagram of another data processing apparatus provided in embodiment 3 of the present invention.
Detailed Description
Currently, to avoid repetition, it is customary in the text to use pronouns, referents and abbreviations to refer to the aforementioned words. For example, at the beginning of the text, "Harbin university of industry" may be written, followed by "Harvard" and "Gongda", etc., and further reference to "this university", "her", etc.; this phenomenon is called a common finger phenomenon. It is very difficult for a computer to perform natural language processing to recognize words having a common phenomenon from text. The computer can carry out coreference resolution on the text, and then words with coreference phenomena can be identified from the text. The coreference resolution is to find all the pronouns of the same word from the text. In the related art, the coreference resolution method is often based on similarity comparison of tuples to obtain results. Resulting in low accuracy of coreference resolution.
Based on this, the embodiment provides a data processing method and device, which can find out characters capable of answering the question sentences from the text as the common-meaning words of the words through the question sentences generated by the words needing common-meaning resolution, thereby greatly improving the accuracy of the common-meaning resolution.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
Example 1
The embodiment provides a data processing method, and an execution main body is a server.
The server may adopt any computing device in the prior art that can process the text according to the words and perform coreference resolution on the words, and the details are not repeated here.
Referring to a flowchart of a data processing method shown in fig. 1, the present embodiment provides a data processing method, including the following specific steps:
and step 100, acquiring the text and the words needing coreference resolution.
In the step 100, the text may be a text that is input to the server by the staff.
In one embodiment, the text may be: "Donald Turpu (Donald Trump), born 6.8.1946 in New York, Congress and Party politicians, entrepreneurs, traders, 45 th United states President. … … chump played a trade battle between the middle and the united states, the … … special prank government declared a 10% tariff on the hua plus 2000 billion dollar imported goods, which formally became effective 24 months 9 in 2018, and the tariff rate increased to 25% … … Trump against … … he solitary … … in 2019.
After reading the above text, it is found that if "tangard-tlamp" is used as a word, then the co-referent of "tangard-tlamp" in the text includes but is not limited to: "Donald Trump", "45 th United states President", "Chunpu", "Trump" and "him".
In order to enable the server to find out the "tang nad tlamp" as a co-referent word when the word needs to be resolved, the worker may input the "tang nad tlamp" as a word into the server, and enable the server to find out all the co-referents of the "tang nad tlamp" from the text, thereby resolving the tang nad tlamp.
And 102, generating a question sentence according to the words, and finding out characters capable of answering the question sentence from the text to be used as common referents of the words.
In order to find out the co-referent of the word from the text, the above step 102 may perform the following steps (1) to (4):
(1) acquiring a question template, filling the words into the question template, and generating question sentences related to the words;
(2) splicing the question sentence with characters in the text to obtain a spliced text;
(3) processing the spliced text by using a pre-training model (BERT) to obtain vector representation of each character in the spliced text;
(4) and finding out characters capable of answering the question sentence from the spliced text as common referents of the words.
In the step (1), the question template is cached in the server, and is used for storing a question frame sentence capable of prompting the server to find out the common referent of the word from the text.
The question frame sentence is a question sentence which needs to be filled with a gap and is incomplete, such as: the question architecture sentence may be, but is not limited to: all references to "(") are to which "and" he "in the text" are to be referring () ".
Therefore, the words to be coreferenced and resolved are filled in the brackets of the question frame sentences in the question template, so that the question sentences related to the words can be generated.
In one embodiment, when the words to be coreferenced are, the question sentences obtained after "filling out the words" tang nad tellur "are all the pronouns of the question frame sentence" () are: "all pronouns of Ten, Ten refer to which.
As can be seen from the description of the step (1), words needing coreference resolution can be filled in the problem template to generate the problem sentences related to the words, so that coreference resolution can be performed on different words, and the method is flexible and convenient to operate and has interpretability.
In the step (2), the server may adopt any method capable of splicing characters in the prior art to splice the question sentence and the characters in the text to obtain a spliced text. And are not described in detail herein.
In the step (3) above, the BERT, which is operated in the server.
The server processes the spliced text by using the BERT to obtain a process of vector representation of each character in the spliced text, which is the prior art and is not described herein again.
The characters may be, but are not limited to: words, phrases and phrases.
Wherein, the step (4) may specifically execute the following process:
and processing the vector representation of each character in the spliced text by utilizing a machine reading understanding model, and finding out characters capable of answering the question sentence from each character of the text to be used as common referents of the words.
And the machine reads an understanding model and runs in the server.
Here, the process of finding out the characters capable of answering the question sentences from the spliced text by using the machine reading understanding model from the vector representations of the characters in the spliced text is a process of finding out the answers capable of answering the question sentences from the vector representations of the characters in the text of the spliced text by using the spliced text containing the question sentences which need to be coreference resolved. Namely, in a question-and-answer mode, common referents of the words are extracted from the text. The specific processing procedure of the machine reading understanding model is the prior art, and is not described in detail herein.
As can be seen from the description in the steps (1) to (4), a question-answering framework based on a machine reading understanding model is used, question sentences are generated based on words needing coreference resolution, and the machine reading understanding model is enabled to find out characters capable of answering the question sentences from a spliced text as coreference words of the words by using the question sentences containing the words needing coreference resolution; the question-answering mechanism of natural language is skillfully used, and the coreference words of the words needing coreference resolution can be more accurately extracted from the text; moreover, the text and the problem sentence are processed by using a pre-training model and a machine reading understanding model at the front edge in the natural language processing, so that the accuracy of extracting the common-meaning word of the word needing common-meaning resolution from the text can be further improved, and the optimal effect is obtained.
And 104, extracting the common referents by using a candidate text extractor, and completing common reference resolution of the terms.
In the above step 104, the candidate text extractor can be regarded as a sequence annotation model, i.e. the sequence annotation model can use BIEO (B, I, E, O represents the start position B of the co-referent, the middle position I of the co-referent, the end position E of the co-referent, and O in no co-referent, respectively) tags.
After the sequence labeling model receives the spliced text, the characters in the spliced text can be coded, and each character is marked with a label B, I, E, O, so that the common referents of the words can be extracted. The specific process is the prior art and is not described herein again.
For example, after the candidate text extractor encodes the sentence "chuanpu sounds the trade war between china and america" in the text, the result of tagging each character in the sentence with BIEO is "chuanbppu/E sounds/O in/O america/O between/O trade/O easy/O war/O", and thus "chuanpu" is labeled with "BE" tag, which is the start position and the end position of the answer, and no tag "O" appears in the middle, so that "chuanpu" is a legal co-referent; note that here, the process of extracting the co-referent also needs to determine the validity of the annotation. So-called legal labeling, i.e. characters between any pair of "B … … E" tags, no tags other than the "I" tag can be present, such as "BOE" tag and "BBE" tag, which are illegal. In other words, a legal annotation must satisfy the form of the "BI … … IE" tags, where the number of tags "I" is 0 or greater.
The process of extracting the common referent words from other sentences of the text is similar to the process of extracting the common referent words from the sentence "Chuanpu sounds the trade war between China and America", and is not repeated here.
In summary, the present embodiment provides a data processing method, where a question sentence is generated according to an obtained word, and a character capable of answering the question sentence is found out from the text as a common referent of the word, and compared with a method of performing common referent resolution based on tuple similarity comparison in the related art, the method can find out the character capable of answering the question sentence from the text as the common referent of the word through the question sentence generated by the word, and find out the common referent of the word from the text in a question-and-answer manner, thereby greatly improving accuracy of the common referent resolution.
Example 2
The present embodiment proposes a data processing apparatus for executing the data processing method.
Referring to a schematic structural diagram of a data processing apparatus shown in fig. 2, the present embodiment provides a data processing apparatus, including:
an obtaining module 200, configured to obtain a text and a word that needs to be coreferenced and resolved;
the processing module 202 is configured to generate a question sentence according to the word, and find out characters capable of answering the question sentence from the text as common referents of the word;
and the extraction module 204 is configured to extract the co-referent by using a candidate text extractor, and complete co-referent resolution of the terms.
Specifically, in order to find out the common referent of the word from the concatenated text, the processing module is specifically configured to:
acquiring a question template, filling the words into the question template, and generating question sentences related to the words;
splicing the question sentence with characters in the text to obtain a spliced text;
processing the spliced text by using a pre-training model BERT to obtain vector representation of each character in the spliced text;
and finding out characters capable of answering the question sentence from the spliced text as common referents of the words.
Specifically, the extracting module is configured to find out, from the concatenated text, a character that can answer the question sentence as a common referent of the term, and includes:
and processing the vector representation of each character in the spliced text by utilizing a machine reading understanding model, and finding out characters capable of answering the question sentence from each character of the text to be used as common referents of the words.
It can be seen from the above description that a question-answering framework based on a machine reading understanding model is used, question sentences are generated based on words needing coreference resolution, and the machine reading understanding model is enabled to find out characters capable of answering the question sentences from a spliced text by using the question sentences containing the words needing coreference resolution as coreference words of the words; the question-answering mechanism of natural language is skillfully used, and the coreference words of the words needing coreference resolution can be more accurately extracted from the text; moreover, the text and the problem sentence are processed by using a pre-training model and a machine reading understanding model at the front edge in the natural language processing, so that the accuracy of extracting the common-meaning word of the word needing common-meaning resolution from the text can be further improved, and the optimal effect is obtained.
In summary, according to the data processing apparatus provided in this embodiment, a question sentence is generated according to an obtained word, and a character capable of answering the question sentence is found out from the text as a common referent of the word, and compared with a method of performing common referent resolution based on tuple similarity comparison in the related art, the data processing apparatus can find out the character capable of answering the question sentence from the text as the common referent of the word through the question sentence generated by the word, and find out the common referent of the word from the text in a question-and-answer manner, thereby greatly improving accuracy of the common referent resolution.
Example 3
The present embodiment proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the data processing method described in embodiment 1 above. For specific implementation, refer to method embodiment 1, which is not described herein again.
In addition, referring to another schematic structural diagram of the data processing apparatus shown in fig. 3, the present embodiment further provides a data processing apparatus, which includes a bus 51, a processor 52, a transceiver 53, a bus interface 54, a memory 55, and a user interface 56. The data processing means comprise a memory 55.
In this embodiment, the data processing apparatus further includes: one or more programs stored on the memory 55 and executable on the processor 52, configured to be executed by the processor for performing the following steps (1) to (3):
(1) acquiring a text and words needing coreference resolution;
(2) generating a question sentence according to the words, and finding out characters capable of answering the question sentence from the text as common referents of the words;
(3) and extracting the co-reference words by using a candidate text extractor, and completing the co-reference resolution of the words.
A transceiver 53 for receiving and transmitting data under the control of the processor 52.
In fig. 3, a bus architecture (represented by bus 51), bus 51 may include any number of interconnected buses and bridges, with bus 51 linking together various circuits including one or more processors, represented by general purpose processor 52, and memory, represented by memory 55. The bus 51 may also link various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further in this embodiment. A bus interface 54 provides an interface between the bus 51 and the transceiver 53. The transceiver 53 may be one element or may be multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. For example: the transceiver 53 receives external data from other devices. The transceiver 53 is used for transmitting data processed by the processor 52 to other devices. Depending on the nature of the computing system, a user interface 56, such as a keypad, display, speaker, microphone, joystick, may also be provided.
The processor 52 is responsible for managing the bus 51 and the usual processing, running a general-purpose operating system as described above. And memory 55 may be used to store data used by processor 52 in performing operations.
Alternatively, processor 52 may be, but is not limited to: a central processing unit, a singlechip, a microprocessor or a programmable logic device.
It will be appreciated that the memory 55 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data rate Synchronous Dynamic random access memory (ddr SDRAM ), Enhanced Synchronous SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and direct memory bus RAM (DRRAM). The memory 55 of the systems and methods described in this embodiment is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, memory 55 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof: an operating system 551 and application programs 552.
The operating system 551 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 552 includes various applications, such as a Media Player (Media Player), a Browser (Browser), and the like, for implementing various application services. A program implementing the method of an embodiment of the present invention may be included in the application 552.
In summary, according to the computer-readable storage medium and the data processing apparatus provided in this embodiment, a question sentence is generated according to an acquired word, and a character capable of answering the question sentence is found from the text as a common referent of the word, and compared with a method of performing common referent resolution based on tuple similarity comparison in the related art, the method can find the character capable of answering the question sentence from the text as the common referent of the word through the question sentence generated by the word, and find the common referent of the word from the text in a question-and-answer manner, thereby greatly improving accuracy of the common referent resolution.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A data processing method, comprising:
acquiring a text and words needing coreference resolution;
generating a question sentence according to the words, and finding out characters capable of answering the question sentence from the text as common referents of the words;
and extracting the co-reference words by using a candidate text extractor, and completing the co-reference resolution of the words.
2. The method of claim 1, wherein generating a question sentence from the words and finding characters from the text that can answer the question sentence as common referents to the words comprises:
acquiring a question template, filling the words into the question template, and generating question sentences related to the words;
splicing the question sentence with characters in the text to obtain a spliced text;
processing the spliced text by using a pre-training model BERT to obtain vector representation of each character in the spliced text;
and finding out characters capable of answering the question sentence from the spliced text as common referents of the words.
3. The method of claim 2, wherein finding characters from the stitched text that can answer the question sentence as a common referent of the term comprises:
and processing the vector representation of each character in the spliced text by utilizing a machine reading understanding model, and finding out characters capable of answering the question sentence from each character of the text to be used as common referents of the words.
4. A data processing apparatus, comprising:
the acquisition module is used for acquiring the text and the words needing coreference resolution;
the processing module is used for generating question sentences according to the words and phrases and finding out characters capable of answering the question sentences from the texts to be used as common referents of the words and phrases;
and the extraction module is used for extracting the common referent by using a candidate text extractor and finishing the common referent resolution of the terms.
5. The apparatus according to claim 4, wherein the processing module is specifically configured to:
acquiring a question template, filling the words into the question template, and generating question sentences related to the words;
splicing the question sentence with characters in the text to obtain a spliced text;
processing the spliced text by using a pre-training model BERT to obtain vector representation of each character in the spliced text;
and finding out characters capable of answering the question sentence from the spliced text as common referents of the words.
6. The apparatus of claim 5, wherein the extracting module is configured to find out characters capable of answering the question sentence from the concatenated text as common referents of the word, and includes:
and processing the vector representation of each character in the spliced text by utilizing a machine reading understanding model, and finding out characters capable of answering the question sentence from each character of the text to be used as common referents of the words.
7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1-3.
8. A data processing apparatus comprising a memory, a processor and one or more programs, wherein the one or more programs are stored in the memory and configured to cause the processor to perform the steps of the method of any of claims 1-3.
CN201911143298.0A 2019-11-20 2019-11-20 Data processing method and device Pending CN110909544A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911143298.0A CN110909544A (en) 2019-11-20 2019-11-20 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911143298.0A CN110909544A (en) 2019-11-20 2019-11-20 Data processing method and device

Publications (1)

Publication Number Publication Date
CN110909544A true CN110909544A (en) 2020-03-24

Family

ID=69816681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911143298.0A Pending CN110909544A (en) 2019-11-20 2019-11-20 Data processing method and device

Country Status (1)

Country Link
CN (1) CN110909544A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761920A (en) * 2020-06-05 2021-12-07 北京金山数字娱乐科技有限公司 Word processing method and device based on double-task model

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012220666A (en) * 2011-04-07 2012-11-12 Nippon Telegr & Teleph Corp <Ntt> Reading comprehension question answering device, method, and program
US20140257792A1 (en) * 2013-03-11 2014-09-11 Nuance Communications, Inc. Anaphora Resolution Using Linguisitic Cues, Dialogue Context, and General Knowledge
CN105589844A (en) * 2015-12-18 2016-05-18 北京中科汇联科技股份有限公司 Missing semantic supplementing method for multi-round question-answering system
CN106462607A (en) * 2014-05-12 2017-02-22 谷歌公司 Automated reading comprehension
US20170351663A1 (en) * 2016-06-03 2017-12-07 Maluuba Inc. Iterative alternating neural attention for machine reading
CN107766320A (en) * 2016-08-23 2018-03-06 中兴通讯股份有限公司 A kind of Chinese pronoun resolution method for establishing model and device
CN108491421A (en) * 2018-02-07 2018-09-04 北京百度网讯科技有限公司 A kind of method, apparatus, equipment and computer storage media generating question and answer
CN109947912A (en) * 2019-01-25 2019-06-28 四川大学 A kind of model method based on paragraph internal reasoning and combined problem answer matches
CN110188362A (en) * 2019-06-10 2019-08-30 北京百度网讯科技有限公司 Text handling method and device
CN113627147A (en) * 2021-08-18 2021-11-09 上海明略人工智能(集团)有限公司 Entity alignment method and device based on multi-round reading understanding

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012220666A (en) * 2011-04-07 2012-11-12 Nippon Telegr & Teleph Corp <Ntt> Reading comprehension question answering device, method, and program
US20140257792A1 (en) * 2013-03-11 2014-09-11 Nuance Communications, Inc. Anaphora Resolution Using Linguisitic Cues, Dialogue Context, and General Knowledge
CN106462607A (en) * 2014-05-12 2017-02-22 谷歌公司 Automated reading comprehension
CN105589844A (en) * 2015-12-18 2016-05-18 北京中科汇联科技股份有限公司 Missing semantic supplementing method for multi-round question-answering system
US20170351663A1 (en) * 2016-06-03 2017-12-07 Maluuba Inc. Iterative alternating neural attention for machine reading
CN107766320A (en) * 2016-08-23 2018-03-06 中兴通讯股份有限公司 A kind of Chinese pronoun resolution method for establishing model and device
CN108491421A (en) * 2018-02-07 2018-09-04 北京百度网讯科技有限公司 A kind of method, apparatus, equipment and computer storage media generating question and answer
CN109947912A (en) * 2019-01-25 2019-06-28 四川大学 A kind of model method based on paragraph internal reasoning and combined problem answer matches
CN110188362A (en) * 2019-06-10 2019-08-30 北京百度网讯科技有限公司 Text handling method and device
CN113627147A (en) * 2021-08-18 2021-11-09 上海明略人工智能(集团)有限公司 Entity alignment method and device based on multi-round reading understanding

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WEI WU: "Coreference Resolution as Query-based Span Prediction", 《IEEE》, pages 1 - 6 *
刘雨江;付立军;刘俊明;吕鹏飞;: "基于多层注意力机制的回指消解算法", 计算机工程, no. 02 *
李映等: "基于中心理论和话语结构的交互式问答文本指代消解", 《中文信息学报》 *
李映等: "基于中心理论和话语结构的交互式问答文本指代消解", 《中文信息学报》, no. 04, 15 July 2016 (2016-07-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761920A (en) * 2020-06-05 2021-12-07 北京金山数字娱乐科技有限公司 Word processing method and device based on double-task model

Similar Documents

Publication Publication Date Title
CN112417102B (en) Voice query method, device, server and readable storage medium
US10417335B2 (en) Automated quantitative assessment of text complexity
US20170286376A1 (en) Checking Grammar Using an Encoder and Decoder
Mori Word-based partial annotation for efficient corpus construction
CN114036300A (en) Language model training method and device, electronic equipment and storage medium
US11327971B2 (en) Assertion-based question answering
CN110750977A (en) Text similarity calculation method and system
EP2447854A1 (en) Method and system of automatic diacritization of Arabic
CN116861242A (en) Language perception multi-language pre-training and fine tuning method based on language discrimination prompt
KR20240006688A (en) Correct multilingual grammar errors
Dong et al. Revisit input perturbation problems for llms: A unified robustness evaluation framework for noisy slot filling task
Kubis et al. Open challenge for correcting errors of speech recognition systems
CN112395866B (en) Customs clearance sheet data matching method and device
CN110909544A (en) Data processing method and device
US11416556B2 (en) Natural language dialogue system perturbation testing
CN115640810A (en) Method, system and storage medium for identifying communication sensitive information of power system
Zayyan et al. Automatic diacritics restoration for modern standard Arabic text
Wan et al. IBM research at the CoNLL 2018 shared task on multilingual parsing
CN112530406A (en) Voice synthesis method, voice synthesis device and intelligent equipment
Kim et al. How to utilize syllable distribution patterns as the input of LSTM for Korean morphological analysis
Reynolds et al. Automatic word stress annotation of Russian unrestricted text
Lazareva et al. Technology for mastering russian vocabulary by chinese students in the field of international trade
Rahman et al. Dense word representation utilization in Indonesian dependency parsing
CN117077664B (en) Method and device for constructing text error correction data and storage medium
CN110866390B (en) Method and device for recognizing Chinese grammar error, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination