WO2021164293A1 - 基于大数据的零指代消解方法、装置、设备及介质 - Google Patents

基于大数据的零指代消解方法、装置、设备及介质 Download PDF

Info

Publication number
WO2021164293A1
WO2021164293A1 PCT/CN2020/123173 CN2020123173W WO2021164293A1 WO 2021164293 A1 WO2021164293 A1 WO 2021164293A1 CN 2020123173 W CN2020123173 W CN 2020123173W WO 2021164293 A1 WO2021164293 A1 WO 2021164293A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
anaphoric
sentence
probability
item
Prior art date
Application number
PCT/CN2020/123173
Other languages
English (en)
French (fr)
Inventor
楼星雨
许开河
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021164293A1 publication Critical patent/WO2021164293A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of information technology, and in particular to a method, device, equipment and medium for zero-reference resolution based on big data.
  • Referential resolution is one of the technologies with the longest research time and very broad application scenarios in the field of natural language.
  • referential resolution is one of the most core technologies.
  • Reference resolution includes two parts: zero-reference resolution and co-reference resolution.
  • Zero-reference resolution is to find the corresponding language unit for the zero-reference item.
  • the zero-referential resolution task is usually divided into two sub-tasks-zero-referential position detection and resolution.
  • the purpose of the resolution task is to identify the specific anaphora for the zero-referential item with the preceding anaphora based on the detection result of the zero-referential position.
  • the traditional resolution model usually constructs the anaphoric candidate set first, and then uses the classification or ranking method to select the most likely candidate from the anaphoric candidate set as the final recognition result.
  • the construction of the candidate set of anaphoric items is often composed of the largest noun phrase and the modifier noun phrase in the two sentences before the zero-reference item. The inventor realizes that the accuracy of this approach very much depends on the accuracy of the candidate set of anaphora items. If the set does not contain the correct anaphora item, the subsequent recognition will inevitably fail. Since the candidate set of anaphora is only composed of a few simple noun phrases, this makes the traditional resolution method more unstable and less accurate.
  • the embodiments of the present application provide a method, device, equipment, and medium for zero-reference resolution based on big data to solve the problem that the existing zero-reference resolution technology is too dependent on the candidate set of anaphora items, and the resolution results are low in accuracy and unstable. problem.
  • a zero-reference resolution method based on big data including:
  • a continuous text segment with the largest anaphoric item probability is selected as the anaphoric item of the sentence to be resolved.
  • a big data-based zero-reference elimination device including:
  • the vectorization module is used to obtain the sentence to be digested and its preceding information, perform vectorization processing on the sentence to be digested and its preceding information, to obtain the context vector representation of each word in the sentence to be digested, and the The context vector representation of each word in the above information;
  • the enhancement module is used to input the context vector representation of each word in the sentence to be resolved and the above information into the bidirectional long and short-term memory network to enhance the context expression and position information of each word, and obtain the enhancement of each word
  • the latter context vector representation
  • the prediction module is used to traverse the enhanced context vector representation of each word, and predict the first word probability and the last word probability of the anaphora term of each word according to the parameter vector in the bert model;
  • the construction module is used to traverse each word, construct a continuous text segment, and calculate the anaphoric item probability of the continuous text segment according to the probability of the first word of the anaphoric item and the word end of the anaphoric item of each word;
  • the selecting module is used to select the continuous text segment with the largest anaphoric item probability from the continuous text segments as the anaphoric item of the sentence to be resolved.
  • a computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • a continuous text segment with the largest anaphoric item probability is selected as the anaphoric item of the sentence to be resolved.
  • One or more non-volatile readable storage media storing computer readable instructions.
  • the computer readable instructions execute the following steps:
  • a continuous text segment with the largest anaphoric item probability is selected as the anaphoric item of the sentence to be resolved.
  • FIG. 1 is a flow chart of a zero-reference resolution method based on big data in an embodiment of the present application
  • FIG. 2 is a flowchart of step S101 in a method for zero-reference resolution based on big data in another embodiment of the present application;
  • FIG. 3 is a flowchart of step S103 in the zero-reference resolution method based on big data in another embodiment of the present application;
  • step S104 is a flowchart of step S104 in a method for zero-reference resolution based on big data in another embodiment of the present application;
  • FIG. 5 is a flowchart of step S105 in a method for zero-reference resolution based on big data in another embodiment of the present application;
  • FIG. 6 is a functional block diagram of a zero-reference elimination device based on big data in an embodiment of the present application
  • Fig. 7 is a schematic diagram of a computer device in an embodiment of the present application.
  • the zero-reference resolution method based on big data provided by the embodiment of the present application is applied to a server.
  • the server can be implemented by an independent server or a server cluster composed of multiple servers.
  • a method for zero-reference resolution based on big data is provided, which includes the following steps:
  • step S101 obtain the sentence to be digested and its above information, perform vectorization processing on the sentence to be digested and its above information, to obtain the context vector representation of each word in the sentence to be digested, and the above The context vector representation of each word in the text information.
  • the above information refers to the text information before the paragraph where the sentence to be resolved is located, and may be one or more sentences before the sentence to be resolved.
  • the sentence to be resolved may be the text input by the customer during the current inquiry and chat or the result text after the phonetic conversion.
  • the context information may be all the texts of the customer's inquiry and chat.
  • Each word corresponds to a context vector representation
  • the context vector representation refers to the feature vector of each word.
  • the embodiment of this application obtains the context vector representation of each word by vectorizing the sentence to be digested and the above information, and introduces the word and the word through the word embedding method and the bert model The correlation between each word makes the context vector representation of each word more accurate and reduces the feature dimension in the context vector representation of each word.
  • the vectorization process is performed on the sentence to be digested and the information above in step S101 to obtain the context vector representation of each word in the sentence to be digested, and the The context vector representation of each word in the above information includes:
  • step S201 each word in the sentence to be digested and the above information is represented by one-hot form, to obtain the high-dimensional discrete word representation matrix corresponding to the sentence to be digested, and the corresponding information above.
  • High-dimensional discrete words represent a matrix.
  • the embodiment of the application adopts the one-hot form to convert the sentence to be resolved and the above information into a mathematical representation, to obtain the high-dimensional discrete word representation matrix corresponding to the sentence to be resolved, and the high-dimensional corresponding to the above information
  • Discrete words represent a matrix.
  • the dictionary contains at least all the words in the sentence to be digested, and each word is assigned a number.
  • each sentence contained in the sentence to be digested A word is converted into a one-hot form in which the number corresponding to the position of the word in the dictionary is 1.
  • the processing logic for the above information is the same and will not be repeated here.
  • One-hot representation is very intuitive.
  • the length of the one-hot form of each word is the length of the dictionary. If the dictionary contains 10,000 words, then the one-hot form corresponding to each word is a 1*10000 vector, and this Only one position of the vector is 1, and the rest are 0, which wastes space and is not conducive to calculation; in addition, the relationship between each word cannot be reflected in the form of one-hot.
  • the embodiment of the present application further reduces the dimensionality of the high-dimensional discrete word representation matrix corresponding to the sentence to be resolved and the high-dimensional discrete word representation matrix corresponding to the above information and introduces the correlation between words.
  • step S202 a word embedding method is adopted to respectively embed the high-dimensional discrete word representation matrix corresponding to the sentence to be digested and the high-dimensional discrete word representation matrix corresponding to the above information into a low-dimensional dense representation matrix.
  • the embodiment of the present application adopts the word2vec method in the word embedding method.
  • a preset shallow neural network is trained, and the dense feature vector of each word is learned through the shallow neural network to obtain a word vector representation that can reflect the relationship between any two words.
  • traverse the one-hot form of each word in the sentence to be digested and transform the word in the sentence to be digested into the corresponding dense feature vector through the trained shallow neural network, and combine the dense feature vector of all the words to get the result.
  • the processing logic for the context information is the same, and will not be repeated here.
  • the dense feature vector assigns a fixed-length vector representation to each word.
  • the fixed length can be set by itself, such as 300, which is much smaller than the dictionary length in the one-hot form; and the relationship between the two words It can be expressed by the value of the angle between two words, and it can be expressed by a simple cosine function. It can be seen that the present embodiment introduces the correlation between words through the dense feature vector, and reduces the feature dimension in the sentence to be resolved and the above information.
  • step S203 the low-dimensional dense representation matrix corresponding to the sentence to be digested and the above information is input into a preset bert model to perform bidirectional encoding to obtain the context of each word in the sentence to be digested and the above information Vector representation.
  • the context vector representation refers to a feature vector containing above and below information.
  • the bert model can perform in-depth bidirectional encoding of each word vector representation, turning a word vector that does not originally contain context information into a context vector that introduces the above and below information of the word.
  • the bert model maps the input low-dimensional dense representation matrix to the hidden space through the 24-layer transformation module Transformer-block.
  • each Transformer-block consists of a multi-head attention module, a residual network, a layer standardization, and a feedforward neural network.
  • the low-dimensional dense representation matrix can learn the interaction information between contexts, and add position coding, so that each word vector in the hidden space obtained by the bert model is a context-based vector representation.
  • Each word corresponds to a context vector representation, which makes the vector representation of each word more accurate.
  • step S102 the context vector representation of each word in the sentence to be resolved and the above information is input into the bidirectional long and short-term memory network to enhance the context expression and position information of each word, and obtain each word enhancement The latter context vector representation.
  • the bert model uses a multi-head attention module to make the low-dimensional dense representation matrix learn the interactive information between contexts. It only adds position-encoded information during input. After 24 layers of Transformer-block operations, the final The output result will have insufficient position information, and the context expression and position information will be weak.
  • the embodiment of the present application inputs the context vector representation of each word in the sentence to be resolved and the above information into a two-way LSTM network, and the two-way LSTM network directly learns the dependence between words. relation.
  • the context vector of each word When the context vector of each word is entered according to the order in the sentence to be resolved, the context vector of each word has strong position information coding ability, which can further enhance the context expression and position information of each word, and obtain The enhanced context vector representation.
  • step S103 the enhanced context vector representation of each word is traversed, and the probability of the anaphoric term head word and the anaphoric term tail word probability of each word are predicted according to the parameter vector in the bert model.
  • the bert model obtains the context vector representation of each word in the sentence to be resolved and the above information by establishing two parameter vectors to be learned.
  • the two parameter vectors to be learned are the first word parameter vector and the last word parameter vector, which are the parameter vectors in the bert model.
  • the embodiment of the present application further uses the parameter vector in the bert model to predict the probability of the first word of the anaphoric term and the probability of the last word of the anaphoric term of each word.
  • traversing the enhanced context vector representation of each word in step S103, and predicting the first word probability and the last word of the anaphoric term of each word according to the parameter vector in the bert model Probabilities include:
  • step S301 the first word parameter vector and the last word parameter vector in the bert model are obtained.
  • the head word parameter vector and the tail word parameter vector are two vectors that are randomly initialized in the bert model, and can be continuously learned by optimizing the objective function.
  • step S302 perform a dot product operation on the enhanced context vector representation of each word and the head word parameter vector, and perform softmax processing on the result of the dot product operation to obtain the head word probability of each word anaphora.
  • each word is traversed, and the dot product of the enhanced context vector representation of the word and the header parameter vector is calculated.
  • the dot product of the enhanced context vector representation of the word and the header parameter vector is calculated.
  • This embodiment further performs numerical processing on the multiple dot products through the Softmax function, converts the multiple dot products into relative probabilities, and selects the maximum value among them as the anaphoric head word probability of the word.
  • step S303 the enhanced context vector representation of each word is subjected to a dot product operation with the tail word parameter vector, and the result of the dot product operation is subjected to softmax processing to obtain the anaphoric tail word probability of each word.
  • the calculation process of the ending word probability is the same as that of the head word probability.
  • each word is traversed, and the dot product of the context vector representation after the word enhancement and the tail word parameter vector is calculated.
  • the bert model there are multiple suffix parameter vectors, and each suffix parameter vector corresponds to one dot product, thereby obtaining multiple dot products corresponding to each word.
  • This embodiment further performs numerical processing on the multiple dot products through the Softmax function, converts the multiple dot products into relative probabilities, and selects the maximum value among them as the anaphoric end word probability of the word.
  • step S104 each word is traversed to construct a continuous text segment, and the anaphoric item probability of the continuous text segment is calculated according to the probability of the first word of the anaphoric item and the word end of the anaphoric item of each word.
  • the continuous text segment is the continuous segment between the character as the head character of the anaphoric item and other characters as the suffix word of the anaphoric item.
  • the anaphoric term probability of the continuous text segment refers to the product of the anaphoric term leading word probability of the character as the anaphoric term head word and the anaphoric term trailing word probability of other characters as the anaphoric term ending word.
  • each word is traversed to construct a continuous text segment, and the continuous text segment is calculated according to the probability of the first word of the anaphoric term and the probability of the last word of the anaphoric term of each word.
  • the probabilities of anaphora include:
  • step S401 each word is traversed, and the word is used as the head word of the anaphoric item, and the word and the following words are used as the last word of the anaphoric item to construct a continuous text segment.
  • the continuous text segment can be a single word, or it can be a word, a sentence, or a paragraph, thereby providing a new way to create a candidate set of anaphora items. Any fragment in the information is used as an anaphoric item candidate, and there is no need to manually create an anaphoric item candidate set, which effectively expands the range of anaphoric item candidates.
  • step S402 the product of the probabilities of the anaphoric items at the beginning of the continuous text segment and the probabilities of the anaphoric items at the end of the continuous text segment is calculated to obtain the anaphoric item probability of the continuous text segment.
  • the present embodiment calculates the anaphoric item probability of the continuous text segment according to the beginning and end of the continuous text segment.
  • the value of the anaphoric item probability is the product of the probabilities of the first word of the anaphoric item of the first word and the probability of the last word of the anaphoric item of the last word in the continuous segment.
  • step S105 the continuous text segment with the largest anaphoric item probability is selected from the continuous text segments as the anaphoric item of the sentence to be resolved.
  • the selection of the continuous text segment with the largest anaphoric item probability from the continuous text segments as the anaphoric item of the sentence to be resolved in step S105 includes:
  • step S501 the continuous text segments that have an intersection with the sentence to be resolved are filtered out from the continuous text segments.
  • the continuous text segment is deleted to complete the filtering of the continuous text segment. Specifically, it is possible to determine whether the beginning and/or end words of the continuous text segment fall within the sentence to be resolved, and delete the consecutive text segments where the beginning and/or end words fall within the sentence to be resolved to ensure retention The first and last words of the successive text fragments are not in the current sentence to be resolved. Since the number of continuous text fragments is very large, and anaphora items usually do not overlap with the sentence to be resolved, this embodiment filters out continuous text fragments that are clearly non-anaphora items by deleting continuous text fragments that overlap with the sentence to be resolved. , In order to reduce the continuous text fragments as candidates, improve the efficiency of selecting the anaphoric items of the sentence to be resolved, and thereby improve the efficiency of zero-reference resolution.
  • step S502 the continuous text segment with the largest anaphoric item probability is selected from the filtered continuous text segments as the anaphoric item of the sentence to be resolved.
  • a continuous text segment with the largest anaphoric item probability is selected from the preserved continuous text segments as the anaphoric item of the sentence to be resolved.
  • the embodiment of the present application provides a new method for determining anaphora items. Based on an extractive reading comprehension model, all consecutive segments in the preceding text can be used as candidate anaphora items, without using rules to pre-build a candidate anaphora item set. The number of referents and the degree of coverage are greater, which effectively avoids the problem of recognition failure caused when the candidate set of anaphora items does not contain the correct anaphora in the prior art, and solves the problem of over-reliance on existing zero-reference resolution methods The problem of the candidate set of anaphora items improves the accuracy and reliability of the zero-reference resolution results.
  • a big data-based zero-reference resolution device is provided, and the big data-based zero-reference resolution device corresponds to the big data-based zero-reference resolution method in the above-mentioned embodiment in a one-to-one correspondence.
  • the big data-based zero-reference elimination device includes a vectorization module 61, an enhancement module 62, a prediction module 63, a construction module 64, and a selection module 65.
  • the detailed description of each functional module is as follows:
  • the vectorization module 61 is used to obtain the sentence to be digested and its above information, perform vectorization processing on the sentence to be digested and its above information, to obtain the context vector representation of each word in the sentence to be digested, and The context vector representation of each word in the above information;
  • the enhancement module 62 is used to input the context vector representation of each word in the sentence to be digested and the above information into the bidirectional long and short-term memory network to enhance the context expression and position information of each word, and obtain each word Enhanced context vector representation;
  • the prediction module 63 is used to traverse the enhanced context vector representation of each word, and predict the first word probability and the last word probability of the anaphora term of each word according to the parameter vector in the bert model;
  • the construction module 64 is used to traverse each word, construct a continuous text segment, and calculate the anaphoric item probability of the continuous text segment according to the probability of the first word of the anaphoric item and the word end of the anaphoric item of each word;
  • the selecting module 65 is configured to select the continuous text segment with the largest anaphoric item probability from the continuous text segments as the anaphoric item of the sentence to be resolved.
  • the vectorization module 61 includes:
  • the characterization unit is used to characterize each word in the sentence to be digested and the above information in one-hot form, to obtain the high-dimensional discrete word representation matrix corresponding to the sentence to be digested, and the corresponding information above High-dimensional discrete word representation matrix;
  • the embedding unit is configured to use a word embedding method to respectively embed the high-dimensional discrete word representation matrix corresponding to the sentence to be digested and the high-dimensional discrete word representation matrix corresponding to the above information into a low-dimensional dense representation matrix;
  • the coding unit is used to input the low-dimensional dense representation matrix corresponding to the sentence to be digested and the above information into a preset bert model for bidirectional encoding to obtain the context of each word in the sentence to be digested and the above information Vector representation.
  • the prediction module 63 includes:
  • the acquiring unit is used to acquire the first word parameter vector and the last word parameter vector in the bert model
  • a head word probability calculation unit configured to perform a dot product operation on the enhanced context vector representation of each word and the head word parameter vector, and perform softmax processing on the dot product operation result to obtain the head word probability of each word anaphora;
  • the end word probability calculation unit is configured to perform a dot product operation on the enhanced context vector representation of each word and the end word parameter vector, and perform softmax processing on the result of the dot product operation to obtain the end word probability of each word.
  • the construction module 64 includes:
  • the construction unit is used to traverse each word, use the word as the head word of the anaphoric item, and use the word and the following words as the last word of the anaphoric item to construct a continuous text segment;
  • the calculation unit is used to calculate the product of the probabilities of the first word of the anaphoric term at the beginning and the probabilities of the last word of the anaphoric term in the continuous text segment to obtain the probability of the anaphoric term of the continuous text segment.
  • the selection module 65 includes:
  • a filtering unit configured to filter out continuous text fragments that have an intersection with the sentence to be resolved from the continuous text fragments
  • the selecting unit is configured to select the continuous text segment with the largest anaphoric item probability from the filtered continuous text segments as the anaphoric item of the sentence to be resolved.
  • the various modules in the above-mentioned big data-based zero-reference elimination device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 7.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by the processor, a zero-reference resolution method based on big data is realized.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • a continuous text segment with the largest anaphoric item probability is selected as the anaphoric item of the sentence to be resolved.
  • one or more non-volatile readable storage media storing computer readable instructions are provided.
  • the computer readable instructions are executed by one or more processors, the one or more Each processor performs the following steps:
  • a continuous text segment with the largest anaphoric item probability is selected as the anaphoric item of the sentence to be resolved.
  • the computer-readable storage medium may be non-volatile or volatile.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

一种基于大数据的零指代消解方法,所述方法包括:获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示(S101);将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示(S102);遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率(S103);遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率(S104);从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项(S105)。该方法解决了现有零指代消解技术过于依赖回指项候选集合、消解结果准确性低且不稳定的问题。

Description

基于大数据的零指代消解方法、装置、设备及介质
本申请要求于2020年2月18日提交中国专利局、申请号为202010099118.X  ,发明名称为“基于大数据的零指代消解方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息技术领域,尤其涉及一种基于大数据的零指代消解方法、装置、设备及介质。
 
背景技术  
指代消解是自然语言领域中研究时间最长且应用场景十分广阔的技术之一。在客服机器人、对话机器人以及智能外呼平台中,指代消解都是最为核心的技术之一。指代消解包括零指代消解和共指消解两部分。
在形如中文等代词缺失的语言中,能够根据上下文的关系所推断出来的部分经常被省略,而被省略的部分在句子中又承担相应的句法成分,并且回指前文中的某个语言单位。其中被省略的部分就称为是零指代项。零指代消解就是要为零指代项找到前文中对应的语言单位。零指代消解任务通常又分为两个子任务——零指代位置检测和消解。
消解任务的目的是在零指代位置检测结果的基础上为具有前文回指项的零指代项识别出它的具体回指项。传统的消解模型通常是先构建回指项候选集合,然后使用分类或排序的方法从回指项候选集合中选取最可能的候选项作为最终的识别结果。回指项候选集合的构建往往是由零指代项之前两句上文中的最大名词短语和修饰性名词短语所组成的。发明人意识到,这种做法的准确性十分依赖于回指项候选集合的准确性,若集合中并不包含正确的回指项,那么必然会导致后续的识别失败。由于回指项候选集合只由简单的几个名词性短语所组成,因此这使得传统的消解方法具有较高的不稳定性和较低的准确性。
因此,寻找一种解决现有零指代消解技术过于依赖回指项候选集合、消解结果准确性低且不稳定的问题的方法成为本领域技术人员亟需解决的技术问题。
 
发明内容
本申请实施例提供了一种基于大数据的零指代消解方法、装置、设备及介质,以解决现有零指代消解技术过于依赖回指项候选集合、消解结果准确性低且不稳定的问题。
一种基于大数据的零指代消解方法,包括:
获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
一种基于大数据的零指代消解装置,包括:
向量化模块,用于获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
增强模块,用于将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
预测模块,用于遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
构建模块,用于遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
选取模块,用于从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
 
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中基于大数据的零指代消解方法的一流程图;
图2是本申请另一实施例中基于大数据的零指代消解方法中步骤S101的一流程图;
图3是本申请另一实施例中基于大数据的零指代消解方法中步骤S103的一流程图;
图4是本申请另一实施例中基于大数据的零指代消解方法中步骤S104的一流程图;
图5是本申请另一实施例中基于大数据的零指代消解方法中步骤S105的一流程图;
图6 是本申请一实施例中基于大数据的零指代消解装置的一原理框图;
图7是本申请一实施例中计算机设备的一示意图。
 
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的基于大数据的零指代消解方法应用于服务器。所述服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。在一实施例中,如图1所示,提供一种基于大数据的零指代消解方法,包括如下步骤:
在步骤S101中,获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示。
在这里,所述上文信息是针对所述待消解句子所在段落之前的文本信息,可以是所述待消解句子之前的一个或多个句子。在客服机器人的应用场景下,所述待消解句子可以为客户当前进行询问和聊天时输入的文本或音转字之后的结果文本。所述上下文信息可以为所述客户进行询问和聊天的所有文本。
每个字对应一个上下文向量表示,所述上下文向量表示是指每个字的特征向量。作为本申请的一个优选示例,本申请实施例通过对所述待消解句子及其上文信息进行向量化,得到每个字的上下文向量表示,并通过word embedding方法和bert模型引入字与字之间的相关性,使得每个字的上下文向量表示更为准确,并降低每个字的上下文向量表示中的特征维度。可选地,如图2所示,步骤S101中所述的对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示包括:
在步骤S201中,对所述待消解句子及其上文信息中的每个字采用one-hot形式表征,得到所述待消解句子对应的高维离散字表示矩阵,及其上文信息对应的高维离散字表示矩阵。
在这里,本申请实施例采用one-hot形式将待消解句子及其上文信息转换成数学表示,得到所述待消解句子对应的高维离散字表示矩阵,及其上文信息对应的高维离散字表示矩阵。具体的,通过预先构建字典,所述字典中至少包含待消解句子中的所有字,且每个字分配一个编号,在对所述待消解句子进行编码时,将所述待消解句子包含的每一个字转换成字典里面所述字的编号对应位置为1的one-hot形式。对于上文信息的处理逻辑相同,此处不再赘述。One-hot表示方式非常直观,每一个字的one-hot形式的长度都是字典的长度,如字典包含10000个字,那么每个字对应的one-hot形式就是1*10000的向量,而这个向量只有一个位置为1,其余都是0,浪费空间,不利于计算;此外,每个字之间的关系无法通过one-hot形式体现出来。
鉴于此,本申请实施例进一步对所述待消解句子对应的高维离散字表示矩阵,及其上文信息对应的高维离散字表示矩阵进行降维和引入字与字之间的相关性。
在步骤S202中,采用word embedding的方法分别将所述待消解句子对应的高维离散字表示矩阵以及将所述上文信息对应的高维离散字表示矩阵嵌入到低维稠密表征矩阵。
可选地,本申请实施例采用word embedding方法中的word2vec方法。具体地,训练预设的浅层神经网络,通过所述浅层神经网络学习每个字的稠密特征向量,获得能够反映任意两个字之间关系的字向量表示。然后遍历待消解句子中的每个字的one-hot形式,通过训练好的浅层神经网络将待消解句子中的字转化为成对应的稠密特征向量,组合所有字的稠密特征向量,得到所述待消解句子对应的低维稠密表征矩阵。对于上下文信息的处理逻辑相同,此处不再赘述。
在这里,稠密特征向量为每个字分配了一个固定长度的向量表示,所述固定长度可以自行设定,比如300,远小于one-hot形式中的字典长度;而且两个字之间的关系可以通过两个字之间的夹角值表示,具体可通过简单的余弦函数表示。可见,本实施例通过稠密特征向量引入了字与字之间的相关性,并且降低了待消解句子及其上文信息中的特征维度。
在步骤S203中,将所述待消解句子及其上文信息对应的低维稠密表征矩阵输入预设的bert模型进行双向编码,得到所述待消解句子及其上文信息中每个字的上下文向量表示。
在这里,所述上下文向量表示是指包含上文和下文信息的特征向量。bert模型可将每个字向量表示进行深层的双向编码,将原来不包含上下文信息的字向量变成引入了该字上文和下文信息的上下文向量。具体地,bert模型通过24层的转换模块Transformer-block将输入的低维稠密表征矩阵映射到隐空间。其中,每个Transformer-block依次由多头注意力模块、残差网络、层标准化以及前馈式神经网络等模块组成。在多头注意力模块中,低维稠密表征矩阵可以学习到上下文之间的交互信息,并且加入位置编码,从而使得bert模型得到的隐空间中的每个字向量是基于上下文的向量表示。每个字对应于一个上下文向量表示,从而使得每个字的向量表示更为准确。
在步骤S102中,将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示。
如前所述,bert模型是通过多头注意力模块来使低维稠密表征矩阵学习上下文之间的交互信息,只在输入时候加入了位置编码的信息,在经过24层Transformer-block运算后,最终输出的结果在位置信息上会有所不足,上下文表达和位置信息较弱。为了解决上述问题,本申请实施例通过将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向LSTM网络,由所述双向LSTM网络直接学习字与字之间的依赖关系。当每个字的上下文向量表示按照在待消解句子中的顺序输入时,每个字的上下文向量表示具有很强的位置信息编码能力,从而可以进一步增强每个字的上下文表达和位置信息,得到增强后的上下文向量表示。
在步骤S103中,遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率。
在上述步骤S203中,bert模型通过建立两个待学习的参数向量来得到所述待消解句子及其上文信息中每个字的上下文向量表示。该两个待学习的参数向量分别为头字参数向量和尾字参数向量,是bert模型中的参数向量。本申请实施例进一步利用所述bert模型中的参数向量,来预测每个字的回指项头字概率和回指项尾字概率。可选地,如图3所示,步骤S103所述的遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率包括:
在步骤S301中,获取bert模型中的头字参数向量和尾字参数向量。
在这里,所述头字参数向量和所述尾字参数向量均为bert模型中随机初始化的两个向量,可以通过优化目标函数不断的学习。
在步骤S302中,将每个字增强后的上下文向量表示与所述头字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项头字概率。
在得到头字参数向量之后,遍历每个字,计算所述字增强后的上下文向量表示与所述头字参数向量的点积。在所述bert模型中,头字参数向量有多个,每一个头字参数向量对应一个点积,从而得到每个字对应的多个点积。本实施例进一步通过Softmax函数对所述多个点积进行数值处理,将所述多个点积转化为相对概率,并选择其中的最大值作为所述字的回指项头字概率。
在步骤S303中,将每个字增强后的上下文向量表示与所述尾字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项尾字概率。
尾字概率的计算过程与头字概率的计算过程相同。在得到尾字向量参数之后,遍历每个字,计算所述字增强后的上下文向量表示与所述尾字参数向量的点积。在所述bert模型中,尾字参数向量有多个,每一个尾字参数向量对应一个点积,从而得到每个字对应的多个点积。本实施例进一步通过Softmax函数对所述多个点积进行数值处理,将所述多个点积转化为相对概率,并选择其中的最大值作为所述字的回指项尾字概率。
在步骤S104中,遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率。
在这里,对于每个字,连续文本片段为所述字作为回指项头字与作为回指项尾字的其他字之间的连续片段。所述连续文本片段的回指项概率是指作为回指项头字的所述字的回指项头字概率与作为回指项尾字的其他字的回指项尾字概率的乘积。可选地,如图4所示,步骤S104所述的遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率包括:
在步骤S401中,遍历每个字,以所述字作为回指项头字,以所述字及之后的字作为回指项尾字,构建连续文本片段。
在这里,所述连续文本片段可以是单个字,也可以是一个词、一个句子、一个文段,从而提供了一种新的回指项候选集合的创建方式,可以将待消解句子的上文信息中的任意片段作为回指项候选项,且无需手动创建回指项候选集,有效地扩展了回指项候选项的范围。
在步骤S402中,计算所述连续文本片段中头字的回指项头字概率和尾字的回指项尾字概率的乘积,得到所述连续文本片段的回指项概率。
对于每一个连续片段,本实施例根据该连续文本片段中的头字和尾字,计算所述连续文本片段的回指项概率。回指项概率的值为所述连续片段中头字的回指项头字概率和尾字的回指项尾字概率的乘积。其中,所述连续片段中头字的回指项头字概率和尾字的回指项尾字概率的乘积越大,所述连续片段的回指项概率越大,反之则越小。
在步骤S105中,从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
可选地,如图5所示,步骤S105所述的从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项包括:
在步骤S501中,从所述连续文本片段中过滤掉与待消解句子有交集的连续文本片段。
在这里,若连续文本片段中的任意部分与待消解句子有重叠时,删除所述连续文本片段,以完成对所述连续文本片段的过滤。具体地,可以通过判断连续文本片段的头字和/或尾字是否落在所述待消解句子内,删除头字和/或尾字落在所述待消解句子内的连续文本片段,保证保留下来的连续文本片段的头字和尾字均不在当前待消解句子。由于连续文本片段的数量非常大,而回指项通常不会与待消解句子重叠,本实施例通过删除与待消解句子有交集的连续文本片段,过滤掉明确为非回指项的连续文本片段,以减少作为候选项的连续文本片段,提高选取所述待消解句子的回指项的效率,进而提高零指代消解的效率。
在步骤S502中,从过滤后的连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
然后根据步骤S104计算得到的回指项概率,从保留下来的连续文本片段中选择回指项概率最大的连续文本片段,作为所述待消解句子的回指项。
本申请实施例提供了一种新的回指项确定方法,基于抽取式的阅读理解模型,可将前文中所有连续片段作为候选回指项,无需使用规则预先构建回指项候选集合,候选回指项的数量和覆盖程度更大,有效地避免了现有技术中回指项候选集合不包含正确的回指项时导致的识别失败的问题,以及解决了现有零指代消解方法过于依赖回指项候选集合的问题,提高了零指代消解结果的准确性和可靠性。
 
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
 
在一实施例中,提供一种基于大数据的零指代消解装置,该基于大数据的零指代消解装置与上述实施例中基于大数据的零指代消解方法一一对应。如图6所示,该基于大数据的零指代消解装置包括向量化模块61、增强模块62、预测模块63、构建模块64、选取模块65。各功能模块详细说明如下:
向量化模块61,用于获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
增强模块62,用于将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
预测模块63,用于遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
构建模块64,用于遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
选取模块65,用于从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
可选地,所述向量化模块61包括:
表征单元,用于对所述待消解句子及其上文信息中的每个字采用one-hot形式表征,得到所述待消解句子对应的高维离散字表示矩阵,及其上文信息对应的高维离散字表示矩阵;
嵌入单元,用于采用word embedding的方法分别将所述待消解句子对应的高维离散字表示矩阵以及将所述上文信息对应的高维离散字表示矩阵嵌入到低维稠密表征矩阵;
编码单元,用于将所述待消解句子及其上文信息对应的低维稠密表征矩阵输入预设的bert模型进行双向编码,得到所述待消解句子及其上文信息中每个字的上下文向量表示。
可选地,所述预测模块63包括:
获取单元,用于获取bert模型中的头字参数向量和尾字参数向量;
头字概率计算单元,用于将每个字增强后的上下文向量表示与所述头字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项头字概率;
尾字概率计算单元,用于将每个字增强后的上下文向量表示与所述尾字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项尾字概率。
可选地,所述构建模块64包括:
构建单元,用于遍历每个字,以所述字作为回指项头字,以所述字及之后的字作为回指项尾字,构建连续文本片段;
计算单元,用于计算所述连续文本片段中头字的回指项头字概率和尾字的回指项尾字概率的乘积,得到所述连续文本片段的回指项概率。
可选地,所述选取模块65包括:
过滤单元,用于从所述连续文本片段中过滤掉与待消解句子有交集的连续文本片段;
选取单元,用于从过滤后的连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
关于基于大数据的零指代消解装置的具体限定可以参见上文中对于基于大数据的零指代消解方法的限定,在此不再赘述。上述基于大数据的零指代消解装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种基于大数据的零指代消解方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:
获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
所述计算机可读存储介质可以是非易失性,也可以是易失性。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink) DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内

Claims (20)

  1. 一种基于大数据的零指代消解方法,包括:
    获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
    将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
    遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
    遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
    从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
  2. 如权利要求1所述的基于大数据的零指代消解方法,其中,所述对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示包括:
    对所述待消解句子及其上文信息中的每个字采用one-hot形式表征,得到所述待消解句子对应的高维离散字表示矩阵,及其上文信息对应的高维离散字表示矩阵;
    采用word embedding的方法分别将所述待消解句子对应的高维离散字表示矩阵以及将所述上文信息对应的高维离散字表示矩阵嵌入到低维稠密表征矩阵;
    将所述待消解句子及其上文信息对应的低维稠密表征矩阵输入预设的bert模型进行双向编码,得到所述待消解句子及其上文信息中每个字的上下文向量表示。
  3. 如权利要求1或2所述的基于大数据的零指代消解方法,其中,所述遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率包括:
    获取bert模型中的头字参数向量和尾字参数向量;
    将每个字增强后的上下文向量表示与所述头字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项头字概率;
    将每个字增强后的上下文向量表示与所述尾字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项尾字概率。
  4. 如权利要求3所述的基于大数据的零指代消解方法,其中,所述遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率包括:
    遍历每个字,以所述字作为回指项头字,以所述字及之后的字作为回指项尾字,构建连续文本片段;
    计算所述连续文本片段中头字的回指项头字概率和尾字的回指项尾字概率的乘积,得到所述连续文本片段的回指项概率。
  5. 如权利要求4所述的基于大数据零指代消解方法,其中,所述从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项包括:
    从所述连续文本片段中过滤掉与待消解句子有交集的连续文本片段;
    从过滤后的连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
  6. 一种基于大数据的零指代消解装置,包括:
    向量化模块,用于获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
    增强模块,用于将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
    预测模块,用于遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
    构建模块,用于遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
    选取模块,用于从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
  7. 如权利要求6所述的基于大数据的零指代消解装置,其中,所述向量化模块包括:
    表征单元,用于对所述待消解句子及其上文信息中的每个字采用one-hot形式表征,得到所述待消解句子对应的高维离散字表示矩阵,及其上文信息对应的高维离散字表示矩阵;
    嵌入单元,用于采用word embedding的方法分别将所述待消解句子对应的高维离散字表示矩阵以及将所述上文信息对应的高维离散字表示矩阵嵌入到低维稠密表征矩阵;
    编码单元,用于将所述待消解句子及其上文信息对应的低维稠密表征矩阵输入预设的bert模型进行双向编码,得到所述待消解句子及其上文信息中每个字的上下文向量表示。
  8. 如权利要求6或7所述的基于大数据的零指代消解装置,其中,所述预测模块包括:
    获取单元,用于获取bert模型中的头字参数向量和尾字参数向量;
    头字概率计算单元,用于将每个字增强后的上下文向量表示与所述头字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项头字概率;
    尾字概率计算单元,用于将每个字增强后的上下文向量表示与所述尾字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项尾字概率。
  9. 如权利要求8所述的基于大数据的零指代消解装置,其中,所述构建模块包括:
    构建单元,用于遍历每个字,以所述字作为回指项头字,以所述字及之后的字作为回指项尾字,构建连续文本片段;
    计算单元,用于计算所述连续文本片段中头字的回指项头字概率和尾字的回指项尾字概率的乘积,得到所述连续文本片段的回指项概率。
  10. 如权利要求9所述的基于大数据的零指代消解装置,其中,所述选取模块包括:
    过滤单元,用于从所述连续文本片段中过滤掉与待消解句子有交集的连续文本片段;
    选取单元,用于从过滤后的连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
    将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
    遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
    遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
    从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
  12. 如权利要求11所述的计算机设备,其中,所述对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示包括:
    对所述待消解句子及其上文信息中的每个字采用one-hot形式表征,得到所述待消解句子对应的高维离散字表示矩阵,及其上文信息对应的高维离散字表示矩阵;
    采用word embedding的方法分别将所述待消解句子对应的高维离散字表示矩阵以及将所述上文信息对应的高维离散字表示矩阵嵌入到低维稠密表征矩阵;
    将所述待消解句子及其上文信息对应的低维稠密表征矩阵输入预设的bert模型进行双向编码,得到所述待消解句子及其上文信息中每个字的上下文向量表示。
  13. 如权利要求11或12所述的计算机设备,其中,所述遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率包括:
    获取bert模型中的头字参数向量和尾字参数向量;
    将每个字增强后的上下文向量表示与所述头字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项头字概率;
    将每个字增强后的上下文向量表示与所述尾字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项尾字概率。
  14. 如权利要求13所述的计算机设备,其中,所述遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率包括:
    遍历每个字,以所述字作为回指项头字,以所述字及之后的字作为回指项尾字,构建连续文本片段;
    计算所述连续文本片段中头字的回指项头字概率和尾字的回指项尾字概率的乘积,得到所述连续文本片段的回指项概率。
  15. 如权利要求14所述的计算机设备,其中,所述从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项包括:
    从所述连续文本片段中过滤掉与待消解句子有交集的连续文本片段;
    从过滤后的连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
  16. 一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取待消解句子及其上文信息,对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示;
    将所述待消解句子和所述上文信息中每个字的上下文向量表示输入双向长短期记忆网络中,以增强每个字的上下文表达和位置信息,得到每个字增强后的上下文向量表示;
    遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率;
    遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率;
    从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
  17. 如权利要求16所述的非易失性可读存储介质,其中,所述对所述待消解句子及其上文信息执行向量化处理,得到所述待消解句子中每个字的上下文向量表示,以及所述上文信息中每个字的上下文向量表示包括:
    对所述待消解句子及其上文信息中的每个字采用one-hot形式表征,得到所述待消解句子对应的高维离散字表示矩阵,及其上文信息对应的高维离散字表示矩阵;
    采用word embedding的方法分别将所述待消解句子对应的高维离散字表示矩阵以及将所述上文信息对应的高维离散字表示矩阵嵌入到低维稠密表征矩阵;
    将所述待消解句子及其上文信息对应的低维稠密表征矩阵输入预设的bert模型进行双向编码,得到所述待消解句子及其上文信息中每个字的上下文向量表示。
  18. 如权利要求16或17所述的非易失性可读存储介质,其中,所述遍历每个字增强后的上下文向量表示,根据bert模型中的参数向量预测每个字的回指项头字概率和回指项尾字概率包括:
    获取bert模型中的头字参数向量和尾字参数向量;
    将每个字增强后的上下文向量表示与所述头字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项头字概率;
    将每个字增强后的上下文向量表示与所述尾字参数向量进行点积运算,对点积运算结果进行softmax处理得到每个字的回指项尾字概率。
  19. 如权利要求18所述的非易失性可读存储介质,其中,所述遍历每个字,构建连续文本片段,并根据每个字的回指项头字概率和回指项尾字概率计算所述连续文本片段的回指项概率包括:
    遍历每个字,以所述字作为回指项头字,以所述字及之后的字作为回指项尾字,构建连续文本片段;
    计算所述连续文本片段中头字的回指项头字概率和尾字的回指项尾字概率的乘积,得到所述连续文本片段的回指项概率。
  20. 如权利要求19所述的非易失性可读存储介质,其中,所述从所述连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项包括:
    从所述连续文本片段中过滤掉与待消解句子有交集的连续文本片段;
    从过滤后的连续文本片段中选取回指项概率最大的连续文本片段作为所述待消解句子的回指项。
     
PCT/CN2020/123173 2020-02-18 2020-10-23 基于大数据的零指代消解方法、装置、设备及介质 WO2021164293A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010099118.X 2020-02-18
CN202010099118.XA CN111401035A (zh) 2020-02-18 2020-02-18 基于大数据的零指代消解方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2021164293A1 true WO2021164293A1 (zh) 2021-08-26

Family

ID=71430335

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123173 WO2021164293A1 (zh) 2020-02-18 2020-10-23 基于大数据的零指代消解方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN111401035A (zh)
WO (1) WO2021164293A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511798A (zh) * 2021-12-10 2022-05-17 安徽大学 基于transformer的驾驶员分心检测方法及装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401035A (zh) * 2020-02-18 2020-07-10 平安科技(深圳)有限公司 基于大数据的零指代消解方法、装置、设备及介质
CN112256868A (zh) * 2020-09-30 2021-01-22 华为技术有限公司 零指代消解方法、训练零指代消解模型的方法及电子设备
US11645465B2 (en) * 2020-12-10 2023-05-09 International Business Machines Corporation Anaphora resolution for enhanced context switching
CN112633014B (zh) * 2020-12-11 2024-04-05 厦门渊亭信息科技有限公司 一种基于神经网络的长文本指代消解方法和装置
CN112463942A (zh) * 2020-12-11 2021-03-09 深圳市欢太科技有限公司 文本处理方法、装置、电子设备及计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294322A (zh) * 2016-08-04 2017-01-04 哈尔滨工业大学 一种基于lstm的汉语零指代消解方法
CN109165386A (zh) * 2017-08-30 2019-01-08 哈尔滨工业大学 一种中文零代词消解方法及系统
CN110162785A (zh) * 2019-04-19 2019-08-23 腾讯科技(深圳)有限公司 数据处理方法和代词消解神经网络训练方法
US20190354574A1 (en) * 2018-05-17 2019-11-21 Oracle International Corporation Systems and methods for scalable hierarchical coreference
CN111401035A (zh) * 2020-02-18 2020-07-10 平安科技(深圳)有限公司 基于大数据的零指代消解方法、装置、设备及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294322A (zh) * 2016-08-04 2017-01-04 哈尔滨工业大学 一种基于lstm的汉语零指代消解方法
CN109165386A (zh) * 2017-08-30 2019-01-08 哈尔滨工业大学 一种中文零代词消解方法及系统
US20190354574A1 (en) * 2018-05-17 2019-11-21 Oracle International Corporation Systems and methods for scalable hierarchical coreference
CN110162785A (zh) * 2019-04-19 2019-08-23 腾讯科技(深圳)有限公司 数据处理方法和代词消解神经网络训练方法
CN111401035A (zh) * 2020-02-18 2020-07-10 平安科技(深圳)有限公司 基于大数据的零指代消解方法、装置、设备及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FU JIAN KONG, FANG: "End to End Chinese Coreference Resolution with Structural Information", COMPUTER ENGINEERING, vol. 46, no. 1, 31 January 2020 (2020-01-31), pages 45 - 51, XP055839851 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511798A (zh) * 2021-12-10 2022-05-17 安徽大学 基于transformer的驾驶员分心检测方法及装置
CN114511798B (zh) * 2021-12-10 2024-04-26 安徽大学 基于transformer的驾驶员分心检测方法及装置

Also Published As

Publication number Publication date
CN111401035A (zh) 2020-07-10

Similar Documents

Publication Publication Date Title
WO2021164293A1 (zh) 基于大数据的零指代消解方法、装置、设备及介质
JP5901001B1 (ja) 音響言語モデルトレーニングのための方法およびデバイス
WO2021042503A1 (zh) 信息分类抽取方法、装置、计算机设备和存储介质
CN110413788B (zh) 会话文本的场景类别的预测方法、系统、设备和存储介质
CN110955761A (zh) 文书中问答数据获取方法、装置、计算机设备和存储介质
CN111291195B (zh) 一种数据处理方法、装置、终端及可读存储介质
EP1475778A1 (en) Rules-based grammar for slots and statistical model for preterminals in natural language understanding system
WO2020253669A1 (zh) 基于机器翻译模型的翻译方法、装置、设备和存储介质
WO2022121251A1 (zh) 文本处理模型训练方法、装置、计算机设备和存储介质
CN111108501A (zh) 一种基于上下文的多轮对话方法、装置、设备及存储介质
CN110704576A (zh) 一种基于文本的实体关系抽取方法及装置
US11636272B2 (en) Hybrid natural language understanding
CN110309504B (zh) 基于分词的文本处理方法、装置、设备及存储介质
CA3180493A1 (en) Training method and device of intention recognition model and intention recognition method and device
EP4191544A1 (en) Method and apparatus for recognizing token, electronic device and storage medium
CN111949762A (zh) 基于上下文情感对话的方法和系统、存储介质
KR20210125449A (ko) 업계 텍스트를 증분하는 방법, 관련 장치 및 매체에 저장된 컴퓨터 프로그램
WO2023246719A1 (zh) 会议记录处理方法、装置、设备及存储介质
CN113609873A (zh) 翻译模型训练方法、装置及介质
US11989528B2 (en) Method and server for training a machine learning algorithm for executing translation
CN115858776A (zh) 一种变体文本分类识别方法、系统、存储介质和电子设备
WO2019148797A1 (zh) 自然语言处理方法、装置、计算机设备和存储介质
Han et al. [Retracted] Machine English Translation Evaluation System Based on BP Neural Network Algorithm
CN115203372A (zh) 文本意图分类方法、装置、计算机设备及存储介质
CN115879480A (zh) 语义约束机器翻译方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20920698

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20920698

Country of ref document: EP

Kind code of ref document: A1