WO2020151685A1 - 编码方法、装置、设备及存储介质 - Google Patents

编码方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020151685A1
WO2020151685A1 PCT/CN2020/073360 CN2020073360W WO2020151685A1 WO 2020151685 A1 WO2020151685 A1 WO 2020151685A1 CN 2020073360 W CN2020073360 W CN 2020073360W WO 2020151685 A1 WO2020151685 A1 WO 2020151685A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
paragraph
vector set
memory
matrix
Prior art date
Application number
PCT/CN2020/073360
Other languages
English (en)
French (fr)
Inventor
谭翊章
孙硕
曹杰
田乐
牛成
周杰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2021517331A priority Critical patent/JP7324838B2/ja
Publication of WO2020151685A1 publication Critical patent/WO2020151685A1/zh
Priority to US17/356,482 priority patent/US11995406B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of natural language processing, in particular to an encoding method, device, equipment and storage medium.
  • Encoding is the process of converting text into coded values to obtain a vector that can accurately describe the meaning of the text.
  • the text can be converted into a vector form that is convenient for calculation and processing. It has been widely used in many fields such as sentence selection and sentence generation.
  • the word vector of each word in each sentence in the target paragraph is obtained.
  • the above-mentioned hierarchical coding scheme uses a serial method to encode the word vector of each sentence in the target paragraph in turn, and then uses a serial method to encode multiple sentence vectors.
  • the encoding speed is slower and the accuracy rate is higher. low.
  • An encoding method includes:
  • a target paragraph and a preset database input the target paragraph and the preset database into a memory coding model, the target paragraph includes at least one sentence, and the memory coding model includes at least an input layer, a first memory layer, and an output Floor;
  • the original vector set of the target paragraph and the knowledge vector set of the preset database are obtained, the original vector set includes the sentence vector of each sentence in the target paragraph; the knowledge vector set Knowledge vectors including multiple pieces of knowledge data in the preset database;
  • a first target sentence matrix of the original vector set is obtained according to the original vector set and the knowledge vector set, and the first target sentence matrix is used to obtain a first target sentence matrix according to the original vector set
  • Processing is performed based on the paragraph vector.
  • An encoding device comprising:
  • An acquiring module configured to acquire a target paragraph and a preset database, and input the target paragraph and the preset database into a memory coding model, and the target paragraph includes at least one sentence;
  • the input layer module is used to obtain the original vector set of the target paragraph and the knowledge vector set of the preset database.
  • the original vector set includes the sentence vector of each sentence in the target paragraph;
  • the knowledge vector set includes Knowledge vectors of multiple pieces of knowledge data in the preset database;
  • the first memory layer module is used to obtain a first target sentence matrix of the original vector set according to the original vector set and the knowledge vector set, and the first target sentence matrix is used to compare the original vector set with the The association relationship between the set of knowledge vectors, describing the target paragraph;
  • the output layer module is used to obtain the paragraph vector of the target paragraph according to the first target sentence matrix
  • the processing module is used for processing based on the paragraph vector.
  • An encoding device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors perform the following steps:
  • a target paragraph and a preset database input the target paragraph and the preset database into a memory coding model, the target paragraph includes at least one sentence, and the memory coding model includes at least an input layer, a first memory layer, and an output Floor;
  • the original vector set of the target paragraph and the knowledge vector set of the preset database are obtained, the original vector set includes the sentence vector of each sentence in the target paragraph; the knowledge vector set Knowledge vectors including multiple pieces of knowledge data in the preset database;
  • a first target sentence matrix of the original vector set is obtained according to the original vector set and the knowledge vector set, and the first target sentence matrix is used to obtain a first target sentence matrix according to the original vector set
  • Processing is performed based on the paragraph vector.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, the one or more processors perform the following steps:
  • a target paragraph and a preset database input the target paragraph and the preset database into a memory coding model, the target paragraph includes at least one sentence, and the memory coding model includes at least an input layer, a first memory layer, and an output Floor;
  • the original vector set of the target paragraph and the knowledge vector set of the preset database are obtained, the original vector set includes the sentence vector of each sentence in the target paragraph; the knowledge vector set Knowledge vectors including multiple pieces of knowledge data in the preset database;
  • a first target sentence matrix of the original vector set is obtained according to the original vector set and the knowledge vector set, and the first target sentence matrix is used to obtain a first target sentence matrix according to the original vector set
  • Processing is performed based on the paragraph vector.
  • FIG. 1 is a schematic structural diagram of a memory coding model provided by an embodiment of the present application.
  • Figure 2 is a schematic structural diagram of another memory coding model provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of another memory coding model provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of yet another memory coding model provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of another memory coding model provided by an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of another memory coding model provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of an encoding method provided by an embodiment of the present application.
  • FIG. 7A is a schematic flow diagram of obtaining the original vector set, the memory vector set and the knowledge vector set of the target paragraph in the input layer according to an embodiment of the present application;
  • FIG. 7B is a schematic flowchart of obtaining a third target sentence matrix of the original vector set in the second memory layer according to an embodiment of the present application.
  • FIG. 7C is a schematic flowchart of obtaining a fourth target sentence matrix in the second gating layer according to an embodiment of the present application.
  • FIG. 7D is a schematic flowchart of obtaining the first target sentence matrix of the original vector set in the first memory layer according to an embodiment of the present application.
  • FIG. 7E is a schematic flowchart of obtaining a second target sentence matrix in the first gating layer according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a sentence coding model provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a sentence coding model provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a memory coding model provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a process for acquiring knowledge vectors provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a memory layer provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a gating layer provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a memory coding model provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a memory coding model provided by an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of a memory coding model provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of an encoding device provided by an embodiment of the present application.
  • FIG. 18 is a structural block diagram of a terminal provided by an embodiment of the present application.
  • FIG. 19 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the embodiment of the application provides a memory coding model, which obtains a target paragraph and a preset database, and inputs them to the memory coding model.
  • the target paragraph can be coded using the memory coding model to obtain the paragraph vector of the target paragraph, which can be based on Paragraph vectors are processed.
  • the target paragraph can be used as a unit, and the memory coding model is used to encode the target paragraph at a time, without the need to separately encode each sentence in the target paragraph in a serial manner.
  • not only the meaning of each sentence in the target paragraph is considered, but also the knowledge data in the preset database, so that the obtained paragraph vector can not only express the meaning of the target paragraph, but also extract relevant information from external knowledge data.
  • Knowledge data enables the obtained paragraph vector to more accurately express the meaning of the target paragraph, and can improve the accuracy when processing based on the paragraph vector.
  • the memory coding model includes an input layer 101, a first memory layer 102 and an output layer 103.
  • the input layer 101 is connected to the first memory layer 102, and the first memory layer 102 is connected to the output layer 103.
  • the input layer 101 extracts a sentence vector representing the meaning of the sentence according to each sentence in the target paragraph, obtains the original vector set of the target paragraph, and inputs the original vector set into the memory layer 102.
  • the input layer 101 also obtains the knowledge vector of each piece of knowledge data according to each piece of knowledge data in the preset database, and composes the multiple obtained knowledge vectors into a knowledge vector set, and inputs it into the first memory layer 102.
  • the first memory layer 102 obtains the first target sentence matrix according to the input original vector set and the knowledge vector set, and inputs the first target sentence matrix to the output layer 103; the output layer 103 obtains the target paragraph according to the first target sentence matrix Paragraph vector.
  • the first memory layer 102 Since the first memory layer 102 adopts an attention learning mechanism, it can extract knowledge data related to the original vector set from the knowledge vector set, so that a more accurate paragraph vector can be obtained.
  • the memory coding model will run the first memory layer 102 repeatedly, and use the first target sentence matrix output by the first memory layer 102 as the original vector set of the first memory layer 102, and keep the knowledge vector set not Change, or you can update the knowledge vector set to obtain the updated knowledge vector set, re-input it into the first memory layer 102, and run the first memory layer 102 repeatedly until the number of repetitions reaches the preset number of times, the current The target sentence matrix is input to the output layer 103 to obtain the paragraph vector of the target paragraph.
  • the preset number of times may be 2 times or 3 times, or may be other values.
  • the memory coding model further includes a first gating layer 104, an input layer 101 and a first memory layer 102 and a first The gate control layer 104 is connected, the first memory layer 102 is connected to the first gate control layer 104, and the first gate control layer 104 is connected to the output layer 103.
  • GSMN Gated Self-Attentive Memory Network, gated self-attentive memory network
  • the first memory layer 102 After the first memory layer 102 obtains the first target sentence matrix, it is input to the first gating layer 104.
  • the first gating layer 104 performs a weighted summation on the original vector set and the first target sentence matrix to obtain the second target sentence matrix,
  • the second target sentence matrix is input to the output layer 103, and the output layer 103 obtains the paragraph vector of the target paragraph according to the second target sentence matrix.
  • the memory coding model will run the first memory layer 102 and the first gating layer 104 repeatedly, and use the second target sentence matrix output by the first gating layer 104 as the original memory layer 102
  • the vector set and the knowledge vector set are re-input to the first memory layer 102, and the first memory layer 102 and the first gating layer 104 are run repeatedly until the number of repetitions reaches the preset number of times, the current target sentence matrix is input to the output At layer 103, the paragraph vector of the target paragraph is obtained.
  • the memory coding model further includes a second memory layer 105, the second memory layer 105 is located before the first memory layer 102, and the input layer 101 is connected to the first memory layer 102 and the second memory layer.
  • the layer 105 is connected, the second memory layer 105 is connected to the first memory layer 102, and the first memory layer 102 is connected to the output layer 103.
  • the input layer 101 obtains the word vector of each word according to each word in the context sentence of the target paragraph, and composes the obtained multiple word vectors into a memory vector set, and inputs the original vector set and the memory vector set to the second memory layer 105 in.
  • the second memory layer 105 obtains the third target sentence matrix according to the input original vector set and the memory vector set, and inputs the third target sentence matrix into the first memory layer 102.
  • the first memory layer 102 obtains the first target sentence matrix according to the input third target sentence matrix and the knowledge vector set, and inputs the first target sentence matrix to the output layer 103, and the output layer 103 obtains the target according to the first target sentence matrix The paragraph vector of the paragraph.
  • the memory coding model will run the second memory layer 105 and the first memory layer 102 repeatedly, and use the first target sentence matrix output by the first memory layer 102 as the original vector set of the second memory layer 105 And the memory vector set is re-input to the second memory layer 105, and the second memory layer 105 and the first memory layer 102 are repeatedly run until the number of repetitions reaches the preset number of times, the current target sentence matrix is input to the output layer 103, Get the paragraph vector of the target paragraph.
  • the memory coding model further includes a second gating layer 106, and the second gating layer 106 is located before the first memory layer 102 and behind the second memory layer 105.
  • the input layer 101 is connected to the second memory layer 105, the second gating layer 106 and the first memory layer 102, the second memory layer 105 is connected to the second gating layer 106, and the second gating layer 106 is connected to the first memory layer 102 Connected, the first memory layer 102 is connected to the output layer 103.
  • the second memory layer 105 After the second memory layer 105 obtains the third target sentence matrix, it is input into the second gating layer 106.
  • the second gating layer 106 performs a weighted summation on the original vector set and the third target sentence matrix to obtain a fourth target sentence matrix, and inputs the fourth target sentence matrix into the first memory layer 102.
  • the first memory layer 102 obtains the first target sentence matrix according to the fourth target sentence matrix and the knowledge vector set, and inputs the first target sentence matrix to the output layer 103, and the output layer 103 obtains the information of the target paragraph according to the first target sentence matrix Paragraph vector.
  • the memory coding model will repeatedly run the second memory layer 105, the second gated layer 106, and the first memory layer 102.
  • the first target sentence matrix output by the first memory layer 102 is used as the original vector set and memory vector set of the second memory layer 105, re-input into the second memory layer 105, and the second memory layer 105 and the second gate control are repeated
  • the layer 106 and the first memory layer 102 input the current target sentence matrix to the output layer 103 until the number of repetitions reaches the preset number of times to obtain the paragraph vector of the target paragraph.
  • the memory coding model shown in Figure 3 or Figure 4 above can be combined with the memory coding model shown in Figure 2 above, and the resulting memory coding model includes an input layer, a second memory layer, and a first memory layer.
  • the first gating layer and the output layer, or the memory coding model includes an input layer, a second memory layer, a second gating layer, a first memory layer, a first gating layer, and an output layer.
  • the processing method in this case is similar to the processing method of the memory coding model described above, and will not be repeated here.
  • the memory coding model further includes a third memory layer 107, which is located behind the first memory layer 102 ,
  • the input layer 101 is connected to the first memory layer 102 and the third memory layer 107
  • the first memory layer 102 is connected to the third memory layer 107.
  • the first memory layer 102 obtains the first target sentence matrix
  • it is input to the third memory layer 107.
  • the third memory layer 107 obtains the fifth target sentence matrix according to the memory vector set and the first target sentence matrix, and the fifth target sentence matrix Input to the output layer 103, and the output layer 103 obtains the paragraph vector of the target paragraph according to the fifth target sentence matrix.
  • the memory coding model will run the third memory layer 107 repeatedly. After the third memory layer 107 obtains the fifth target sentence matrix, it uses the fifth target sentence matrix as the updated first target sentence matrix and the set of memory vectors, and the third memory layer 107 repeats execution according to the updated fifth target sentence matrix The step of acquiring the target sentence matrix with the memory vector collection, until the number of repetitions reaches the preset number, the current target sentence matrix is input into the output layer 103. The output layer 103 obtains the paragraph vector of the target paragraph according to the current target sentence matrix.
  • the memory coding model further includes a third gating layer 108, the third gating layer 108 is located behind the third memory layer 107, and the input layer 101 is connected to the first memory layer 102 and the first memory layer.
  • the three memory layers 107 are connected, the first memory layer 102 is connected to the third memory layer 107 and the third gating layer 108, and the third memory layer 107 is connected to the third gating layer 108.
  • the third memory layer 107 After the third memory layer 107 obtains the fifth target sentence matrix, it inputs the fifth target sentence matrix into the third gate control layer 108, and the third gate control layer 108 performs a weighted calculation on the fifth target sentence matrix and the first target sentence matrix. And, the sixth target sentence matrix is obtained, the sixth target sentence matrix is input to the output layer 103, and the output layer 103 obtains the paragraph vector of the target paragraph according to the sixth target sentence matrix.
  • the memory coding model will repeatedly run the third memory layer 107 and the third gating layer 108.
  • the third gate control layer 108 uses the sixth target sentence matrix as the updated first target sentence matrix and memory vector set.
  • the third memory layer 107 and the third gate control layer 108 repeat the execution according to The step of obtaining the target sentence matrix by the updated first target sentence matrix and the memory vector set, until the number of repetitions reaches the preset number of times, the current target sentence matrix is input into the output layer 103.
  • the output layer 103 obtains the paragraph vector of the target paragraph according to the current target sentence matrix.
  • the memory coding model shown in Figure 5 or Figure 6 above can be combined with the memory coding model shown in Figure 2 above, and the resulting memory coding model includes an input layer, a first memory layer, and a first gate.
  • the layer, the third memory layer and the output layer, or the memory coding model includes an input layer, a first memory layer, a first gating layer, a third memory layer, a third gating layer, and an output layer.
  • the processing method in this case is similar to the processing method of the memory coding model described above, and will not be repeated here.
  • multiple repetitive operation methods can be used, that is, any one or more memory layers in the memory coding model, or any one or more gating layers can be used Repeated operation, and for multiple layers, the next layer can be repeated after each layer has been repeatedly run, or the multiple layers can be regarded as a whole and run multiple times together.
  • a user conducts a dialogue with a chat robot, and the chat robot can obtain a text message input by the user as a target paragraph.
  • the target paragraph is encoded to obtain a paragraph vector
  • the paragraph vector is matched with the vectors of multiple reply messages in the corpus database, and the reply message whose vector matches the paragraph vector is obtained and displayed to the user, thus realizing the effect of the user and the chat robot having a dialogue.
  • the generated paragraph vector is more accurate, which enables the chatbot to better understand the meaning that the user wants to express. According to the paragraph vector, it can be obtained A more matching reply message can give a reply more in line with the user's needs for the text message entered by the user, and enhance the dialogue effect.
  • the target paragraph to be classified is obtained, and the target paragraph is encoded using the method provided in the embodiment of the present application to obtain a paragraph vector.
  • the category to which the target paragraph belongs can be determined .
  • the generated paragraph vector is more accurate, and the meaning of the target paragraph can be better understood. Classification according to the paragraph vector can improve the classification accuracy.
  • each target paragraph is encoded using the method provided in the embodiment of the present application to obtain a paragraph vector.
  • the paragraph vectors of the multiple target paragraphs Select the target paragraph that meets the requirements from the target paragraphs.
  • the generated paragraph vector is more accurate and can better understand the meaning of the target paragraph. According to the paragraph vector, the target paragraph that meets the requirements can be selected to avoid selecting the wrong paragraph. problem.
  • FIG. 7 is a flowchart of an encoding method provided by an embodiment of the present application.
  • the embodiment of the present application illustrates the process of applying a memory encoding model to encode a target paragraph.
  • the memory encoding model includes an input layer, a second memory layer, and a second memory layer.
  • the execution subject of the embodiments of the present application is an encoding device, which may be a server or a terminal such as a mobile phone or a computer. Referring to Figure 7, the method includes:
  • the target paragraph includes at least one sentence, and each sentence includes at least one word.
  • the contextual sentence of the target paragraph may include a sentence in one or more paragraphs before the target paragraph, a sentence in one or more paragraphs after the target paragraph, or may also include one or more sentences in the target paragraph.
  • the context sentence of the target paragraph may be the original text of the target paragraph.
  • the target paragraph is a paragraph in a certain article
  • the context sentence may include the sentence before or after the paragraph in the article, or may also include the sentence in the paragraph.
  • the target paragraph is a certain piece of text input by the user in the smart dialogue scene
  • the context sentence may include the text input by the user before the target paragraph, or the text in the target paragraph, or may also include the chat before the target paragraph
  • the robot replies to the user’s text, etc.
  • the preset database includes at least one piece of knowledge data, and the at least one piece of knowledge data may include multiple types, such as news, entertainment, and professional knowledge.
  • the knowledge data in the preset database can be uploaded by maintenance personnel, or the data uploaded by multiple network users can be collected by the encoding device, or set in other ways. And in the process of use, the knowledge data in the preset database can be fixed, or can be updated according to demand.
  • each piece of knowledge data may include at least one sentence, each sentence includes at least one word, or each piece of knowledge data includes at least one set of key-value pairs, and each set of key-value pairs includes a key (Key) And Value.
  • a piece of knowledge data in the preset database may be shown in Table 1 below.
  • the target paragraph, the context sentence of the target paragraph and the preset database are acquired, and the target paragraph, the context sentence of the target paragraph and the preset database are input into the memory coding model.
  • the input layer is the first layer in the memory coding model.
  • the target paragraph, the context sentence of the target paragraph and the preset database are input into the input layer, and the target paragraph and the target paragraph
  • the context sentences and the preset database are processed separately to obtain the original vector set, memory vector set and knowledge vector set of the target paragraph.
  • the original vector set includes the sentence vector of each sentence in the target paragraph
  • the memory vector set includes the word vector of each word in the context sentence of the target paragraph
  • the knowledge vector set includes knowledge vectors of multiple pieces of knowledge data in the preset database.
  • step 701 may include the following steps 7011-7013:
  • the target paragraph is preprocessed.
  • the preprocessing process includes: sentence division of the target paragraph, each sentence in the target paragraph is obtained, each sentence is divided into words, each word in each sentence is obtained, and The word vector for each word.
  • word segmentation algorithm can be used to segment each sentence.
  • the word segmentation algorithm can include multiple algorithms, such as two-way maximum matching method and least segmentation method. Or use other methods to divide words.
  • the word vector corresponding to the word can be queried according to the word vector dictionary.
  • the word vector dictionary can include the correspondence between the word and the word vector, or the word vector dictionary can be a word vector Obtaining models, such as recurrent neural network models, deep learning network models, convolutional neural network models, etc., using the word vector acquisition model to obtain word vectors of words.
  • the sentence coding model After preprocessing the target paragraph, for each sentence, apply the sentence coding model to process the word vector of each word in the sentence to obtain the sentence vector of the sentence, so as to get the sentence vector of each sentence in the target paragraph , According to the sentence vector of each sentence, the original vector set is formed.
  • the sentence coding model is used to compress the word vector of multiple words in any sentence into a sentence vector representing the meaning of the sentence. It can be a recurrent neural network model, a deep learning network model, a convolutional neural network model, and a transformation neural network. Model, word-based GSMN model and other types of models.
  • the sentence coding model includes the first sentence coding sub-model and the second sentence coding sub-model.
  • the process of obtaining the sentence vector of the sentence may include: for each sentence in the target paragraph, obtaining the sentence The word vector of each word is used to obtain multiple word vectors; the first sentence encoding sub-model is applied to encode multiple word vectors in positive order to obtain the first vector, and the second sentence encoding sub-model is applied to multiple word vectors. Reverse encoding to get the second vector; according to the first vector and the second vector, get the sentence vector of the sentence. Repeat the above steps to obtain the sentence vector of each sentence in the target paragraph.
  • the first sentence coding sub-model is a positive-order coding model
  • the second sentence coding sub-model is a reverse-order coding model.
  • the word vectors of the multiple words in the sentence are arranged in order, and then the first sentence coding sub-model is applied, and the multiple word vectors are positively coded according to the sequence of the multiple word vectors to obtain the first vector.
  • the second sentence coding sub-model multiple word vectors are processed in reverse order, and then the multiple word vectors are reversed encoded according to the sequence of the reverse processing to obtain the second vector.
  • the first vector and the second vector can be concatenated to obtain the sentence vector, or the first vector and the second vector can be added to obtain the sentence vector, or other methods can be used Get the sentence vector.
  • the sentence coding model as a two-way cyclic neural network model as an example for illustration.
  • the two-way cyclic neural network model includes a forward cyclic neural network model and a backward cyclic neural network model.
  • the neural network model encodes multiple word vectors of the sentence in positive order, and obtains the first vector.
  • the backward loop neural network model encodes multiple word vectors of the sentence in reverse order to obtain the second vector.
  • the first vector and the second The vectors are concatenated to obtain the sentence vector of the sentence.
  • the context sentence is divided into words to obtain each word in the context sentence, and then the word vector of each word is obtained, and a memory vector set is formed according to the obtained word vector.
  • the process of dividing words and obtaining word vectors of words is similar to the above step 7011, and will not be repeated here.
  • the original vector set and the memory vector set can be obtained by only processing the sentence in the target paragraph, without processing other sentences.
  • the memory vector set is obtained according to the word vector obtained after the preprocessing of the target paragraph.
  • the memory coding model uses the target paragraph as the unit to encode, so the input layer inputs the acquired original vector set and the memory vector set into the memory layer for processing.
  • the knowledge vector of each piece of knowledge data in the preset database is obtained, and the knowledge vector of at least one piece of knowledge data is formed into a knowledge vector set.
  • the knowledge vector of each piece of knowledge data can be obtained by preprocessing a preset database in advance, and the way of obtaining the knowledge vector can include multiple.
  • each piece of knowledge data in the preset database is obtained, and for each piece of knowledge data, the knowledge data is word-divided to obtain at least one word in the knowledge data, and the word vector of the at least one word is obtained, According to the word vector of at least one word, the knowledge vector of the knowledge data is obtained, and the knowledge vector and the knowledge data are correspondingly stored in a preset database.
  • a word segmentation algorithm can be used to segment each piece of knowledge data.
  • the word segmentation algorithm can include multiple algorithms, such as the two-way maximum matching method and the least segmentation method. Or use other methods to divide words.
  • each piece of knowledge data includes at least one set of key-value pairs, and word segmentation algorithms can be used to segment the keys and values in the key-value pairs of each piece of knowledge data.
  • the word vector corresponding to the word can be queried according to the word vector dictionary.
  • the word vector dictionary can include the correspondence between the word and the word vector, or the word vector dictionary can be a word vector Obtaining models, such as recurrent neural network models, deep learning network models, convolutional neural network models, etc., using the word vector acquisition model to obtain word vectors of words.
  • the word vector of at least one word in the knowledge data is connected in series to obtain the knowledge vector of the knowledge data.
  • the knowledge data includes multiple sets of key-value pairs
  • the word vector of at least one word in the key-value pair is formed into a vector, which is the vector of the key-value pair.
  • multiple sets of vector of key-value pairs can be obtained.
  • the vectors of multiple sets of key-value pairs are compressed to obtain the knowledge vector of the knowledge data, and the knowledge vector of each piece of knowledge data in the preset database can be obtained in a similar manner.
  • multiple sets of key-value pairs can be formed into a matrix, and the column vector sum of the matrix is performed, that is, the matrix is divided into multiple column vectors, and each column vector is calculated.
  • the sum of the values in to obtain the total value of each column vector, and the total value of the multiple column vectors form a vector to obtain the knowledge vector.
  • the coding model can be applied to compress vectors of multiple sets of key-value pairs.
  • the coding model is used to compress multiple vectors into one vector. It can be a cyclic neural network model, a deep learning network model, a convolutional neural network model, a transform neural network model, a word-based GSMN model, etc. model.
  • the process of obtaining the knowledge vector of the knowledge data can be as shown in Figure 11.
  • the key and value in the key-value pair of each piece of knowledge data are respectively segmented, and for each word, obtain The word vector of the word, where ⁇ represents the word vector.
  • the word vector of each group of key-value pairs is formed into a vector by concatenation, and then the vectors of these three key-value pairs are compressed to obtain the Knowledge vector of knowledge data.
  • the input layer inputs the original vector set and the memory vector set into the second memory layer, and obtains the third target sentence matrix in the second memory layer.
  • the third target sentence matrix is used according to the difference between the original vector set and the memory vector set. Relevance, describing the target paragraph, can memorize sentences with high similarity to the context sentence, pay more attention to the sentence in the subsequent processing, which is equivalent to applying the attention mechanism to obtain the third target sentence matrix, so that The third target sentence matrix describes the target paragraph more accurately.
  • step 702 may include the following steps 7021-7024:
  • the second memory layer includes a memory model, and the first memory matrix and the second memory matrix corresponding to the memory vector set can be obtained by using the memory model, wherein the first memory matrix and the second memory matrix are used to describe the memory vector set And the first memory matrix and the second memory matrix can be the same or different.
  • the method of obtaining the first memory matrix the word vector of each word in the context sentence can be obtained according to the memory vector set, the sentence coding model is applied to obtain the sentence vector of each sentence, and the first memory matrix can be obtained according to the sentence vector of each sentence .
  • the sentence coding model includes a third sentence coding sub-model and a fourth sentence coding sub-model.
  • the process of obtaining the sentence vector of the sentence may include: for each sentence in the context sentence, obtaining each sentence in the sentence The word vector of a word is used to obtain multiple word vectors; the third sentence coding sub-model is applied to encode multiple word vectors in positive order to obtain the third vector, and the fourth sentence coding sub-model is applied to reverse multiple word vectors Encode to get the fourth vector; according to the third vector and the fourth vector, get the sentence vector of the sentence.
  • the sentence vectors of these sentences are combined to obtain the first memory matrix.
  • the acquisition method of the second memory matrix is similar to the acquisition method of the first memory matrix, except that the sentence encoding model used can be the same or different from the sentence encoding model used when acquiring the first memory matrix.
  • the sentence coding models used when obtaining the first memory matrix and the second memory matrix are both bidirectional cyclic neural network models, and the two bidirectional cyclic neural network models are used to process the memory vector sets separately to obtain the first memory Matrix and the second memory matrix.
  • the parameters of the two bidirectional cyclic neural network models can be the same or different, so the first memory matrix and the second memory matrix obtained can be the same or different.
  • first memory matrix and the second memory matrix can describe the set of memory vectors
  • processing according to the first memory matrix, the second memory matrix and the original vector set can consider the relationship between the context sentence and the target paragraph, so that Obtain a paragraph that more accurately describes the target paragraph.
  • the target paragraph is the same as the context sentence
  • the original vector set is the same as the memory vector set.
  • the following steps 7022-7024 can be performed to obtain the third target sentence matrix used to describe the target paragraph.
  • the third target sentence matrix can also be obtained in multiple ways.
  • the similarity matrix there are many ways to obtain the similarity matrix, such as matrix multiplication and matrix subtraction.
  • the sentence vectors in the original vector set are combined to obtain the original sentence matrix of the target paragraph, and the original sentence matrix is multiplied by the first memory matrix, and the obtained matrix is used as the similarity matrix.
  • the original sentence matrix and the transpose of the first memory matrix may be multiplied, and the obtained matrix is used as the similarity matrix.
  • Each value in the similarity matrix represents the similarity between the sentence in the original vector set and the corresponding sentence in the context sentence. The higher the similarity, the closer the two sentences are related, and the more attention should be paid in the subsequent processing. The statement.
  • the similarity matrix includes multiple similarities, and the probability distribution calculation is performed on the similarity matrix to obtain a probability matrix.
  • the probability matrix includes the probability corresponding to each similarity, and the sum of the probabilities of all similarities is 1.
  • a Softmax (normalized index) function is used to calculate the similarity matrix to obtain a probability matrix corresponding to the similarity matrix.
  • the ratio of the similarity at that position to the sum of all similarities in the similarity matrix is obtained, and the probability corresponding to the similarity at the position is obtained, so as to obtain each position
  • the probability corresponding to the similarity of compose the probability matrix with the obtained probability.
  • the probability matrix is multiplied by the second memory matrix to obtain the sentence matrix of the same size as the target paragraph The third target sentence matrix.
  • the original vector set includes sentence vectors of J sentences of the target paragraph
  • the memory vector set includes word vectors of K sentences of context sentences.
  • J and K are positive integers
  • the matrix X corresponding to the original vector set is The matrix of J*D
  • the matrix M corresponding to the memory vector set is the matrix of K*D
  • D is the number of dimensions of the sentence vector.
  • the second gating layer perform a weighted summation on the original vector set and the third target sentence matrix to obtain a fourth target sentence matrix, so that each value in the fourth target sentence matrix belongs to a preset value range.
  • the input layer inputs the original vector set to the second gating layer
  • the second memory layer inputs the third target sentence matrix to the second gating layer.
  • the processing is performed according to the original vector set and the third target sentence matrix Processing is to adjust the proportion of the memory-enhanced third target sentence matrix and the original original vector set, so as to adjust the proportion of sentences in the target paragraph that are more similar to the context sentence.
  • step 703 may include the following steps 7031-7033:
  • the linear network model may be a linear neural network model, or other linear network models. After linear processing is performed on the original vector set, the linear value obtained can describe the original vector set.
  • the preset function After obtaining the linear value, use a preset function to process the linear value to obtain the first weight of the original vector set.
  • the preset function is used to compress the linear value to a preset value range, so that the obtained first weight belongs to the preset value range.
  • the preset function may be a sigmoid (non-linear effect of neuron) function or other functions.
  • the preset value range may be a value range from 0 to 1, and the first weight is greater than 0 and less than 1.
  • the first weight is the weight occupied by the original vector set
  • the second weight is the weight occupied by the third target sentence matrix.
  • the sum of the first weight and the second weight is 1. After the first weight is obtained, calculate 1
  • the difference with the first weight is the second weight.
  • the sentence vectors in the original vector set are combined to obtain the original sentence matrix of the target paragraph.
  • the first weight is the weight of the original sentence matrix
  • the second weight is the weight of the third target sentence matrix.
  • the first weight and the second weight are weighted summation of the original sentence matrix and the third target sentence matrix to obtain the fourth target sentence matrix, so that each value in the fourth target sentence matrix belongs to the preset value range.
  • O' is the fourth target sentence matrix
  • G is the first weight
  • X is the original sentence matrix of the target paragraph
  • O is the third target sentence matrix.
  • the second gating layer can filter the information learned after memory enhancement, adjust the proportion between the target paragraph and the context sentence, control the flow of information, and avoid adding too much information that is not related to the target paragraph.
  • the input layer inputs the set of knowledge vectors into the first memory layer, and the second gating layer inputs the fourth target sentence matrix into the first memory layer.
  • the first target sentence matrix of the original vector set is obtained according to the fourth target sentence matrix and the knowledge vector set.
  • the first target sentence matrix is used to describe the target paragraph based on the fourth target sentence matrix combined with the knowledge vector set.
  • External knowledge data can be introduced, and relevant knowledge data can be extracted from the external knowledge data.
  • the knowledge data strengthens the target paragraph, so that the first target sentence matrix describes the target paragraph more accurately.
  • step 704 may include the following steps 7041-7044:
  • the first memory layer includes a memory model, and the first knowledge matrix and the second knowledge matrix corresponding to the knowledge vector set can be obtained by using the memory model, wherein the first knowledge matrix and the second knowledge matrix are used to describe the knowledge vector set , And the first knowledge matrix and the second knowledge matrix can be the same or different.
  • the acquisition method for the first knowledge matrix the knowledge vector of each piece of knowledge data in the preset database can be obtained according to the set of knowledge vectors, and the sentence coding model can be used to obtain the first knowledge vector of each piece of knowledge data.
  • a knowledge vector obtains the first knowledge matrix.
  • the sentence coding model includes a fifth sentence coding sub-model and a sixth sentence coding sub-model
  • the process of obtaining knowledge vector of knowledge data may include: for each piece of knowledge data in the preset database, obtaining each piece of knowledge data Knowledge vectors of pieces of knowledge data to obtain at least one knowledge vector; apply the fifth sentence coding sub-model to perform positive sequence coding on at least one knowledge vector to obtain the fifth vector of each knowledge vector, and apply the sixth sentence coding sub-model to At least one knowledge vector is encoded in reverse order to obtain the sixth vector of each knowledge vector; according to the fifth vector and the sixth vector, the first knowledge vector of the knowledge data is obtained. After obtaining the first knowledge vector of the knowledge data, the first knowledge vector of the knowledge data is combined to obtain the first knowledge matrix.
  • the acquisition method of the second knowledge matrix is similar to the acquisition method of the first knowledge matrix, except that the sentence coding model used can be the same or different from the sentence coding model used when acquiring the first knowledge matrix.
  • the sentence coding models used when obtaining the first knowledge matrix and the second knowledge matrix are both bidirectional cyclic neural network models, and the two bidirectional cyclic neural network models are used to process the set of knowledge vectors separately. Obtain the first knowledge matrix and the second knowledge matrix.
  • the parameters of the two bidirectional cyclic neural network models can be the same or different, so the first knowledge matrix and the second knowledge matrix obtained can be the same or different.
  • first knowledge matrix and the second knowledge matrix can describe the set of knowledge vectors
  • processing according to the first knowledge matrix, the second knowledge matrix and the fourth target sentence matrix can introduce external knowledge data from external knowledge data Relevant knowledge data is extracted in, and the target paragraph is strengthened according to the extracted relevant knowledge data, so as to obtain a paragraph vector that can describe the target paragraph more accurately.
  • the similarity matrix there are many ways to obtain the similarity matrix, such as matrix multiplication and matrix subtraction.
  • the fourth target sentence matrix is multiplied by the first knowledge matrix, and the obtained matrix is used as the similarity matrix.
  • the fourth target sentence matrix and the transpose of the first knowledge matrix may also be multiplied, and the obtained matrix is used as the similarity matrix.
  • Each value in the similarity matrix represents the similarity between the sentences in the original vector set and the knowledge data in the preset database. The higher the similarity, the tighter the association, and the more knowledge data should be used in the subsequent processing. Relevant knowledge data in the introduction to strengthen the sentence in the target paragraph.
  • the similarity matrix includes multiple similarities, and the probability distribution calculation is performed on the similarity matrix to obtain a probability matrix.
  • the probability matrix includes the probability corresponding to each similarity, and the sum of the probabilities of all similarities is 1.
  • a Softmax (normalized index) function is used to calculate the similarity matrix to obtain a probability matrix corresponding to the similarity matrix.
  • the ratio of the similarity at that position to the sum of all similarities in the similarity matrix is obtained, and the probability corresponding to the similarity at the position is obtained, so as to obtain each position
  • the probability corresponding to the similarity of compose the probability matrix with the obtained probability.
  • the probability matrix is multiplied by the second knowledge matrix to obtain the same size as the fourth target sentence matrix The first target sentence matrix.
  • the first target sentence matrix extracts knowledge data related to the original vector set from the knowledge vector set, and the description of the target paragraph is more accurate. Since the vector in the fourth target sentence matrix is similar to the vector in the first knowledge matrix, the higher the probability, the probability matrix is multiplied by the second knowledge matrix, and the relevant knowledge data in the knowledge data can be introduced Recently, the sentences in the target paragraph were strengthened to make the first target sentence matrix describe the target paragraph more accurately.
  • the first gate control layer perform a weighted summation on the fourth target sentence matrix and the first target sentence matrix to obtain a second target sentence matrix, so that each value in the second target sentence matrix belongs to a preset value range .
  • the second gate control layer inputs the fourth target sentence matrix to the first gate control layer
  • the first memory layer inputs the first target sentence matrix to the first gate control layer
  • the fourth target sentence matrix is processed in the first gate control layer. And the proportion of the first target sentence matrix, so as to adjust the proportion of the memory-enhanced target paragraph and knowledge data.
  • step 705 may include the following steps 7051-7053:
  • the first gating layer apply the linear network model to obtain the linear value corresponding to the fourth target sentence matrix, and use the preset function to process the linear value to obtain the third weight of the fourth target sentence matrix, so that The three weights belong to the preset value range.
  • the linear network model is applied to obtain the linear value corresponding to the fourth target sentence matrix.
  • the linear network model can be a linear neural network model or other linear network models. After the matrix is linearly processed, the linear value obtained can describe the fourth target sentence matrix.
  • the preset function is used to process the linear value to obtain the third weight of the fourth target sentence matrix.
  • the preset function is used to compress the linear value to the preset value range, so that the obtained third weight belongs to the preset value range.
  • the preset function may be a sigmoid (non-linear effect of neuron) function or other functions.
  • the preset value range may be a value range of 0 to 1, and the third weight is greater than 0 and less than 1.
  • the third weight is the weight of the fourth target sentence matrix
  • the fourth weight is the weight of the first target sentence matrix
  • the sum of the third weight and the fourth weight is 1.
  • the third weight is the weight of the fourth target sentence matrix
  • the fourth weight is the weight of the first target sentence matrix.
  • the fourth target sentence matrix and the first target sentence matrix are Weighted summation is used to obtain the second target sentence matrix, so that each value in the second target sentence matrix belongs to the preset value range.
  • O' is the second target sentence matrix
  • G is the third weight
  • X is the fourth target sentence matrix
  • O is the first target sentence matrix
  • the information learned after the introduction of relevant knowledge data can be screened, and the proportion between the target paragraph with memory enhancement and the target paragraph with the introduction of relevant knowledge data can be adjusted, and the flow of information can be controlled to avoid adding too many and target paragraphs. Irrelevant information.
  • the second target sentence matrix is converted into a vector as the paragraph vector of the target paragraph.
  • the second target sentence matrix is summed in column direction, that is, the second target sentence matrix is divided into multiple column vectors, and each column vector is calculated. The sum of the values in to obtain the total value of each column vector, and the total value of the multiple column vectors to form a vector to obtain the paragraph vector.
  • the embodiment of the present application only uses the second memory layer, the second gating layer, the first memory layer, and the first gating layer as an example for description.
  • the second memory layer, the second gating layer, the first memory layer and the first gating layer can be run repeatedly, as shown in FIG. 15. That is, after the second target sentence matrix is obtained in the first gating layer, the second target sentence matrix is used as the updated original vector set and memory vector set, keeping the knowledge vector set unchanged, or the knowledge vector set can be Update to get the updated knowledge vector set.
  • the second memory layer, the second gating layer, the first memory layer and the first gating layer are executed repeatedly to obtain the target according to the updated original vector set, memory vector set and knowledge vector set In the sentence matrix steps, until the number of repetitions reaches the preset number of times, the current target sentence matrix is input to the output layer, and the output layer obtains the paragraph vector of the target paragraph according to the current target sentence matrix.
  • the preset number of times may be determined according to requirements, or a preferred value determined through experiments, and the preset number of times may be 2 or 3, etc.
  • the second memory layer and the second gating layer can be run repeatedly, that is, after the fourth target sentence matrix is obtained in the second gating layer, the fourth target sentence matrix is used as the updated original Vector set and memory vector set.
  • the second memory layer and the second gating layer repeat the steps of obtaining the target sentence matrix according to the updated original vector set and memory vector set, until the number of repetitions reaches the preset number of times, the current target The sentence matrix is input into the first memory layer. In the subsequent process, the processing is continued in the first memory layer and the first gate control layer.
  • the first memory layer and the first gating layer can be run repeatedly, and after the first gating layer obtains the second target sentence matrix, the second target sentence matrix is used as the updated fourth target
  • the sentence matrix keeps the knowledge vector set unchanged, or it can also update the knowledge vector set to obtain the updated knowledge vector set.
  • the first memory layer and the first gate control layer repeatedly execute according to the updated fourth target sentence matrix and The step of acquiring the target sentence matrix by the knowledge vector collection, until the number of repetitions reaches the preset number, the current target sentence matrix is input into the output layer.
  • the output layer obtains the paragraph vector of the target paragraph according to the current target sentence matrix.
  • the paragraph vector After obtaining the paragraph vector of the target paragraph, the paragraph vector is processed.
  • Different application scenarios have different processing methods for paragraph vectors.
  • the specific processing method can be determined according to requirements. For example: in a smart dialogue scenario, the target paragraph is the text message entered by the user. After the paragraph vector of the target paragraph is obtained, the matching reply message will be obtained according to the paragraph vector, and the text message entered by the user can be given in accordance with the user Demand response.
  • the coding method provided by the implementation of this application provides a memory coding model.
  • the memory coding model includes an input layer, a first memory layer and an output layer.
  • the target paragraph and the preset database are obtained, and the target paragraph and the preset database are input to the memory coding Model, the input layer obtains the original vector set of the target paragraph and the knowledge vector set of the preset database; the first memory layer obtains the first target sentence matrix of the original vector set according to the original vector set and the knowledge vector set; the output layer obtains the first target sentence matrix of the original vector set according to the first target Sentence matrix, obtain the paragraph vector of the target paragraph, and process based on the paragraph vector.
  • the embodiment of the present application does not need to separately encode each sentence in a serial manner, but uses the target paragraph as a unit to encode the target paragraph using a memory coding model, thereby improving the encoding speed. Moreover, not only the target paragraph itself is considered in the encoding process, but also the knowledge data in the preset database, so that the obtained paragraph vector can not only express the meaning of the target paragraph, but also extract relevant knowledge data from external knowledge data. Improved encoding accuracy.
  • the memory coding model provided by the embodiment of the application has self-attention.
  • the self-attention mechanism is applied to the sentence level of the paragraph, and comprehensive processing is performed according to the target paragraph, the context sentence and the knowledge data in the preset database, which can guarantee the target paragraph
  • the paragraph vector expression is richer and can more accurately describe the meaning of the target paragraph.
  • the embodiments of the present application can be applied in various scenarios and have a wide range of applications.
  • the embodiment of the present application only takes the memory coding model including the second memory layer, the second gating layer, the first memory layer and the first gating layer as an example for description.
  • the memory coding model can also adopt other network architectures.
  • the memory coding model includes an input layer, a first memory layer, and an output layer.
  • the input layer inputs the original vector set and the knowledge vector set into the first memory layer
  • the first memory layer obtains the first target sentence matrix according to the original vector set and the knowledge vector set
  • the output layer obtains the paragraph vector of the target paragraph according to the first target sentence matrix.
  • the memory coding model includes an input layer, a first memory layer, a first gating layer, and an output layer.
  • the input layer inputs the original vector set and the knowledge vector set into the first memory layer.
  • the first memory layer obtains the first target sentence matrix according to the original vector set and the knowledge vector set, and inputs the first target sentence matrix to the first gate In the control layer.
  • the first gating layer obtains the second target sentence matrix according to the original vector set and the first target sentence matrix, and inputs the second target sentence matrix into the output layer.
  • the output layer obtains the paragraph vector of the target paragraph according to the second target sentence matrix.
  • the memory coding model includes an input layer, a second memory layer, a first memory layer, and an output layer.
  • the input layer inputs the original vector set and the memory vector set into the second memory layer.
  • the second memory layer obtains the third target sentence matrix according to the input original vector set and memory vector set, and inputs the third target sentence matrix into the first memory layer, and the input layer inputs the knowledge vector set to the first memory layer .
  • the first memory layer obtains the first target sentence matrix according to the input third target sentence matrix and the set of knowledge vectors, and inputs the first target sentence matrix to the output layer, and the output layer obtains the target sentence matrix according to the first target sentence matrix Paragraph vector.
  • the memory coding model includes an input layer, a second memory layer, a second gating layer, a first memory layer, and an output layer.
  • the input layer inputs the original vector set and the memory vector set into the second memory layer, and also inputs the knowledge vector set into the first memory layer.
  • the second memory layer obtains the third target sentence matrix according to the input original vector set and the memory vector set, and inputs it into the second gating layer.
  • the second gating layer performs a weighted summation on the original vector set and the third target sentence matrix to obtain the fourth target sentence matrix, and inputs the fourth target sentence matrix into the first memory layer.
  • the first memory layer obtains the first target sentence matrix according to the fourth target sentence matrix and the knowledge vector set, and inputs the first target sentence matrix to the output layer, and the output layer obtains the paragraph vector of the target paragraph according to the first target sentence matrix.
  • the memory coding model includes an input layer, a first memory layer, a third memory layer, and an output layer.
  • the input layer inputs the original vector set and the knowledge vector set into the first memory layer, and the memory vector set into the third memory layer.
  • the first memory layer obtains the first target sentence matrix according to the original vector set and the knowledge vector set, and inputs it to the third memory layer.
  • the third memory layer obtains the fifth target sentence matrix according to the memory vector set and the first target sentence matrix, and The fifth target sentence matrix is input to the output layer, and the output layer obtains the paragraph vector of the target paragraph according to the fifth target sentence matrix.
  • the memory coding model includes an input layer, a first memory layer, a third memory layer, a third gating layer, and an output layer.
  • the input layer inputs the original vector set and the knowledge vector set into the first memory layer, and the memory vector set into the third memory layer.
  • the first memory layer obtains the first target sentence matrix according to the original vector set and the knowledge vector set, and inputs it to the third memory layer.
  • the third memory layer obtains the fifth target sentence matrix according to the memory vector set and the first target sentence matrix, and The fifth target sentence matrix is input into the third gate control layer, and the third gate control layer performs a weighted summation on the fifth target sentence matrix and the first target sentence matrix to obtain the sixth target sentence matrix, and input the sixth target sentence matrix To the output layer, the output layer obtains the paragraph vector of the target paragraph according to the sixth target sentence matrix.
  • the embodiment of the present application provides a network architecture of a memory coding model, and the target paragraph can be coded by using the memory coding model.
  • the coding method provided in the foregoing embodiment can be applied to both the coding process and the process of training the memory coding model.
  • the process of training the memory coding model in the process of training the memory coding model, obtain the initialized memory coding model, or obtain the memory coding that has been trained one or more times but whose accuracy rate has not yet met the requirements model. And, obtain one or more sample paragraphs as the target paragraph.
  • the current memory coding model is used to process the target paragraph, and the coding method provided in the above embodiment is executed during the processing to obtain the paragraph vector of the target paragraph.
  • the paragraph vector of the target paragraph is decoded to obtain the test paragraph corresponding to the paragraph vector, and the model parameters in the memory coding model are corrected according to the error between the target paragraph and the test paragraph.
  • a decoding algorithm can be used to decode a paragraph vector
  • a decoding model can be used to decode a paragraph vector.
  • the decoding model can be a cyclic neural network model, a deep learning network model, or a convolutional neural network model. Wait.
  • the model parameters in the memory coding model can be determined, and a memory coding model with an accuracy rate that meets the requirements can be obtained.
  • the memory coding model has been trained and its accuracy meets the requirements. Then, the memory coding model is obtained, and when a certain target paragraph is to be encoded, the memory coding model is used to process the target paragraph, and the encoding method provided in the above embodiment is executed during the processing to obtain the paragraph vector of the target paragraph.
  • the memory coding model can be trained by the coding device, or sent to the coding device after being trained by the training device, and the training device can also be a terminal or a server.
  • Fig. 17 is a schematic structural diagram of an encoding device provided by an embodiment of the present application.
  • the device includes: an acquisition module 1700, an input layer module 1701, a first memory layer module 1702, an output layer module 1703, and a processing module 1704;
  • the obtaining module 1700 is configured to obtain a target paragraph and a preset database, and input the target paragraph and the preset database into the memory coding model, and the target paragraph includes at least one sentence;
  • the input layer module 1701 is used to obtain the original vector set of the target paragraph and the knowledge vector set of the preset database.
  • the original vector set includes the sentence vector of each sentence in the target paragraph; the knowledge vector set includes multiple pieces of knowledge data in the preset database.
  • the first memory layer module 1702 is used to obtain the first target sentence matrix of the original vector set according to the original vector set and the knowledge vector set.
  • the first target sentence matrix is used to obtain the association relationship between the original vector set and the knowledge vector set, Describe the target paragraph;
  • the output layer module 1703 is used to obtain the paragraph vector of the target paragraph according to the first target sentence matrix
  • the processing module 1704 is used for processing based on the paragraph vector.
  • the acquisition module acquires the target paragraph and the preset database, inputs the target paragraph and the preset database into the memory coding model, and the input layer module acquires the original vector set of the target paragraph and the memory in the context sentence of the target paragraph
  • the vector set the memory layer module obtains the first target sentence matrix of the original vector set according to the original vector set and the memory vector set
  • the output layer module obtains the paragraph vector of the target paragraph according to the first target sentence matrix
  • the processing module performs processing based on the paragraph vector.
  • the embodiment of the present application does not need to separately encode each sentence in a serial manner, but uses the target paragraph as a unit to encode the target paragraph using a memory coding model, thereby improving the encoding speed.
  • the target paragraph itself is considered in the encoding process, but also the knowledge data in the preset database and the context sentence of the target paragraph, so that the obtained paragraph vector can not only express the meaning of the target paragraph, but also from external knowledge data. Extract relevant knowledge data to improve coding accuracy.
  • the memory coding model provided by the embodiment of the application has self-attention.
  • the self-attention mechanism is applied to the sentence level of the paragraph, and comprehensive processing is performed according to the target paragraph and the context sentence, which can ensure that the paragraph vector expression of the target paragraph is richer. More accurately describe the meaning of the target paragraph.
  • the embodiments of the present application can be applied in various scenarios and have a wide range of applications.
  • the input layer module 1701 includes:
  • the original acquisition unit is used to apply the sentence coding model to obtain the sentence vector of each sentence according to the word vector of each word in each sentence in the target paragraph to obtain the original vector set;
  • the knowledge acquisition unit is used to acquire a set of knowledge vectors according to the knowledge vector of each piece of knowledge data in the preset database.
  • the device further includes:
  • the knowledge data acquisition module is used to acquire each piece of knowledge data in the preset database
  • the knowledge vector acquisition module is used to divide the knowledge data into words for each piece of knowledge data to obtain at least one word, obtain the word vector of the at least one word, and obtain the knowledge vector of the knowledge data according to the word vector of the at least one word;
  • the storage module is used to store the knowledge vector and the knowledge data in a preset database.
  • the first memory layer module 1702 includes:
  • the knowledge matrix acquisition unit is used to apply the first memory model to acquire the first knowledge matrix and the second knowledge matrix corresponding to the knowledge vector set;
  • the first target obtaining unit is configured to obtain the first target sentence matrix of the original vector set according to the original vector set, the first knowledge matrix and the second knowledge matrix.
  • the device further includes a first gating layer module
  • the first gate control layer module is used to perform a weighted summation on the original vector set and the first target sentence matrix to obtain the second target sentence matrix, so that each value in the second target sentence matrix belongs to the preset value range;
  • the output layer module 1703 is used to obtain the paragraph vector of the target paragraph according to the second target sentence matrix.
  • the first gating layer module includes:
  • the first weight obtaining unit is used to apply the linear network model to obtain the linear value corresponding to the original vector set, and use the preset function to process the linear value to obtain the first weight of the original vector set, so that the first weight belongs to the preset value range;
  • the second weight obtaining unit is used to calculate the difference between 1 and the first weight to obtain the second weight of the first target sentence matrix
  • the weighting unit is used to perform a weighted summation on the original vector set and the first target sentence matrix according to the first weight and the second weight to obtain the second target sentence matrix.
  • the output layer module 1703 includes:
  • the column-wise summation unit is used to perform column-wise summation on the first target sentence matrix to obtain a paragraph vector.
  • the first memory layer module 1702 is also used to use the first target sentence matrix as the updated original vector set and knowledge vector set, and repeatedly execute to obtain the target according to the updated original vector set and knowledge vector set. The steps of the sentence matrix, until the number of repetitions reaches the preset number of times, trigger the output layer module 1703;
  • the output layer module 1703 is also used to obtain the paragraph vector of the target paragraph according to the current target sentence matrix.
  • the device further includes a second memory layer module
  • the input layer module 1701 is also used to obtain a memory vector set of the target paragraph, and the memory vector set includes the word vector of each word in the context sentence of the target paragraph;
  • the second memory layer module is used to obtain the third target sentence matrix of the original vector set according to the original vector set and the memory vector set.
  • the third target sentence matrix is used to determine the relationship between the original vector set and the memory vector set. Describe the target paragraph;
  • the first memory layer module 1702 is also used to obtain the first target sentence matrix of the original vector set according to the third target sentence matrix and the knowledge vector set.
  • the device further includes a second gating layer module
  • the second gate control layer module is used to perform a weighted summation on the original vector set and the third target sentence matrix to obtain the fourth target sentence matrix, so that each value in the fourth target sentence matrix belongs to the preset value range;
  • the first memory layer module is also used to obtain the first target sentence matrix of the original vector set according to the fourth target sentence matrix and the knowledge vector set.
  • the device further includes a third memory layer module
  • the input layer module 1701 is also used to obtain a memory vector set of the target paragraph, and the memory vector set includes the word vector of each word in the context sentence of the target paragraph;
  • the third memory layer module is used to obtain the fifth target sentence matrix of the original vector set according to the first target sentence matrix and the memory vector set.
  • the fifth target sentence matrix is used to obtain the fifth target sentence matrix according to the original vector set, the knowledge vector set and the memory vector set. The relationship between the two, describing the target paragraph;
  • the output layer module 1703 is also used to obtain the paragraph vector of the target paragraph according to the fifth target sentence matrix.
  • the device further includes a third gating layer module
  • the third gate control layer module is used to perform a weighted summation on the first target sentence matrix and the fifth target sentence matrix to obtain the sixth target sentence matrix, so that each value in the sixth target sentence matrix belongs to the preset value range;
  • the output layer module 1703 is also used to obtain the paragraph vector of the target paragraph according to the sixth target sentence matrix.
  • the encoding device provided in the above embodiment encodes a paragraph, only the division of the above functional modules is used as an example for illustration. In actual applications, the above functions can be allocated by different functional modules as needed. That is, the internal structure of the encoding device is divided into different functional modules to complete all or part of the functions described above.
  • the encoding device provided in the foregoing embodiment and the encoding method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, and will not be repeated here.
  • FIG. 18 is a structural block diagram of a terminal provided by an embodiment of the present application.
  • the terminal 1800 is used to perform the steps performed by the encoding device in the foregoing embodiment, and may be a portable mobile terminal, such as a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, and moving picture experts compress standard audio layer 3) ), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4) player, laptop or desktop computer.
  • the terminal 1800 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
  • the terminal 1800 includes a processor 1801 and a memory 1802.
  • the processor 1801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 1801 can adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve.
  • the processor 1801 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit); the coprocessor is A low-power processor used to process data in the standby state.
  • the processor 1801 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen.
  • the processor 1801 may further include an AI (Artificial Intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 1802 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 1802 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 1802 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1801 to implement the encoding method provided in the method embodiment of the present application .
  • the terminal 1800 may optionally further include: a peripheral device interface 1803 and at least one peripheral device.
  • the processor 1801, the memory 1802, and the peripheral device interface 1803 may be connected through a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 1803 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 1804, a touch display screen 1805, a camera 1806, an audio circuit 1807, a positioning component 1808, and a power supply 1809.
  • the peripheral device interface 1803 can be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1801 and the memory 1802.
  • the processor 1801, the memory 1802, and the peripheral device interface 1803 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1801, the memory 1802, and the peripheral device interface 1803 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 1804 is used to receive and transmit RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 1804 communicates with a communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 1804 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 1804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
  • the radio frequency circuit 1804 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area network and/or WiFi (Wireless Fidelity, wireless fidelity) network.
  • the radio frequency circuit 1804 may also include NFC (Near Field Communication) related circuits, which is not limited in this application.
  • the display screen 1805 is used for displaying UI (User Interface).
  • the UI can include graphics, text, icons, videos, and any combination thereof.
  • the display screen 1805 also has the ability to collect touch signals on or above the surface of the display screen 1805.
  • the touch signal may be input to the processor 1801 as a control signal for processing.
  • the display screen 1805 can also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • there may be one display screen 1805 which is provided with the front panel of the terminal 1800; in other embodiments, there may be at least two display screens 1805, which are respectively arranged on different surfaces of the terminal 1800 or have a folding design; In still other embodiments, the display screen 1805 may be a flexible display screen, which is disposed on the curved surface or the folding surface of the terminal 1800. Moreover, the display screen 1805 can also be set as a non-rectangular irregular pattern, that is, a special-shaped screen.
  • the display screen 1805 can be made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Emitting Diode).
  • the camera assembly 1806 is used to capture images or videos.
  • the camera assembly 1806 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • the camera assembly 1806 may also include a flash.
  • the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • the audio circuit 1807 may include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 1801 for processing, or input to the radio frequency circuit 1804 to realize voice communication.
  • the microphone can also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 1801 or the radio frequency circuit 1804 into sound waves.
  • the speaker can be a traditional thin-film speaker or a piezoelectric ceramic speaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert electrical signals into sound waves audible to humans, but also convert electrical signals into sound waves inaudible to humans for distance measurement and other purposes.
  • the audio circuit 1807 may also include a headphone jack.
  • the positioning component 1808 is used to locate the current geographic location of the terminal 1800 to implement navigation or LBS (Location Based Service, location-based service).
  • the positioning component 1808 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Granus system of Russia, or the Galileo system of the European Union.
  • the power supply 1809 is used to supply power to various components in the terminal 1800.
  • the power supply 1809 may be alternating current, direct current, disposable batteries or rechargeable batteries.
  • the rechargeable battery may support wired charging or wireless charging.
  • the rechargeable battery can also be used to support fast charging technology.
  • the terminal 1800 further includes one or more sensors 1810.
  • the one or more sensors 1810 include, but are not limited to: an acceleration sensor 1811, a gyroscope sensor 1812, a pressure sensor 1813, a fingerprint sensor 1814, an optical sensor 1815, and a proximity sensor 1816.
  • the acceleration sensor 1811 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 1800.
  • the acceleration sensor 1811 can be used to detect the components of the gravitational acceleration on three coordinate axes.
  • the processor 1801 may control the touch screen 1805 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 1811.
  • the acceleration sensor 1811 may also be used for the collection of game or user motion data.
  • the gyroscope sensor 1812 can detect the body direction and rotation angle of the terminal 1800, and the gyroscope sensor 1812 can cooperate with the acceleration sensor 1811 to collect the user's 3D actions on the terminal 1800. Based on the data collected by the gyroscope sensor 1812, the processor 1801 can implement the following functions: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 1813 may be disposed on the side frame of the terminal 1800 and/or the lower layer of the touch screen 1805.
  • the processor 1801 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 1813.
  • the processor 1801 controls the operability controls on the UI interface according to the user's pressure operation on the touch display screen 1805.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 1814 is used to collect the user's fingerprint.
  • the processor 1801 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 1814, or the fingerprint sensor 1814 identifies the user's identity according to the collected fingerprint. When it is recognized that the user's identity is a trusted identity, the processor 1801 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 1814 may be provided on the front, back or side of the terminal 1800. When a physical button or manufacturer logo is provided on the terminal 1800, the fingerprint sensor 1814 can be integrated with the physical button or manufacturer logo.
  • the optical sensor 1815 is used to collect the ambient light intensity.
  • the processor 1801 may control the display brightness of the touch screen 1805 according to the ambient light intensity collected by the optical sensor 1815. Specifically, when the ambient light intensity is high, the display brightness of the touch screen 1805 is increased; when the ambient light intensity is low, the display brightness of the touch screen 1805 is decreased.
  • the processor 1801 may also dynamically adjust the shooting parameters of the camera assembly 1806 according to the ambient light intensity collected by the optical sensor 1815.
  • the proximity sensor 1816 also called a distance sensor, is usually set on the front panel of the terminal 1800.
  • the proximity sensor 1816 is used to collect the distance between the user and the front of the terminal 1800.
  • the processor 1801 controls the touch screen 1805 to switch from the on-screen state to the off-screen state; when the proximity sensor 1816 detects When the distance between the user and the front of the terminal 1800 gradually increases, the processor 1801 controls the touch display screen 1805 to switch from the on-screen state to the on-screen state.
  • FIG. 18 does not constitute a limitation on the terminal 1800, and may include more or fewer components than shown in the figure, or combine some components, or adopt different component arrangements.
  • FIG. 19 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 1900 may have relatively large differences due to different configurations or performance, and may include one or more processors (central processing units, CPU) 1901 and one Or more than one memory 1902, where at least one instruction is stored in the memory 1902, and the at least one instruction is loaded and executed by the processor 1901 to implement the methods provided by the foregoing method embodiments.
  • the server may also have components such as a wired or wireless network interface, a keyboard, an input and output interface for input and output, and the server may also include other components for implementing device functions, which will not be repeated here.
  • the server 1900 may be used to execute the steps performed by the encoding device in the foregoing encoding method.
  • the embodiment of the present application also provides an encoding device, which includes one or more processors and one or more memories, and at least one computer-readable instruction, code set or instruction set is stored in the one or more memories, The computer-readable instruction, the code set, or the instruction set is loaded and executed by the one or more processors to implement the operations performed in the encoding method of the foregoing embodiment.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores at least one computer-readable instruction, code set, or instruction set, the computer-readable instruction, the code set, or the instruction set It is loaded and executed by one or more processors to implement the operations performed in the encoding method of the foregoing embodiment.
  • the program can be stored in a computer-readable storage medium.
  • the storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种编码方法,包括:获取目标段落和预设数据库,将目标段落和预设数据库输入至记忆编码模型;在输入层中获取目标段落的原始向量集合和预设数据库的知识向量集合;在第一记忆层中根据原始向量集合和知识向量集合,获取第一目标语句矩阵;在输出层中根据第一目标语句矩阵,获取目标段落的段落向量,基于段落向量进行处理。

Description

编码方法、装置、设备及存储介质
本申请要求于2019年01月24日提交中国专利局,申请号为201910069751.1,申请名称为“编码方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及自然语言处理领域,特别涉及一种编码方法、装置、设备及存储介质。
背景技术
编码是将文字转换成编码值,从而得到能够准确描述该文本含义的向量的过程。通过进行编码,可以将文字转换为便于运算处理的向量形式,现已广泛应用于语句选取、语句生成等多种领域。
目前,当要对包括多个语句的目标段落进行编码时,获取目标段落中每个语句中每个词语的词向量。对于目标段落中的每个语句,应用基于词语层面的第一编码模型,将该语句中每个词语的词向量编码成一个向量,得到该语句的语句向量,进而得到目标段落中多个语句的语句向量。再应用基于语句层面的第二编码模型,将该多个语句的语句向量编码成一个向量,得到目标段落的段落向量。
上述分级编码的方案采用串行的方式,依次对目标段落中每个语句的词向量分别进行编码,再采用串行的方式,对多个语句向量进行编码,编码速度较慢,且准确率较低。
发明内容
一种编码方法,所述方法包括:
获取目标段落和预设数据库,将所述目标段落和所述预设数据库输入至记忆编码模型,所述目标段落包括至少一个语句,所述记忆编码模型至少包括输入层、第一记忆层和输出层;
在所述输入层中,获取所述目标段落的原始向量集合和所述预设数据库的知识向量集合,所述原始向量集合包括所述目标段落中每个语句的语句向 量;所述知识向量集合包括所述预设数据库中多条知识数据的知识向量;
在所述第一记忆层中,根据所述原始向量集合和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵,所述第一目标语句矩阵用于根据所述原始向量集合与所述知识向量集合之间的关联关系,对所述目标段落进行描述;
在所述输出层中,根据所述第一目标语句矩阵,获取所述目标段落的段落向量;及
基于所述段落向量进行处理。
一种编码装置,所述装置包括:
获取模块,用于获取目标段落和预设数据库,将所述目标段落和所述预设数据库输入至记忆编码模型,所述目标段落包括至少一个语句;
输入层模块,用于获取所述目标段落的原始向量集合和所述预设数据库的知识向量集合,所述原始向量集合包括所述目标段落中每个语句的语句向量;所述知识向量集合包括所述预设数据库中多条知识数据的知识向量;
第一记忆层模块,用于根据所述原始向量集合和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵,所述第一目标语句矩阵用于根据所述原始向量集合与所述知识向量集合之间的关联关系,对所述目标段落进行描述;
输出层模块,用于根据所述第一目标语句矩阵,获取所述目标段落的段落向量;及
处理模块,用于基于所述段落向量进行处理。
一种编码设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:
获取目标段落和预设数据库,将所述目标段落和所述预设数据库输入至记忆编码模型,所述目标段落包括至少一个语句,所述记忆编码模型至少包括输入层、第一记忆层和输出层;
在所述输入层中,获取所述目标段落的原始向量集合和所述预设数据库 的知识向量集合,所述原始向量集合包括所述目标段落中每个语句的语句向量;所述知识向量集合包括所述预设数据库中多条知识数据的知识向量;
在所述第一记忆层中,根据所述原始向量集合和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵,所述第一目标语句矩阵用于根据所述原始向量集合与所述知识向量集合之间的关联关系,对所述目标段落进行描述;
在所述输出层中,根据所述第一目标语句矩阵,获取所述目标段落的段落向量;及
基于所述段落向量进行处理。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取目标段落和预设数据库,将所述目标段落和所述预设数据库输入至记忆编码模型,所述目标段落包括至少一个语句,所述记忆编码模型至少包括输入层、第一记忆层和输出层;
在所述输入层中,获取所述目标段落的原始向量集合和所述预设数据库的知识向量集合,所述原始向量集合包括所述目标段落中每个语句的语句向量;所述知识向量集合包括所述预设数据库中多条知识数据的知识向量;
在所述第一记忆层中,根据所述原始向量集合和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵,所述第一目标语句矩阵用于根据所述原始向量集合与所述知识向量集合之间的关联关系,对所述目标段落进行描述;
在所述输出层中,根据所述第一目标语句矩阵,获取所述目标段落的段落向量;及
基于所述段落向量进行处理。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本 申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种记忆编码模型的结构示意图;
图2是本申请实施例提供的另一种记忆编码模型的结构示意图;
图3是本申请实施例提供的又一种记忆编码模型的结构示意图;
图4是本申请实施例提供的再一种记忆编码模型的结构示意图;
图5是本申请实施例提供的还一种记忆编码模型的结构示意图;
图6是本申请实施例提供的又一个记忆编码模型的结构示意图;
图7是本申请实施例提供的一种编码方法的流程示意图;
图7A是本申请实施例提供的在输入层中获取目标段落的原始向量集合、记忆向量集合和知识向量集合的流程示意图;
图7B是本申请实施例提供的在第二记忆层中获取原始向量集合的第三目标语句矩阵的流程示意图;
图7C是本申请实施例提供的在第二门控层中获得第四目标语句矩阵的流程示意图;
图7D是本申请实施例提供的在第一记忆层中获取原始向量集合的第一目标语句矩阵的流程示意图;
图7E是本申请实施例提供的在第一门控层中获取第二目标语句矩阵的流程示意图;
图8是本申请实施例提供的一种语句编码模型的结构示意图;
图9是本申请实施例提供的一种语句编码模型的流程示意图;
图10是本申请实施例提供的一种记忆编码模型的结构示意图;
图11是本申请实施例提供的一种获取知识向量的流程示意图;
图12是本申请实施例提供的一种记忆层的结构示意图;
图13是本申请实施例提供的一种门控层的结构示意图;
图14是本申请实施例提供的一种记忆编码模型的结构示意图;
图15是本申请实施例提供的一种记忆编码模型的结构示意图;
图16是本申请实施例提供的一种记忆编码模型的结构示意图;
图17是本申请实施例提供的一种编码装置的结构示意图;
图18是本申请实施例提供的一种终端的结构框图;
图19是本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请实施例提供了一种记忆编码模型,获取目标段落和预设数据库,输入至记忆编码模型,应用该记忆编码模型可以对该目标段落进行编码,得到该目标段落的段落向量,从而能够基于段落向量进行处理。编码过程中,可以目标段落为单位,一次性地应用该记忆编码模型对目标段落进行编码,无需采用串行的方式分别对目标段落中每个语句进行编码。而且,不仅考虑到目标段落中每个语句的含义,还考虑到预设数据库中的知识数据,从而使获取到的段落向量不仅能够表达目标段落的含义,还能够从外部知识数据中抽取相关的知识数据,使得获取到的段落向量更能准确表达目标段落的含义,基于段落向量进行处理时能够提高精确度。
参见图1,该记忆编码模型包括输入层101、第一记忆层102和输出层103,输入层101与第一记忆层102连接,第一记忆层102与输出层103连接。
其中,输入层101根据目标段落中的每个语句,提取代表语句含义的语句向量,得到该目标段落的原始向量集合,将原始向量集合输入至记忆层102中。输入层101还会根据预设数据库中每条知识数据,获取每条知识数据的知识向量,并将获取的多个知识向量组成知识向量集合,输入至第一记忆层102中。第一记忆层102根据输入的原始向量集合与知识向量集合,获取第一目标语句矩阵,并将第一目标语句矩阵输入至输出层103;输出层103根据第一目标语句矩阵,获取该目标段落的段落向量。
由于第一记忆层102采用了注意力学习机制,能够在知识向量集合中抽取与原始向量集合相关的知识数据,因此能够获取到更为准确的段落向量。
在一种可能实现方式中,该记忆编码模型会重复运行第一记忆层102,将第一记忆层102输出的第一目标语句矩阵作为第一记忆层102的原始向量集合,保持知识向量集合不变,或者也可以对知识向量集合进行更新,得到更新后的知识向量集合,重新输入至第一记忆层102中,重复运行第一记忆层102,直至重复次数达到预设次数时,将当前的目标语句矩阵输入至输出 层103,得到目标段落的段落向量。其中,预设次数可以是2次或3次,或者可以为其他数值。
在一种可能实现方式中,在图1所示的记忆编码模型的基础上,参见图2,该记忆编码模型还包括第一门控层104,输入层101与第一记忆层102和第一门控层104连接,第一记忆层102与第一门控层104连接,第一门控层104与输出层103连接。即本申请实施例提供了一种GSMN(Gated Self-attentive Memory Network,门控自注意力型记忆网络)模型。
第一记忆层102得到第一目标语句矩阵后,输入至第一门控层104,第一门控层104对原始向量集合和第一目标语句矩阵进行加权求和,得到第二目标语句矩阵,将第二目标语句矩阵输入至输出层103,输出层103根据第二目标语句矩阵,获取该目标段落的段落向量。
在一种可能实现方式中,该记忆编码模型会重复运行第一记忆层102和第一门控层104,将第一门控层104输出的第二目标语句矩阵作为第一记忆层102的原始向量集合和知识向量集合,重新输入至第一记忆层102中,重复运行第一记忆层102和第一门控层104,直至重复次数达到预设次数时,将当前的目标语句矩阵输入至输出层103,得到目标段落的段落向量。
在一种可能实现方式中,参见图3,该记忆编码模型还包括第二记忆层105,第二记忆层105位于第一记忆层102之前,输入层101与第一记忆层102和第二记忆层105连接,第二记忆层105与第一记忆层102连接,第一记忆层102与输出层103连接。
输入层101根据目标段落的上下文语句中的每个词语,获取每个词语的词向量,并将获取的多个词向量组成记忆向量集合,将原始向量集合和记忆向量集合输入至第二记忆层105中。第二记忆层105根据输入的原始向量集合和记忆向量集合,获取第三目标语句矩阵,并将第三目标语句矩阵输入至第一记忆层102中。第一记忆层102根据输入的第三目标语句矩阵和知识向量集合,获取第一目标语句矩阵,将第一目标语句矩阵输入至输出层103,输出层103根据第一目标语句矩阵,获取该目标段落的段落向量。
在一种可能实现方式中,该记忆编码模型会重复运行第二记忆层105和 第一记忆层102,将第一记忆层102输出的第一目标语句矩阵作为第二记忆层105的原始向量集合和记忆向量集合,重新输入至第二记忆层105中,重复运行第二记忆层105和第一记忆层102,直至重复次数达到预设次数时,将当前的目标语句矩阵输入至输出层103,得到目标段落的段落向量。
在一种可能实现方式中,参见图4,该记忆编码模型还包括第二门控层106,第二门控层106位于第一记忆层102之前、第二记忆层105之后。输入层101与第二记忆层105、第二门控层106和第一记忆层102连接,第二记忆层105与第二门控层106连接,第二门控层106与第一记忆层102连接,第一记忆层102与输出层103连接。
第二记忆层105得到第三目标语句矩阵后,输入至第二门控层106中。第二门控层106对原始向量集合和第三目标语句矩阵进行加权求和,得到第四目标语句矩阵,将第四目标语句矩阵输入至第一记忆层102中。第一记忆层102根据第四目标语句矩阵和知识向量集合,获取第一目标语句矩阵,将第一目标语句矩阵输入至输出层103,输出层103根据第一目标语句矩阵,获取该目标段落的段落向量。
在一种可能实现方式中,该记忆编码模型会重复运行第二记忆层105、第二门控层106和第一记忆层102。将第一记忆层102输出的第一目标语句矩阵作为第二记忆层105的原始向量集合和记忆向量集合,重新输入至第二记忆层105中,重复运行第二记忆层105、第二门控层106和第一记忆层102,直至重复次数达到预设次数时,将当前的目标语句矩阵输入至输出层103,得到目标段落的段落向量。
需要说明的是,上述图3或图4所示的记忆编码模型可以与上述图2所示的记忆编码模型进行结合,则得到的记忆编码模型包括输入层、第二记忆层、第一记忆层、第一门控层和输出层,或者记忆编码模型包括输入层、第二记忆层、第二门控层、第一记忆层、第一门控层和输出层。
此种情况下的处理方式与上述记忆编码模型的处理方式类似,在此不再赘述。
在另一种可能实现方式中,在图1所示的记忆编码模型的基础上,参见 图5,该记忆编码模型还包括第三记忆层107,第三记忆层107位于第一记忆层102之后,输入层101与第一记忆层102和第三记忆层107连接,第一记忆层102与第三记忆层107连接。第一记忆层102得到第一目标语句矩阵后,输入至第三记忆层107,第三记忆层107根据记忆向量集合和第一目标语句矩阵,获取第五目标语句矩阵,将第五目标语句矩阵输入至输出层103,输出层103根据第五目标语句矩阵,获取该目标段落的段落向量。
在一种可能实现方式中,该记忆编码模型会重复运行第三记忆层107。第三记忆层107获取到第五目标语句矩阵后,将第五目标语句矩阵作为更新后的第一目标语句矩阵和记忆向量集合,第三记忆层107重复执行根据更新后的第五目标语句矩阵和记忆向量集合获取目标语句矩阵的步骤,直至重复次数达到预设次数时,将当前的目标语句矩阵输入至输出层103中。输出层103根据当前目标语句矩阵,获取目标段落的段落向量。
在一种可能实现方式中,参见图6,该记忆编码模型还包括第三门控层108,第三门控层108位于第三记忆层107之后,输入层101与第一记忆层102和第三记忆层107连接,第一记忆层102与第三记忆层107和第三门控层108连接,第三记忆层107与第三门控层108连接。
第三记忆层107得到第五目标语句矩阵后,将第五目标语句矩阵输入至第三门控层108中,第三门控层108对第五目标语句矩阵和第一目标语句矩阵进行加权求和,得到第六目标语句矩阵,将第六目标语句矩阵输入至输出层103,输出层103根据第六目标语句矩阵,获取该目标段落的段落向量。
在一种可能实现方式中,该记忆编码模型会重复运行第三记忆层107和第三门控层108。第三门控层108获取到第六目标语句矩阵后,将第六目标语句矩阵作为更新后的第一目标语句矩阵和记忆向量集合,第三记忆层107和第三门控层108重复执行根据更新后的第一目标语句矩阵和记忆向量集合获取目标语句矩阵的步骤,直至重复次数达到预设次数时,将当前的目标语句矩阵输入至输出层103中。输出层103根据当前目标语句矩阵,获取目标段落的段落向量。
需要说明的是,上述图5或图6所示的记忆编码模型可以与上述图2所示的记忆编码模型进行结合,则得到的记忆编码模型包括输入层、第一记忆 层、第一门控层、第三记忆层和输出层,或者记忆编码模型包括输入层、第一记忆层、第一门控层、第三记忆层、第三门控层和输出层。
此种情况下的处理方式与上述记忆编码模型的处理方式类似,在此不再赘述。
需要说明的是,对于上述几种不同的模型架构,均可采用多种重复运行的方式,即该记忆编码模型中的任一个或多个记忆层,或者任一个或多个门控层均可以重复运行,且针对多层来说,可以在每一层重复运行完毕之后重复运行下一层,或者也可以将多层看做一个整体,一起重复运行多次。
本申请实施例可以应用于对任一段落进行编码的场景下。
例如,在智能对话的场景下,用户与聊天机器人进行对话,聊天机器人可以获取用户输入的文本消息,作为目标段落,采用本申请实施例提供的方法,对目标段落进行编码得到段落向量,将该段落向量与语料数据库中多个答复消息的向量进行匹配,得到其向量与该段落向量匹配的答复消息,展示给用户,实现了用户与聊天机器人进行对话的效果。
由于编码过程中会考虑用户输入的文本消息以及预设数据库中的知识数据,因此生成的段落向量较为准确,能够使聊天机器人更好地理解用户想要表达的含义,根据该段落向量可以获取到更为匹配的答复消息,能够针对用户输入的文本消息给出更符合用户需求的答复,提升对话效果。
又例如,在文本分类的场景下,获取待分类的目标段落,采用本申请实施例提供的方法,对目标段落进行编码得到段落向量,根据该段落向量进行分类,可以确定该目标段落所属的类别。
由于编码过程中会考虑预设数据库中的知识数据,因此生成的段落向量较为准确,能够更好地理解目标段落的含义,根据该段落向量进行分类,可以提高分类准确度。
再例如,在段落选取的场景下,获取待选取的多个目标段落,采用本申请实施例提供的方法,对每个目标段落进行编码得到段落向量,根据多个目标段落的段落向量,从多个目标段落中选取满足要求的目标段落。
由于编码过程中会预设数据库中的知识数据,因此生成的段落向量较为 准确,能够更好地理解目标段落的含义,根据段落向量进行选取,可以选取到满足要求的目标段落,避免选取错误的问题。
除上述场景之外,本申请实施例提供的方法还可以应用于阅读理解等其他场景下,本申请实施例对此不做限定。
图7是本申请实施例提供的一种编码方法的流程图,本申请实施例对应用记忆编码模型对目标段落进行编码的过程进行说明,该记忆编码模型包括输入层、第二记忆层、第二门控层、第一记忆层、第一门控层和输出层。本申请实施例的执行主体为编码设备,该编码设备可以为服务器或者还可以为手机、计算机等终端。参见图7,该方法包括:
700、获取目标段落、目标段落的上下文语句和预设数据库,将目标段落、目标段落的上下文语句和预设数据库输入至记忆编码模型。
其中,该目标段落包括至少一个语句,每个语句包括至少一个词语。目标段落的上下文语句可以包括目标段落之前的一个或多个段落中的语句、目标段落之后的一个或多个段落中的语句,或者还可以包括目标段落中的一个或多个语句。例如,目标段落的上下文语句可以为目标段落的原文。
在一种可能实现方式中,该目标段落为某一文章中的一个段落,则上下文语句可以包括该文章中在该段落之前或之后的语句,或者还可以包括该段落中的语句。或者,该目标段落为智能对话场景中用户输入的某一段文本,则上下文语句可以包括在该目标段落之前用户输入的文本,或者该目标段落中的文本,或者还可以包括在该目标段落之前聊天机器人回复用户的文本等。
另外,预设数据库中包括至少一条知识数据,该至少一条知识数据可以包括多种类型,如新闻类、娱乐类、专业知识类等。且预设数据库中的知识数据可以由维护人员上传,或者由编码设备收集多个网络用户上传的数据,或者采用其他方式设置。且使用过程中,预设数据库中的知识数据可以固定不变,也可以根据需求进行更新。
在一种可能实现方式中,每条知识数据可以包括至少一个语句,每个语句包括至少一个词语,或者,每条知识数据包括至少一组键值对,每组键值对包括键(Key)和值(Value)。
例如,预设数据库中的一条知识数据可以如下表1所示。
表1
Key Value
姓名 张三
职业 医院院长
国籍 中国
当要对目标段落进行编码时,获取该目标段落、目标段落的上下文语句和预设数据库,将目标段落、目标段落的上下文语句和预设数据库输入至记忆编码模型中。
701、在输入层中,获取目标段落的原始向量集合、记忆向量集合和知识向量集合。
输入层为记忆编码模型中的第一层,当要对目标段落进行编码时,将目标段落、目标段落的上下文语句和预设数据库输入至输入层中,在输入层中对目标段落、目标段落的上下文语句和预设数据库分别进行处理,获取目标段落的原始向量集合、记忆向量集合和知识向量集合。
该原始向量集合包括目标段落中每个语句的语句向量,记忆向量集合包括目标段落的上下文语句中每个词语的词向量,知识向量集合包括预设数据库中多条知识数据的知识向量。
本申请实施例中,在对目标段落进行编码时,不仅要考虑目标段落本身,还要考虑目标段落的上下文语句以及预设数据库中的知识数据,因此不仅要获取原始向量集合,还要获取记忆向量集合和知识向量集合,后续会根据原始向量集合、记忆向量集合和知识向量集合进行处理。
在一种可能实现方式中,如图7A所示,步骤701可以包括以下步骤7011-7013:
7011、根据目标段落中每个语句中每个词语的词向量,应用语句编码模型,获取每个语句的语句向量,得到原始向量集合。
首先对该目标段落进行预处理,该预处理过程包括:对目标段落进行语句划分,得到目标段落中的每个语句,对每个语句进行词语划分,得到每个语句中的每个词语,获取每个词语的词向量。
其中,针对语句划分过程:可以获取目标段落中能够代表对应的语句已经结束的标点符号,如句号、问号、感叹号等,根据获取的标点符号对目标 段落进行划分,即可得到目标段落中的语句。
针对词语划分过程:可以采用分词算法对每个语句进行分词,该分词算法可以包括多种算法,如双向最大匹配法、最少切分法等。或者采用其他方式进行词语划分。
针对获取词向量的过程:对于每个词语,可以根据词向量字典查询该词语对应的词向量,该词向量字典可以包括词语与词向量之间的对应关系,或者该词向量字典可以为词向量获取模型,如循环神经网络模型、深度学习网络模型、卷积神经网络模型等,应用该词向量获取模型可以获取词语的词向量。
对目标段落进行预处理之后,对于每个语句,应用语句编码模型,对该语句中每个词语的词向量进行处理,获取该语句的语句向量,从而能够得到目标段落中每个语句的语句向量,根据每个语句的语句向量,组成原始向量集合。
其中,语句编码模型用于将任一语句中多个词语的词向量压缩成一个代表该语句含义的语句向量,可以为循环神经网络模型、深度学习网络模型、卷积神经网络模型、变换神经网络模型、基于词语层面的GSMN模型等多种类型的模型。
在一种可能实现的方式中,语句编码模型包括第一语句编码子模型和第二语句编码子模型,获取语句的语句向量的过程可以包括:对于目标段落中的每个语句,获取该语句中每个词语的词向量,得到多个词向量;应用第一语句编码子模型,对多个词向量进行正序编码,得到第一向量,应用第二语句编码子模型,对多个词向量进行倒序编码,得到第二向量;根据第一向量和第二向量,获取语句的语句向量。重复执行上述步骤,可以获取到目标段落中每个语句的语句向量。
其中,该第一语句编码子模型为正序编码模型,该第二语句编码子模型为倒序编码模型。该语句中的多个词语的词向量按照顺序排列,则应用第一语句编码子模型,会按照多个词向量的排列顺序对多个词向量进行正序编码,得到第一向量。而应用第二语句编码子模型,会对多个词向量进行倒序处理,再按照倒序处理后的排列顺序对多个词向量进行倒序编码,得到第二向量。
另外,获取到第一向量和第二向量之后,可以将第一向量和第二向量串 联,得到语句向量,或者将第一向量和第二向量相加,得到语句向量,或者还可以采用其他方式得到语句向量。
以语句编码模型为双向循环神经网络模型为例进行说明,如图8和图9所示,双向循环神经网络模型包括一个前向循环神经网络模型和一个后向循环神经网络模型,通过前向循环神经网络模型对语句的多个词向量进行正序编码,获取第一向量,通过后向循环神经网络模型对语句的多个词向量进行倒序编码,获取第二向量,将第一向量和第二向量串联,得到该语句的语句向量。
7012、根据上下文语句中每个词语的词向量,获取记忆向量集合。
对上下文语句进行词语划分,得到上下文语句中的每个词语,之后获取每个词语的词向量,根据获取的词向量组成记忆向量集合。其中,进行词语划分以及获取词语的词向量的过程与上述步骤7011类似,在此不再赘述。
需要说明的是,如果目标段落与上下文语句相同,则只需对目标段落中的语句进行处理即可得到原始向量集合和记忆向量集合,而无需对其他的语句进行处理。如图10所示,根据目标段落进行预处理后得到的词向量获取记忆向量集合。
本申请实施例中,记忆编码模型以目标段落为单位进行编码,因此输入层将获取到的原始向量集合和记忆向量集合均输入到记忆层中进行处理。
7013、根据预设数据库中每条知识数据的知识向量,获取知识向量集合。
获取预设数据库中每条知识数据的知识向量,将至少一条知识数据的知识向量组成知识向量集合。
其中,每条知识数据的知识向量可以预先通过对预设数据库进行预处理得到,且获取知识向量的方式可以包括多种。在一种可能实现方式中,获取预设数据库中的每条知识数据,对于每条知识数据,对该知识数据进行词语划分,得到知识数据中的至少一个词语,获取至少一个词语的词向量,根据至少一个词语的词向量,获取该知识数据的知识向量,将该知识向量与该知识数据对应存储于预设数据库中。
针对词语划分过程,可以采用分词算法对每条知识数据进行分词,该分词算法可以包括多种算法,如双向最大匹配法、最少切分法等。或者采用其他方式进行词语划分。在一种可能实现方式中,每条知识数据包括至少一组 键值对,可以采用分词算法对每条知识数据的键值对中的键和值分别进行分词。
针对获取词向量的过程,对于每个词语,可以根据词向量字典查询该词语对应的词向量,该词向量字典可以包括词语与词向量之间的对应关系,或者该词向量字典可以为词向量获取模型,如循环神经网络模型、深度学习网络模型、卷积神经网络模型等,应用该词向量获取模型可以获取词语的词向量。
针对获取知识向量的过程,将知识数据中的至少一个词语的词向量串联,得到知识数据的知识向量。在一种可能实现方式中,知识数据包括多组键值对时,对于每组键值对,将键值对中的至少一个词语的词向量组成一个向量,即为该键值对的向量,采用类似方式即可得到多组键值对的向量。之后,对多组键值对的向量进行压缩处理,得到该知识数据的知识向量,采用类似方式即可得到预设数据库中每条知识数据的知识向量。
其中,进行压缩处理时可以采用多种方式,例如可以将多组键值对的向量组成一个矩阵,对该矩阵进行列向量求和,即将该矩阵划分为多个列向量,计算每个列向量中的数值之和,得到每个列向量的总数值,将该多个列向量的总数值组成一个向量,得到知识向量。或者还可以应用编码模型对多组键值对的向量进行压缩处理。其中,该编码模型用于将多个向量压缩成一个向量,可以为循环神经网络模型、深度学习网络模型、卷积神经网络模型、变换神经网络模型、基于词语层面的GSMN模型等多种类型的模型。
基于上述表1所示的知识数据,获取该知识数据的知识向量的流程可以如图11所示,对每条知识数据的键值对中的键和值分别进行分词,对于每个词语,获取该词语的词向量,其中,Φ表示词向量,通过串联的方式将每组键值对中的词语的词向量组成一个向量,之后,对这三组键值对的向量进行压缩处理,得到该知识数据的知识向量。
702、在第二记忆层中,根据原始向量集合和记忆向量集合,获取原始向量集合的第三目标语句矩阵。
输入层将原始向量集合和记忆向量集合输入至第二记忆层中,在第二记忆层中获取第三目标语句矩阵,该第三目标语句矩阵用于根据原始向量集合与记忆向量集合之间的关联关系,对目标段落进行描述,可以将与上下文语 句相似度较高的语句进行记忆强化,在后续处理过程中对该语句更加注意,相当于应用了注意力机制得到第三目标语句矩阵,使第三目标语句矩阵对该目标段落的描述更加准确。
在一种可能实现方式中,如图7B所示,步骤702可以包括以下步骤7021-7024:
7021、在第二记忆层中,应用记忆模型,获取记忆向量集合对应的第一记忆矩阵和第二记忆矩阵。
第二记忆层包括记忆模型,应用该记忆模型可以获取该记忆向量集合对应的第一记忆矩阵和第二记忆矩阵,其中,第一记忆矩阵和第二记忆矩阵用于对该记忆向量集合进行描述,且第一记忆矩阵和第二记忆矩阵可以相同,也可以不同。
针对第一记忆矩阵的获取方式:可以根据记忆向量集合获取上下文语句中每个词语的词向量,应用语句编码模型,获取每个语句的语句向量,根据每个语句的语句向量获取第一记忆矩阵。
在一种可能实现方式中,该语句编码模型包括第三语句编码子模型和第四语句编码子模型,获取语句的语句向量过程可以包括:对于上下文语句中的每个语句,获取该语句中每个词语的词向量,得到多个词向量;应用第三语句编码子模型,对多个词向量进行正序编码,得到第三向量,应用第四语句编码子模型,对多个词向量进行倒序编码,得到第四向量;根据第三向量和第四向量,获取语句的语句向量。
其中,获取语句向量的具体过程与上述步骤7011类似,在此不再赘述。
获取到上下文语句中的每个语句的语句向量后,将这些语句的语句向量进行组合,得到第一记忆矩阵。
另外,第二记忆矩阵的获取方式与第一记忆矩阵的获取方式类似,区别仅在于采用的语句编码模型可以与获取第一记忆矩阵时采用的语句编码模型相同或者不同。
参见图12,获取第一记忆矩阵和第二记忆矩阵时采用的语句编码模型均为双向循环神经网络模型,应用这两个双向循环神经网络模型对记忆向量集合分别进行处理,可以得到第一记忆矩阵和第二记忆矩阵。这两个双向循环神经网络模型的参数可以相同,也可以不同,因此得到的第一记忆矩阵和第 二记忆矩阵可以相同,也可以不同。
由于第一记忆矩阵和第二记忆矩阵能够对记忆向量集合进行描述,因此根据第一记忆矩阵和第二记忆矩阵以及原始向量集合进行处理,能够考虑上下文语句与目标段落之间的关联关系,以便得到能够更加准确地描述该目标段落的段落。
本申请实施例以目标段落与上下文语句相同为例,则原始向量集合与记忆向量集合相同。在此情况下,可以执行下述步骤7022-7024获取用于描述目标段落的第三目标语句矩阵。当然,在目标段落与上下文语句不同的情况下,还可以采用多种方式获取第三目标语句矩阵。
7022、获取原始向量集合与第一记忆矩阵的相似度矩阵。
其中,获取相似度矩阵的方式有多种,如矩阵相乘法、矩阵相减法等。在一种可能实现方式中,根据该原始向量集合中的语句向量进行组合,得到目标段落的原始语句矩阵,将该原始语句矩阵与第一记忆矩阵相乘,得到的矩阵作为相似度矩阵。或者,还可以将该原始语句矩阵与第一记忆矩阵的转置相乘,得到的矩阵作为相似度矩阵。
相似度矩阵中的每一个数值代表了原始向量集合中的语句与上下文语句中对应的语句之间的相似度,相似度越高,表示两个语句关联越紧密,在后续的处理过程中越应当注意该语句。
7023、对相似度矩阵进行概率分布计算,得到概率矩阵。
相似度矩阵中包括多个相似度,对相似度矩阵进行概率分布计算,可以得到概率矩阵,概率矩阵中包括每个相似度对应的概率,且所有相似度的概率之和为1。
其中,概率分布计算方式可以有多种,在一种可能实现方式中,采用Softmax(归一化指数)函数对相似度矩阵进行计算,得到与相似度矩阵对应的概率矩阵。或者,对于相似度矩阵中的每个位置,获取该位置上的相似度与相似度矩阵中所有相似度之和的比值,得到该位置上的相似度对应的概率,从而获取到每个位置上的相似度对应的概率,将获取到的概率组成概率矩阵。
7024、根据第二记忆矩阵和概率矩阵,获取第三目标语句矩阵。
根据第二记忆矩阵和概率矩阵,获取第三目标语句矩阵的方式有多种,在一种可能实现方式中,将概率矩阵与第二记忆矩阵相乘,得到与目标段落 的语句矩阵尺寸相同的第三目标语句矩阵。
由于目标段落中的语句与上下文语句的相似度越高,概率越大,因此将概率矩阵与第二记忆矩阵相乘,可以将与上下文语句相似度较高的语句进行记忆强化。
举例来说,原始向量集合中包括目标段落的J个语句的语句向量,记忆向量集合中包括上下文语句的K个语句的词向量,J和K为正整数,则原始向量集合对应的矩阵X为J*D的矩阵,记忆向量集合对应的矩阵M为K*D的矩阵,D为语句向量的维度数量。将这两个矩阵输入至记忆层,通过执行上述步骤7021-7023,得到的第三目标语句矩阵为O=Softmax(XΦ 1(M) T2(M),其中,Φ 1(M)为第一记忆矩阵,Φ 2(M)为第二记忆矩阵。
703、在第二门控层中,对原始向量集合与第三目标语句矩阵进行加权求和,得到第四目标语句矩阵,使第四目标语句矩阵中的每个数值属于预设数值范围。
输入层将原始向量集合输入至第二门控层,第二记忆层将第三目标语句矩阵输入至第二门控层,在第二门控层中根据原始向量集合和第三目标语句矩阵进行处理,对经过记忆强化的第三目标语句矩阵和原始的原始向量集合所占的比重进行调整,从而对目标段落中与上下文语句相似度较高的语句所占的比重进行调整。
在一种可能实现方式中,如图7C所示,步骤703可以包括以下步骤7031-7033:
7031、应用线性网络模型,获取原始向量集合对应的线性数值,采用预设函数对线性数值进行处理,得到原始向量集合的第一权重,以使第一权重属于预设数值范围。
参见图13,线性网络模型可以是线性神经网络模型,或者还可以是其他线性网络模型,对原始向量集合进行线性处理后,得到的该线性数值能够对原始向量集合进行描述。
获取到线性数值之后,采用预设函数对线性数值进行处理,得到原始向量集合的第一权重。该预设函数用于将线性数值压缩到预设数值范围,以使得到的第一权重属于该预设数值范围。其中,该预设函数可以是sigmoid(神经元的非线性作用)函数或者其他函数,该预设数值范围可以为0至1的数 值范围,则第一权重大于0小于1。
7032、计算1与第一权重的差值,得到第一目标语句矩阵的第二权重。
其中,第一权重是原始向量集合所占的权重,第二权重是第三目标语句矩阵所占的权重,第一权重与第二权重之和为1,在得到第一权重之后,通过计算1与第一权重的差值,得到第二权重。
7033、按照第一权重和第二权重,对原始向量集合与第三目标语句矩阵进行加权求和,得到第四目标语句矩阵。
参见图14,根据该原始向量集合中的语句向量进行组合,得到目标段落的原始语句矩阵,第一权重即为该原始语句矩阵的权重,第二权重即为第三目标语句矩阵的权重,按照第一权重与第二权重,对该原始语句矩阵和该第三目标语句矩阵进行加权求和,得到第四目标语句矩阵,使第四目标语句矩阵中的每个数值属于预设数值范围。
在一种可能实现方式中,采用以下公式进行加权求和:
O’=G*X+(1-G)*O;
其中,O’为第四目标语句矩阵,G为第一权重,X为目标段落的原始语句矩阵,O为第三目标语句矩阵。
通过第二门控层可以筛选经过记忆加强后学习到的信息,调整目标段落与上下文语句之间的比重,控制信息的流动,避免加入过多与目标段落不相关的信息。
704、在第一记忆层中,根据第四目标语句矩阵和知识向量集合,获取原始向量集合的第一目标语句矩阵。
输入层将知识向量集合输入至第一记忆层中,第二门控层将第四目标语句矩阵输入至第一记忆层中。在第一记忆层中,根据第四目标语句矩阵和知识向量集合,获取原始向量集合的第一目标语句矩阵。
其中,第一目标语句矩阵用于根据第四目标语句矩阵并结合知识向量集合,对目标段落进行描述,可以引入外部的知识数据,从外部的知识数据中抽取相关的知识数据,根据抽取的相关知识数据对目标段落进行强化,使第一目标语句矩阵对该目标段落的描述更加准确。
在一种可能实现方式中,如图7D所示,步骤704可以包括以下步骤7041-7044:
7041、在第一记忆层中,应用记忆模型,获取知识向量集合对应的第一知识矩阵和第二知识矩阵。
第一记忆层包括记忆模型,应用该记忆模型可以获取该知识向量集合对应的第一知识矩阵和第二知识矩阵,其中,第一知识矩阵和第二知识矩阵用于对该知识向量集合进行描述,且第一知识矩阵和第二知识矩阵可以相同,也可以不同。
针对第一知识矩阵的获取方式:可以根据知识向量集合获取预设数据库中每条知识数据的知识向量,应用语句编码模型,获取每条知识数据的第一知识向量,根据每个知识数据的第一知识向量获取第一知识矩阵。
在一种可能实现方式中,该语句编码模型包括第五语句编码子模型和第六语句编码子模型,获取知识数据的知识向量过程可以包括:对于预设数据库中的每条知识数据,获取每条知识数据的知识向量,得到至少一个知识向量;应用第五语句编码子模型,对至少一个知识向量进行正序编码,得到每个知识向量的第五向量,应用第六语句编码子模型,对至少一个知识向量进行倒序编码,得到每个知识向量的第六向量;根据第五向量和第六向量,获取知识数据的第一知识向量。获取知识数据的第一知识向量后,将这些知识数据的第一知识向量进行组合,得到第一知识矩阵。
另外,第二知识矩阵的获取方式与第一知识矩阵的获取方式类似,区别仅在于采用的语句编码模型可以与获取第一知识矩阵时采用的语句编码模型相同或者不同。
在一种可能实现方式中,获取第一知识矩阵和第二知识矩阵时采用的语句编码模型均为双向循环神经网络模型,应用这两个双向循环神经网络模型对知识向量集合分别进行处理,可以得到第一知识矩阵和第二知识矩阵。这两个双向循环神经网络模型的参数可以相同,也可以不同,因此得到的第一知识矩阵和第二知识矩阵可以相同,也可以不同。
由于第一知识矩阵和第二知识矩阵能够对知识向量集合进行描述,因此根据第一知识矩阵和第二知识矩阵以及第四目标语句矩阵进行处理,能够引入外部的知识数据,从外部的知识数据中抽取相关的知识数据,根据抽取的相关知识数据对目标段落进行强化,以便得到能够更加准确地描述该目标段落的段落向量。
7042、获取第四目标语句矩阵与第一知识矩阵的相似度矩阵。
其中,获取相似度矩阵的方式有多种,如矩阵相乘法、矩阵相减法等。在一种可能实现方式中,将第四目标语句矩阵与第一知识矩阵相乘,得到的矩阵作为相似度矩阵。或者,还可以将该第四目标语句矩阵与第一知识矩阵的转置相乘,得到的矩阵作为相似度矩阵。
相似度矩阵中的每一个数值代表了原始向量集合中的语句与预设数据库中的知识数据之间的相似度,相似度越高,表示关联越紧密,在后续的处理过程中越应当将知识数据中的相关的知识数据引入进来,对目标段落中的语句进行强化。
7043、对相似度矩阵进行概率分布计算,得到概率矩阵。
相似度矩阵中包括多个相似度,对相似度矩阵进行概率分布计算,可以得到概率矩阵,概率矩阵中包括每个相似度对应的概率,且所有相似度的概率之和为1。
其中,概率分布计算方式可以有多种,在一种可能实现方式中,采用Softmax(归一化指数)函数对相似度矩阵进行计算,得到与相似度矩阵对应的概率矩阵。或者,对于相似度矩阵中的每个位置,获取该位置上的相似度与相似度矩阵中所有相似度之和的比值,得到该位置上的相似度对应的概率,从而获取到每个位置上的相似度对应的概率,将获取到的概率组成概率矩阵。
7044、根据第二知识矩阵和概率矩阵,获取第一目标语句矩阵。
根据第二知识矩阵和概率矩阵,获取第一目标语句矩阵的方式有多种,在一种可能实现方式中,将概率矩阵与第二知识矩阵相乘,得到与第四目标语句矩阵尺寸相同的第一目标语句矩阵。
其中,与第四目标语句矩阵相比,第一目标语句矩阵从知识向量集合中抽取与原始向量集合相关的知识数据,对目标段落的描述更加准确。由于第四目标语句矩阵中的向量与第一知识矩阵中的向量的相似度越高,概率越大,因此将概率矩阵与第二知识矩阵相乘,可以将知识数据中的相关的知识数据引入进来,对目标段落中的语句进行强化,使第一目标语句矩阵对该目标段落的描述更加准确。
705、在第一门控层中,对第四目标语句矩阵与第一目标语句矩阵进行加权求和,得到第二目标语句矩阵,使第二目标语句矩阵中的每个数值属于预 设数值范围。
第二门控层将第四目标语句矩阵输入至第一门控层,第一记忆层将第一目标语句矩阵输入至第一门控层,在第一门控层中对第四目标语句矩阵和第一目标语句矩阵所占的比重进行调整,从而对经过记忆加强的目标段落与知识数据所占的比重进行调整。
在一种可能实现方式中,如图7E所示,步骤705可以包括以下步骤7051-7053:
7051、在第一门控层中,应用线性网络模型,获取第四目标语句矩阵对应的线性数值,采用预设函数对线性数值进行处理,得到第四目标语句矩阵的第三权重,以使第三权重属于预设数值范围。
在第一门控层中,应用线性网络模型,获取第四目标语句矩阵对应的线性数值,其中,线性网络模型可以是线性神经网络模型,或者还可以是其他线性网络模型,对第四目标语句矩阵进行线性处理后,得到的该线性数值能够对第四目标语句矩阵进行描述。
获取到线性数值之后,采用预设函数对线性数值进行处理,得到第四目标语句矩阵的第三权重。该预设函数用于将线性数值压缩到预设数值范围,以使得到的第三权重属于该预设数值范围。其中,该预设函数可以是sigmoid(神经元的非线性作用)函数或者其他函数,该预设数值范围可以为0至1的数值范围,则第三权重大于0小于1。
7052、计算1与第三权重的差值,得到第一目标语句矩阵的第四权重。
其中,第三权重是第四目标语句矩阵所占的权重,第四权重是第一目标语句矩阵所占的权重,第三权重与第四权重之和为1,在得到第三权重之后,通过计算1与第三权重的差值,得到第四权重。
7053、按照第三权重和第四权重,对第四目标语句矩阵与第一目标语句矩阵进行加权求和,得到第二目标语句矩阵。
第三权重为该第四目标语句矩阵的权重,第四权重即为第一目标语句矩阵的权重,按照第三权重与第四权重,对该第四目标语句矩阵和该第一目标语句矩阵进行加权求和,得到第二目标语句矩阵,使第二目标语句矩阵中的每个数值属于预设数值范围。
在一种可能实现方式中,采用以下公式进行加权求和:
O’=G*X+(1-G)*O;
其中,O’为第二目标语句矩阵,G为第三权重,X为第四目标语句矩阵,O为第一目标语句矩阵。
通过第一门控层可以筛选引入相关知识数据后学习得到的信息,调整经过记忆加强的目标段落与引入相关知识数据的目标段落之间的比重,控制信息的流动,避免加入过多与目标段落不相关的信息。
706、在输出层中,根据第二目标语句矩阵,获取目标段落的段落向量。
在输出层中,将第二目标语句矩阵转换为一个向量,作为目标段落的段落向量。其中,获取段落向量的方式可以有多种,在一种可能实现方式中,对第二目标语句矩阵进行列向求和,即将第二目标语句矩阵划分为多个列向量,计算每个列向量中的数值之和,得到每个列向量的总数值,将该多个列向量的总数值组成一个向量,得到段落向量。
需要说明的一点是,本申请实施例仅是以运行一次第二记忆层、第二门控层、第一记忆层和第一门控层为例进行说明,而在另一实施例中,还可以重复运行第二记忆层、第二门控层、第一记忆层和第一门控层,如图15所示。即在第一门控层中获取到第二目标语句矩阵后,将第二目标语句矩阵作为更新后的原始向量集合和记忆向量集合,保持知识向量集合不变,或者也可以对知识向量集合进行更新,得到更新后的知识向量集合,第二记忆层、第二门控层、第一记忆层和第一门控层重复执行根据更新后的原始向量集合、记忆向量集合和知识向量集合获取目标语句矩阵的步骤,直至重复次数达到预设次数时,将当前的目标语句矩阵输入至输出层中,输出层中根据当前的目标语句矩阵,获取目标段落的段落向量。其中,该预设次数可以根据需求确定,或者为通过实验确定的优选值,该预设次数可以为2或3等。
在一种可能实现方式中,可以重复运行第二记忆层和第二门控层,即在第二门控层中获取到第四目标语句矩阵后,将第四目标语句矩阵作为更新后的原始向量集合和记忆向量集合,第二记忆层和第二门控层重复执行根据更新后的原始向量集合和记忆向量集合获取目标语句矩阵的步骤,直至重复次数达到预设次数时,将当前的目标语句矩阵输入至第一记忆层中。后续过程中在第一记忆层和第一门控层中继续进行处理。
在一种可能实现方式中,可以重复运行第一记忆层和第一门控层,在第 一门控层获取到第二目标语句矩阵后,将第二目标语句矩阵作为更新后的第四目标语句矩阵,保持知识向量集合不变,或者也可以对知识向量集合进行更新,得到更新后的知识向量集合,第一记忆层和第一门控层重复执行根据更新后的第四目标语句矩阵和知识向量集合获取目标语句矩阵的步骤,直至重复次数达到预设次数时,将当前的目标语句矩阵输入至输出层中。输出层根据当前的目标语句矩阵,获取目标段落的段落向量。
707、基于段落向量进行处理。
在获取到目标段落的段落向量之后,会对段落向量进行处理。应用场景不同,对段落向量的处理方式也不同,具体采用何种处理方式可以根据需求确定。例如:在智能对话场景下,目标段落为用户输入的文本消息,在获取到目标段落的段落向量之后,会根据该段落向量获取到匹配的答复消息,能够针对用户输入的文本消息给出符合用户需求的答复。
本申请实施提供的编码方法,提供了一种记忆编码模型,记忆编码模型包括输入层、第一记忆层和输出层,获取目标段落和预设数据库,将目标段落和预设数据库输入至记忆编码模型,输入层获取目标段落的原始向量集合和预设数据库的知识向量集合;第一记忆层根据原始向量集合和知识向量集合,获取原始向量集合的第一目标语句矩阵;输出层根据第一目标语句矩阵,获取目标段落的段落向量,基于段落向量进行处理。本申请实施例无需采用串行的方式对每个语句分别进行编码,而是以目标段落为单位,应用记忆编码模型对目标段落进行编码,因此提高了编码速度。并且,编码过程中不仅考虑目标段落本身,还考虑到预设数据库中的知识数据,从而使获取到的段落向量不仅能够表达目标段落的含义,还能够从外部知识数据中抽取相关的知识数据,提高了编码准确率。
本申请实施例提供的记忆编码模型具有自注意力性,将自注意力机制应用于段落的语句层面上,根据目标段落、上下文语句和预设数据库中的知识数据进行综合处理,可以保证目标段落的段落向量表达更为丰富,更能准确地描述目标段落的含义。且本申请实施例可以应用于多种场景下,应用范围广泛。
需要说明的是,本申请实施例仅是以记忆编码模型包括第二记忆层、第 二门控层、第一记忆层和第一门控层为例进行说明。在一种可能实现方式中,该记忆编码模型还可以采用其他的网络架构。
在一种可能实现方式中,该记忆编码模型包括输入层、第一记忆层和输出层。
输入层将原始向量集合和知识向量集合输入至第一记忆层中,第一记忆层根据原始向量集合和知识向量集合,获取第一目标语句矩阵,并将第一目标语句矩阵输入至输出层中。输出层根据第一目标语句矩阵,获取目标段落的段落向量。
在一种可能实现方式中,该记忆编码模型包括输入层、第一记忆层、第一门控层和输出层。
输入层将原始向量集合和知识向量集合输入至第一记忆层中,第一记忆层根据原始向量集合和知识向量集合,获取第一目标语句矩阵,并将第一目标语句矩阵输入至第一门控层中。第一门控层根据原始向量集合和第一目标语句矩阵,获取第二目标语句矩阵,并将第二目标语句矩阵输入至输出层中。输出层根据第二目标语句矩阵,获取目标段落的段落向量。
在一种可能实现方式中,该记忆编码模型包括输入层、第二记忆层、第一记忆层和输出层。
输入层将原始向量集合和记忆向量集合输入至第二记忆层中。第二记忆层根据输入的原始向量集合和记忆向量集合,获取第三目标语句矩阵,并将第三目标语句矩阵输入至第一记忆层中,且输入层将知识向量集合输入至第一记忆层。第一记忆层根据输入的第三目标语句矩阵和知识向量集合,获取第一目标语句矩阵,并将第一目标语句矩阵输入至输出层,输出层根据第一目标语句矩阵,获取该目标段落的段落向量。
在一种可能实现方式中,该记忆编码模型包括输入层、第二记忆层、第二门控层、第一记忆层和输出层。
输入层将原始向量集合和记忆向量集合输入至第二记忆层中,还将知识向量集合输入至第一记忆层中。第二记忆层根据输入的原始向量集合和记忆向量集合,获取第三目标语句矩阵,输入至第二门控层中。第二门控层对原始向量集合和第三目标语句矩阵进行加权求和,得到第四目标语句矩阵,将第四目标语句矩阵输入至第一记忆层中。第一记忆层根据第四目标语句矩阵 和知识向量集合,获取第一目标语句矩阵,将第一目标语句矩阵输入至输出层,输出层根据第一目标语句矩阵,获取该目标段落的段落向量。
在一种可能实现方式中,该记忆编码模型包括输入层、第一记忆层、第三记忆层和输出层。
输入层将原始向量集合和知识向量集合输入至第一记忆层中,将记忆向量集合输入至第三记忆层中。第一记忆层根据原始向量集合和知识向量集合,获取第一目标语句矩阵,输入至第三记忆层,第三记忆层根据记忆向量集合和第一目标语句矩阵,获取第五目标语句矩阵,将第五目标语句矩阵输入至输出层,输出层根据第五目标语句矩阵,获取该目标段落的段落向量。
在一种可能实现方式中,该记忆编码模型包括输入层、第一记忆层、第三记忆层、第三门控层和输出层。
如图16所示,输入层将原始向量集合和知识向量集合输入至第一记忆层中,将记忆向量集合输入至第三记忆层中。第一记忆层根据原始向量集合和知识向量集合,获取第一目标语句矩阵,输入至第三记忆层,第三记忆层根据记忆向量集合和第一目标语句矩阵,获取第五目标语句矩阵,将第五目标语句矩阵输入至第三门控层中,第三门控层对第五目标语句矩阵和第一目标语句矩阵进行加权求和,得到第六目标语句矩阵,将第六目标语句矩阵输入至输出层,输出层根据第六目标语句矩阵,获取该目标段落的段落向量。
本申请实施例提供了一种记忆编码模型的网络架构,应用该记忆编码模型可以对目标段落进行编码。且上述实施例提供的编码方法既可以应用于编码过程,也可以应用于训练记忆编码模型的过程。
也即是,在一种可能实现方式中,在训练记忆编码模型的过程中,获取初始化的记忆编码模型,或者获取已经进行过一次或多次训练、但其准确率还未满足要求的记忆编码模型。并且,获取一个或多个样本段落,作为目标段落。应用当前的记忆编码模型对目标段落进行处理,处理过程中执行上述实施例提供的编码方法,即可得到目标段落的段落向量。
之后,将目标段落的段落向量进行解码,得到与该段落向量对应的测试段落,根据目标段落与测试段落之间的误差,对记忆编码模型中的模型参数进行修正。其中,解码方式可以有多种,如可以采用解码算法对段落向量进 行解码,或者应用解码模型对段落向量进行解码,该解码模型可以为循环神经网络模型、深度学习网络模型、卷积神经网络模型等。
则采用上述方式进行一次或多次训练之后,即可确定记忆编码模型中的模型参数,得到准确率满足要求的记忆编码模型。
而在另一种可能实现方式中,记忆编码模型已经训练完成,其准确率满足要求。则获取该记忆编码模型,当要对某一目标段落进行编码时,应用该记忆编码模型对目标段落进行处理,处理过程中执行上述实施例提供的编码方法,即可得到目标段落的段落向量。其中,该记忆编码模型可以由编码设备训练,或者由训练设备训练后发送给编码设备,该训练设备也可以为终端或者服务器等。
图17是本申请实施例提供的一种编码装置的结构示意图。参见图17,该装置包括:获取模块1700、输入层模块1701、第一记忆层模块1702、输出层模块1703和处理模块1704;
获取模块1700,用于获取目标段落和预设数据库,将目标段落和预设数据库输入至记忆编码模型,目标段落包括至少一个语句;
输入层模块1701,用于获取目标段落的原始向量集合和预设数据库的知识向量集合,原始向量集合包括目标段落中每个语句的语句向量;知识向量集合包括预设数据库中多条知识数据的知识向量;
第一记忆层模块1702,用于根据原始向量集合和知识向量集合,获取原始向量集合的第一目标语句矩阵,第一目标语句矩阵用于根据原始向量集合与知识向量集合之间的关联关系,对目标段落进行描述;
输出层模块1703,用于根据第一目标语句矩阵,获取目标段落的段落向量;
处理模块1704,用于基于段落向量进行处理。
本申请实施提供的编码装置,获取模块获取目标段落和预设数据库,将目标段落和预设数据库输入至记忆编码模型,输入层模块获取目标段落的原始向量集合以及目标段落的上下文语句中的记忆向量集合,记忆层模块根据原始向量集合和记忆向量集合,获取原始向量集合的第一目标语句矩阵,输出层模块根据第一目标语句矩阵获取目标段落的段落向量,处理模块基于段 落向量进行处理。本申请实施例无需采用串行的方式对每个语句分别进行编码,而是以目标段落为单位,应用记忆编码模型对目标段落进行编码,因此提高了编码速度。并且,编码过程中不仅考虑目标段落本身,还考虑到预设数据库中的知识数据和目标段落的上下文语句,从而使获取到的段落向量不仅能够表达目标段落的含义,还能够从外部知识数据中抽取相关的知识数据,提高了编码准确率。
本申请实施例提供的记忆编码模型具有自注意力性,将自注意力机制应用于段落的语句层面上,根据目标段落和上下文语句进行综合处理,可以保证目标段落的段落向量表达更为丰富,更能准确地描述目标段落的含义。且本申请实施例可以应用于多种场景下,应用范围广泛。
在一种可能实现方式中,输入层模块1701包括:
原始获取单元,用于根据目标段落中每个语句中每个词语的词向量,应用语句编码模型,获取每个语句的语句向量,得到原始向量集合;
知识获取单元,用于根据预设数据库中每条知识数据的知识向量,获取知识向量集合。
在一种可能实现方式中,该装置还包括:
知识数据获取模块,用于获取预设数据库中的每条知识数据;
知识向量获取模块,用于对于每条知识数据,对知识数据进行词语划分,得到至少一个词语,获取至少一个词语的词向量,根据至少一个词语的词向量,获取知识数据的知识向量;
存储模块,用于将知识向量与知识数据对应存储于预设数据库中。
在一种可能实现方式中,第一记忆层模块1702包括:
知识矩阵获取单元,用于应用第一记忆模型,获取知识向量集合对应的第一知识矩阵和第二知识矩阵;
第一目标获取单元,用于根据原始向量集合、第一知识矩阵和第二知识矩阵,获取原始向量集合的第一目标语句矩阵。
在一种可能实现方式中,装置还包括第一门控层模块;
第一门控层模块,用于对原始向量集合与第一目标语句矩阵进行加权求和,得到第二目标语句矩阵,使第二目标语句矩阵中的每个数值属于预设数值范围;
输出层模块1703,用于根据第二目标语句矩阵,获取目标段落的段落向量。
在一种可能实现方式中,第一门控层模块包括:
第一权重获取单元,用于应用线性网络模型,获取原始向量集合对应的线性数值,采用预设函数对线性数值进行处理,得到原始向量集合的第一权重,以使第一权重属于预设数值范围;
第二权重获取单元,用于计算1与第一权重的差值,得到第一目标语句矩阵的第二权重;
加权单元,用于按照第一权重和第二权重,对原始向量集合与第一目标语句矩阵进行加权求和,得到第二目标语句矩阵。
在一种可能实现方式中,输出层模块1703,包括:
列向求和单元,用于对第一目标语句矩阵进行列向求和,得到段落向量。
在一种可能实现方式中,第一记忆层模块1702还用于将第一目标语句矩阵作为更新后的原始向量集合和知识向量集合,重复执行根据更新后的原始向量集合和知识向量集合获取目标语句矩阵的步骤,直至重复次数达到预设次数时,触发输出层模块1703;
输出层模块1703,还用于根据当前的目标语句矩阵,获取所述目标段落的段落向量。
在一种可能实现方式中,装置还包括第二记忆层模块;
输入层模块1701,还用于获取目标段落的记忆向量集合,记忆向量集合包括目标段落的上下文语句中每个词语的词向量;
第二记忆层模块,用于根据原始向量集合和记忆向量集合,获取原始向量集合的第三目标语句矩阵,第三目标语句矩阵用于根据原始向量集合与记忆向量集合之间的关联关系,对目标段落进行描述;
第一记忆层模块1702,还用于根据第三目标语句矩阵和知识向量集合,获取原始向量集合的第一目标语句矩阵。
在一种可能实现方式中,装置还包括第二门控层模块;
第二门控层模块,用于对原始向量集合与第三目标语句矩阵进行加权求和,得到第四目标语句矩阵,使第四目标语句矩阵中的每个数值属于预设数值范围;
第一记忆层模块,还用于根据第四目标语句矩阵和知识向量集合,获取原始向量集合的第一目标语句矩阵。
在一种可能实现方式中,装置还包括第三记忆层模块;
输入层模块1701,还用于获取目标段落的记忆向量集合,记忆向量集合包括目标段落的上下文语句中每个词语的词向量;
第三记忆层模块,用于根据第一目标语句矩阵和记忆向量集合,获取原始向量集合的第五目标语句矩阵,第五目标语句矩阵用于根据原始向量集合、知识向量集合与记忆向量集合之间的关联关系,对目标段落进行描述;
输出层模块1703,还用于根据第五目标语句矩阵,获取目标段落的段落向量。
在一种可能实现方式中,装置还包括第三门控层模块;
第三门控层模块,用于对第一目标语句矩阵和第五目标语句矩阵进行加权求和,得到第六目标语句矩阵,使第六目标语句矩阵中的每个数值属于预设数值范围;
输出层模块1703,还用于根据第六目标语句矩阵,获取目标段落的段落向量。
需要说明的是:上述实施例提供的编码装置在对段落进行编码时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将编码设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的编码装置与编码方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图18是本申请实施例提供的一种终端的结构框图。该终端1800用于执行上述实施例中编码设备执行的步骤,可以是便携式移动终端,比如:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端1800还可能被称为用户设备、便携式终端、膝上型终端、台式终端 等其他名称。
通常,终端1800包括有:处理器1801和存储器1802。
处理器1801可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1801可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1801也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1801可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1801还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1802可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器1802还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1802中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器1801所执行以实现本申请中方法实施例提供的编码方法。
在一些实施例中,终端1800还可选包括有:外围设备接口1803和至少一个外围设备。处理器1801、存储器1802和外围设备接口1803之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口1803相连。具体地,外围设备包括:射频电路1804、触摸显示屏1805、摄像头1806、音频电路1807、定位组件1808和电源1809中的至少一种。
外围设备接口1803可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器1801和存储器1802。在一些实施例中,处理器1801、存储器1802和外围设备接口1803被集成在同一芯片或电路板上;在一些其他实施例中,处理器1801、存储器1802和外围设备接口1803中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限 定。
射频电路1804用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路1804通过电磁信号与通信网络以及其他通信设备进行通信。射频电路1804将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路1804包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路1804可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(WirelessFidelity,无线保真)网络。在一些实施例中,射频电路1804还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏1805用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏1805是触摸显示屏时,显示屏1805还具有采集在显示屏1805的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器1801进行处理。此时,显示屏1805还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏1805可以为一个,设置终端1800的前面板;在另一些实施例中,显示屏1805可以为至少两个,分别设置在终端1800的不同表面或呈折叠设计;在再一些实施例中,显示屏1805可以是柔性显示屏,设置在终端1800的弯曲表面上或折叠面上。甚至,显示屏1805还可以设置成非矩形的不规则图形,也即异形屏。显示屏1805可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件1806用于采集图像或视频。可选地,摄像头组件1806包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄 功能。在一些实施例中,摄像头组件1806还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路1807可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器1801进行处理,或者输入至射频电路1804以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端1800的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器1801或射频电路1804的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路1807还可以包括耳机插孔。
定位组件1808用于定位终端1800的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件1808可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统或俄罗斯的格雷纳斯系统或欧盟的伽利略系统的定位组件。
电源1809用于为终端1800中的各个组件进行供电。电源1809可以是交流电、直流电、一次性电池或可充电电池。当电源1809包括可充电电池时,该可充电电池可以支持有线充电或无线充电。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端1800还包括有一个或多个传感器1810。该一个或多个传感器1810包括但不限于:加速度传感器1811、陀螺仪传感器1812、压力传感器1813、指纹传感器1814、光学传感器1815以及接近传感器1816。
加速度传感器1811可以检测以终端1800建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器1811可以用于检测重力加速度在三个坐标轴上的分量。处理器1801可以根据加速度传感器1811采集的重力加速度信号,控制触摸显示屏1805以横向视图或纵向视图进行用户界面的显示。加速度传感器1811还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器1812可以检测终端1800的机体方向及转动角度,陀螺仪传感器1812可以与加速度传感器1811协同采集用户对终端1800的3D动作。 处理器1801根据陀螺仪传感器1812采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器1813可以设置在终端1800的侧边框和/或触摸显示屏1805的下层。当压力传感器1813设置在终端1800的侧边框时,可以检测用户对终端1800的握持信号,由处理器1801根据压力传感器1813采集的握持信号进行左右手识别或快捷操作。当压力传感器1813设置在触摸显示屏1805的下层时,由处理器1801根据用户对触摸显示屏1805的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器1814用于采集用户的指纹,由处理器1801根据指纹传感器1814采集到的指纹识别用户的身份,或者,由指纹传感器1814根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器1801授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器1814可以被设置终端1800的正面、背面或侧面。当终端1800上设置有物理按键或厂商Logo时,指纹传感器1814可以与物理按键或厂商标志集成在一起。
光学传感器1815用于采集环境光强度。在一个实施例中,处理器1801可以根据光学传感器1815采集的环境光强度,控制触摸显示屏1805的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏1805的显示亮度;当环境光强度较低时,调低触摸显示屏1805的显示亮度。在另一个实施例中,处理器1801还可以根据光学传感器1815采集的环境光强度,动态调整摄像头组件1806的拍摄参数。
接近传感器1816,也称距离传感器,通常设置在终端1800的前面板。接近传感器1816用于采集用户与终端1800的正面之间的距离。在一个实施例中,当接近传感器1816检测到用户与终端1800的正面之间的距离逐渐变小时,由处理器1801控制触摸显示屏1805从亮屏状态切换为息屏状态;当接近传感器1816检测到用户与终端1800的正面之间的距离逐渐变大时,由处理器1801控制触摸显示屏1805从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图18中示出的结构并不构成对终端1800的 限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
图19是本申请实施例提供的一种服务器的结构示意图,该服务器1900可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)1901和一个或一个以上的存储器1902,其中,存储器1902中存储有至少一条指令,至少一条指令由处理器1901加载并执行以实现上述各个方法实施例提供的方法。当然,该服务器还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器还可以包括其他用于实现设备功能的部件,在此不做赘述。
服务器1900可以用于执行上述编码方法中编码设备所执行的步骤。
本申请实施例还提供了一种编码设备,该编码设备包括一个或多个处理器和一个或多个存储器,一个或多个存储器中存储有至少一条计算机可读指令、代码集或指令集,该计算机可读指令、该代码集或该指令集由该一个或多个处理器加载并执行以实现上述实施例的编码方法中所执行的操作。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有至少一条计算机可读指令、代码集或指令集,该计算机可读指令、该代码集或该指令集由一个或多个处理器加载并执行以实现上述实施例的编码方法中所执行的操作。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种编码方法,由编码设备执行,所述方法包括:
    获取目标段落和预设数据库,将所述目标段落和所述预设数据库输入至记忆编码模型,所述目标段落包括至少一个语句,所述记忆编码模型至少包括输入层、第一记忆层和输出层;
    在所述输入层中,获取所述目标段落的原始向量集合和所述预设数据库的知识向量集合,所述原始向量集合包括所述目标段落中每个语句的语句向量;所述知识向量集合包括所述预设数据库中多条知识数据的知识向量;
    在所述第一记忆层中,根据所述原始向量集合和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵,所述第一目标语句矩阵用于根据所述原始向量集合与所述知识向量集合之间的关联关系,对所述目标段落进行描述;
    在所述输出层中,根据所述第一目标语句矩阵,获取所述目标段落的段落向量;及
    基于所述段落向量进行处理。
  2. 根据权利要求1所述的方法,其特征在于,所述输入层包括语句编码模型,所述在所述输入层中,获取所述目标段落的原始向量集合和所述预设数据库的知识向量集合,包括:
    根据所述目标段落中每个语句中每个词语的词向量,应用所述语句编码模型,获取所述每个语句的语句向量,得到所述原始向量集合;及
    根据所述预设数据库中每条知识数据的知识向量,获取所述知识向量集合。
  3. 根据权利要求2所述的方法,其特征在于,所述语句编码模型包括第一语句编码子模型和第二语句编码子模型,所述根据所述目标段落中每个语句中每个词语的词向量,应用所述语句编码模型,获取所述每个语句的语句向量,包括:
    根据所述目标段落中每个语句中每个词语的词向量,得到多个词向量;
    应用所述第一语句编码子模型,对所述多个词向量进行正序编码,得到 第一向量;
    应用所述第二语句编码子模型,对所述多个词向量进行倒序编码,得到第二向量;及
    根据所述第一向量和所述第二向量,获取所述每个语句的语句向量。
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取所述预设数据库中的每条知识数据;及
    对于每条知识数据,对所述知识数据进行词语划分,得到至少一个词语,获取所述至少一个词语的词向量,根据所述至少一个词语的词向量,获取所述知识数据的知识向量,将所述知识向量与所述知识数据对应存储于所述预设数据库中。
  5. 根据权利要求1所述的方法,其特征在于,所述第一记忆层包括第一记忆模型,所述在所述第一记忆层中,根据所述原始向量集合和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵,包括:
    在所述第一记忆层中,应用所述第一记忆模型,获取所述知识向量集合对应的第一知识矩阵和第二知识矩阵;及
    根据所述原始向量集合、所述第一知识矩阵和所述第二知识矩阵,获取所述原始向量集合的第一目标语句矩阵。
  6. 根据权利要求1所述的方法,其特征在于,所述记忆编码模型还包括第一门控层,所述在所述输出层中,根据所述第一目标语句矩阵,获取所述目标段落的段落向量,包括:
    将所述第一目标语句矩阵输入至所述第一门控层,在所述第一门控层中,对所述原始向量集合与所述第一目标语句矩阵进行加权求和,得到第二目标语句矩阵,使所述第二目标语句矩阵中的每个数值属于预设数值范围;及
    在所述输出层中,根据所述第二目标语句矩阵,获取所述目标段落的段落向量。
  7. 根据权利要求6所述的方法,其特征在于,所述第一门控层包括线性 网络模型,所述在所述第一门控层中,对所述第一目标语句矩阵与所述原始向量集合进行加权求和,得到第二目标语句矩阵,包括:
    应用所述线性网络模型,获取所述原始向量集合对应的线性数值,采用预设函数对所述线性数值进行处理,得到所述原始向量集合的第一权重,以使所述第一权重属于所述预设数值范围;
    计算1与所述第一权重的差值,得到所述第一目标语句矩阵的第二权重;及
    按照所述第一权重和所述第二权重,对所述原始向量集合与所述第一目标语句矩阵进行加权求和,得到所述第二目标语句矩阵。
  8. 根据权利要求1所述的方法,其特征在于,所述在所述输出层中,根据所述第一目标语句矩阵,获取所述目标段落的段落向量,包括:
    对所述第一目标语句矩阵进行列向求和,得到所述段落向量。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述方法还包括:
    将所述第一目标语句矩阵作为更新后的原始向量集合和知识向量集合,在所述第一记忆层中,重复执行根据更新后的原始向量集合和知识向量集合获取目标语句矩阵的步骤,直至重复次数达到预设次数时,在所述输出层中,根据当前的目标语句矩阵,获取所述目标段落的段落向量。
  10. 根据权利要求1-8任一项所述的方法,其特征在于,所述记忆编码模型还包括位于所述第一记忆层之前的第二记忆层,所述方法还包括:
    在所述输入层中,获取目标段落的记忆向量集合,所述记忆向量集合包括所述目标段落的上下文语句中每个词语的词向量;
    将所述原始向量集合和所述记忆向量集合输入至所述第二记忆层,在所述第二记忆层中,根据所述原始向量集合和所述记忆向量集合,获取所述原始向量集合的第三目标语句矩阵,所述第三目标语句矩阵用于根据所述原始向量集合与所述记忆向量集合之间的关联关系,对所述目标段落进行描述;及
    所述在所述第一记忆层中,根据所述原始向量集合和所述知识向量集合, 获取所述原始向量集合的第一目标语句矩阵,包括:
    将所述第三目标语句矩阵输入至所述第一记忆层,在所述第一记忆层中,根据所述第三目标语句矩阵和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵。
  11. 根据权利要求10所述的方法,其特征在于,所述记忆编码模型还包括位于所述第一记忆层之前、所述第二记忆层之后的第二门控层,所述方法还包括:
    将所述第三目标语句矩阵输入至所述第二门控层,在所述第二门控层中,对所述原始向量集合与所述第三目标语句矩阵进行加权求和,得到第四目标语句矩阵,使所述第四目标语句矩阵中的每个数值属于预设数值范围;及
    所述在所述第一记忆层中,根据所述第三目标语句矩阵和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵,包括:将所述第四目标语句矩阵输入至所述第一记忆层,在所述第一记忆层中,根据所述第四目标语句矩阵和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵。
  12. 根据权利要求1-8任一项所述的方法,其特征在于,所述记忆编码模型还包括位于所述第一记忆层之后的第三记忆层,所述方法还包括:
    在所述输入层中,获取目标段落的记忆向量集合,所述记忆向量集合包括所述目标段落的上下文语句中每个词语的词向量;
    所述在所述输出层中,根据所述第一目标语句矩阵,获取所述目标段落的段落向量,包括:
    将所述记忆向量集合输入至所述第三记忆层,在所述第三记忆层中,根据所述第一目标语句矩阵和所述记忆向量集合,获取所述原始向量集合的第五目标语句矩阵,所述第五目标语句矩阵用于根据所述原始向量集合、所述知识向量集合与所述记忆向量集合之间的关联关系,对所述目标段落进行描述;及
    在所述输出层中,根据所述第五目标语句矩阵,获取所述目标段落的段落向量。
  13. 根据权利要求12所述的方法,其特征在于,所述记忆编码模型还包括位于所述第三记忆层之后的第三门控层,所述在所述输出层中,根据所述第五目标语句矩阵,获取所述目标段落的段落向量,包括:
    将所述第五目标语句矩阵输入至所述第三门控层,在所述第三门控层中,对所述第一目标语句矩阵和所述第五目标语句矩阵进行加权求和,得到第六目标语句矩阵,使所述第六目标语句矩阵中的每个数值属于预设数值范围;及
    在所述输出层中,根据所述第六目标语句矩阵,获取所述目标段落的段落向量。
  14. 一种编码装置,所述装置包括:
    获取模块,用于获取目标段落和预设数据库,将所述目标段落和所述预设数据库输入至记忆编码模型,所述目标段落包括至少一个语句;
    输入层模块,用于获取所述目标段落的原始向量集合和所述预设数据库的知识向量集合,所述原始向量集合包括所述目标段落中每个语句的语句向量;所述知识向量集合包括所述预设数据库中多条知识数据的知识向量;
    第一记忆层模块,用于根据所述原始向量集合和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵,所述第一目标语句矩阵用于根据所述原始向量集合与所述知识向量集合之间的关联关系,对所述目标段落进行描述;
    输出层模块,用于根据所述第一目标语句矩阵,获取所述目标段落的段落向量;及
    处理模块,用于基于所述段落向量进行处理。
  15. 根据权利要求14所述的装置,其特征在于,所述输入层模块包括:
    原始获取单元,用于根据目标段落中每个语句中每个词语的词向量,应用语句编码模型,获取每个语句的语句向量,得到原始向量集合;及
    知识获取单元,用于根据预设数据库中每条知识数据的知识向量,获取知识向量集合。
  16. 根据权利要求14所述的装置,其特征在于,所述装置还包括:
    第一门控层模块,用于对所述原始向量集合与所述第一目标语句矩阵进行加权求和,得到第二目标语句矩阵,使所述第二目标语句矩阵中的每个数值属于预设数值范围;及
    所述输出层模块,还用于根据所述第二目标语句矩阵,获取所述目标段落的段落向量。
  17. 根据权利要求14-16任一项所述的装置,其特征在于,所述装置还包括第二记忆层模块;
    所述输入层模块,还用于获取目标段落的记忆向量集合,所述记忆向量集合包括所述目标段落的上下文语句中每个词语的词向量;
    所述第二记忆层模块,用于根据所述原始向量集合和所述记忆向量集合,获取所述原始向量集合的第三目标语句矩阵,所述第三目标语句矩阵用于根据所述原始向量集合与所述记忆向量集合之间的关联关系,对所述目标段落进行描述;及
    所述第一记忆层模块,还用于根据所述第三目标语句矩阵和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵。
  18. 根据权利要求17所述的装置,其特征在于,所述装置还包括:
    第二门控层模块,用于对所述原始向量集合与所述第三目标语句矩阵进行加权求和,得到第四目标语句矩阵,使所述第四目标语句矩阵中的每个数值属于预设数值范围;及
    所述第一记忆层模块,还用于根据所述第四目标语句矩阵和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵。
  19. 一种编码设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:
    获取目标段落和预设数据库,将所述目标段落和所述预设数据库输入至记忆编码模型,所述目标段落包括至少一个语句,所述记忆编码模型至少包 括输入层、第一记忆层和输出层;
    在所述输入层中,获取所述目标段落的原始向量集合和所述预设数据库的知识向量集合,所述原始向量集合包括所述目标段落中每个语句的语句向量;所述知识向量集合包括所述预设数据库中多条知识数据的知识向量;
    在所述第一记忆层中,根据所述原始向量集合和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵,所述第一目标语句矩阵用于根据所述原始向量集合与所述知识向量集合之间的关联关系,对所述目标段落进行描述;
    在所述输出层中,根据所述第一目标语句矩阵,获取所述目标段落的段落向量;及
    基于所述段落向量进行处理。
  20. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
    获取目标段落和预设数据库,将所述目标段落和所述预设数据库输入至记忆编码模型,所述目标段落包括至少一个语句,所述记忆编码模型至少包括输入层、第一记忆层和输出层;
    在所述输入层中,获取所述目标段落的原始向量集合和所述预设数据库的知识向量集合,所述原始向量集合包括所述目标段落中每个语句的语句向量;所述知识向量集合包括所述预设数据库中多条知识数据的知识向量;
    在所述第一记忆层中,根据所述原始向量集合和所述知识向量集合,获取所述原始向量集合的第一目标语句矩阵,所述第一目标语句矩阵用于根据所述原始向量集合与所述知识向量集合之间的关联关系,对所述目标段落进行描述;
    在所述输出层中,根据所述第一目标语句矩阵,获取所述目标段落的段落向量;及
    基于所述段落向量进行处理。
PCT/CN2020/073360 2019-01-24 2020-01-21 编码方法、装置、设备及存储介质 WO2020151685A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021517331A JP7324838B2 (ja) 2019-01-24 2020-01-21 符号化方法並びにその、装置、機器及びコンピュータプログラム
US17/356,482 US11995406B2 (en) 2019-01-24 2021-06-23 Encoding method, apparatus, and device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910069751.1 2019-01-24
CN201910069751.1A CN110147532B (zh) 2019-01-24 2019-01-24 编码方法、装置、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/356,482 Continuation US11995406B2 (en) 2019-01-24 2021-06-23 Encoding method, apparatus, and device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020151685A1 true WO2020151685A1 (zh) 2020-07-30

Family

ID=67588574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/073360 WO2020151685A1 (zh) 2019-01-24 2020-01-21 编码方法、装置、设备及存储介质

Country Status (4)

Country Link
US (1) US11995406B2 (zh)
JP (1) JP7324838B2 (zh)
CN (1) CN110147532B (zh)
WO (1) WO2020151685A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410212A (zh) * 2022-11-02 2022-11-29 平安科技(深圳)有限公司 多模态模型的训练方法、装置、计算机设备及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7180132B2 (ja) * 2018-06-12 2022-11-30 富士通株式会社 処理プログラム、処理方法および情報処理装置
CN110147533B (zh) * 2019-01-24 2023-08-29 腾讯科技(深圳)有限公司 编码方法、装置、设备及存储介质
CN110147532B (zh) 2019-01-24 2023-08-25 腾讯科技(深圳)有限公司 编码方法、装置、设备及存储介质
CN114297338B (zh) * 2021-12-02 2024-05-14 腾讯科技(深圳)有限公司 文本匹配方法、装置、存储介质和程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180150743A1 (en) * 2016-11-28 2018-05-31 Xerox Corporation Long-term memory networks for knowledge extraction from text and publications
CN108763567A (zh) * 2018-06-05 2018-11-06 北京玄科技有限公司 应用于智能机器人交互的知识推理方法及装置
CN110032633A (zh) * 2019-04-17 2019-07-19 腾讯科技(深圳)有限公司 多轮对话处理方法、装置和设备
CN110147532A (zh) * 2019-01-24 2019-08-20 腾讯科技(深圳)有限公司 编码方法、装置、设备及存储介质
CN110263324A (zh) * 2019-05-16 2019-09-20 华为技术有限公司 文本处理方法、模型训练方法和装置

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002169834A (ja) * 2000-11-20 2002-06-14 Hewlett Packard Co <Hp> 文書のベクトル解析を行うコンピュータおよび方法
US9336192B1 (en) * 2012-11-28 2016-05-10 Lexalytics, Inc. Methods for analyzing text
US10331782B2 (en) * 2014-11-19 2019-06-25 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for automatic identification of potential material facts in documents
US10318883B2 (en) 2015-03-26 2019-06-11 International Business Machines Corporation Identifying optimum times at which to retrain a logistic regression model
US20160350653A1 (en) * 2015-06-01 2016-12-01 Salesforce.Com, Inc. Dynamic Memory Network
US10664744B2 (en) * 2015-10-13 2020-05-26 Facebook, Inc. End-to-end memory networks
US10332508B1 (en) * 2016-03-31 2019-06-25 Amazon Technologies, Inc. Confidence checking for speech processing and query answering
CN106126596B (zh) * 2016-06-20 2019-08-23 中国科学院自动化研究所 一种基于层次化记忆网络的问答方法
JP7106077B2 (ja) * 2016-09-22 2022-07-26 エヌフェレンス,インコーポレイテッド 意味的情報の可視化およびライフサイエンスエンティティ間の顕著な関連を示す時間的信号の推測のためのシステム、方法、およびコンピュータ可読媒体
WO2018060450A1 (en) * 2016-09-29 2018-04-05 Koninklijke Philips N.V. Question generation
KR20180077691A (ko) * 2016-12-29 2018-07-09 주식회사 엔씨소프트 문장 추상화 장치 및 방법
KR101882906B1 (ko) * 2017-01-17 2018-07-27 경북대학교 산학협력단 복수 문단 텍스트의 추상적 요약문 생성 장치 및 방법, 그 방법을 수행하기 위한 기록 매체
US11729120B2 (en) 2017-03-16 2023-08-15 Microsoft Technology Licensing, Llc Generating responses in automated chatting
US10817650B2 (en) * 2017-05-19 2020-10-27 Salesforce.Com, Inc. Natural language processing using context specific word vectors
US10380259B2 (en) * 2017-05-22 2019-08-13 International Business Machines Corporation Deep embedding for natural language content based on semantic dependencies
CN107273487A (zh) * 2017-06-13 2017-10-20 北京百度网讯科技有限公司 基于人工智能的聊天数据的生成方法、装置及计算机设备
CN108304911B (zh) * 2018-01-09 2020-03-13 中国科学院自动化研究所 基于记忆神经网络的知识抽取方法以及系统和设备
CN108491386A (zh) * 2018-03-19 2018-09-04 上海携程国际旅行社有限公司 自然语言理解方法及系统
CN108536679B (zh) * 2018-04-13 2022-05-20 腾讯科技(成都)有限公司 命名实体识别方法、装置、设备及计算机可读存储介质
CN109086408B (zh) * 2018-08-02 2022-10-28 腾讯科技(深圳)有限公司 文本生成方法、装置、电子设备及计算机可读介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180150743A1 (en) * 2016-11-28 2018-05-31 Xerox Corporation Long-term memory networks for knowledge extraction from text and publications
CN108763567A (zh) * 2018-06-05 2018-11-06 北京玄科技有限公司 应用于智能机器人交互的知识推理方法及装置
CN110147532A (zh) * 2019-01-24 2019-08-20 腾讯科技(深圳)有限公司 编码方法、装置、设备及存储介质
CN110032633A (zh) * 2019-04-17 2019-07-19 腾讯科技(深圳)有限公司 多轮对话处理方法、装置和设备
CN110263324A (zh) * 2019-05-16 2019-09-20 华为技术有限公司 文本处理方法、模型训练方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEIWEI WANG: "Research on Question Answering Over Large Scale Knowledge Base with Hierarchical Memory Network", CHINA MASTER'S THESES FULL-TEXT DATABASE , INFORMATION & TECHNOLOGY, 15 January 2019 (2019-01-15), pages 1 - 78, XP055723745, ISSN: 1674-0246 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410212A (zh) * 2022-11-02 2022-11-29 平安科技(深圳)有限公司 多模态模型的训练方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
JP2022517463A (ja) 2022-03-09
US11995406B2 (en) 2024-05-28
CN110147532B (zh) 2023-08-25
JP7324838B2 (ja) 2023-08-10
CN110147532A (zh) 2019-08-20
US20210319167A1 (en) 2021-10-14

Similar Documents

Publication Publication Date Title
WO2020228519A1 (zh) 字符识别方法、装置、计算机设备以及存储介质
WO2020151685A1 (zh) 编码方法、装置、设备及存储介质
CN110110145B (zh) 描述文本生成方法及装置
CN110147533B (zh) 编码方法、装置、设备及存储介质
CN110544272B (zh) 脸部跟踪方法、装置、计算机设备及存储介质
CN110750992B (zh) 命名实体识别方法、装置、电子设备及介质
CN110162604B (zh) 语句生成方法、装置、设备及存储介质
CN108304506B (zh) 检索方法、装置及设备
CN110209784B (zh) 消息交互方法、计算机设备及存储介质
CN110263131B (zh) 回复信息生成方法、装置及存储介质
JP7431977B2 (ja) 対話モデルの訓練方法、装置、コンピュータ機器及びプログラム
CN111324699A (zh) 语义匹配的方法、装置、电子设备及存储介质
CN111581958A (zh) 对话状态确定方法、装置、计算机设备及存储介质
CN113516143A (zh) 文本图像匹配方法、装置、计算机设备及存储介质
CN110555102A (zh) 媒体标题识别方法、装置及存储介质
CN110991445B (zh) 竖排文字识别方法、装置、设备及介质
CN113836946B (zh) 训练评分模型的方法、装置、终端及存储介质
CN113763931B (zh) 波形特征提取方法、装置、计算机设备及存储介质
CN110990549B (zh) 获取答案的方法、装置、电子设备及存储介质
CN110232417B (zh) 图像识别方法、装置、计算机设备及计算机可读存储介质
CN110837557A (zh) 摘要生成方法、装置、设备及介质
CN112988984B (zh) 特征获取方法、装置、计算机设备及存储介质
CN111310701B (zh) 手势识别方法、装置、设备及存储介质
CN110096707B (zh) 生成自然语言的方法、装置、设备及可读存储介质
CN112487162A (zh) 确定文本语义信息的方法、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20744673

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021517331

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20744673

Country of ref document: EP

Kind code of ref document: A1