CN111324722B - Method and system for training word weight model - Google Patents

Method and system for training word weight model Download PDF

Info

Publication number
CN111324722B
CN111324722B CN202010409812.7A CN202010409812A CN111324722B CN 111324722 B CN111324722 B CN 111324722B CN 202010409812 A CN202010409812 A CN 202010409812A CN 111324722 B CN111324722 B CN 111324722B
Authority
CN
China
Prior art keywords
text
texts
words
word
mark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010409812.7A
Other languages
Chinese (zh)
Other versions
CN111324722A (en
Inventor
陈晓军
崔恒斌
陈显玲
杨明晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010409812.7A priority Critical patent/CN111324722B/en
Publication of CN111324722A publication Critical patent/CN111324722A/en
Application granted granted Critical
Publication of CN111324722B publication Critical patent/CN111324722B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The embodiment of the specification provides a method and a system for training a word weight model. The method comprises the following steps: acquiring a plurality of text pairs; judging whether two texts in the text pair are matched or not to obtain a matching result; determining importance identifications of words of the texts in the text pairs based on the matching result to obtain a plurality of text pairs containing the importance identifications; training a word weight model based on a plurality of training data derived from text in a plurality of text pairs containing the importance identifications.

Description

Method and system for training word weight model
Technical Field
The present disclosure relates to the field of machine learning technologies, and in particular, to a method and a system for training a word weight model.
Background
In the intelligent interaction, the question-answer communication between the intelligent customer service and the user is usually realized by adopting a mode of configuring a knowledge base. The intelligent interactive system may match answers to the user's questions for the intelligent client in a question-and-answer knowledge base based on text semantics and word weights. Word weights may improve the accuracy of answer matching.
Therefore, it is desirable to provide a method and system for training a word weight model, which improves the efficiency of determining word weights.
Disclosure of Invention
One aspect of the present specification provides a method of training a word weight model. The method comprises the following steps: acquiring a plurality of text pairs; judging whether two texts in the text pair are matched or not to obtain a matching result; determining importance identifications of words of the texts in the text pairs based on the matching result to obtain a plurality of text pairs containing the importance identifications; training a word weight model based on a plurality of training data derived from text in a plurality of text pairs containing the importance identifications.
In some embodiments, the determining an importance indicator of a word of text in the text pair based on the matching result, resulting in a plurality of text pairs containing the importance indicator, includes: judging whether the two texts of the text pair are matched, and performing one or more of the following processes: processing one: if the two texts do not match, making a first mark for different words in the two texts; and (5) processing: if the two texts do not match, making a second mark on the same words in the two texts; and (3) treatment III: if the two texts are matched, making a first mark for the same word in the two texts; and (4) treatment: if the two texts are matched, making a second mark for different words in the two texts; the first mark and the second mark are the importance marks, and the importance of the first mark is higher than that of the second mark.
In some embodiments, the method further comprises: the word weight model comprises a vectorization model and a weight submodel; the vectorization model carries out vector representation on words in the input text, and the generated vector comprises information of the words and context information of the words in the text; the weight submodel generates a weight predictor based on the vector.
In some embodiments, the method further comprises: acquiring a text to be retrieved; determining weights for words in the retrieved text based on the word weight model; determining a retrieval keyword of the retrieved text based on the weight of the words in the retrieved text; determining at least one retrieval result based on the retrieval key word.
In some embodiments, the method further comprises: acquiring a first text and a second text; determining weights for words in the first text based on the word weight model, and weights for words in the second text based on the word weight model; calculating a vector distance of the first text and the second text based on weights of words in the first text and the second text; determining whether the first text and the second text match based on the vector distance.
Another aspect of the specification provides a system for training a word weight model. The system comprises: the acquisition module is used for acquiring a plurality of text pairs; the judging module is used for judging whether the two texts in the text pair are matched or not to obtain a matching result; the marking module is used for determining the importance identifications of the words of the texts in the text pairs based on the matching result to obtain a plurality of text pairs containing the importance identifications; and the training module is used for training the word weight model based on a plurality of training data, and the training data is derived from texts in a plurality of text pairs containing the importance identifications.
Another aspect of the present specification provides an apparatus for training a word weight model, comprising a processor for performing the method as set forth above.
Another aspect of the present specification provides a computer-readable storage medium storing computer instructions which, when read by a computer, cause the computer to perform the method as described above.
Drawings
The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is a schematic diagram of an application scenario of a training word weight model system in accordance with some embodiments of the present description;
FIG. 2 is a schematic diagram of a method of training a word weight model, shown in accordance with some embodiments of the present description;
FIG. 3 is a schematic diagram of a text match determination method according to some embodiments of the present description; and
FIG. 4 is a schematic diagram of a training word weight model, shown in accordance with some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
Although various references are made herein to certain modules or units in a system according to embodiments of the present description, any number of different modules or units may be used and run on the client and/or server. The modules are merely illustrative and different aspects of the systems and methods may use different modules.
Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
FIG. 1 is a schematic diagram of an application scenario of a training word weight model system according to some embodiments of the present description.
The training word weight model system 100 may be an online platform that may include a server 110, a network 120, a user terminal 130, a database 140, and other data sources 150.
The server 110 may be used to manage resources and process data and/or information from at least one component of the present system or an external data source (e.g., a cloud data center). In some embodiments, the server 110 may be a single server or a server farm. The server farm can be centralized or distributed (e.g., server 110 can be a distributed system). In some embodiments, the server 110 may be local or remote. In some embodiments, the server 110 may be implemented on a cloud platform or provided in a virtual manner. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, across clouds, multiple clouds, the like, or any combination of the above. In some embodiments, server 110 may be implemented on a computing device, which may include one or more components.
In some embodiments, the server 110 may include a processing device 112. The processing device 112 may process information and/or data related to training the word weight model to perform one or more of the functions described in this specification. For example, processing device 112 may determine weights for words in text based on text pair data obtained from user terminal 130. In some embodiments, the processing device 112 may include one or more processors (e.g., a single wafer processor or a multi-wafer processor). By way of example only, the processing device 112 may include one or more hardware processors such as a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an application specific instruction set processor (ASIP), an image processing unit (GPU), a physical arithmetic processing unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller unit, a Reduced Instruction Set Computer (RISC), a microprocessor, or the like, or any combination of the above.
Network 120 may connect the various components of system 100 and/or connect system 100 with external resource components. Network 120 enables communication between the various components and with other components outside of system 100 to facilitate the exchange of data and/or information. In some embodiments, the network 120 may be any one of, or a combination of, a wired network or a wireless network. Merely by way of example, network 120 may include a cable network, a wired network, a fiber optic network, a remote communication network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a ZigBee network, a Near Field Communication (NFC) network, the like, or any combination of the above. In some embodiments, network 120 may include one or more network switching points. For example, the network 120 may include wired or wireless network switching points, such as base stations and/or Internet switching points 120-1, 120-2, … …, through which one or more components of the training word weight model system 100 may connect to the network 120 to exchange data and/or information.
User terminal 130 may be used to input text and/or receive text output. In some embodiments, the user may be a user of the subscriber terminal 130. For example, the user may input query text using the user terminal 130. As another example, the user may receive reply text associated with their query via user terminal 130. In some embodiments, the user terminal 130 may include a mobile device 130-1, a tablet 130-2, a laptop 130-3, the like, or any combination of the above.
Database 140 may be used to store data and/or instructions. In some embodiments, database 140 may be implemented in a single central server, multiple servers connected by communication links, or multiple personal devices. In some embodiments, database 140 may include mass storage, removable storage, volatile read-write memory (e.g., random access memory RAM), read-only memory (ROM), the like, or any combination of the above. Exemplary mass storage devices may include magnetic disks, optical disks, solid state disks, and the like. In some embodiments, database 140 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, across clouds, multiple clouds, the like, or any combination of the above.
Other data sources 150 may be used to provide one or more sources of other information to system 100. In some embodiments, the other data sources 150 may include one or more devices, one or more application program interfaces, one or more database query interfaces, one or more protocol-based information acquisition interfaces, or other manners in which information may be acquired, or the like, or combinations of two or more of the foregoing. In some embodiments, the information provided by the data source may include information that already exists at the time the information is extracted, that is temporarily generated at the time the information is extracted, or a combination thereof. In some embodiments, other data sources 150 may be used to provide text-to-data, etc. to system 100.
In some embodiments, database 140 may be included in server 110, user terminal 130, and possibly other system components. In some embodiments, processing device 112 may be included in server 110, user terminal 130, and possibly other system components.
In some embodiments, server 110 may communicate with other components of training word weight model system 100 (e.g., user terminal 130, database 140, and/or other data sources 150, etc.) via network 120 to obtain information and/or data therein. For example, server 110 may obtain text pair data stored in database 140 via network 120. In some embodiments, database 140 may be connected with network 120 to communicate with one or more components (e.g., server 110, user terminal 130, etc.) in training word weight model system 100. In some embodiments, one or more components in the training word weight model system 100 may access data or instructions stored in the database 140 and/or other data sources 150 via the network 120. In some embodiments, the database 140 and/or other data sources 150 may be directly connected to or in communication with one or more components (e.g., server 110, user terminal 130) in the training word weight model system 100. In some embodiments, database 140 may be part of server 110. In some embodiments, one or more components of training data generation system 100 (e.g., server 110, user terminal 130, etc.) may possess permission to access database 140.
The training word weight model system 100 may generate training data for training the word weight model by implementing the methods and/or processes disclosed in this specification. In some embodiments, the system 100 may obtain training data for training the word weight model by analyzing word token importance identifications in the text based on a plurality of text pairs obtained from the user terminal 130, or the database 140, or other data source 150, by analyzing matches between the texts.
FIG. 2 is a schematic diagram of a method of training a word weight model, shown in accordance with some embodiments of the present description.
As shown in FIG. 2, a training word weight model method 200 may be implemented at the processing device 112. The processing device 112 may label words in the text by matching between the text pairs based on the obtained plurality of text pairs to obtain training data for training the word weight model.
Step 210, a plurality of text pairs are obtained. In particular, step 210 may be performed by the acquisition module.
Each text pair may consist of at least two texts. In some embodiments, the text may be a string consisting of ordered sequences of characters. For example, text may include Chinese characters, letters, symbols, numbers, and other words. In some embodiments, the text pairs may be used to train a natural language understanding model. For example, text pairs may be used for natural language understanding model training in smart question-answering scenarios, retrieval scenarios, and the like.
In some embodiments, the processing device may retrieve a plurality of text pairs from a database. For example, the database may include a background database of a cell phone application, a network development database, a platform database, and the like. In some embodiments, the processing device may retrieve a plurality of text pairs from the smart customer service database. For example: the processing device may select any two texts from the question bank of the question and answer robot as a text pair, may select any two texts from the answer bank of the question and answer robot as a text pair, and may select one text from the user history question database and the corresponding question bank of the question and answer robot as a text pair. In some embodiments, the processing device may obtain the text pairs in any other feasible manner, which is not limited by this specification.
And step 220, judging whether the two texts in the text pair are matched or not to obtain a matching result. In particular, step 220 may be performed by the determination module.
The text matching result may reflect semantic relevance between the two texts. For example, the two matched texts may be a user question and a corresponding customer service answer in a question and answer scenario, or a user question and a corresponding preset question in an intelligent question and answer system.
In some embodiments, the processing device may calculate whether there is a match between two texts in a pair of texts through a matching neural network. For example, the matching Neural network may include, but is not limited to, a Recurrent Neural Network (RNN), a Convolutional Neural Network (CNN), a bert (bidirectional Encoder retrieval from transformations) network, and the like.
In some embodiments, the processing device may determine whether the two texts match based on matching labels of the texts. In some embodiments, the matching labels may include a combination of one or more of numeric labels, highlighting, symbolic labels, and the like. For example, text one and text two in the text pair may carry the same or different labels, indicating a match or mismatch between text one and text two. The matching labels may be obtained in any reasonably conventional manner, for example, the matching labels may be obtained by manually or machine labeling of text, or based on user feedback, which is not limited in this specification. In some embodiments, the processing device may determine whether there is a match between the two texts in other manners, which is not limited in this specification.
In some embodiments, the match result may include a match between two texts, or a mismatch between two texts. In some embodiments, the matching results may be represented by words, numbers, symbols, and the like. For example, a match between two texts may be represented by the word "yes", or the number "1", or the symbol "√" or the like, and a mismatch between two texts may be represented by the word "no", or the number "0", or the symbol "×" or the like, respectively.
And step 230, determining the importance identifications of the words of the texts in the text pairs based on the matching result, and obtaining a plurality of text pairs containing the importance identifications. In particular, step 230 may be performed by the marking module. The importance identification may reflect the degree of importance of a word in the text relative to the semantic representation of the text. For example, for the text "when this good medical insurance takes effect", based on the semantic meaning of the text, the key words of the text may be "good medical insurance" and "when the good medical insurance takes effect", and then the text may be marked with "good medical insurance" or "when the good medical insurance takes effect" as the importance identifier of the text.
In some embodiments, the significance designations may include one or more combinations of highlighting, annotations, special symbols, and the like. In some embodiments, the significance representation may include a significant identification and a non-significant identification. For example, important words in the text may be tagged with an important identification to indicate that the word is important, and/or unimportant words in the text may be tagged with an unimportant identification to indicate that the word is unimportant. In some embodiments, the processing device may adopt different importance identification modes based on the difference of the matching results between texts. Specifically, the method comprises the following steps:
when the two texts do not match, the important identifications are marked on different words of the two texts.
When the two texts in the text pair do not match, the processing device may mark the difference portion in the two texts as important and the same portion as unimportant based on the matching result. For more details, reference may be made to fig. 3 and its related description, which are not repeated herein.
When two texts are matched, the same words of the two texts are marked with important identifications.
When two texts in a text pair match, the processing device may mark the same part of the two texts as important and the different part as unimportant based on the matching result. For more details about the mismatch between the two texts, reference may be made to fig. 3 and the related description thereof, which are not described herein again.
The processing device may derive a plurality of text pairs containing an importance identification based on the importance identifications of the text pairs.
At step 240, a word weight model is trained based on a plurality of training data. In particular, step 240 may be performed by a training module.
The processing device may generate weight identification values for the training word weight model based on the importance identification of the text to train the word weight model based on the training data. In some embodiments, the training data is derived from text in a plurality of text pairs containing an identification of significance. In some embodiments, the processing device may input training data into the initial weight model for learning to obtain a trained word weight model. In some embodiments, the term weight model may include an LSTM model, a CNN model, a BERT model, and the like.
In some embodiments, the input to the word weight model may be text and the output may be a word weight prediction value for the text. For more details, reference may be made to other parts of this specification (e.g., fig. 4 and its associated description), which are not repeated herein.
Fig. 3 is a schematic diagram of a text matching determination method according to some embodiments of the present description.
As shown in FIG. 3, the text match determination method 300 may be implemented at the processing device 112 (e.g., a tagging module). The processing device may perform one or more of process one, process two, process three, and process four on the text based on the two text matching results, which will be described in detail below with reference to fig. 3.
Step 310, determine whether the two texts of the text pair match.
In some embodiments, the processing device may determine whether the two texts match through a matching neural network. In some embodiments, the processing device may determine whether the two texts match based on matching labels of the texts. Specifically, refer to fig. 2 and the related description thereof, which are not repeated herein.
If the two texts do not match, process one 323 and/or process two 325 may be performed, step 320.
When the two texts do not match, the processing device may perform the processing of 323: making different words in the two texts into a first mark; and/or process two 325: a second label is made to the same word in both texts. The first mark and the second mark are importance marks, and the importance of the first mark is higher than that of the second mark. For example, if the matching results of the text one "when this good medical insurance takes effect" and the text two "when this accident risk takes effect" are not matched, the processing device may mark the text one and the text two with different words "good medical insurance" and "accident risk", respectively, and mark the text one and the text two with the same words "this" and "when this takes effect", respectively, with second marks. Wherein the first token indicates that the corresponding word is relatively more important to the semantic representation of the text and the second token indicates that the corresponding word is less important to the semantic representation of the related text.
In some embodiments, the first indicia and/or the second indicia may be numerical values. In some embodiments, the numerical values may include 0, 0.25, 0.75, 1, and the like. For example, the processing device may mark different words in the first text and the second text as a number 1, the same word as a number 0 or the same word as an unmarked word, wherein the number 1 is more important than the number 0. In some embodiments, the tag value is associated with the number of words in the two texts that are not identical. For example, if only one different word is contained in the first text and the second text, the processing device may mark the different word as 1, if two different words are contained in the first text and the second text, the processing device may mark the two different words as 0.75, respectively, and so on. The relevance of the marking numerical value to the number of different words in the text can be in any reasonable way, and the description does not limit the relevance.
In a question and answer scene or a retrieval scene, the system focuses more on semantic expression of texts, and for two unmatched texts, the expressed semantics are different, so that different parts in the texts are more important. For example, for two unmatched texts, the text one "when this good medical insurance takes effect" and the text two "when this accident risk takes effect" are both questions of the effective time, the matching is more concerned with the body taking effect when responding to the customer service, the bodies of the text one and the text two are different words "good medical insurance" and "accident risk" in the two texts, that is, the different words have higher importance relative to the text, so that a first mark can be made for the different words "good medical insurance" and "accident risk" in the two texts, and a second mark can be made for the same word "this" and "when this takes effect" in the two texts.
If the two texts match, process three 333 and/or process four 335 may be performed, step 330.
When the two texts match, the processing device may perform processing three 333: making a first mark on the same word in the two texts; and/or processing four 335: a second label is made for a different word in the two texts. For example, the matching results of the text three "where to go today noon" and the text four "where to drive today noon" are matches, the processing device may first label the same words "where to go" and "where to go" in the text three and the text four, respectively, and second label the different words "drive" of the text three and the text four. Wherein the first token indicates that the corresponding word is relatively more important to the semantic representation of the text and the second token indicates that the corresponding word is less important to the semantic representation of the related text.
In a question and answer scene or a retrieval scene, the semantic expression of the text is more concerned by the system, and for two matched texts, the expression semantics are the same, so that the same part in the text is more important. For example, for two matched texts, namely three text "payment failure for credit card" and four text "why payment failure for credit card", the three text and the four text have the same semantics, and are both consulted for payment for credit card, the matching of the corresponding customer service response focuses more on the same word "payment failure for credit card" in the three text and the four text, namely, the same word has higher importance relative to the text, so that a first mark can be made on the same word "payment failure for credit card" in the two texts, and a second mark can be made on the different word "why" in the two texts.
In some embodiments, the processing device may perform one or more of process one, process two, process three, and process four on the text based on a result of a match between different texts in the plurality of text pairs. For example, the processing device may make the first mark and/or the second mark for the first text and the second text based on the matching result of the first text and the second text, and may also make the first mark and/or the second mark for the first text (or the second text) and the third text (or the fourth text) based on the matching result of the first text (or the second text) and the third text (or the fourth text). Namely, the first text and the second text can form a text pair to obtain an importance identifier, and the first text and the third text can form a text pair to obtain another importance identifier.
In some alternative embodiments, the importance of the text may be identified in other possible ways, which are not limited by the present description.
It should be noted that the above descriptions of the methods 200 and 300 are for illustration and explanation only and do not limit the application scope. Various modifications and alterations to methods 200 and 300 will be apparent to those skilled in the art in light of the present disclosure. However, such modifications and variations are intended to be within the scope of the present application.
FIG. 4 is a schematic diagram of a training word weight model, shown in accordance with some embodiments of the present description.
In step 410, a vector representation is performed on words in the input text.
In some embodiments, the input text may be a query or query statement entered by a user through a user terminal or the like. In some embodiments, the processing device may generate corresponding text vectors by vector representing words in the input text by a vectorization model. In some embodiments, the input to the vectorization model may be input text and the output may be a vector corresponding to the input text. In some embodiments, the input of the vectorization model may be the input text after word segmentation, and the output may be a vector representation corresponding to the word after word segmentation in the input text.
In some embodiments, the output vector of the vectorization model may contain information of the words in the input text and context information of the words in the corresponding text.
In some embodiments, the vectorization model may include, but is not limited to, an LSTM (Long Short-Term Memory) model, a BilSTM (Bi-directional Long Short-Term Memory) model, a GRU (Gate RecurrentUnit) model, and the like.
Step 420, generating a weight prediction value based on the vector.
In some embodiments, the processing device may generate a text weight predictor for the input text by the weight sub-model based on the output vector of the vectorization model. The weight prediction value may reflect the importance of the corresponding word in the text. In some embodiments, the weight prediction value may be a number, a percentage, or the like. For example, the weight prediction value may be a number "1", "2", "3", etc., where a larger numerical value corresponds to a larger weight prediction value for a word, indicating that the word is more important in the text.
In some embodiments, the input to the weight submodel may be a word vector output by the vectorization model, and the output may be a text weight predictor for the word.
Step 430, adjusting the prediction result of the word weight model based on the loss function.
In some embodiments, the processing device may optimize the word weight model parameters based on a loss function to make the prediction of the word weight model more accurate. In particular, the processing device may set the loss function as a learning goal for the initial model to improve the accuracy of the model output results. For example, the processing apparatus may set loss (y, y ') = y '. logy + (1-y '). log (1-y) as a learning target of the weight submodel. Where y may represent a text weight prediction value output by the weight submodel, and y' may represent a text weight identification value. For example, the text weight identification value may take a value of 1 or 0, 1 may indicate that a word is of high importance in the text, and 0 may indicate that a word is of low importance in the text. The text weight identification value may be obtained based on an importance identification of the text.
In some embodiments, the processing device may input training data into the initial word weight model, and obtain a trained word weight model using the loss function as a learning objective of the word weight model.
It should be noted that the above description of method 400 is for purposes of example and illustration only and is not intended to limit the scope of applicability of the present application. Various modifications and alterations to method 400 will be apparent to those skilled in the art in light of the present application. However, such modifications and variations are intended to be within the scope of the present application.
In some embodiments, the trained word weight model may be applied in scenarios such as intelligent interaction, intelligent retrieval, and the like.
In some embodiments, the processing device may acquire the retrieved text. The retrieved text may be text in the retrieved data. For example, the retrieved text may be text in an answer database of machine customer service in an intelligent interaction. The processing device can determine the weight of the words in the searched text through the trained word weight model, and determine the search keywords of the searched text based on the weight of the words in the searched text.
In some embodiments, the processing device may determine the word with the largest weight value, or with the highest weight value ranking topN, as the retrieval keyword of the retrieved text. For example, the processing device may determine a weight of each word in a text in the machine customer service answer database, and determine the word with the highest weight value as the keyword of the retrieved text.
The processing device may determine at least one search result based on the search keyword in the searched text. For example, if the user asks that the keyword is "when the accident will take effect" in the question, and the keyword in the question is "accident" and "take effect", the processing device may obtain the retrieved text corresponding to the retrieval keyword similar to "accident" and/or "take effect" from the intelligent customer service database, determine the retrieved text as the retrieval result, and output the retrieval result to the user as the response of the intelligent customer service.
In some embodiments, the processing device may obtain the first text and the second text. The first text and the second text may be a search text and a candidate text that may be related to the search text in a search scenario or a smart interaction scenario, respectively. For example, in a smart interaction scenario, the user question may be considered a first text and the second text may be an answer retrieved from a machine customer service answer database that may be relevant to the user question.
The processing device may determine weights for words in the first text and weights for words in the second text based on the trained word weight model. Specifically, the first text and the second text may be used as input of a word weight model, the word weight model is input separately, and the weight of the word in the first text and the weight of the word in the second text are obtained based on the output of the word weight model.
In some embodiments, the processing device may calculate a vector distance of the first text and the second text based on weights of words in the first text and the second text. For example, the processing device may use the weight value of each word as a coefficient for the word in the calculation process to calculate the vector distance of the two texts. The vector distance may reflect a similarity of the first text and the second text. Wherein the distance is inversely related to the similarity, i.e. the greater the distance, the smaller the similarity. In some embodiments, the vector distance may include, but is not limited to, a cosine distance, an Euclidean distance, a Manhattan distance, a Mahalanobis distance, or a Minkowski distance, among others.
In some embodiments, the processing device may determine whether the first text and the second text match based on the vector distance. In some embodiments, the processing device may determine whether the first text and the second text match by setting a distance threshold. For example, a first threshold may be set, and when the vector distance between the first text and the second text is greater than the first threshold, which indicates that the similarity between the two texts may be low, it is determined that the first text and the second text do not match; conversely, if the vector distance is less than the first threshold, a match between the first text and the second text is determined.
In some embodiments, the processing device may determine the search result based on a result of the matching of the first text and the second text. For example, if the first text is a question of the user and the second text is a candidate answer of the machine service related to the first text retrieved from the database, the second text matching the first text can be used as an answer of the smart service to answer the user.
It is to be understood that the above description is intended to be illustrative, and not restrictive. In some alternative embodiments, the trained word weight model may be used in any other reasonable scenario, and the search result or the matching result may be obtained in any other feasible manner.
In some embodiments, a training word weight model system (e.g., training word weight model system 100) may include an acquisition module, a determination module, a labeling module, a training module, a determination module, and a matching module, among others.
The obtaining module may be configured to obtain a plurality of text pairs. The judging module can be used for judging whether the two texts in the text pair are matched or not to obtain a matching result.
The marking module can be used for determining the importance identifications of the words of the texts in the text pairs based on the matching result, and obtaining a plurality of text pairs containing the importance identifications. The marking module can be further used for judging whether the two texts of the text pair are matched and performing one or more of the following processes: processing one: if the two texts are not matched, making a first mark for different words in the two texts; and (5) processing: if the two texts are not matched, making a second mark for the same word in the two texts; and (3) treatment III: if the two texts are matched, making a first mark for the same word in the two texts; and (4) treatment: if the two texts match, a second mark is made for a different word in the two texts.
The training module may be configured to train a word weight model based on a plurality of training data derived from text in a plurality of text pairs containing an identification of importance.
The determining module can be used for acquiring the searched text and determining the weight of the words in the searched text based on the word weight model; and determining a retrieval keyword of the retrieved text based on the weight of the words in the retrieved text, and determining at least one retrieval result based on the retrieval keyword.
The matching module can be used for acquiring the first text and the second text, determining the weight of the words in the first text based on the word weight model, and determining the weight of the words in the second text based on the word weight model; calculating the vector distance of the first text and the second text based on the weights of the words in the first text and the second text; and determining whether the first text and the second text match based on the vector distance.
For more descriptions of the obtaining module, the determining module, the labeling module, the training module, the determining module, and the matching module, reference may be made to other parts of this specification (for example, fig. 2, fig. 3, and their related descriptions), and no further description is given here. It should be noted that the above description of the training word weight model system and its modules is for convenience of description only and should not limit the present disclosure to the scope of the illustrated embodiments.
The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) the weights of words in the text are learned through the text matching result, so that the labor cost can be reduced, and the weight determination efficiency is improved; (2) different word importance identification modes are adopted respectively through text matching and text mismatching, and accuracy of determining the weight of the words in the text can be improved. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.
Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.
Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, VisualBasic, Fortran2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing processing device or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.
For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims (12)

1. A method of training a word weight model, comprising:
acquiring a plurality of text pairs;
judging whether two texts in the text pair are matched or not to obtain a matching result;
determining importance identifications of words of the texts in the text pairs based on the matching result to obtain a plurality of text pairs containing the importance identifications, wherein if the matching result is unmatched, the importance of different word identifications in the two texts is higher than that of marks of the same word; if the matching result is matching, the importance of the same word mark in the two texts is higher than that of marks of different words;
training a word weight model based on a plurality of training data derived from text in a plurality of text pairs containing the importance identifications.
2. The method of claim 1, wherein determining an importance indicator for a word of text in the text pair based on the matching result results in a plurality of text pairs containing the importance indicator, comprises:
judging whether the two texts of the text pair are matched, and performing one or more of the following processes:
processing one: if the two texts do not match, making a first mark for different words in the two texts;
and (5) processing: if the two texts do not match, making a second mark on the same words in the two texts;
and (3) treatment III: if the two texts are matched, making a first mark for the same word in the two texts;
and (4) treatment: if the two texts are matched, making a second mark for different words in the two texts;
the first mark and the second mark are the importance marks, and the importance of the first mark is higher than that of the second mark.
3. The method of claim 1, further comprising:
the word weight model comprises a vectorization model and a weight submodel;
the vectorization model carries out vector representation on words in the input text, and the generated vector comprises information of the words and context information of the words in the text;
the weight submodel generates a weight predictor based on the vector.
4. The method of claim 1, further comprising:
acquiring a text to be retrieved;
determining weights for words in the retrieved text based on the word weight model;
determining a retrieval keyword of the retrieved text based on the weight of the words in the retrieved text;
determining at least one retrieval result based on the retrieval key word.
5. The method of claim 1, further comprising:
acquiring a first text and a second text;
determining weights for words in the first text based on the word weight model, and weights for words in the second text based on the word weight model;
calculating a vector distance of the first text and the second text based on weights of words in the first text and the second text;
determining whether the first text and the second text match based on the vector distance.
6. A system for training a word weight model, comprising:
the acquisition module is used for acquiring a plurality of text pairs;
the judging module is used for judging whether the two texts in the text pair are matched or not to obtain a matching result;
the marking module is used for determining the importance identifications of the words of the texts in the text pairs based on the matching result to obtain a plurality of text pairs containing the importance identifications, wherein if the matching result is not matched, the importance of the marks of different words in the two texts is higher than that of the marks of the same words; if the matching result is matching, the importance of the same word mark in the two texts is higher than that of marks of different words;
and the training module is used for training the word weight model based on a plurality of training data, and the training data is derived from texts in a plurality of text pairs containing the importance identifications.
7. The system of claim 6, the tagging module further to:
judging whether the two texts of the text pair are matched, and performing one or more of the following processes:
processing one: if the two texts do not match, making a first mark for different words in the two texts;
and (5) processing: if the two texts do not match, making a second mark on the same words in the two texts;
and (3) treatment III: if the two texts are matched, making a first mark for the same word in the two texts;
and (4) treatment: if the two texts are matched, making a second mark for different words in the two texts;
the first mark and the second mark are the importance marks, and the importance of the first mark is higher than that of the second mark.
8. The system of claim 6, the term weight model comprising a vectorization model and a weight submodel;
the vectorization model carries out vector representation on words in the input text, and the generated vector comprises information of the words and context information of the words in the text;
the weight submodel generates a weight predictor based on the vector.
9. The system of claim 6, further comprising a determination module to:
acquiring a text to be retrieved;
determining weights for words in the retrieved text based on the word weight model;
determining a retrieval keyword of the retrieved text based on the weight of the words in the retrieved text;
determining at least one retrieval result based on the retrieval key word.
10. The system of claim 6, further comprising a matching module to:
acquiring a first text and a second text;
determining weights for words in the first text based on the word weight model, and weights for words in the second text based on the word weight model;
calculating a vector distance of the first text and the second text based on weights of words in the first text and the second text;
determining whether the first text and the second text match based on the vector distance.
11. An apparatus for training a word weight model, comprising a processor for performing the method of any one of claims 1-5.
12. A computer-readable storage medium storing computer instructions which, when read by a computer, cause the computer to perform the method of any one of claims 1 to 5.
CN202010409812.7A 2020-05-15 2020-05-15 Method and system for training word weight model Active CN111324722B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010409812.7A CN111324722B (en) 2020-05-15 2020-05-15 Method and system for training word weight model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010409812.7A CN111324722B (en) 2020-05-15 2020-05-15 Method and system for training word weight model

Publications (2)

Publication Number Publication Date
CN111324722A CN111324722A (en) 2020-06-23
CN111324722B true CN111324722B (en) 2020-08-14

Family

ID=71168218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010409812.7A Active CN111324722B (en) 2020-05-15 2020-05-15 Method and system for training word weight model

Country Status (1)

Country Link
CN (1) CN111324722B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609248A (en) * 2021-08-20 2021-11-05 北京金山数字娱乐科技有限公司 Word weight generation model training method and device and word weight generation method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100583101C (en) * 2008-06-12 2010-01-20 昆明理工大学 Text categorization feature selection and weight computation method based on field knowledge
CN105975459B (en) * 2016-05-24 2018-09-21 北京奇艺世纪科技有限公司 A kind of the weight mask method and device of lexical item
CN108304424B (en) * 2017-03-30 2021-09-07 腾讯科技(深圳)有限公司 Text keyword extraction method and text keyword extraction device
CN108334533B (en) * 2017-10-20 2021-12-24 腾讯科技(深圳)有限公司 Keyword extraction method and device, storage medium and electronic device
CN108509638B (en) * 2018-04-11 2023-06-27 联想(北京)有限公司 Question extraction method and electronic equipment

Also Published As

Publication number Publication date
CN111324722A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
CN107491547A (en) Searching method and device based on artificial intelligence
CN109815487A (en) Text quality detecting method, electronic device, computer equipment and storage medium
CN109460457A (en) Text sentence similarity calculating method, intelligent government affairs auxiliary answer system and its working method
CN110704586A (en) Information processing method and system
CN111353033B (en) Method and system for training text similarity model
CN111309887B (en) Method and system for training text key content extraction model
CN109408821B (en) Corpus generation method and device, computing equipment and storage medium
CN111046147A (en) Question answering method and device and terminal equipment
CN111582500A (en) Method and system for improving model training effect
US11461613B2 (en) Method and apparatus for multi-document question answering
CN111767375A (en) Semantic recall method and device, computer equipment and storage medium
CN113377936A (en) Intelligent question and answer method, device and equipment
CN110955766A (en) Method and system for automatically expanding intelligent customer service standard problem pairs
CN117290492A (en) Knowledge base question-answering method and device, electronic equipment and storage medium
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
CN113821527A (en) Hash code generation method and device, computer equipment and storage medium
CN113821622B (en) Answer retrieval method and device based on artificial intelligence, electronic equipment and medium
CN116467417A (en) Method, device, equipment and storage medium for generating answers to questions
CN111324722B (en) Method and system for training word weight model
CN111198949B (en) Text label determination method and system
CN113505786A (en) Test question photographing and judging method and device and electronic equipment
CN111324738B (en) Method and system for determining text label
CN111611796A (en) Hypernym determination method and device for hyponym, electronic device and storage medium
CN112132269B (en) Model processing method, device, equipment and storage medium
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant