WO2020107765A1 - 语句分析处理方法、装置、设备以及计算机可读存储介质 - Google Patents

语句分析处理方法、装置、设备以及计算机可读存储介质 Download PDF

Info

Publication number
WO2020107765A1
WO2020107765A1 PCT/CN2019/081282 CN2019081282W WO2020107765A1 WO 2020107765 A1 WO2020107765 A1 WO 2020107765A1 CN 2019081282 W CN2019081282 W CN 2019081282W WO 2020107765 A1 WO2020107765 A1 WO 2020107765A1
Authority
WO
WIPO (PCT)
Prior art keywords
word slot
similarity score
value
intent
vector
Prior art date
Application number
PCT/CN2019/081282
Other languages
English (en)
French (fr)
Inventor
汤耀华
莫凯翔
张超
徐倩
杨强
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2020107765A1 publication Critical patent/WO2020107765A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of transfer learning technology, and in particular, to a sentence analysis processing method, device, device, and computer-readable storage medium.
  • the spoken language understanding model in the artificial intelligence dialogue robot can play a key role in helping the robot to understand the user's intention.
  • artificial intelligence dialogue robots are widely used, such as Amazon's Alexa, Microsoft's Xiaobing robot and Apple's siri.
  • the robot's ability to understand spoken language is particularly important, not only needs to be able to understand the user's common demand scenarios, but also needs to continuously expand the robot's understanding ability to new user demand scenarios.
  • Support for new user demand scenarios generally requires the collection and annotation of data, while the current technical solutions are generally rule matching or adding training data. This process is time-consuming and expensive, and requires a professional labeling team.
  • the main purpose of this application is to provide a laser marking filling method, laser marking device, equipment and computer storage medium, aiming to solve the problem that after the model is moved to a new field, because there are only a small number of samples or zero samples, it cannot be quickly learned and Technical issues for performing oral comprehension tasks.
  • the present application also provides a sentence analysis processing device, the sentence analysis processing device includes:
  • a migration module used to obtain a pre-trained model on a large sample data set in the source domain, and transfer the pre-trained model to the target domain;
  • the determining module is used to obtain the sentence features of the preset question in the pre-training model in the target domain, and perform semantic analysis on each sentence feature to determine the corresponding feature of the preset question Different intentions
  • a first obtaining module configured to obtain an intent similarity score of each of the intents in the pre-training model, and determine the highest intent similarity score among each of the intent similarity scores;
  • the second obtaining module is used to obtain each word slot in the pre-training model, determine the word slot similarity score of each word slot in the pre-training model, and determine the highest among the word slot similarity scores Word slot similarity score;
  • the output module is configured to obtain the final intent corresponding to the highest intention similarity score and the final word slot corresponding to the highest word slot similarity score, and output the highest intention and the final word slot.
  • the present application also provides a mobile terminal
  • the mobile terminal includes: a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, where the computer-readable instructions are implemented as described above when executed by the processor. The steps of the sentence analysis processing method described above.
  • the present application also provides a computer-readable storage medium; the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions are executed by a processor to implement the statements as described above Analyze the steps of the processing method.
  • the simple classification model in the principle model is replaced by calculating the similarity score of the intention and the similarity score of the word slot, which can well solve the problem of migration from the source domain to the target domain, and when the model After migrating from the source domain to the target domain, the user does not need to redesign the plan, which is scalable and does not need to re-add training data, thereby saving labor costs. After the model is moved to the new domain, because there are only a few samples The technical problem of zero samples and unable to quickly learn and perform oral comprehension tasks.
  • FIG. 1 is a schematic diagram of a terminal ⁇ device structure of a hardware operating environment involved in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a first embodiment of a sentence analysis processing method of the application
  • FIG. 3 is a schematic flowchart of a second embodiment of a sentence analysis processing method of the application.
  • FIG. 4 is a schematic diagram of functional modules of a sentence analysis processing device of the application.
  • FIG. 5 is a model network structure diagram of the sentence analysis processing method of the present application.
  • FIG. 1 is a schematic diagram of a terminal structure of a hardware operating environment involved in a solution of an embodiment of the present application.
  • the terminal in the embodiment of the present application is a sentence analysis processing device.
  • the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 may include a display (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as disk storage.
  • the memory 1005 may optionally be a storage device independent of the foregoing processor 1001.
  • the terminal may also include a camera, RF (Radio Frequency (radio frequency) circuits, sensors, audio circuits, WiFi modules, etc.
  • sensors such as light sensors, motion sensors and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen according to the brightness of the ambient light, and the proximity sensor may turn off the display screen and/or when the terminal device moves to the ear Backlight.
  • the terminal device can also be configured with other sensors such as gyroscopes, barometers, hygrometers, thermometers, and infrared sensors, which will not be repeated here.
  • terminal structure shown in FIG. 1 does not constitute a limitation on the terminal, and may include more or less components than those illustrated, or combine certain components, or arrange different components.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and computer-readable instructions.
  • the network interface 1004 is mainly used to connect to the back-end server and perform data communication with the back-end server;
  • the user interface 1003 is mainly used to connect to the client (user end) and perform data communication with the client; and the processor 1001 can be used to call computer-readable instructions stored in the memory 1005 and execute the sentence analysis processing method provided in the embodiments of the present application.
  • the present application provides a sentence analysis processing method.
  • the sentence analysis processing method includes the following steps:
  • Step S10 Obtain the pre-trained model on the large sample data set in the source domain, and transfer the pre-trained model to the target domain;
  • the source field is a mature application scenario with a large amount of labeled data used to train various models.
  • the target area is a new application scenario, with little or no labeled data.
  • Transfer learning is to share the model parameters that have been trained in the original domain to the model in the new target domain in some way to help the new model training.
  • collect some user questions design the intention/slot frame according to the user questions, and organize the staff to label the data according to the frame.
  • the pre-trained model architecture used in different scenarios is the same, but the pre-trained model is adjusted on the labeled small sample data.
  • all the parameters of the large sample model are used to initialize the parameters of the small sample model, and then training fine-tuning is performed on the small sample labeling data of the new scene.
  • the small sample model is successfully obtained from the training of the pre-trained model in the small sample scene in the target field, it will be interacted with the actual user, and the question will be continuously collected during the user's use, and the training set will be expanded and reused.
  • the expanded data set enhances this small sample model.
  • Step S20 In the target field, obtain the sentence features of the preset question in the pre-training model, and perform semantic analysis on each sentence feature to determine the different intentions corresponding to the preset question ;
  • Intent means that we identify what the user wants to do specifically.
  • the intent is a classifier that divides user needs into a certain type. For example, the phrase “I want to book a ticket from Beijing to Shanghai” is a user expressing his needs. This can be defined as an "inform” intention; “What time does the ticket have?” This phrase indicates that the user is asking for ticket information , This can be defined as the "request” intent.
  • the phrase “I want to book a ticket from Beijing to Shanghai” is a user expressing his needs. This can be defined as an "inform” intention; "What time does the ticket have?”
  • This phrase indicates that the user is asking for ticket information , This can be defined as the "request” intent.
  • LSTM Long Short-Term Memory
  • LSTM Long Short-Term Memory
  • Step S30 Obtain the intention similarity score of each of the intentions in the pre-training model, and determine the highest intention similarity score among each of the intention similarity scores;
  • the bidirectional LSTM layer is used to further abstract the features obtained by the common representation layer, and then the last state of each direction of the bidirectional LSTM is stitched together, denoted as h intent .
  • the expression words of each intent name are converted into a semantic vector with a fixed length similar to embedding through the semantic network, and then the semantic vector and h intent are bilinearly operated to obtain the intent Intent similarity score for each intent, because each intent uses the same method to obtain the intent similarity score corresponding to the intent, so you can compare the size of each intent similarity score to get the highest intent similarity with the highest score Minute.
  • the Semantic network first replaces each word with the corresponding word embedding: E(w i ). Then use a layer of DNN (Deep Neural Network, the depth of the neural network) network E (w i) do get the semantic vector nonlinear mapping of the word, and finally all the semantic vector n words of the intent to do averaging name of semantic vector .
  • Step S40 Obtain each word slot in the pre-training model, determine the word slot similarity score of each word slot in the pre-training model, and determine the highest word slot similarity in each word slot similarity score Minute;
  • the word slot is for the definition of key information in the user's expression. For example, in the expression of booking a ticket, our slot has "takeoff time, starting point, destination". These three key information need to be identified.
  • the semantic network Agreeing with the graph name, we also use the semantic network to convert the expressions of each slot name into a semantic vector r i slotname .
  • the i-th word slot may have multiple values, and each value can also be converted into a semantic vector through the semantic network.
  • the semantic vector of the j-th value is r i , j slotvalue . It should be noted that after all the scores of the values are normalized, the weighted average of the corresponding semantic vectors is obtained to obtain the semantic vector r i slotvalue of the entire word slot value. Then use r i slotvalue and h t slot to do quadratic linear operation to get the similarity score of the value of the word slot.
  • Step S50 Acquire the final intention corresponding to the highest intent similarity score and the final word slot corresponding to the highest word slot similarity score, and output the final intention and the final word slot.
  • the intent corresponding to the highest intent similarity score is taken as the final intent
  • the word slot corresponding to the highest word slot similarity score is taken as the final word slot
  • the final word slot and the final intent are output.
  • the model is divided into an Embeddings layer (embedding layer), Common Representation layer (common feature extraction layer), Intent Task (intention task) layer and Slot task (word slot task) layer.
  • the Embeddings layer replaces the input sentence words with corresponding word embedding, such as W 0 , W t , W T+1 and so on.
  • the Common Representation layer, Intent Task layer and Slot task layer all use a bidirectional LSTM network architecture.
  • the bidirectional LSTM layer is used to further abstract the features obtained by the common representation layer, and then the last state of each direction of the bidirectional LSTM is spliced together, recorded as h intent , and then h intent and each intention are Intent1 (intent 1), Intent2 (intent 2), and Intent3 (intent 3) perform Semantic Similarity (similarity comparison) to obtain the maximum similarity value, that is, Softmax, and then output the maximum similarity intent, which is shown in the figure ⁇ .
  • Semantic Similarity similarity comparison
  • the similarity of each slot name and h t slot needs to be compared to obtain the similarity score of the word slot name.
  • the similarity score of the word slot name and the similarity score of the word slot value are added to obtain the total similarity score of the state of the word slot and the current position h t slot . Then determine the highest word slot similarity score in each word slot similarity score and output it to St in the figure.
  • the simple classification model in the principle model is replaced by calculating the similarity score of the intention and the similarity score of the word slot, which can well solve the problem of migration from the source domain to the target domain, and when the model After migrating from the source domain to the target domain, the user does not need to redesign the plan, which is scalable and does not need to re-add training data, thereby saving labor costs. After the model is moved to the new domain, because there are only a small number of samples or The technical problem of zero samples and unable to quickly learn and perform oral comprehension tasks.
  • step S30 of the first embodiment of the present application is step S30 of the first embodiment of the present application.
  • Step S31 Obtain the first state vector in the pre-training model
  • Step S32 Obtain an intent name semantic vector corresponding to each of the intents, and calculate an intent similarity score between each of the intent name semantic vectors and the first state vector;
  • the first state vector can be the Intent task layer in the model, using the bidirectional LSTM layer to common
  • the features obtained by the representation layer are further abstracted, and then the state vector after the last state of each direction of the bidirectional LSTM is stitched together.
  • Intent name is the expression of intention. After obtaining the first state vector in the pre-trained model, you also need to obtain the intent name semantic vector corresponding to each intent again, and then perform a quadratic linear operation on the intent name semantic vector and the first state vector to obtain the intent similarity. Sex points. And since each intent has an intent similarity score corresponding to the intent, and the acquisition method is basically the same, so all intent similarity scores can be used.
  • Step S33 Compare the intent similarity scores to obtain the highest intent similarity score among the intent similarity scores.
  • each intention similarity score needs to be compared with other intention similarity scores.
  • the step of obtaining the semantic vector of the intent name corresponding to each of the intents includes:
  • DNN Deep Neural Network, the depth of the neural network
  • step S322 an average vector value of each sentence vector is obtained, and the average vector value is used as the semantic vector of the intent name.
  • the accuracy of detecting the similarity of the intent is improved.
  • step S40 of the first embodiment of the present application to obtain The refinement of the steps of determining the similarity of the word slots in the pre-training model for each word slot in the pre-training model includes:
  • Step S41 Acquire each word slot in the pre-training model
  • Step S42 Obtain the word slot name of the word slot and the value of the overall word slot, and determine the first similarity score of the word slot name and the second similarity score of the overall word slot value;
  • the first similarity score may be a similarity score between the word slot name and the current position state.
  • the second similarity score may be a similarity score between the overall word slot value and the current position state.
  • word slots are generally expressed by one or more words, such as "food”, and generally each word slot will have some possible values, such as the word slot "food”, which can be easily obtained Possible values: “cake”, “apple”, “roasted leg of lamb”, etc.
  • the value semantic vector and the state vector are subjected to a quadratic linear operation to obtain the second similarity score corresponding to the overall word slot value.
  • the three vectors are operated with the current state vector to obtain a score, and then the three scores are normalized to become C1, C2, C3, then A1*C1+A2*C2+A3*C3 is the semantic vector of the entire word slot value.
  • the word slot name is the name of the slot, the expression of the slot.
  • the overall word slot value may be a word slot value that is related to each word slot value.
  • Step S43 and determine the word slot similarity score of the word slot according to the sum of the first similarity score and the second similarity score.
  • the first similarity score corresponding to the word slot name and the second similarity score corresponding to the overall word slot value need to be added to obtain the sum, and Take its sum as the similarity score between the word slot and the current position.
  • the word slot similarity of the word slot is determined, thereby improving the accuracy of determining the word slot similarity.
  • the steps of determining the first similarity score of the word slot name and the second similarity score of the overall word slot value include:
  • Step S421 Obtain the current position state in the pre-training model, and determine the second state vector of the current position state;
  • the states of the bidirectional LSTM of the representation layer and the bidirectional LSTM of the Intent task layer are stitched together as the state of the current position, that is, the second state vector.
  • Step S422 Obtain a word slot name semantic vector corresponding to the word slot name, and determine a first similarity score between the word slot name semantic vector and the second state vector;
  • the word slot name semantic vector of the word slot name can be obtained by performing a nonlinear operation on the word slot name through a layer of DNN network in the preset model, and then the word slot name semantic vector and the second The state vector performs quadratic linear operation to obtain the first similarity score.
  • Step S423 Obtain a value semantic vector corresponding to the value of the overall word slot, and determine a second similarity score between the value semantic vector and the second state vector.
  • the semantic vector corresponding to the overall word slot value you can first calculate the semantic vector of each word slot value in the word slot, and then determine the similarity score of these semantic vectors, and normalize these similarity scores.
  • the semantic vector of the corresponding word slot value is weighted to obtain the value semantic vector corresponding to the overall word slot value, and then the value semantic vector and the second state vector are subjected to a quadratic linear operation to obtain the second similarity score .
  • the first similarity of the word slot name and the second similarity of the overall word slot value are determined by determining the current position state in the pre-training model, thereby ensuring whether the word slot in the system is a user What is needed improves the user's sense of experience.
  • the step of obtaining the value semantic vector corresponding to the value of the overall word slot includes:
  • Step A10 Obtain the value of each sub-word slot in the word slot, and determine the sub-value semantic vector corresponding to the value of each sub-word slot;
  • the sub-word slot value can be any word slot value in the word slot. Obtain the values of all sub-word slots in the word slot, and perform a non-linear operation on the sub-word slot values through a layer of DNN network in the preset model to obtain the sub-value semantic vector corresponding to the sub-word slot values.
  • Step A11 calculating a third similarity score between the sub-value vector and the second state vector, and obtaining a vector product between the third similarity score and the sub-value vector;
  • the third similarity score may be a similarity score between any word slot value and the current position state.
  • the third similarity score between the sub-value vector and the state vector is calculated by quadratic linear operation, and then the vector product between the third similarity score and the sub-value vector is determined.
  • Step A12 Obtain a vector product corresponding to each sub-word slot value, and add the vector products to obtain a value semantic vector corresponding to the overall word slot value.
  • the vector product corresponding to each sub-word slot value is obtained, and then all the vector products are added to obtain the sum value, and finally the sum value is used as the value semantic vector corresponding to the overall word slot value.
  • the value semantic vector corresponding to the overall word slot value is determined according to the values of all sub-word slots, thereby ensuring that the value semantic vector is related to all word slot values in the word slot, ensuring that The accuracy of the valued semantic vector improves the user's sense of experience.
  • the step of obtaining each word slot in the pre-training model includes:
  • Step S411 Obtain the preset question in the pre-training model
  • Step S412 Perform semantic analysis on the preset question in the target field to determine each word slot in the pre-training model.
  • the word slot name can be food at this time, and each word slot in the word slot can be cake, apple, roast lamb leg Wait.
  • each word slot in the pre-training model is determined according to the preset question in the target domain, thereby ensuring that each word slot is related to the preset question and sentence, and avoiding the occupation of word slot space by unrelated word slots It saves resources and improves the user experience.
  • an embodiment of the present application further provides a sentence analysis and processing device.
  • the sentence analysis and processing device includes:
  • a migration module used to obtain a pre-trained model on a large sample data set in the source domain, and transfer the pre-trained model to the target domain;
  • the determining module is used to obtain the sentence features of the preset question in the pre-training model in the target domain, and perform semantic analysis on each sentence feature to determine the corresponding feature of the preset question Different intentions
  • a first obtaining module configured to obtain an intent similarity score of each of the intents in the pre-training model, and determine the highest intent similarity score among each of the intent similarity scores;
  • the second obtaining module is used to obtain each word slot in the pre-training model, determine the word slot similarity score of each word slot in the pre-training model, and determine the highest among the word slot similarity scores Word slot similarity score;
  • the output module is configured to obtain the final intent corresponding to the highest intention similarity score and the final word slot corresponding to the highest word slot similarity score, and output the highest intention and the final word slot.
  • the first obtaining module is also used to:
  • the first obtaining module is also used to:
  • the second obtaining module is also used to:
  • the second obtaining module is also used to:
  • the second obtaining module is also used to:
  • the second obtaining module is also used to:
  • embodiments of the present application also provide a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium.
  • the computer-readable storage medium of the present application stores computer-readable instructions, where the computer-readable instructions are executed by a processor to implement the steps of the sentence analysis processing method as described above.
  • the method implemented when the computer-readable instruction is executed can refer to various embodiments of the sentence analysis processing method of the present application, and details are not described herein again.
  • the methods in the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware, but in many cases the former is better Implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or part that contributes to the existing technology, and the computer software product is stored in a storage medium (such as ROM/RAM as described above) , Magnetic disk, optical disk), including several instructions to make a terminal device (which can be a mobile phone, computer, server, air conditioner, or network equipment, etc.) to perform the method described in each embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

一种语句分析处理方法、装置、计算机可读存储介质,所述方法包括:获取源领域大样本数据集上的预训练模型,并将预训练模型迁移学习到目标领域(S10);在所述目标领域内,获取所述预训练模型中预设问句的各语句特征,并对各所述语句特征进行语义分析,以确定所述预设问句对应的各不同意图(S20);获取各所述意图在预训练模型中的意图相似性分,在所述各意图相似性分中确定最高意图相似性分(S30);获取所述预训练模型中的各词槽,确定各所述词槽在预训练模型中的词槽相似性分,在各所述词槽相似性分中确定最高词槽相似性分(S40);获取所述最高意图相似性分对应的最终意图和最高词槽相似性分对应的最终词槽,并输出所述最终意图和所述最终词槽(S50)。该方法能够在模型迁移到新领域的同时,也能快速学习并执行口语理解任务。

Description

语句分析处理方法、装置、设备以及计算机可读存储介质
本申请要求于2018年11月30日提交中国专利局、申请号为201811464437.5、发明名称为“语句分析处理方法、装置、设备以及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及迁移学习技术领域,尤其涉及一种语句分析处理方法、装置、设备以及计算机可读存储介质。
背景技术
人工智能对话机器人中的口语理解模型能够起到帮助机器人理解用户意图的关键性作用。随着人工智能对话机器人被广泛的使用,比如亚马逊的Alexa,微软的小冰机器人以及苹果的siri。机器人的口语理解能力显得尤为重要,不仅需要能够理解用户的常见需求场景,还需要不断的扩展机器人的理解能力到新的用户需求场景。对于新的用户需求场景的支持一般需要收集和标注数据,而目前采用的技术方案一般是规则匹配或者是增加训练数据。这个过程既耗时又耗钱,而且需要专业的标注团队。因此,在某个有大量数据的场景下学习了口语理解模型之后,对于新的场景领域,因为只有少量的样本或者零样本而不能快速学习并执行口语理解任务成为目前亟待解决的技术问题。
发明内容
本申请的主要目的在于提供一种激光打标的填充方法、激光打标装置、设备和计算机存储介质,旨在解决模型迁移到新领域后,因为只有少量的样本或者零样本而不能快速学习并执行口语理解任务的技术问题。
为实现上述目的,本申请提供一种语句分析处理方法,所述语句分析处理方法包括以下步骤:
获取源领域大样本数据集上的预训练模型,并将所述预训练模型迁移学习到目标领域;
在所述目标领域内,获取所述预训练模型中预设问句的各语句特征,并对各所述语句特征进行语义分析,以确定所述预设问句对应的各不同意图;
获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分;
获取所述预训练模型中的各词槽,确定各所述词槽在预训练模型中的词槽相似性分,并在各所述词槽相似性分中确定最高词槽相似性分;
获取所述最高意图相似性分对应的最终意图和所述最高词槽相似性分对应的最终词槽,并输出所述最高意图和所述最终词槽。
此外,为实现上述目的,本申请还提供一种语句分析处理装置,所述语句分析处理装置包括:
迁移模块,用于获取源领域大样本数据集上的预训练模型,并将所述预训练模型迁移学习到目标领域;
确定模块,用于在所述目标领域内,获取所述预训练模型中预设问句的各语句特征,并对各所述语句特征进行语义分析,以确定所述预设问句对应的各不同意图;
第一获取模块,用于获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分;
第二获取模块,用于获取所述预训练模型中的各词槽,确定各所述词槽在预训练模型中的词槽相似性分,并在各所述词槽相似性分中确定最高词槽相似性分;
输出模块,用于获取所述最高意图相似性分对应的最终意图和所述最高词槽相似性分对应的最终词槽,并输出所述最高意图和所述最终词槽。
此外,为实现上述目的,本申请还提供一种移动终端;
所述移动终端包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,其中:所述计算机可读指令被所述处理器执行时实现如上所述的语句分析处理方法的步骤。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质;所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如上述的语句分析处理方法的步骤。
在本实施例中,通过计算意图的相似性分和词槽的相似性分的方式来替代原理模型中的简单分类模型,可以很好的解决从源领域迁移到目标领域的问题,并且当模型从源领域迁移到目标领域后,不需要用户重新设计规划,具有可扩展性,也不需要重新增加训练数据,从而节约了人工成本,解决了模型迁移到新领域后,因为只有少量的样本或者零样本而不能快速学习并执行口语理解任务的技术问题。
附图说明
图1是本申请实施例方案涉及的硬件运行环境的终端\装置结构示意图;
图2为本申请语句分析处理方法第一实施例的流程示意图;
图3为本申请语句分析处理方法第二实施例的流程示意图;
图4为本申请语句分析处理装置的功能模块示意图;
图5为本申请语句分析处理方法的模型网络结构图。
本申请目的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
如图1所示,图1是本申请实施例方案涉及的硬件运行环境的终端结构示意图。本申请实施例终端为语句分析处理设备。
如图1所示,该终端可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
可选地,终端还可以包括摄像头、RF(Radio Frequency,射频)电路,传感器、音频电路、WiFi模块等等。其中,传感器比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示屏的亮度,接近传感器可在终端设备移动到耳边时,关闭显示屏和/或背光。当然,终端设备还可配置陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
本领域技术人员可以理解,图1示出的终端结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或组合某些部件,或不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及计算机可读指令。
在图1所示的终端中,网络接口1004主要用于连接后台服务器,与后台服务器进行数据通信;用户接口1003主要用于连接客户端(用户端),与客户端进行数据通信;而处理器1001可以用于调用存储器1005中存储的计算机可读指令,并执行本申请实施例提供的语句分析处理方法。
参照图2,本申请提供一种语句分析处理方法,在语句分析处理方法一实施例中,语句分析处理方法包括以下步骤:
步骤S10,获取源领域大样本数据集上的预训练模型,并将所述预训练模型迁移学习到目标领域;
源领域是成熟的应用场景,具有大量的标注数据用来训练各个模型。目标领域是新的应用场景,只存在少量或者根本没有标注数据。迁移学习是把在原领域已训练好的模型参数通过某种方式来分享给新的目标领域的模型来帮助新模型训练。在源领域大样本数据集上进行预设数量的模型训练,并从这些模型中选择一个在该数据集上表现最优异的模型作为预训练模型,然后再将此预训练模型迁移到目标领域小样本场景中,并在目标领域小样本场景下,搜集部分用户问句,再根据用户问句设计意图/词槽框架,组织人员根据框架标注数据。其中,在不同的场景下使用的预训练模型架构是一样的,只是将预训练的模型在标注的小样本数据上做调整。而在调整的过程中是将大样本模型的参数全部拿来初始化小样本模型的参数,然后在新场景小样本标注数据上做训练微调。并当在目标领域小样本场景下,对预训练模型训练成功获取到小样本模型后,会将其交互给实际用户使用,在用户使用过程中会不断搜集问句,并扩大训练集,再用扩大的数据集提升此小样本模型。
步骤S20,在所述目标领域内,获取所述预训练模型中预设问句的各语句特征,并对各所述语句特征进行语义分析,以确定所述预设问句对应的各不同意图;
意图是指我们识别用户这句表达具体是想做什么,具体来说意图是一个分类器,将用户需求划分为某个类型。例如:“我要定北京到上海的机票”这句话是用户表达他的需求,这个可以被定义为“告知”意图;“机票都有几点的?”这句话表示用户在询问机票信息,这个可以被定义为“请求”意图。在目标领域小样本场景下,当从预训练模型中获取到预设问句后,还需要获取组成预设问句的句子单词,或者是中文词组等。然后在预训练模型中的Embeddings层(嵌入层)中将输入的句子单词替换成相应的word embedding(嵌入单词),再通过预训练模型中的common representation层(公用特征提取层)中的双向LSTM网络架构来提取各个语句特征,再对这些语句特征进行语义分析,从而确定各个不同的意图,需要说明的是,在现实应用中,每个意图都是由几个词语表述的,比如“确认购买”。其中,LSTM(Long Short-Term Memory)是长短期记忆网络,是一种时间递归神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。
步骤S30,获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分;
在预训练模型中的Intent task(意图任务)层中,使用双向LSTM层将common representation层得到的特征作进一步地抽象,然后再将该双向LSTM每个方向的最后一个状态拼接起来,记为hintent 。我们的预训练模型里面将每个意图名(intent name)的表述词语通过semantic network转换成类似embedding一样固定长度的语义向量,然后拿该语义向量与hintent做双线性运算,以得到该意图的意图相似性分,由于每个意图都是采用相同的方法获取到意图对应的意图相似性分,因此,可以通过将各个意图相似性分进行大小比较,以得到分值最高的最高意图相似性分。并为辅助理解本申请的semantic network的架构和双线性运算,下面进行举例说明。
例如,假设有意图名sni=(w1,w2...wn), Semantic network先将每个单词替换成相应的word embedding:E(wi)。然后使用一层DNN(Deep Neural Network, 深度神经网络)网络将E(wi)做非线性映射得到该单词的语义向量,最后把所有n个词的语义向量做平均得到该意图名的语义向量。双线性运算将两个输入向量V1和V2做如下矩阵运算:score=vT 1Wv2,得到两个向量的相似性打分。
步骤S40,获取所述预训练模型中的各词槽,确定各所述词槽在预训练模型中的词槽相似性分,并在各所述词槽相似性分中确定最高词槽相似性分;
词槽是针对用户表达中关键信息的定义,例如在订机票的表达中,我们的槽位有“起飞时间、起始地、目的地”,这三个关键信息需要被识别出来。在获取预训练模型中的各词槽,并确定各词槽对应的词槽相似性分时,需要先在预训练模型中的Slot task(词槽任务)层确定当前位置的状态,具体来说就是在每个输入位置上将common representation层的双向LSTM和Intent task层的双向LSTM的状态拼接起来作为当前位置的状态, 记t时刻的状态为ht slot。同意图名一样,我们将每个词槽名(slot name)的表述词语也使用semantic network转换成语义向量ri slotname。同时第i个词槽可能有多个取值,每个取值同样可以通过semantic network转换成语义向量,记第j个取值的语义向量为ri,jslotvalue。需要说明的是,所有取值的打分做归一化处理之后同对应取值的语义向量做加权平均,得到整个词槽取值的语义向量ri slotvalue。再用ri slotvalue与ht slot做二次线性运算,得到该词槽的取值的相似性打分。词槽名的相似性打分和词槽取值的相似性打分相加得到该词槽和当前位置的状态ht slot总相似性打分,即词槽相似性分。然后在各个词槽相似性分中确定最高词槽相似性分。
步骤S50,获取所述最高意图相似性分对应的最终意图和所述最高词槽相似性分对应的最终词槽,并输出所述最终意图和所述最终词槽。
在预训练模型中,将最高意图相似性分对应的意图作为最终意图,将最高词槽相似性分对应的词槽作为最终词槽,然后再输出此最终词槽和最终意图。并为辅助理解本申请的预训练模型结构流程,下面进行举例说明。
例如,如图5所示,该模型分为Embeddings层(嵌入层),Common Representation层(公用特征提取层),Intent Task(意图任务)层和Slot task(词槽任务)层。其中,Embeddings层将输入的句子单词替换成相应的word embedding,如W0,Wt,WT+1等。而Common Representation层,Intent Task层和Slot task层均是采用双向LSTM网络架构。在Intent Task层中使用双向LSTM层将common representation层得到的特征作进一步地抽象,然后再将该双向LSTM每个方向的最后一个状态拼接起来,记为hintent ,再将hintent和各个意图如Intent1(意图1)、Intent2(意图2)、Intent3(意图3)进行Semantic Similarity(相似性比较),获取相似性最大的值,即Softmax,然后再将相似性最大的意图进行输出即图中的τ。而输出最终词槽也是先确定当前位置的状态,记t时刻的状态为ht slot,通过Slot Value 1(槽值1)、Slot Value 2(槽值2),一直到Slot Value n(槽值n)和ht slot进行相似性比较,即图中的Semantic Similarity(相似性比较),Attention(注意力),需要对所有取值的相似性打分做归一化处理之后同对应取值的语义向量做加权平均,得到整个词槽取值的语义向量ri slotvalue。再用ri slotvalue与ht slot做二次线性运算,得到该词槽的取值的相似性打分。与此同时也需要将各个slot name(词槽名)和ht slot进行相似性比较,以获取词槽名的相似性打分。词槽名的相似性打分和词槽取值的相似性打分相加得到该词槽和当前位置的状态ht slot总相似性打分。然后在各个词槽相似性分中确定最高词槽相似性分并进行输出即到图中的St。
在本实施例中,通过计算意图的相似性分和词槽的相似性分的方式来替代原理模型中的简单分类模型,可以很好的解决从源领域迁移到目标领域的问题,并且当模型从源领域迁移到目标领域后,不需要用户重新设计规划,具有可扩展性,也不需要重新增加训练数据,从而节约了人工成本,解决了模型迁移到新领域后,因为只有少量的样本或者零样本而不能快速学习并执行口语理解任务的技术问题。
进一步地,在本申请第一实施例的基础上,提出了本申请语句分析处理方法的第二实施例,本实施例是本申请第一实施例的步骤S30,获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分的步骤的细化,参照图3,包括:
步骤S31,获取所述预训练模型中的第一状态向量;
步骤S32,获取各所述意图对应的意图名语义向量,并计算各所述意图名语义向量和第一状态向量之间的意图相似性分;
第一状态向量可以是在模型中的Intent task层,使用双向LSTM层将common representation层得到的特征作进一步地抽象,然后再将该双向LSTM每个方向的最后一个状态拼接起来后的状态向量。意图名即意图的表述词语。当获取到预训练模型中的第一状态向量后,还需要再次获取各个意图对应的意图名语义向量,然后再对意图名语义向量和第一状态向量做二次线性运算,从而得到该意图相似性分。并且由于每个意图都有一个与该意图对应的意图相似性分,获取的方法也基本相同,因此,可以所有的意图相似性分。
步骤S33,对各所述意图相似性分进行比较,以获取各所述意图相似性分中的最高意图相似性分。
当获取到每个意图的意图相似性分时,还需要对每个意图相似性分进行大小比较,已确定分数最高的意图相似性分,并将其作为最高意图相似性分。需要说明的是,每个意图相似性分都需要和其它的意图相似性分进行比较。
在本实施例中,通过确定各个意图名语义向量和第一状态向量之间的相似性分来确定哪个意图的相似性分最高,从而保证了确定用户意图的准确性。
具体地,获取各所述意图对应的意图名语义向量的步骤,包括:
步骤S321,获取所述意图中的各语句信息,并确定各所述语句信息对应的语句语义向量;
获取意图对应的意图名语义向量需要先获取该意图中的所有语句信息,并确定各个语句信息对应的语句语义向量。例如,假设有假设有意图名sni=(w1,w2...wn), Semantic network先将每个单词替换成相应的word embedding: E(wi)。然后使用一层DNN(Deep Neural Network, 深度神经网络)网络将E(wi)做非线性映射得到该单词的语义向量。
步骤S322,获取各所述语句向量的平均向量值,并将所述平均向量值作为所述意图名语义向量。
当在模型中获取各个语句向量后,还需要确定各个语句向量的平均值,即平均向量值,并将此平均向量值作为意图名语义向量。
在本实施例中,通过确定意图中所有的语句信息对应的语句语义向量,并取其平均值作为意图名语义向量,从而提高了检测意图相似性的准确性。
进一步地,在本申请第一实施例至第二实施例任意一个的基础上,提出了本申请语句分析处理方法的第三实施例,本实施例是本申请第一实施例的步骤S40,获取所述预训练模型中的各词槽,确定各所述词槽在预训练模型中的词槽相似性分的步骤的细化,包括:
步骤S41,获取所述预训练模型中的各词槽;
步骤S42,获取所述词槽的词槽名和整体词槽取值,并确定所述词槽名的第一相似性分和所述整体词槽取值的第二相似性分;
第一相似性分可以是词槽名和当前位置状态之间的相似性分。第二相似性分可以是整体词槽取值和当前位置状态之间的相似性分。在现实应用中,词槽一般是由一个或者多个词语表述的,比如“食物”,而且一般每个词槽都会有一些可能的取值,比如“食物”这个词槽,可以很容易的得到可能出现的取值:“蛋糕”,“苹果”,“烤羊腿”等。在预训练模型中通过对预设问句进行分析,来确定可能出现的各个词槽,然后确定词槽的词槽名和整体词槽取值,并确定词槽名对应的词槽名语义向量和整体词槽取值对应的取值语义向量,并在Intent task层中的每个输入位置上将common representation层的双向LSTM和Intent task层的双向LSTM的状态拼接起来作为当前位置的状态,即状态向量,然后在用词槽名语义向量和状态向量做二次线性运算,得到词槽名对应的第一相似性分,在用取值语义向量和状态向量做二次线性运算,得到整体词槽取值对应的第二相似性分。例如,当词槽中有三个词槽向量A1,A2,A3时,这三个向量分别跟当前状态向量做运算分别得到一个分值,然后三个分值归一化之后变成C1,C2,C3,然后A1*C1+A2*C2+A3*C3就是整个词槽取值的语义向量。其中,词槽名即是槽位的名字,槽位的表述词语。整体词槽取值可以是与各个词槽取值值均相关的一个词槽取值。
步骤S43,并根据所述第一相似性分和所述第二相似性分的和值确定所述词槽的词槽相似性分。
当获取到第一相似性分和第二相似性分后,还需要将词槽名对应的第一相似性分和整体词槽取值对应的第二相似性分相加以得到其和值,并将其和值作为该词槽和当前位置的词槽相似性分。
在本实施例中,通过确定词槽名的第一相似性和整体词槽取值的第二相似性,来确定词槽的词槽相似性,从而提高了确定词槽相似性的准确性。
具体地,确定所述词槽名的第一相似性分和所述整体词槽取值的第二相似性分的步骤,包括:
步骤S421,获取所述预训练模型中的当前位置状态,并确定所述当前位置状态的第二状态向量;
在预训练模型中的Intent task层中的每个输入位置上将common representation层的双向LSTM和Intent task层的双向LSTM的状态拼接起来作为当前位置的状态,即第二状态向量。
步骤S422,获取所述词槽名对应的词槽名语义向量,并确定所述词槽名语义向量和所述第二状态向量之间的第一相似性分;
对于词槽名的词槽名语义向量可以通过该预设模型中的一层DNN网络将词槽名做非线性运算来得到该词槽名语义向量,然后再将词槽名语义向量和第二状态向量做二次线性运算得到第一相似性分。
步骤S423,获取所述整体词槽取值对应的取值语义向量,并确定所述取值语义向量和所述第二状态向量之间的第二相似性分。
获取整体词槽取值对应的语义向量可以先计算词槽中的每个词槽取值的语义向量,再确定这些语义向量的相似性分,并对这些相似性分做归一化处理之后同对应的词槽取值的语义向量做加权平均,从而得到整体词槽取值对应的取值语义向量,再将取值语义向量和第二状态向量做二次线性运算以得到第二相似性分。
在本实施例中,通过确定预训练模型中的当前位置状态,来确定词槽名的第一相似性和整体词槽取值的第二相似性,从而保证了系统中的词槽是否为用户所需要的,提高了用户的使用体验感。
具体地,获取所述整体词槽取值对应的取值语义向量的步骤,包括:
步骤A10,获取所述词槽中的各子词槽取值,并确定所述各子词槽取值对应的子取值语义向量;
子词槽取值可以是词槽中的任意一个词槽取值。获取词槽中的所有子词槽取值,并通过该预设模型中的一层DNN网络将子词槽取值做非线性运算来得到子词槽取值对应的子取值语义向量。
步骤A11,计算所述子取值向量和所述第二状态向量之间的第三相似性分,并获取所述第三相似性分和所述子取值向量之间的向量乘积;
第三相似性分可以是任意一个词槽取值和当前位置状态之间的相似性分。通过二次线性运算来计算子取值向量和状态向量之间的第三相似性分,再确定第三相似性分和子取值向量之间的向量乘积。
步骤A12,获取各所述子词槽取值对应的向量乘积,并将各所述向量乘积相加以获取所述整体词槽取值对应的取值语义向量。
获取各个子词槽取值对应的向量乘积,然后再将所有的向量乘积相加以得到其和值,最后将和值作为整体词槽取值对应的取值语义向量。
在本实施例中,通过根据所有子词槽取值来确定整体词槽取值对应的取值语义向量,从而保证了取值语义向量和词槽中的所有词槽取值都相关,保证了取值语义向量的准确性,提高了用户的体验感。
具体地,获取所述预训练模型中的各词槽的步骤,包括:
步骤S411,获取所述预训练模型中的预设问句;
步骤S412,在所述目标领域内对所述预设问句进行语义分析,以确定所述预训练模型中的各词槽。
在预训练模型中,由于每个预设问句需要用到的词槽都不相同,因此需要获取预训练模型中的预设问句,并对此预设问句进行语义分析,从而来确定预训练模型中的各个词槽。例如,当对预设问句进行语义分析时,发现需要与食物相关的东西时,此时词槽名即可以为食物,而词槽中的各个词槽则可以为蛋糕、苹果、烤羊腿等。
在本实施例中,通过根据目标领域下的预设问句来确定预训练模型中的各词槽,从而保证了各个词槽和预设问句相关,避免了无关词槽占据词槽空间,节约了资源,提高了用户的使用体验感。
此外,参照图4,本申请实施例还提出一种语句分析处理装置,所述语句分析处理装置包括:
迁移模块,用于获取源领域大样本数据集上的预训练模型,并将所述预训练模型迁移学习到目标领域;
确定模块,用于在所述目标领域内,获取所述预训练模型中预设问句的各语句特征,并对各所述语句特征进行语义分析,以确定所述预设问句对应的各不同意图;
第一获取模块,用于获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分;
第二获取模块,用于获取所述预训练模型中的各词槽,确定各所述词槽在预训练模型中的词槽相似性分,并在各所述词槽相似性分中确定最高词槽相似性分;
输出模块,用于获取所述最高意图相似性分对应的最终意图和所述最高词槽相似性分对应的最终词槽,并输出所述最高意图和所述最终词槽。
可选地,所述第一获取模块,还用于:
获取所述预训练模型中的第一状态向量;
获取各所述意图对应的意图名语义向量,并计算各所述意图名语义向量和第一状态向量之间的意图相似性分;
对各所述意图相似性分进行比较,以获取各所述意图相似性分中的最高意图相似性分。
可选地,所述第一获取模块,还用于:
获取所述意图中的各语句信息,并确定各所述语句信息对应的语句语义向量;
获取各所述语句向量的平均向量值,并将所述平均向量值作为所述意图名语义向量。
可选地,所述第二获取模块,还用于:
获取所述预训练模型中的各词槽;
获取所述词槽的词槽名和整体词槽取值,并确定所述词槽名的第一相似性分和所述整体词槽取值的第二相似性分;
并根据所述第一相似性分和所述第二相似性分的和值确定所述词槽的词槽相似性分。
可选地,所述第二获取模块,还用于:
获取所述预训练模型中的当前位置状态,并确定所述当前位置状态的第二状态向量;
获取所述词槽名对应的词槽名语义向量,并确定所述词槽名语义向量和所述第二状态向量之间的第一相似性分;
获取所述整体词槽取值对应的取值语义向量,并确定所述取值语义向量和所述第二状态向量之间的第二相似性分。
可选地,所述第二获取模块,还用于:
获取所述词槽中的各子词槽取值,并确定所述各子词槽取值对应的子取值语义向量;
计算所述子取值向量和所述第二状态向量之间的第三相似性分,并获取所述第三相似性分和所述子取值向量之间的向量乘积;
获取各所述子词槽取值对应的向量乘积,并将各所述向量乘积相加以获取所述整体词槽取值对应的取值语义向量。
可选地,所述第二获取模块,还用于:
获取所述预训练模型中的预设问句;
在所述目标领域内对所述预设问句进行语义分析,以确定所述预训练模型中的各词槽。
其中,语句分析处理装置的各个功能模块实现的步骤可参照本申请语句分析处理方法的各个实施例,此处不再赘述。
此外,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以为非易失性可读存储介质。
本申请计算机可读存储介质上存储有计算机可读指令,其中所述计算机可读指令被处理器执行时,实现如上述的语句分析处理方法的步骤。
其中,该计算机可读指令被执行时所实现的方法可参照本申请语句分析处理方法的各个实施例,此处不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种语句分析处理方法,其特征在于,所述语句分析处理方法包括以下步骤:
    获取源领域大样本数据集上的预训练模型,并将所述预训练模型迁移学习到目标领域;
    在所述目标领域内,获取所述预训练模型中预设问句的各语句特征,并对各所述语句特征进行语义分析,以确定所述预设问句对应的各不同意图;
    获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分;
    获取所述预训练模型中的各词槽,确定各所述词槽在训练模型中的词槽相似性分,并在各所述词槽相似性分中确定最高词槽相似性分;
    获取所述最高意图相似性分对应的最终意图和所述最高词槽相似性分对应的最终词槽,并输出所述最高意图和所述最终词槽。
  2. 如权利要求1所述的语句分析处理方法,其特征在于,所述获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分的步骤,包括:
    获取所述预训练模型中的第一状态向量;
    获取各所述意图对应的意图名语义向量,并计算各所述意图名语义向量和第一状态向量之间的意图相似性分;
    对各所述意图相似性分进行比较,以获取各所述意图相似性分中的最高意图相似性分。
  3. 如权利要求2所述的语句分析处理方法,其特征在于,所述获取各所述意图对应的意图名语义向量的步骤,包括:
    获取所述意图中的各语句信息,并确定各所述语句信息对应的语句语义向量;
    获取各所述语句向量的平均向量值,并将所述平均向量值作为所述意图名语义向量。
  4. 如权利要求1所述的语句分析处理方法,其特征在于,所述获取所述预训练模型中的各词槽,确定各所述词槽在训练模型中的词槽相似性分的步骤,包括:
    获取所述预训练模型中的各词槽;
    获取所述词槽的词槽名和整体词槽取值,并确定所述词槽名的第一相似性分和所述整体词槽取值的第二相似性分;
    根据所述第一相似性分和所述第二相似性分的和值确定所述词槽的词槽相似性分。
  5. 如权利要求4所述的语句分析处理方法,其特征在于,所述确定所述词槽名的第一相似性分和所述整体词槽取值的第二相似性分的步骤,包括:
    获取所述预训练模型中的当前位置状态,并确定所述当前位置状态的第二状态向量;
    获取所述词槽名对应的词槽名语义向量,并确定所述词槽名语义向量和所述第二状态向量之间的第一相似性分;
    获取所述整体词槽取值对应的取值语义向量,并确定所述取值语义向量和所述第二状态向量之间的第二相似性分。
  6. 如权利要求5所述的语句分析处理方法,其特征在于,所述获取所述整体词槽取值对应的取值语义向量的步骤,包括:
    获取所述词槽中的各子词槽取值,并确定所述各子词槽取值对应的子取值语义向量;
    计算所述子取值向量和所述第二状态向量之间的第三相似性分,并获取所述第三相似性分和所述子取值向量之间的向量乘积;
    获取各所述子词槽取值对应的向量乘积,并将各所述向量乘积相加以获取所述整体词槽取值对应的取值语义向量。
  7. 如权利要求4所述的语句分析处理方法,其特征在于,所述获取所述预训练模型中的各词槽的步骤,包括:
    获取所述预训练模型中的预设问句;
    在所述目标领域内对所述预设问句进行语义分析,以确定所述预训练模型中的各词槽。
  8. 一种语句分析处理装置,其特征在于,所述语句分析处理装置包括:
    迁移模块,用于获取源领域大样本数据集上的预训练模型,并将所述预训练模型迁移学习到目标领域;
    确定模块,用于在所述目标领域内,获取所述预训练模型中预设问句的各语句特征,并对各所述语句特征进行语义分析,以确定所述预设问句对应的各不同意图;
    第一获取模块,用于获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分;
    第二获取模块,用于获取所述预训练模型中的各词槽,确定各所述词槽在预训练模型中的词槽相似性分,并在各所述词槽相似性分中确定最高词槽相似性分;
    输出模块,用于获取所述最高意图相似性分对应的最终意图和所述最高词槽相似性分对应的最终词槽,并输出所述最高意图和所述最终词槽。
  9. 一种语句分析处理设备,其特征在于,所述语句分析处理设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,其中所述计算机可读指令被所述处理器执行时实现如下步骤:
    获取源领域大样本数据集上的预训练模型,并将所述预训练模型迁移学习到目标领域;
    在所述目标领域内,获取所述预训练模型中预设问句的各语句特征,并对各所述语句特征进行语义分析,以确定所述预设问句对应的各不同意图;
    获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分;
    获取所述预训练模型中的各词槽,确定各所述词槽在训练模型中的词槽相似性分,并在各所述词槽相似性分中确定最高词槽相似性分;
    获取所述最高意图相似性分对应的最终意图和所述最高词槽相似性分对应的最终词槽,并输出所述最高意图和所述最终词槽。
  10. 如权利要求9所述的语句分析处理设备,其特征在于,所述获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分的步骤,包括:
    获取所述预训练模型中的第一状态向量;
    获取各所述意图对应的意图名语义向量,并计算各所述意图名语义向量和第一状态向量之间的意图相似性分;
    对各所述意图相似性分进行比较,以获取各所述意图相似性分中的最高意图相似性分。
  11. 如权利要求10所述的语句分析处理设备,其特征在于,所述获取各所述意图对应的意图名语义向量的步骤,包括:
    获取所述意图中的各语句信息,并确定各所述语句信息对应的语句语义向量;
    获取各所述语句向量的平均向量值,并将所述平均向量值作为所述意图名语义向量。
  12. 如权利要求9所述的语句分析处理设备,其特征在于,所述获取所述预训练模型中的各词槽,确定各所述词槽在训练模型中的词槽相似性分的步骤,包括:
    获取所述预训练模型中的各词槽;
    获取所述词槽的词槽名和整体词槽取值,并确定所述词槽名的第一相似性分和所述整体词槽取值的第二相似性分;
    根据所述第一相似性分和所述第二相似性分的和值确定所述词槽的词槽相似性分。
  13. 如权利要求12所述的语句分析处理设备,其特征在于,所述确定所述词槽名的第一相似性分和所述整体词槽取值的第二相似性分的步骤,包括:
    获取所述预训练模型中的当前位置状态,并确定所述当前位置状态的第二状态向量;
    获取所述词槽名对应的词槽名语义向量,并确定所述词槽名语义向量和所述第二状态向量之间的第一相似性分;
    获取所述整体词槽取值对应的取值语义向量,并确定所述取值语义向量和所述第二状态向量之间的第二相似性分。
  14. 如权利要求13所述的语句分析处理设备,其特征在于,所述获取所述整体词槽取值对应的取值语义向量的步骤,包括:
    获取所述词槽中的各子词槽取值,并确定所述各子词槽取值对应的子取值语义向量;
    计算所述子取值向量和所述第二状态向量之间的第三相似性分,并获取所述第三相似性分和所述子取值向量之间的向量乘积;
    获取各所述子词槽取值对应的向量乘积,并将各所述向量乘积相加以获取所述整体词槽取值对应的取值语义向量。
  15. 如权利要求12所述的语句分析处理设备,其特征在于,所述获取所述预训练模型中的各词槽的步骤,包括:
    获取所述预训练模型中的预设问句;
    在所述目标领域内对所述预设问句进行语义分析,以确定所述预训练模型中的各词槽。
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,实现如下步骤:
    获取源领域大样本数据集上的预训练模型,并将所述预训练模型迁移学习到目标领域;
    在所述目标领域内,获取所述预训练模型中预设问句的各语句特征,并对各所述语句特征进行语义分析,以确定所述预设问句对应的各不同意图;
    获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分;
    获取所述预训练模型中的各词槽,确定各所述词槽在训练模型中的词槽相似性分,并在各所述词槽相似性分中确定最高词槽相似性分;
    获取所述最高意图相似性分对应的最终意图和所述最高词槽相似性分对应的最终词槽,并输出所述最高意图和所述最终词槽。
  17. 如权利要求16所述的计算机可读存储介质,其特征在于,所述获取各所述意图在预训练模型中的意图相似性分,并在各所述意图相似性分中确定最高意图相似性分的步骤,包括:
    获取所述预训练模型中的第一状态向量;
    获取各所述意图对应的意图名语义向量,并计算各所述意图名语义向量和第一状态向量之间的意图相似性分;
    对各所述意图相似性分进行比较,以获取各所述意图相似性分中的最高意图相似性分。
  18. 如权利要求17所述的计算机可读存储介质,其特征在于,所述获取各所述意图对应的意图名语义向量的步骤,包括:
    获取所述意图中的各语句信息,并确定各所述语句信息对应的语句语义向量;
    获取各所述语句向量的平均向量值,并将所述平均向量值作为所述意图名语义向量。
  19. 如权利要求16所述的计算机可读存储介质,其特征在于,所述获取所述预训练模型中的各词槽,确定各所述词槽在训练模型中的词槽相似性分的步骤,包括:
    获取所述预训练模型中的各词槽;
    获取所述词槽的词槽名和整体词槽取值,并确定所述词槽名的第一相似性分和所述整体词槽取值的第二相似性分;
    根据所述第一相似性分和所述第二相似性分的和值确定所述词槽的词槽相似性分。
  20. 如权利要求19所述的计算机可读存储介质,其特征在于,所述确定所述词槽名的第一相似性分和所述整体词槽取值的第二相似性分的步骤,包括:
    获取所述预训练模型中的当前位置状态,并确定所述当前位置状态的第二状态向量;
    获取所述词槽名对应的词槽名语义向量,并确定所述词槽名语义向量和所述第二状态向量之间的第一相似性分;
    获取所述整体词槽取值对应的取值语义向量,并确定所述取值语义向量和所述第二状态向量之间的第二相似性分。
PCT/CN2019/081282 2018-11-30 2019-04-03 语句分析处理方法、装置、设备以及计算机可读存储介质 WO2020107765A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811464437.5 2018-11-30
CN201811464437.5A CN109597993B (zh) 2018-11-30 2018-11-30 语句分析处理方法、装置、设备以及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2020107765A1 true WO2020107765A1 (zh) 2020-06-04

Family

ID=65959469

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/081282 WO2020107765A1 (zh) 2018-11-30 2019-04-03 语句分析处理方法、装置、设备以及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN109597993B (zh)
WO (1) WO2020107765A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859909A (zh) * 2020-07-10 2020-10-30 山西大学 一种语义场景一致性识别阅读机器人
CN112016300A (zh) * 2020-09-09 2020-12-01 平安科技(深圳)有限公司 预训练模型处理、下游任务处理方法、装置及存储介质
CN112214998A (zh) * 2020-11-16 2021-01-12 中国平安财产保险股份有限公司 意图与实体的联合识别方法、装置、设备和存储介质
CN112507712A (zh) * 2020-12-11 2021-03-16 北京百度网讯科技有限公司 建立槽位识别模型与槽位识别的方法、装置
CN112926313A (zh) * 2021-03-10 2021-06-08 新华智云科技有限公司 一种槽位信息的提取方法与系统
CN113378970A (zh) * 2021-06-28 2021-09-10 平安普惠企业管理有限公司 语句相似性检测方法、装置、电子设备及存储介质
CN117574878A (zh) * 2024-01-15 2024-02-20 西湖大学 用于混合领域的成分句法分析方法、装置及介质

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188182B (zh) * 2019-05-31 2023-10-27 中国科学院深圳先进技术研究院 模型训练方法、对话生成方法、装置、设备及介质
CN110309875A (zh) * 2019-06-28 2019-10-08 哈尔滨工程大学 一种基于伪样本特征合成的零样本目标分类方法
CN110399492A (zh) * 2019-07-22 2019-11-01 阿里巴巴集团控股有限公司 针对用户问句的问题分类模型的训练方法和装置
CN110674648B (zh) * 2019-09-29 2021-04-27 厦门大学 基于迭代式双向迁移的神经网络机器翻译模型
CN110909541A (zh) * 2019-11-08 2020-03-24 杭州依图医疗技术有限公司 指令生成方法、系统、设备和介质
CN111563144B (zh) * 2020-02-25 2023-10-20 升智信息科技(南京)有限公司 基于语句前后关系预测的用户意图识别方法及装置
CN111460118B (zh) * 2020-03-26 2023-10-20 聚好看科技股份有限公司 一种人工智能冲突语义识别方法及装置
CN111767377B (zh) * 2020-06-22 2024-05-28 湖北马斯特谱科技有限公司 一种面向低资源环境的高效口语理解识别方法
CN111738016B (zh) * 2020-06-28 2023-09-05 中国平安财产保险股份有限公司 多意图识别方法及相关设备
CN111931512A (zh) * 2020-07-01 2020-11-13 联想(北京)有限公司 语句意图的确定方法及装置、存储介质
CN112883180A (zh) * 2021-02-24 2021-06-01 挂号网(杭州)科技有限公司 模型训练方法、装置、电子设备和存储介质
CN113326360B (zh) * 2021-04-25 2022-12-13 哈尔滨工业大学 一种小样本场景下的自然语言理解方法
CN114444462B (zh) * 2022-01-26 2022-11-29 北京百度网讯科技有限公司 模型训练方法及人机交互方法、装置
CN117709394A (zh) * 2024-02-06 2024-03-15 华侨大学 车辆轨迹预测模型训练方法、多模型迁移预测方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170169013A1 (en) * 2015-12-11 2017-06-15 Microsoft Technology Licensing, Llc Personalizing Natural Language Understanding Systems
CN107341146A (zh) * 2017-06-23 2017-11-10 上海交通大学 基于语义槽内部结构的可迁移口语语义解析系统及其实现方法
CN107832476A (zh) * 2017-12-01 2018-03-23 北京百度网讯科技有限公司 一种搜索序列的理解方法、装置、设备和存储介质
CN108681585A (zh) * 2018-05-14 2018-10-19 浙江工业大学 一种基于NetSim-TL的多源迁移学习标签流行性预测模型的构建方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156003B (zh) * 2016-06-30 2018-08-28 北京大学 一种问答系统中的问句理解方法
CN107330120B (zh) * 2017-07-14 2018-09-18 三角兽(北京)科技有限公司 询问应答方法、询问应答装置及计算机可读存储介质
CN107688614B (zh) * 2017-08-04 2018-08-10 平安科技(深圳)有限公司 意图获取方法、电子装置及计算机可读存储介质
CN108305612B (zh) * 2017-11-21 2020-07-31 腾讯科技(深圳)有限公司 文本处理、模型训练方法、装置、存储介质和计算机设备
CN108021660B (zh) * 2017-12-04 2020-05-22 中国人民解放军国防科技大学 一种基于迁移学习的话题自适应的微博情感分析方法
CN108197167A (zh) * 2017-12-18 2018-06-22 深圳前海微众银行股份有限公司 人机对话处理方法、设备及可读存储介质
CN108182264B (zh) * 2018-01-09 2022-04-01 武汉大学 一种基于跨领域排名推荐模型的排名推荐方法
CN108334496B (zh) * 2018-01-30 2020-06-12 中国科学院自动化研究所 用于特定领域的人机对话理解方法与系统及相关设备
CN108874779B (zh) * 2018-06-21 2021-09-21 东北大学 基于K8s集群建立的依图写诗系统的控制方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170169013A1 (en) * 2015-12-11 2017-06-15 Microsoft Technology Licensing, Llc Personalizing Natural Language Understanding Systems
CN107341146A (zh) * 2017-06-23 2017-11-10 上海交通大学 基于语义槽内部结构的可迁移口语语义解析系统及其实现方法
CN107832476A (zh) * 2017-12-01 2018-03-23 北京百度网讯科技有限公司 一种搜索序列的理解方法、装置、设备和存储介质
CN108681585A (zh) * 2018-05-14 2018-10-19 浙江工业大学 一种基于NetSim-TL的多源迁移学习标签流行性预测模型的构建方法

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859909B (zh) * 2020-07-10 2022-05-31 山西大学 一种语义场景一致性识别阅读机器人
CN111859909A (zh) * 2020-07-10 2020-10-30 山西大学 一种语义场景一致性识别阅读机器人
CN112016300A (zh) * 2020-09-09 2020-12-01 平安科技(深圳)有限公司 预训练模型处理、下游任务处理方法、装置及存储介质
CN112016300B (zh) * 2020-09-09 2022-10-14 平安科技(深圳)有限公司 预训练模型处理、下游任务处理方法、装置及存储介质
CN112214998A (zh) * 2020-11-16 2021-01-12 中国平安财产保险股份有限公司 意图与实体的联合识别方法、装置、设备和存储介质
CN112214998B (zh) * 2020-11-16 2023-08-22 中国平安财产保险股份有限公司 意图与实体的联合识别方法、装置、设备和存储介质
CN112507712A (zh) * 2020-12-11 2021-03-16 北京百度网讯科技有限公司 建立槽位识别模型与槽位识别的方法、装置
CN112507712B (zh) * 2020-12-11 2024-01-26 北京百度网讯科技有限公司 建立槽位识别模型与槽位识别的方法、装置
CN112926313B (zh) * 2021-03-10 2023-08-15 新华智云科技有限公司 一种槽位信息的提取方法与系统
CN112926313A (zh) * 2021-03-10 2021-06-08 新华智云科技有限公司 一种槽位信息的提取方法与系统
CN113378970A (zh) * 2021-06-28 2021-09-10 平安普惠企业管理有限公司 语句相似性检测方法、装置、电子设备及存储介质
CN113378970B (zh) * 2021-06-28 2023-08-22 山东浪潮成方数字服务有限公司 语句相似性检测方法、装置、电子设备及存储介质
CN117574878A (zh) * 2024-01-15 2024-02-20 西湖大学 用于混合领域的成分句法分析方法、装置及介质
CN117574878B (zh) * 2024-01-15 2024-05-17 西湖大学 用于混合领域的成分句法分析方法、装置及介质

Also Published As

Publication number Publication date
CN109597993B (zh) 2021-11-05
CN109597993A (zh) 2019-04-09

Similar Documents

Publication Publication Date Title
WO2020107765A1 (zh) 语句分析处理方法、装置、设备以及计算机可读存储介质
WO2020180013A1 (en) Apparatus for vision and language-assisted smartphone task automation and method thereof
WO2020034526A1 (zh) 保险录音的质检方法、装置、设备和计算机存储介质
WO2020107761A1 (zh) 广告文案处理方法、装置、设备及计算机可读存储介质
WO2020164281A1 (zh) 基于文字定位识别的表格解析方法、介质及计算机设备
WO2018074681A1 (ko) 전자 장치 및 그 제어 방법
WO2020107762A1 (zh) Ctr预估方法、装置及计算机可读存储介质
WO2018164378A1 (en) Electronic apparatus for compressing language model, electronic apparatus for providing recommendation word and operation methods thereof
WO2020119069A1 (zh) 基于自编码神经网络的文本生成方法、装置、终端及介质
WO2020159288A1 (ko) 전자 장치 및 그 제어 방법
WO2020253115A1 (zh) 基于语音识别的产品推荐方法、装置、设备和存储介质
WO2021051558A1 (zh) 基于知识图谱的问答方法、装置和存储介质
WO2018182201A1 (ko) 사용자의 음성 입력에 대한 답변을 제공하는 방법 및 장치
WO2019125054A1 (en) Method for content search and electronic device therefor
WO2020071854A1 (en) Electronic apparatus and control method thereof
WO2021071155A1 (en) Electronic apparatus and control method thereof
EP3577571A1 (en) Electronic apparatus for compressing language model, electronic apparatus for providing recommendation word and operation methods thereof
WO2021107449A1 (ko) 음역 전환 신조어를 이용한 지식 그래프 기반 마케팅 정보 분석 서비스 제공 방법 및 그 장치
WO2016182393A1 (ko) 사용자의 감성을 분석하는 방법 및 디바이스
WO2021085811A1 (ko) 키보드 매크로 기능을 활용한 자동 음성 인식기 및 음성 인식 방법
WO2021107445A1 (ko) 지식 그래프 및 국가별 음역 전환 기반 신조어 정보 서비스 제공 방법 및 그 장치
WO2018182072A1 (ko) 가상현실 및 증강현실 콘텐츠에서 학습 데이터를 추출하는 시스템 및 방법
WO2022244997A1 (en) Method and apparatus for processing data
WO2022177091A1 (ko) 전자 장치 및 이의 제어 방법
WO2019103518A1 (ko) 전자 장치 및 그 제어 방법

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19888784

Country of ref document: EP

Kind code of ref document: A1