WO2024114186A1 - 意图识别方法以及相关设备 - Google Patents

意图识别方法以及相关设备 Download PDF

Info

Publication number
WO2024114186A1
WO2024114186A1 PCT/CN2023/126388 CN2023126388W WO2024114186A1 WO 2024114186 A1 WO2024114186 A1 WO 2024114186A1 CN 2023126388 W CN2023126388 W CN 2023126388W WO 2024114186 A1 WO2024114186 A1 WO 2024114186A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
target
intent
model
initial
Prior art date
Application number
PCT/CN2023/126388
Other languages
English (en)
French (fr)
Inventor
姚望
宁义双
Original Assignee
金蝶软件(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 金蝶软件(中国)有限公司 filed Critical 金蝶软件(中国)有限公司
Publication of WO2024114186A1 publication Critical patent/WO2024114186A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the embodiments of the present application relate to the field of machine learning, and in particular to intent recognition methods and related equipment.
  • Text is an important medium for users to interact with Internet services.
  • Conversational robots respond to user texts through retrieval or generation based on understanding user texts. The most important thing in this process is the understanding of user texts, that is, intent recognition.
  • intent recognition is the primary task of conversational robots. Only by correctly recognizing the intent of user language can they respond to user texts reasonably in the future.
  • intent recognition can be implemented based on a traditional neural network model. For example, multiple sentences of user text that may be sent by users with different intent types are labeled as training data. Then, a neural network model is trained based on the training data with corresponding labels of different intent types. Finally, the neural network model can be used to identify and classify the user text intent.
  • the embodiments of the present application provide an intent recognition method and related equipment, involving artificial intelligence technology, which is used to balance model training costs and model performance in multi-user training scenarios.
  • a first aspect of an embodiment of the present application provides an intent recognition method, including:
  • the target adaptive parameter is obtained by training the adaptive layer of the initial intent recognition model based on the historical recognition text sent by the target user;
  • the target intent type with the highest corresponding confidence is determined as the predicted intent type of the text to be recognized.
  • the method further includes:
  • the adaptive parameters of the initial intent recognition model are adjusted according to the first loss value until the first loss value satisfies a first preset convergence condition to obtain a target adaptive parameter.
  • the method before inputting the text to be recognized into the target intent recognition model, the method further includes:
  • the model parameters of the initial text model are adjusted according to the second loss value until the second loss value satisfies a second preset convergence condition to obtain the initial intent recognition model.
  • the method further includes:
  • the pre-trained text model is an encoder layer of the initial text model, and the initial text model further includes an attention layer, and the attention layer is the adaptive layer;
  • the initial text model is constructed, wherein the pre-trained text model is the encoder layer of the initial text model, and the initial text model also includes an attention layer and an adaptive layer.
  • inputting the text to be recognized into the target intent recognition model includes:
  • the pre-processed text to be recognized is input into the target intent recognition model.
  • the method before determining the target adaptive parameter corresponding to the target user, the method further includes:
  • the step of determining the target adaptive parameter corresponding to the target user and using the target adaptive parameter to update the adaptive layer of the initial intent recognition model includes:
  • a second aspect of an embodiment of the present application provides an intention recognition device, including:
  • the acquisition unit is used to acquire the text to be recognized sent by the target user.
  • a determination unit used to determine a target adaptive parameter corresponding to the target user, and use the target adaptive parameter to update the adaptive layer of the initial intent recognition model to obtain a target intent recognition model corresponding to the target user;
  • the target adaptive parameter is obtained by training the adaptive layer of the initial intent recognition model based on the historical recognition text sent by the target user;
  • An input unit is used to input the text to be recognized into the target intention recognition model to obtain the The confidence level of the text to be recognized under each intent type;
  • the device further includes: a computing unit and a training unit;
  • the acquisition unit is further used to acquire the historical recognition texts sent by the target user and the intention type label corresponding to each of the historical recognition texts;
  • the input unit is further used to input the target historical text into the initial intention recognition model to obtain the confidence of the target historical text under each intention type;
  • the calculation unit is used to calculate a first loss value based on the confidence of the target historical text under each intent type and the intent type label corresponding to the target historical text;
  • the device further includes: a computing unit and a training unit;
  • the calculation unit is used to calculate a second loss value based on the confidence of the target simulation text under each intent type and the intent type label corresponding to the target simulation text;
  • the construction unit is used to construct the initial text model based on a pre-trained text model, wherein the pre-trained text model is an encoder layer of the initial text model, and the initial text model further includes an attention layer, and the attention layer is the adaptive layer;
  • the construction unit is also used to construct the initial text model based on a pre-trained text model, wherein the pre-trained text model is an encoder layer of the initial text model, and the initial text model also includes an attention layer and an adaptive layer.
  • the input unit is specifically used to perform text preprocessing on the text to be recognized to obtain preprocessed text to be recognized;
  • the pre-processed text to be recognized is input into the target intent recognition model.
  • the device before determining the target adaptive parameter corresponding to the target user, the device further includes: a deployment unit;
  • the deployment unit is used to deploy the initial intent recognition model and adaptive parameters corresponding to multiple users respectively;
  • the determination unit is specifically used to determine the target adaptive parameters corresponding to the target user from the adaptive parameters corresponding to the multiple users, and use the target adaptive parameters to update the adaptive layer of the initial intention recognition model.
  • a third aspect of an embodiment of the present application provides an intention recognition device, including:
  • the memory is a transient storage memory or a persistent storage memory
  • the central processing unit is configured to communicate with the memory and execute instructions in the memory to perform the method described in the first aspect.
  • a fourth aspect of the embodiments of the present application provides a computer program product comprising instructions, and when the computer program product is run on a computer, the computer is caused to execute the method described in the first aspect.
  • a fifth aspect of an embodiment of the present application provides a computer storage medium, wherein the computer storage medium stores instructions, and when the instructions are executed on a computer, the computer executes the method described in the first aspect.
  • the embodiments of the present application have the following advantages: providing a corresponding intent recognition model for each user can ensure that the model can maintain good performance in the recognition of at least one intent type corresponding to each user. At the same time, when training the target intent recognition model corresponding to the target user, only the adaptive parameters of the adaptive layer in the initial intent recognition model are adjusted, and the training The cost is much lower than performing global training for each user to obtain the corresponding neural network model. It can be seen that the intent recognition model of the embodiment of the present application can take into account both the model training cost and the model performance.
  • FIG1 is a system architecture diagram of an intention recognition system disclosed in an embodiment of the present application.
  • FIG2 is a flow chart of an intention recognition method disclosed in an embodiment of the present application.
  • FIG3 is a schematic diagram of a structure of an intention recognition device disclosed in an embodiment of the present application.
  • FIG4 is another schematic diagram of the structure of the intention recognition device disclosed in the embodiment of the present application.
  • FIG5 is a schematic diagram of a structure of an adaptive layer disclosed in an embodiment of the present application.
  • Intent recognition is the primary task of a conversational robot. Only by correctly recognizing the intention of the user's language can it respond to the user's text reasonably.
  • the current mainstream methods for intent recognition are: rule-based methods and traditional neural network-based methods.
  • the rule-based method is to formulate keywords or specific sentence structures corresponding to various intent types through people's understanding of a certain intent type.
  • the text to be recognized input by the user contains corresponding keywords or specific sentence structures
  • the text to be recognized is considered to be of the corresponding intent type.
  • this intent recognition method relies heavily on the accuracy of people's understanding of intent types.
  • keywords and specific sentence structures cannot adapt to the changing user texts.
  • this technical solution proposes an intent recognition method based on a large-scale pre-trained language model under limited computing resources.
  • the method first uses a large-scale pre-trained language model (including but not limited to: BERT, RoBERTa and ERNIE, etc.) as the encoder layer to encode the input text to obtain the semantic features of the text to be recognized, and then uses the attention mechanism (Attention Mechanism, i.e., the attention layer) to make the model focus on the part of the text to be recognized that can better highlight the intent of the text to be recognized.
  • a soft maximization layer Softmax, i.e., the output layer
  • Softmax i.e., the output layer
  • the adaptive fine-tuning method is used to freeze most of the parameters of the encoder layer and only fine-tune the parameters of the top layer of the encoder to reduce the model's consumption of computing resources such as memory, CPU, and GPU.
  • the embodiments of the present application provide an intent recognition method and related equipment, involving artificial intelligence technology, which is used to obtain an intent recognition model that maintains good performance in different intent types at a low training cost.
  • the embodiment of the present application provides an intent recognition system, which is used to execute the intent recognition method provided by the present application.
  • the intent recognition system can be packaged into docker, so that it can be quickly and conveniently deployed on a machine with a docker environment, that is, a terminal or server with the intent recognition system deployed can perform intent recognition on the text to be recognized.
  • a user can input text to be recognized through a terminal 101.
  • the terminal 101 can send the text to be recognized to any server 102 deployed with an intent recognition system.
  • the server 102 can input the text to be recognized into a target intent recognition model to obtain the confidence of the text to be recognized under each intent type.
  • the target intent type with the highest corresponding confidence is determined as the predicted intent type of the text to be recognized.
  • a reply can be made according to the predicted intent type, or related services can be pushed to the target user.
  • the embodiment of the present application further provides an intention recognition method, which can be executed by the above-mentioned server 103, including the following steps:
  • the first step of intent recognition is to obtain the intention of the target user in sending the text to be recognized.
  • the target adaptive parameter is obtained by training the adaptive layer of the initial intent recognition model based on the historical recognition text sent by the target user.
  • the adaptive parameters of the adaptive layer may be part or all of the model parameters corresponding to the adaptive layer, which is not specifically limited here.
  • the text to be recognized obtained in step 201 can be input into the target intent recognition model, and the confidence of the text to be recognized output by the target intent recognition model under each intent type can be obtained. Based on the confidence of the text to be recognized under each intent type, the intent type of the text to be recognized can be judged.
  • the confidence of the text to be recognized under different intent types can reflect the possibility that the text to be recognized belongs to the corresponding intent type. Therefore, the higher the confidence, the higher the possibility that the text to be recognized belongs to the intent type. Therefore, the corresponding target intent type with the highest confidence can be directly determined as the predicted intent type of the text to be recognized.
  • a corresponding intent recognition model is provided for each user, which can ensure that the model can maintain good performance in the recognition of at least one intent type corresponding to each user.
  • the intent recognition model of the embodiment of the present application can take into account both the model training cost and the model performance.
  • the target adaptive parameters required for constructing the target intent recognition model in the aforementioned step 202 can be obtained by training the adaptive layer of the initial intent recognition model, which can be specifically implemented in the following manner: obtaining the historical recognition texts sent by the target user, and the intent type labels corresponding to each historical recognition text; determining each historical recognition text as the target historical text in turn; inputting the target historical text into the initial intent recognition model to obtain the confidence of the target historical text under each intent type; calculating the first loss value based on the confidence of the target historical text under each intent type, and the intent type labels corresponding to the target historical text; adjusting the adaptive parameters of the initial intent recognition model according to the first loss value until the first loss value meets the first preset convergence condition to obtain the target adaptive parameters.
  • At least one historical recognition text previously sent by the target user can be obtained (it should be noted that in order to ensure the good performance of the model, multiple historical recognition texts are usually used), wherein each historical recognition text has a corresponding intent type label, which is used to identify the intent type corresponding to the historical recognition text. Then, each historical recognition text is used in turn to train the initial intent recognition model until the initial intent recognition model converges to obtain the target intent training model.
  • the first loss value can be calculated according to the confidence of the target historical text under each intent type, the corresponding intent type label of the target historical text, and the loss function.
  • the adaptive parameters of the initial intent recognition model are adjusted based on the first loss value; if the first loss value meets the preset convergence conditions, the adaptive parameters of the initial intent recognition model at this time are determined to be the target adaptive parameters of the target user.
  • the object is a user
  • the historical recognition text and other related data involved in the embodiments of this application are all obtained after the user's authorization.
  • the data involved in the use need to obtain the user's permission or consent, and the collection, use and processing of the relevant data need to comply with the relevant laws, regulations and standards of the relevant countries and regions.
  • the initial text model can be preliminarily fine-tuned in the classification task (that is, intent recognition) scenario, and then, based on the initial intent recognition model that can be used to perform the classification task after the preliminary fine-tuning, the intent recognition model corresponding to each user is trained.
  • the specific implementation method is as follows: obtain at least one preset simulated text and the intent type label corresponding to each simulated text; determine each simulated text as the target simulated text in turn; input the target simulated text into the initial text model to obtain the confidence of the target simulated text under each intent type; based on the confidence of the target simulated text under each intent type, The intention type label corresponding to the simulated text is marked, and a second loss value is calculated; the model parameters of the initial text model are adjusted according to the second loss value until the second loss value meets the second preset convergence condition to obtain the initial intention recognition model.
  • the initial text model is trained based on each simulated text to obtain an initial intent recognition model that can be used for intent classification tasks.
  • the training method of the initial text model is similar to the aforementioned training method based on the initial intent training model to obtain the target intent recognition model, and will not be repeated here.
  • the simulated text here can be data from any existing open source text classification data set, or it can be simulated text written by business personnel and/or technical personnel based on their understanding of various intent types. There is no specific limitation here.
  • the aforementioned manually written simulation text can be implemented as follows. If the simulation sample needs to cover intent types such as business trip application (bus trip), financial indicator inquiry (enquire financial indicators), data service (data request), phone call (phonecall dzh), message (sendMsg dzh), what you will do (sys what can you do) and/or find application (appInvoke dzh). Under the guidance of business personnel, 20 text intent sample pairs are provided for each intent type, and each text intent sample pair contains its corresponding simulation text and the corresponding intent type. Among them, the simulation text is the text that the simulated user may send, and the intent type is the real user needs reflected in the text, that is, the intent type.
  • the writing of text intent sample pairs can refer to the following table:
  • the above table is only an example of writing text intent sample pairs, and the text intent sample pairs can also be written without including a table; the number of text intent sample pairs required for each intent type is not necessarily 20; the possible intent types are not limited to the above 7 intent types, and the above content of this embodiment is not specifically limited.
  • the pre-trained text model is the encoder layer of the initial text model.
  • the initial text model also includes an attention layer, which is an adaptive layer; alternatively, the initial text model is constructed based on a pre-trained text model, the pre-trained text model is the encoder layer of the initial text model, and the initial text model also includes an adaptive layer.
  • the constructed initial text model may include: an embedding layer, an encoder layer, an attention layer, and an input layer, wherein the encoder layer can be used as an adaptive layer, and some parameters in the encoder layer can be used as adaptive parameters; or the constructed initial text model may include: an embedding layer, an encoder layer, an attention layer, an input layer, and an adaptive layer, wherein the adaptive layer is located after the attention layer, and the adaptive layer can first map the input with a dimension of H (i.e., the vector length of the embedding layer) to a lower dimension K, and then through a nonlinear function relu, and then through a preceding layer to restore the dimension to H.
  • H i.e., the vector length of the embedding layer
  • the initial text model of the embodiment of the present application may be any structure including an adaptive layer, which is not limited here.
  • both the encoding layer and the attention layer extract semantic features from the vector, but the encoding layer is a fully connected network, which is different from the network form used by the attention layer.
  • the encoder layer of the initial text model in the embodiment of the present application can be any pre-trained text model, such as GPT (Generative Pre-Training) or BERT (bidirectional encoder representations from transformer), and there is no limitation here.
  • GPT Generative Pre-Training
  • BERT bidirectional encoder representations from transformer
  • the text corpus before the text corpus is delivered to the model, it generally requires a series of preprocessing work to meet the requirements of the model input, such as: converting the text into the tensor required by the model and standardizing the size of the tensor, etc., and scientific text preprocessing can also effectively guide the selection of model hyperparameters and improve the evaluation indicators of the model.
  • the text to be recognized before the text to be recognized is input into the target intent recognition model, the text to be recognized should also be subjected to text preprocessing to obtain the preprocessed text to be recognized, and then the preprocessed text to be recognized is input into the target intent recognition model.
  • the corresponding text preprocessing should be performed on it, and the preprocessed text (such as the preprocessed target historical text and the preprocessed target simulation text) should be used as the input of the model.
  • text preprocessing can be performed as follows. For a text (such as the text to be recognized, the historical recognition text, and the simulation text), first convert all the letters in the text to lowercase, and then use regular expressions to remove invalid characters, such as brackets, spaces, and special characters, to eliminate the negative impact of these contents on the model. Next, add [cls] and [seq] at the beginning and end of the text respectively. The character [cls] not only represents the beginning of the sentence, but also represents the meaning of the entire sentence. Finally, the processed text is converted into the form of an index in the vocabulary corresponding to the large-scale pre-trained language model, and the corresponding pre-processed text can be obtained.
  • Embedding layer This layer converts words (i.e. text) into vectors for subsequent mathematical calculations. Since the initial intent recognition model has been pre-trained, the word vectors already have certain semantic information in the initial state.
  • Input X ⁇ x 1 ,x 2 ,...x n ⁇ , where xi represents the index of a single word.
  • the input becomes a two-dimensional array Xe, with a dimension of L ⁇ H , where L is the sentence length and H is the vector length 768.
  • Encoder layer The pre-trained Chinese BERT-WWM model can be used as the Encoder to provide features for subsequent intent recognition tasks.
  • the attention layer calculates the degree of attention each word pays to other words and reconstructs the vector expression of the entire sentence to highlight the part of the sentence that needs to be focused on. For example, in the sentence "I want to find an audit assistant", "audit assistant” is the basis for judging the intent type.
  • the attention calculation process is as follows:
  • Q, K, and V are all generated by Xe through a linear layer, and the dimension remains unchanged.
  • the softmax obtains the attention weight of each word to other words, which is multiplied by V to obtain the reconstructed Xe .
  • D is the variance of Q*K.
  • Output layer Take the vector corresponding to [cls] and pass it through a linear layer.
  • the output dimension is n, where n is equal to the number of intent types.
  • the softmax formula is:
  • is the model parameter
  • K is the number of intent types
  • d k is the [cls] vector.
  • the target intent recognition model is trained by referring to the model constructed in 1) and the model training methods of the aforementioned and later related embodiments.
  • the input text to be recognized is first preprocessed and then input into the model to obtain the softmax normalized intent type vector.
  • the intent type with the largest corresponding probability value is taken as the predicted intent type of the text to be recognized.
  • an intention recognition device including:
  • a determination unit 302 is used to determine a target adaptive parameter corresponding to the target user, and use the target adaptive parameter to update the adaptive layer of the initial intent recognition model to obtain a target intent recognition model corresponding to the target user; the target adaptive parameter is obtained by training the adaptive layer of the initial intent recognition model based on the historical recognition text sent by the target user;
  • An input unit 303 is used to input the text to be recognized into the target intent recognition model to obtain the confidence of the text to be recognized under each intent type;
  • the determination unit 302 is further used to determine the corresponding target intent type with the highest confidence, which is the predicted intent type of the text to be recognized.
  • the device further includes: a computing unit and a training unit;
  • the acquisition unit 301 is further used to acquire the historical recognition texts sent by the target user and the intent type label corresponding to each historical recognition text;
  • the determination unit 302 is further used to determine each historical recognition text as a target historical text in turn;
  • the input unit 303 is also used to input the target historical text into the initial intention recognition model to obtain the confidence of the target historical text under each intention type;
  • a calculation unit configured to calculate a first loss value based on the confidence of the target historical text under each intent type and the intent type label corresponding to the target historical text
  • a training unit is used to adjust the adaptive parameters of the initial intent recognition model according to the first loss value until the first loss value satisfies a first preset convergence condition to obtain the target adaptive parameters.
  • the device further includes: a computing unit and a training unit;
  • the acquisition unit 301 is further used to acquire at least one preset simulated text and an intention type label corresponding to each simulated text;
  • the determination unit 302 is further used to determine each simulated text as a target simulated text in turn;
  • the input unit 303 is also used to input the target simulation text into the initial text model to obtain the confidence of the target simulation text under each intention type;
  • a calculation unit used to calculate a second loss value based on the confidence of the target simulation text under each intent type and the intent type label corresponding to the target simulation text;
  • a training unit is used to adjust the model parameters of the initial text model according to the second loss value until the second loss value satisfies a second preset convergence condition to obtain an initial intent recognition model.
  • the apparatus further includes: a construction unit;
  • a construction unit used to construct an initial text model based on a pre-trained text model, where the pre-trained text model is an encoder layer of the initial text model, and the initial text model also includes an attention layer, which is an adaptive layer;
  • the construction unit is also used to construct an initial text model based on a pre-trained text model.
  • the pre-trained text model is an encoder layer of the initial text model.
  • the initial text model also includes an attention layer and an adaptive layer.
  • the input unit 303 is specifically used to perform text preprocessing on the text to be recognized to obtain the preprocessed text to be recognized;
  • a deployment unit used to deploy an initial intent recognition model and adaptive parameters corresponding to multiple users
  • the determination unit is specifically used to determine the target adaptive parameters corresponding to the target user from the adaptive parameters corresponding to multiple users, and use the target adaptive parameters to update the adaptive layer of the initial intention recognition model.
  • FIG. 4 is a schematic diagram of the structure of an intention recognition device provided in an embodiment of the present application.
  • the intention recognition device 400 may include one or more central processing units (CPU) 401 and a memory 405, and the memory 405 stores one or more applications or data.
  • CPU central processing units
  • memory 405 stores one or more applications or data.
  • the memory 405 may be a volatile storage or a persistent storage.
  • the program stored in the memory 405 may include one or more modules, each of which may include a series of instruction operations in the intention recognition device.
  • the central processing unit 401 may be configured to communicate with the memory 405 and execute a series of instruction operations in the memory 405 on the intention recognition device 400.
  • the intention recognition device 400 may also include one or more power supplies 402, one or more wired or wireless network interfaces 403, one or more input and output interfaces 404, and/or one or more operating systems, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • one or more power supplies 402 one or more wired or wireless network interfaces 403, one or more input and output interfaces 404, and/or one or more operating systems, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the central processor 401 can execute the operations performed by the intention recognition device in the embodiments shown in Figures 1 to 3 above, and the details will not be repeated here.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division. There may be other division methods in actual implementation.
  • multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units. It can be electrical, mechanical or other.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), disk or optical disk and other media that can store program code.
  • An embodiment of the present application also provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to execute the above-mentioned intention recognition method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

意图识别方法以及相关设备,涉及人工智能技术,用于在多用户训练场景下,兼顾模型训练成本和模型表现。本申请实施例方法包括:获取目标用户发送的待识别文本;确定所述目标用户对应的目标自适应参数,并使用所述目标自适应参数更新初始意图识别模型的自适应层,以得到所述目标用户对应的目标意图识别模型;所述目标自适应参数基于所述目标用户发送的历史识别文本,对所述初始意图识别模型的自适应层进行训练得到;将所述待识别文本输入所述目标意图识别模型,得到所述待识别文本在各意图类型下的置信度;确定对应的置信度最高的目标意图类型,为所述待识别文本的预测意图类型。

Description

意图识别方法以及相关设备
本申请要求于2022年11月30日提交中国专利局、申请号为202211520363.9、发明名称为“意图识别方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及机器学习领域,尤其涉及意图识别方法以及相关设备。
背景技术
为了提高互联网服务效率,能够识别用户文本并快速回应的对话机器人应运而生。文字是用户与互联网服务交互的重要媒介,对话机器人在理解用户文本的基础上,通过检索或者生成式对用户文本进行答复。这个过程中最重要的是对用户文本的理解,即意图识别。意图识别是对话机器人的首要任务,只有正确的识别用户语言的意图,后续才可能合理地答复用户文本。
现有技术方案中,可以基于传统的神经网络模型实现意图识别。比如,对多句用户在具有不同意图类型时,可能会发送的用户文本进行标注,作为训练数据。然后,基于对应的标签为不同意图类型的训练数据,训练一个神经网络模型。最后,可以使用该神经网络模型对用户文本进行意图识别与分类。
在针对多用户场景下,不同用户需要识别的意图类型不同(如财务类以及金融类)。随着意图类型的增加,任务变复杂,单一的神经网络模型难以在针对多种意图类型的意图识别过程中,均保持良好的表现。而针对每个用户训练对应的神经网络模型的训练成本过高,无法符合实际应用需求。
发明内容
本申请实施例提供了意图识别方法以及相关设备,涉及人工智能技术,用于在多用户训练场景下,兼顾模型训练成本和模型表现。
本申请实施例第一方面提供一种意图识别方法,包括:
获取目标用户发送的待识别文本;
确定所述目标用户对应的目标自适应参数,并使用所述目标自适应参数更新初始意图识别模型的自适应层,以得到所述目标用户对应的目标意图识别模型;所述目标自适应参数基于所述目标用户发送的历史识别文本,对所述初始意图识别模型的自适应层进行训练得到;
将所述待识别文本输入所述目标意图识别模型,得到所述待识别文本在各意图类型下的置信度;
确定对应的置信度最高的目标意图类型,为所述待识别文本的预测意图类型。
在一种具体实现方式中,所述方法还包括:
获取所述目标用户发送的历史识别文本,以及每个所述历史识别文本相应的意图类型标签;
轮流将每个所述历史识别文本,确定为目标历史文本;
将所述目标历史文本输入所述初始意图识别模型,得到所述目标历史文本在各意图类型下的置信度;
基于所述目标历史文本在各意图类型下的置信度,以及所述目标历史文本相应的意图类型标签,计算第一损失值;
根据所述第一损失值调整所述初始意图识别模型的自适应参数,直至所述第一损失值满足第一预设收敛条件得到目标自适应参数。
在一种具体实现方式中,在所述将所述待识别文本输入所述目标意图识别模型之前,所述方法还包括:
获取预设的至少一个模拟文本,以及每个所述模拟文本对应的意图类型标签;
轮流将每个所述模拟文本,确定为目标模拟文本;
将所述目标模拟文本输入初始文本模型,得到所述目标模拟文本在各意图类型下的置信度;
基于所述目标模拟文本在各意图类型下的置信度,以及所述目标模拟文本相应的意图类型标签,计算第二损失值;
根据所述第二损失值调整所述初始文本模型的模型参数,直至所述第二损失值满足第二预设收敛条件得到所述初始意图识别模型。
在一种具体实现方式中,所述方法还包括:
基于预训练文本模型,构建所述初始文本模型,所述预训练文本模型为所述初始文本模型的编码器层,所述初始文本模型还包括注意力层,所述注意力层为所述自适应层;
或者,
基于预训练文本模型,构建所述初始文本模型,所述预训练文本模型为所述初始文本模型的编码器层,所述初始文本模型还包括注意力层以及自适应层。
在一种具体实现方式中,所述将所述待识别文本输入所述目标意图识别模型,包括:
对所述待识别文本执行文本预处理,得到预处理待识别文本;
将所述预处理待识别文本输入所述目标意图识别模型。
在一种具体实现方式中,在所述确定所述目标用户对应的目标自适应参数之前,所述方法还包括:
部署所述初始意图识别模型以及多个用户分别对应的自适应参数;
所述确定所述目标用户对应的目标自适应参数,并使用所述目标自适应参数更新初始意图识别模型的自适应层,包括:
从所述多个用户分别对应的自适应参数中确定所述目标用户对应的目标自适应参数,并使用所述目标自适应参数更新所述初始意图识别模型的自适应层。
本申请实施例第二方面提供一种意图识别装置,包括:
获取单元,用于获取目标用户发送的待识别文本,
确定单元,用于确定所述目标用户对应的目标自适应参数,并使用所述目标自适应参数更新初始意图识别模型的自适应层,以得到所述目标用户对应的目标意图识别模型;所述目标自适应参数基于所述目标用户发送的历史识别文本,对所述初始意图识别模型的自适应层进行训练得到;
输入单元,用于将所述待识别文本输入所述目标意图识别模型,得到所述 待识别文本在各意图类型下的置信度;
所述确定单元,还用于确定对应的置信度最高的目标意图类型,为所述待识别文本的预测意图类型。
在一种具体实现方式中,所述装置还包括:计算单元以及训练单元;
所述获取单元,还用于获取所述目标用户发送的历史识别文本,以及每个所述历史识别文本相应的意图类型标签;
所述确定单元,还用于轮流将每个所述历史识别文本,确定为目标历史文本;
所述输入单元,还用于将所述目标历史文本输入所述初始意图识别模型,得到所述目标历史文本在各意图类型下的置信度;
所述计算单元,用于基于所述目标历史文本在各意图类型下的置信度,以及所述目标历史文本相应的意图类型标签,计算第一损失值;
所述训练单元,用于根据所述第一损失值调整所述初始意图识别模型的自适应参数,直至所述第一损失值满足第一预设收敛条件得到目标自适应参数。
在一种具体实现方式中,所述装置还包括:计算单元以及训练单元;
所述获取单元,还用于获取预设的至少一个模拟文本,以及每个所述模拟文本对应的意图类型标签;
所述确定单元,还用于轮流将每个所述模拟文本,确定为目标模拟文本;
所述输入单元,还用于将所述目标模拟文本输入初始文本模型,得到所述目标模拟文本在各意图类型下的置信度;
所述计算单元,用于基于所述目标模拟文本在各意图类型下的置信度,以及所述目标模拟文本相应的意图类型标签,计算第二损失值;
所述训练单元,用于根据所述第二损失值调整所述初始文本模型的模型参数,直至所述第二损失值满足第二预设收敛条件得到所述初始意图识别模型。
在一种具体实现方式中,所述装置还包括:构建单元;
所述构建单元,用于基于预训练文本模型,构建所述初始文本模型,所述预训练文本模型为所述初始文本模型的编码器层,所述初始文本模型还包括注意力层,所述注意力层为所述自适应层;
或者,
所述构建单元,还用于基于预训练文本模型,构建所述初始文本模型,所述预训练文本模型为所述初始文本模型的编码器层,所述初始文本模型还包括注意力层以及自适应层。
在一种具体实现方式中,所述输入单元,具体用于对所述待识别文本执行文本预处理,得到预处理待识别文本;
将所述预处理待识别文本输入所述目标意图识别模型。
在一种具体实现方式中,在所述确定所述目标用户对应的目标自适应参数之前,所述装置还包括:部署单元;
所述部署单元,用于部署所述初始意图识别模型以及多个用户分别对应的自适应参数;
所述确定单元,具体用于从所述多个用户分别对应的自适应参数中确定所述目标用户对应的目标自适应参数,并使用所述目标自适应参数更新所述初始意图识别模型的自适应层。
本申请实施例第三方面提供一种意图识别装置,包括:
中央处理器,存储器以及输入输出接口;
所述存储器为短暂存储存储器或持久存储存储器;
所述中央处理器配置为与所述存储器通信,并执行所述存储器中的指令操作以执行第一方面所述的方法。
本申请实施例第四方面提供一种包含指令的计算机程序产品,当所述计算机程序产品在计算机上运行时,使得计算机执行如第一方面所述的方法。
本申请实施例第五方面提供一种计算机存储介质,所述计算机存储介质中存储有指令,所述指令在计算机上执行时,使得所述计算机执行如第一方面所述的方法。
从以上技术方案可以看出,本申请实施例具有以下优点:针对每个用户提供对应的意图识别模型,可以保证模型在针对每个用户对应的至少一种意图类型的识别中,可以保持良好的表现。同时,在训练目标用户对应的目标意图识别模型时,仅对初始意图识别模型中自适应层的自适应参数进行调整,训练成 本远低于针对每个用户进行全局训练,以获得对应的神经网络模型。由此可见,本申请实施例的意图识别模型,可以同时兼顾模型训练成本和模型表现。
附图说明
图1为本申请实施例公开的意图识别系统的一种系统架构图;
图2为本申请实施例公开的意图识别方法的一种流程示意图;
图3为本申请实施例公开的意图识别装置一个结构示意图;
图4为本申请实施例公开的意图识别装置另一结构示意图;
图5为本申请实施例公开的自适应层的一个结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
意图识别是对话机器人的首要任务,只有正确的识别用户语言的意图,后续才可能合理地答复用户文本。目前主流的进行意图识别的方法有:基于规则的方法以及基于传统神经网络的方法。
基于规则的方法是指,通过人对某一意图类型的理解,制定与各种意图类型对应的关键词或者特定的语句结构。当用户输入的待识别文本中包含对应的关键词,或者特定的语句结构的时候,认为待识别文本为相应的意图类型。但这种意图识别方法很大程度上依赖于人对于意图类型理解的准确性,同时,关键词和特定的语句结构无法适应多变的用户文本。
为了解决上述缺陷,本技术方案提出一种有限计算资源下基于大规模预训练语言模型的意图识别方法,该方法首先采用大规模预训练语言模型(包括但不限于:BERT、RoBERTa以及ERNIE等)作为编码器层,对输入文本进行编码,以获取待识别文本的语义特征,然后使用注意力机制(Attention Mechanism,即注意力层)使模型专注于,待识别文本中更能凸显待识别文本意图的部分,最后使用软最大化层(Softmax,即输出层)对待识别文本的意图类型进行识 别。此外,为了减小多租户场景下模型的规模和提升模型的并发性能,采用自适应微调(Adapter tuning)方法冻结编码器层的大部分参数,仅对编码器最上层的参数进行微调,以降低模型对内存、CPU、GPU等计算资源的消耗。
本申请实施例提供了意图识别方法以及相关设备,涉及人工智能技术,用于以较低的训练成本,获得在不同意图类型中均保持良好表现的意图识别模型。
为了更好的实现本申请实施例的意图识别方法,本申请实施例提供一种意图识别系统,该意图识别系统用于执行本申请提供的意图识别方法。其中,为了便于部署,可以将意图识别系统打包进docker,以便于快速方便的在具有docker环境的机器部署,也就是部署有该意图识别系统的终端或者服务器,可以对待识别文本执行意图识别。
请参阅图1,在实际应用中,用户可以通过终端101输入待识别文本,终端101可以将待识别文本发送给部署有意图识别系统的任一服务器102,然后服务器102便可以将待识别文本输入目标意图识别模型,得到待识别文本在各意图类型下的置信度;然后,确定对应的置信度最高的目标意图类型,为待识别文本的预测意图类型。在确定目标用户发送的待识别文本的预测意图类型后,便可以根据该预测意图类型进行答复,或者向目标用户推送相关服务。
在前述意图识别系统的基础上,请参阅图2,本申请实施例还提供一种意图识别方法,可由前述服务器103执行,包括以下步骤:
201、获取目标用户发送的待识别文本。
需要识别目标用户发送待识别文本的意图,需要对用户发送的待识别文本进行分析,也就是说,意图识别的首先应当,获取目标用户发送待识别文本的意图。
202、确定目标用户对应的目标自适应参数,并使用目标自适应参数更新初始意图识别模型自适应层,以得到目标用户对应的目标意图识别模型。目标自适应参数基于目标用户发送的历史识别文本,对初始意图识别模型的自适应层进行训练得到。
考虑到多租户场景下(或者多用户共有一个云服务),不同租户(或者说用户)需要执行意图识别的业务场景不同,进而不同用户需要识别的意图类型也相应不同,为了能针对每个租户均提供表现良好的性能识别模型,在分析待 识别文本之前,需要构建与目标用户对应的目标意图识别模型,即基于目标用户发送的历史识别文本训练得到的意图识别模型。
在实际应用中,为了兼顾模型训练效率、存储资源以及模型表现,在训练目标用户对应的目标意图识别模型时,可以只对初始意图识别模型的自适应层的自适应参数进行调整,并将训练得到的自适应参数保存为目标用户对应的自适应参数。而初始意图识别模型中除自适应参数之外的其他参数,可以在每个租户间通用,因此仅需存储通用的模型参数以及目标用户对应的目标自适应参数,就可以迅速的构建目标用户对应的目标意图识别模型。上述仅训练自适应层的自适应参数的方式,可有效解决多租户场景下硬件资源不足的问题。
可以理解的是,自适应层的自适应参数可以是自适应层对应的部分或全部模型参数,此处不作具体限定。
203、将待识别文本输入目标意图识别模型,得到待识别文本在各意图类型下的置信度。
在前述步骤202构建完目标用户对应的目标意图识别模型后,就可以将步骤201中获取的待识别文本输入该目标意图识别模型,可以得到目标意图识别模型输出的待识别文本在每种意图类型下的置信度。基于待识别文本在各意图类型下的置信度,就可以待识别文本的意图类型进行判断。
204、确定对应的置信度最高的目标意图类型,为待识别文本的预测意图类型。
可以理解的是,待识别文本在不同意图类型下的置信度,可以反应待识别文本属于相应的意图类型的可能性。因此,置信度越高,说明待识别文本属于该意图类型的可能新更高,所以,可以直接将对应的置信度最高的目标意图类型,确定为待识别文本的预测意图类型。
本申请实施例中,针对每个用户提供对应的意图识别模型,可以保证模型在针对每个用户对应的至少一种意图类型的识别中,可以保持良好的表现。同时,在训练目标用户对应的目标意图识别模型时,仅对初始意图识别模型中自适应层的自适应参数进行调整,训练成本远低于针对每个用户进行全局训练,以获得对应的神经网络模型。由此可见,本申请实施例的意图识别模型,可以同时兼顾模型训练成本和模型表现。
在一些具体实现方式中,前述步骤202中构建目标意图识别模型所需的目标自适应参数可以通过对初始意图识别模型的自适应层训练得到,具体可以通过以下方式实现:获取目标用户发送的历史识别文本,以及每个历史识别文本相应的意图类型标签;轮流将每个历史识别文本,确定为目标历史文本;将目标历史文本输入初始意图识别模型,得到目标历史文本在各意图类型下的置信度;基于目标历史文本在各意图类型下的置信度,以及目标历史文本相应的意图类型标签,计算第一损失值;根据第一损失值调整初始意图识别模型的自适应参数,直至第一损失值满足第一预设收敛条件得到目标自适应参数。
具体的,可以获取目标用户以前发送的至少一个历史识别文本(需要说明的是,为了保证模型的良好表现,通常会使用多个历史识别文本),其中每个历史识别文本都存在相应的意图类型标签,用于标识该历史识别文本对应的意图类型。然后,依次使用每个历史识别文本训练初始意图识别模型,直至该初始意图识别模型收敛,得到目标意图训练模型。其中,每轮次训练过程中可以根据目标历史文本在各意图类型下的置信度,目标历史文本相应的意图类型标签,以及损失函数,计算第一损失值,然后,若第一损失值不符合预设收敛条件,则基于第一损失值调整初始意图识别模型的自适应参数;若第一损失值符合预设收敛条件,则确定此时的初始意图识别模型的自适应参数为目标用户的目标自适应参数。
需要说明的是,当对象是用户时,本申请实施例所涉及到历史识别文本等相关的数据,均是经过用户授权后所获取到的。并且,当本申请实施例运用到具体产品或技术中时,所涉及使用到的数据需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
为了更进一步地,减少为每个用户训练对应的意图识别模型的训练成本,首先可以基于初始文本模型,在分类任务(也就是意图识别)场景下进行初步微调,然后,基于初步微调后可以用于执行分类任务的初始意图识别模型,训练得到各用户对应的意图识别模型。具体实现方式如下:获取预设的至少一个模拟文本,以及每个模拟文本对应的意图类型标签;轮流将每个模拟文本,确定为目标模拟文本;将目标模拟文本输入初始文本模型,得到目标模拟文本在各意图类型下的置信度;基于目标模拟文本在各意图类型下的置信度,以及目 标模拟文本相应的意图类型标签,计算第二损失值;根据第二损失值调整初始文本模型的模型参数,直至第二损失值满足第二预设收敛条件得到初始意图识别模型。
具体的,训练之前需要构建数据集,也就是模拟文本以及每个模拟文本对应的意图类型标签。然后,基于每个模拟文本对初始文本模型进行训练,得到可以用于意图分类任务的初始意图识别模型。此处初始文本模型的训练方式,与前述依据初始意图训练模型训练得到目标意图识别模型类似,不再赘述。需要注意的是,此处的模拟文本可以是现有的任意开源文本分类数据集中的数据,也可以是业务人员和/或技术人员根据对各种意图类型的了解,认为编写的模拟文本,此处不作具体限定。
在一些具体实现方式中,前述人工编写模拟文本,可以参照如下方式实现。若模拟样本需涵盖:出差申请(bus trip)、财务指标查询(enquire financial indicators)、数据服务(data request)、打电话(phonecall dzh)、发消息(sendMsg dzh)、你会做什么(sys what can you do)和/或找应用(appInvoke dzh)等意图类型。在业务人员的指导下,针对每种意图类型提供20个文本意图样本对,每个文本意图样本对包含其对应的模拟文本以及对应的意图类型。其中,模拟文本是模拟用户可能会发送的文本,意图类型即为文本体现出的用户真实需求,也就是意图类型。文本意图样本对的编写可参照下表:
需要说明的是,上表仅是对文本意图样本对的编写的示例,文本意图样本对还可以不包含表格的方式进行编写;每种意图类型所需的文本意图样本对数量也并未一定为20个;可能存在的意图类型也不止局限于上述7种意图类型,本实施例以上内容不作具体限定。
更进一步的,在训练初始文本模型之前,还需基于预训练文本模型构建初始文本模型,以进一步节省训练成本,具体可以通过以下方式实现:基于预训练文本模型,构建初始文本模型,预训练文本模型为初始文本模型的编码器层, 初始文本模型还包括注意力层,注意力层为自适应层;或者,基于预训练文本模型,构建初始文本模型,预训练文本模型为初始文本模型的编码器层,初始文本模型还包括自适应层。
在实际应用中,构建的初始文本模型可以是包括:嵌入层、编码器层、注意力层以及输入层,其中编码器层可以作为自适应层,编码器层中的部分参数可以作为自适应参数;或者构建的初始文本模型可以包括:嵌入层、编码器层、注意力层、输入层以及自适应层,其中,自适应层位于注意力层之后,自适应层可以将先将维度为H(即嵌入层的向量长度)的输入映射到一个较低的维度K,再通过一个非线性函数relu,然后通过先行层将维度还原到H。本申请实施例的初始文本模型可以是包括自适应层的任意结构,此处不作限定。其中,编码层和注意力层都是从向量中提取语义特征,只是编码层是全连接网络,和注意力层用的网络形式有所不同。
需要注意的是,无论是前述两种网络结构中的哪种网络结构,本申请实施例中初始文本模型的编码器层都可以为任一预训练文本模型,如GPT(Generative Pre-Training)或BERT(bidirectional encoder representations from transformer),此处不做限定。
可以知道的是,文本语料在输送给模型前一般需要一系列的预处理工作,才能符合模型输入的要求,如:将文本转化成模型需要的张量以及规范张量的尺寸等,而且科学的文本预处理还能有效指导模型超参数的选择,并提升模型的评估指标。在前述实施例的基础上,在一些具体实现方式中,将待识别文本输入目标意图识别模型之前,还应当对待识别文本执行文本预处理,得到预处理待识别文本,然后才将预处理待识别文本输入目标意图识别模型。同理,在将目标历史文本输入初始意图识别模型之前,以及将目标模拟文本输入初始意图识别模型之前,都应当对其执行相应的文本预处理,并将预处理后的文本(如预处理目标历史文本以及预处理目标模拟文本)作为模型的输入。
在实际应用中,文本预处理可以参照如下方式进行处理。对于一条文本(如待识别文本、历史识别文本以及模拟文本),首先将文本中的字母全部转换成小写,然后用正则表达式去掉无效字符,如括号、空格和特殊字符等,以消除这些内容对模型的负面影响。接下来,在文本的开头和末端分别加上[cls]和[seq] 字符,[cls]不仅代表句子的开始,也代表整个句子的含义。最后将处理后的文本转换成大规模预训练语言模型对应的词汇表中索引的形式,就可以得到相应的预处理文本。
在前面实施例的基础上,下面在一个具体场景下,描述本申请实施例的模型构建、模型训练、模型预测以及服务部署等流程。
1)模型构建
a)嵌入层。该层将文字(也就是文本)转换成向量的形式,用于后续的数学计算。由于初始意图识别模型经过了预训练,字向量在初始状态下已经具有了一定的语义信息。输入X={x1,x2,…xn},xi代表单个文字的索引,经过嵌入层后,输入变成了二维数组Xe的形式,Xe的维度是L×H,L为句子长度,H为向量长度768。
b)编码器层。可以使用预训练好的中文BERT-WWM模型作为Encoder,为后续意图识别任务提供特征。
c)注意力层。注意力层会计算每个字对其他字的关注程度,重构整个句子的向量表达式,以突出句子中需要重点关注的部分,例如“我想找审核助手”这个句子中,“审核助手”是判断意图类型的依据。注意力的计算过程如式:
Q,K,V均由Xe经过线性层生成,维度保持不变,softmax得到的是每个字对其他字的关注权重值,乘以V得到重构后的Xe,D为Q*K的方差。
d)输出层。取[cls]对应的向量,经过一个线性层后输出的维度为n,n等于意图类型的数量,使用softmax归一化,得到文本属于每个类别的概率。softmax公式为:
式中,θ为模型参数,K为意图类型数量,dk为[cls]向量。
e)自适应微调层。在多租户的场景下,每个用户独立使用一个预训练模型会导致内存不足的问题。为了解决该问题,本技术方案采 用自适应微调方法将大规模预训练语言模型的大部分参数冻结,仅对编码器中特定的参数进行微调,并将该部分保存为规模很小的自适应模块。自适应模块在模型中放在attention之后的位置,自适应模块的具体结构可参照图5。
自适应模块先将维度为H的输入映射到一个较低的维度K,再通过一个非线性函数relu(函数表达式为:f(x)=max(0,x)),最后,经过线性层将维度还原到H。
2)模型训练
参照1)中构建的模型以及前述以及后述相关实施例的模型训练方式训练得到目标意图识别模型。
3)模型预测
输入的待识别文本先经过预处理,输入到模型后得到softmax归一化后的意图类型向量,取对应的概率值最大的意图类型,作为待识别文本的预测意图类型。
4)服务部署
模型训练和代码开发完成后,为了便于部署,可以将整个服务打包进docker,这样就可以快速方便的在具有docker环境的机器部署该意图识别服务了。
请参阅图3,本申请实施例提供一种意图识别装置,包括:
获取单元301,用于获取目标用户发送的待识别文本,
确定单元302,用于确定目标用户对应的目标自适应参数,并使用目标自适应参数更新初始意图识别模型的自适应层,以得到目标用户对应的目标意图识别模型;目标自适应参数基于目标用户发送的历史识别文本,对初始意图识别模型的自适应层进行训练得到;
输入单元303,用于将待识别文本输入目标意图识别模型,得到待识别文本在各意图类型下的置信度;
确定单元302,还用于确定对应的置信度最高的目标意图类型,为待识别文本的预测意图类型。
在一种具体实现方式中,装置还包括:计算单元以及训练单元;
获取单元301,还用于获取目标用户发送的历史识别文本,以及每个历史识别文本相应的意图类型标签;
确定单元302,还用于轮流将每个历史识别文本,确定为目标历史文本;
输入单元303,还用于将目标历史文本输入初始意图识别模型,得到目标历史文本在各意图类型下的置信度;
计算单元,用于基于目标历史文本在各意图类型下的置信度,以及目标历史文本相应的意图类型标签,计算第一损失值;
训练单元,用于根据第一损失值调整初始意图识别模型的自适应参数,直至第一损失值满足第一预设收敛条件得到目标自适应参数。
在一种具体实现方式中,装置还包括:计算单元以及训练单元;
获取单元301,还用于获取预设的至少一个模拟文本,以及每个模拟文本对应的意图类型标签;
确定单元302,还用于轮流将每个模拟文本,确定为目标模拟文本;
输入单元303,还用于将目标模拟文本输入初始文本模型,得到目标模拟文本在各意图类型下的置信度;
计算单元,用于基于目标模拟文本在各意图类型下的置信度,以及目标模拟文本相应的意图类型标签,计算第二损失值;
训练单元,用于根据第二损失值调整初始文本模型的模型参数,直至第二损失值满足第二预设收敛条件得到初始意图识别模型。
在一种具体实现方式中,装置还包括:构建单元;
构建单元,用于基于预训练文本模型,构建初始文本模型,预训练文本模型为初始文本模型的编码器层,初始文本模型还包括注意力层,注意力层为自适应层;
或者,
构建单元,还用于基于预训练文本模型,构建初始文本模型,预训练文本模型为初始文本模型的编码器层,初始文本模型还包括注意力层以及自适应层。
在一种具体实现方式中,输入单元303,具体用于对待识别文本执行文本预处理,得到预处理待识别文本;
将预处理待识别文本输入目标意图识别模型。
在一种具体实现方式中,在确定目标用户对应的目标自适应参数之前,装置还包括:部署单元;
部署单元,用于部署初始意图识别模型以及多个用户分别对应的自适应参数;
确定单元,具体用于从多个用户分别对应的自适应参数中确定目标用户对应的目标自适应参数,并使用目标自适应参数更新初始意图识别模型的自适应层。
图4是本申请实施例提供的一种意图识别装置结构示意图,该意图识别装置400可以包括一个或一个以上中央处理器(central processing units,CPU)401和存储器405,该存储器405中存储有一个或一个以上的应用程序或数据。
其中,存储器405可以是易失性存储或持久存储。存储在存储器405的程序可以包括一个或一个以上模块,每个模块可以包括对意图识别装置中的一系列指令操作。更进一步地,中央处理器401可以设置为与存储器405通信,在意图识别装置400上执行存储器405中的一系列指令操作。
意图识别装置400还可以包括一个或一个以上电源402,一个或一个以上有线或无线网络接口403,一个或一个以上输入输出接口404,和/或,一个或一个以上操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等。
该中央处理器401可以执行前述图1至图3所示实施例中意图识别装置所执行的操作,具体此处不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接, 可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,read-only memory)、随机存取存储器(RAM,random access memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例还提供一种包含指令的计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如上述的意图识别方法。

Claims (10)

  1. 一种意图识别方法,其特征在于,包括:
    获取目标用户发送的待识别文本;
    确定所述目标用户对应的目标自适应参数,并使用所述目标自适应参数更新初始意图识别模型的自适应层,以得到所述目标用户对应的目标意图识别模型;所述目标自适应参数基于所述目标用户发送的历史识别文本,对所述初始意图识别模型的自适应层进行训练得到;
    将所述待识别文本输入所述目标意图识别模型,得到所述待识别文本在各意图类型下的置信度;
    确定对应的置信度最高的目标意图类型,为所述待识别文本的预测意图类型。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取所述目标用户发送的历史识别文本,以及每个所述历史识别文本相应的意图类型标签;
    轮流将每个所述历史识别文本,确定为目标历史文本;
    将所述目标历史文本输入所述初始意图识别模型,得到所述目标历史文本在各意图类型下的置信度;
    基于所述目标历史文本在各意图类型下的置信度,以及所述目标历史文本相应的意图类型标签,计算第一损失值;
    根据所述第一损失值调整所述初始意图识别模型的自适应参数,直至所述第一损失值满足第一预设收敛条件得到目标自适应参数。
  3. 根据权利要求1所述的方法,其特征在于,在所述将所述待识别文本输入所述目标意图识别模型之前,所述方法还包括:
    获取预设的至少一个模拟文本,以及每个所述模拟文本对应的意图类型标签;
    轮流将每个所述模拟文本,确定为目标模拟文本;
    将所述目标模拟文本输入初始文本模型,得到所述目标模拟文本在各意图类型下的置信度;
    基于所述目标模拟文本在各意图类型下的置信度,以及所述目标模拟文本 相应的意图类型标签,计算第二损失值;
    根据所述第二损失值调整所述初始文本模型的模型参数,直至所述第二损失值满足第二预设收敛条件得到所述初始意图识别模型。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    基于预训练文本模型,构建所述初始文本模型,所述预训练文本模型为所述初始文本模型的编码器层,所述初始文本模型还包括注意力层,所述注意力层为所述自适应层;
    或者,
    基于预训练文本模型,构建所述初始文本模型,所述预训练文本模型为所述初始文本模型的编码器层,所述初始文本模型还包括注意力层以及自适应层。
  5. 根据权利要求1所述的方法,其特征在于,所述将所述待识别文本输入所述目标意图识别模型,包括:
    对所述待识别文本执行文本预处理,得到预处理待识别文本;
    将所述预处理待识别文本输入所述目标意图识别模型。
  6. 根据权利要求1所述的方法,其特征在于,在所述确定所述目标用户对应的目标自适应参数之前,所述方法还包括:
    部署所述初始意图识别模型以及多个用户分别对应的自适应参数;
    所述确定所述目标用户对应的目标自适应参数,并使用所述目标自适应参数更新初始意图识别模型的自适应层,包括:
    从所述多个用户分别对应的自适应参数中确定所述目标用户对应的目标自适应参数,并使用所述目标自适应参数更新所述初始意图识别模型的自适应层。
  7. 一种意图识别装置,其特征在于,包括:
    获取单元,用于获取目标用户发送的待识别文本,
    确定单元,用于确定所述目标用户对应的目标自适应参数,并使用所述目标自适应参数更新初始意图识别模型的自适应层,以得到所述目标用户对应的目标意图识别模型;所述目标自适应参数基于所述目标用户发送的历史识别文本,对所述初始意图识别模型的自适应层进行训练得到;
    输入单元,用于将所述待识别文本输入所述目标意图识别模型,得到所述 待识别文本在各意图类型下的置信度;
    所述确定单元,还用于确定对应的置信度最高的目标意图类型,为所述待识别文本的预测意图类型。
  8. 根据权利要求7所述的装置,其特征在于,所述装置还包括:计算单元以及训练单元;
    所述获取单元,还用于获取所述目标用户发送的历史识别文本,以及每个所述历史识别文本相应的意图类型标签;
    所述确定单元,还用于轮流将每个所述历史识别文本,确定为目标历史文本;
    所述输入单元,还用于将所述目标历史文本输入所述初始意图识别模型,得到所述目标历史文本在各意图类型下的置信度;
    所述计算单元,用于基于所述目标历史文本在各意图类型下的置信度,以及所述目标历史文本相应的意图类型标签,计算第一损失值;
    所述训练单元,用于根据所述第一损失值调整所述初始意图识别模型的自适应参数,直至所述第一损失值满足第一预设收敛条件得到目标自适应参数。
  9. 一种意图识别装置,其特征在于,包括:
    中央处理器,存储器以及输入输出接口;
    所述存储器为短暂存储存储器或持久存储存储器;
    所述中央处理器配置为与所述存储器通信,并执行所述存储器中的指令操作以执行权利要求1至6中任一项所述的方法。
  10. 一种计算机存储介质,其特征在于,所述计算机存储介质中存储有指令,所述指令在计算机上执行时,使得所述计算机执行如权利要求1至6中任一项所述的方法。
PCT/CN2023/126388 2022-11-30 2023-10-25 意图识别方法以及相关设备 WO2024114186A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211520363.9A CN115730590A (zh) 2022-11-30 2022-11-30 意图识别方法以及相关设备
CN202211520363.9 2022-11-30

Publications (1)

Publication Number Publication Date
WO2024114186A1 true WO2024114186A1 (zh) 2024-06-06

Family

ID=85299519

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/126388 WO2024114186A1 (zh) 2022-11-30 2023-10-25 意图识别方法以及相关设备

Country Status (2)

Country Link
CN (1) CN115730590A (zh)
WO (1) WO2024114186A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115730590A (zh) * 2022-11-30 2023-03-03 金蝶软件(中国)有限公司 意图识别方法以及相关设备
CN117235241A (zh) * 2023-11-15 2023-12-15 安徽省立医院(中国科学技术大学附属第一医院) 一种面向高血压问诊随访场景人机交互方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104064179A (zh) * 2014-06-20 2014-09-24 哈尔滨工业大学深圳研究生院 一种基于动态hmm事件数的提高语音识别准确率的方法
CN110414005A (zh) * 2019-07-31 2019-11-05 深圳前海达闼云端智能科技有限公司 意图识别方法、电子设备及存储介质
CN112907309A (zh) * 2019-11-19 2021-06-04 阿里巴巴集团控股有限公司 模型更新方法、资源推荐方法、装置、设备及系统
CN113112312A (zh) * 2021-05-13 2021-07-13 支付宝(杭州)信息技术有限公司 针对用户生成模型的方法、装置和计算机可读存储介质
CN113901880A (zh) * 2021-09-13 2022-01-07 中国科学院自动化研究所 一种实时事件流识别方法及系统
CN114139551A (zh) * 2021-10-29 2022-03-04 苏宁易购集团股份有限公司 意图识别模型的训练方法及装置、意图识别的方法及装置
KR20220131807A (ko) * 2021-03-22 2022-09-29 재단법인대구경북과학기술원 메타 학습 기반 이미지 정합 모델 생성 방법 및 장치
CN115730590A (zh) * 2022-11-30 2023-03-03 金蝶软件(中国)有限公司 意图识别方法以及相关设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104064179A (zh) * 2014-06-20 2014-09-24 哈尔滨工业大学深圳研究生院 一种基于动态hmm事件数的提高语音识别准确率的方法
CN110414005A (zh) * 2019-07-31 2019-11-05 深圳前海达闼云端智能科技有限公司 意图识别方法、电子设备及存储介质
CN112907309A (zh) * 2019-11-19 2021-06-04 阿里巴巴集团控股有限公司 模型更新方法、资源推荐方法、装置、设备及系统
KR20220131807A (ko) * 2021-03-22 2022-09-29 재단법인대구경북과학기술원 메타 학습 기반 이미지 정합 모델 생성 방법 및 장치
CN113112312A (zh) * 2021-05-13 2021-07-13 支付宝(杭州)信息技术有限公司 针对用户生成模型的方法、装置和计算机可读存储介质
CN113901880A (zh) * 2021-09-13 2022-01-07 中国科学院自动化研究所 一种实时事件流识别方法及系统
CN114139551A (zh) * 2021-10-29 2022-03-04 苏宁易购集团股份有限公司 意图识别模型的训练方法及装置、意图识别的方法及装置
CN115730590A (zh) * 2022-11-30 2023-03-03 金蝶软件(中国)有限公司 意图识别方法以及相关设备

Also Published As

Publication number Publication date
CN115730590A (zh) 2023-03-03

Similar Documents

Publication Publication Date Title
CN113657399B (zh) 文字识别模型的训练方法、文字识别方法及装置
WO2024114186A1 (zh) 意图识别方法以及相关设备
US20210312139A1 (en) Method and apparatus of generating semantic feature, method and apparatus of training model, electronic device, and storage medium
KR20220005416A (ko) 다항 관계 생성 모델의 트레이닝 방법, 장치, 전자 기기 및 매체
WO2020155619A1 (zh) 带情感的机器聊天方法、装置、计算机设备及存储介质
EP4113357A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
WO2021072863A1 (zh) 文本相似度计算方法、装置、电子设备及计算机可读存储介质
JP2023541742A (ja) ソートモデルのトレーニング方法及び装置、電子機器、コンピュータ可読記憶媒体、コンピュータプログラム
CN117114063A (zh) 用于训练生成式大语言模型和用于处理图像任务的方法
US20240177506A1 (en) Method and Apparatus for Generating Captioning Device, and Method and Apparatus for Outputting Caption
KR102608867B1 (ko) 업계 텍스트를 증분하는 방법, 관련 장치 및 매체에 저장된 컴퓨터 프로그램
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
Lhasiw et al. A bidirectional LSTM model for classifying Chatbot messages
CN116701734B (zh) 地址文本的处理方法、设备及计算机可读存储介质
CN117370524A (zh) 答复生成模型的训练方法、答复语句生成方法和装置
CN116089586B (zh) 基于文本的问题生成方法及问题生成模型的训练方法
WO2021072864A1 (zh) 文本相似度获取方法、装置、电子设备及计算机可读存储介质
US20230070966A1 (en) Method for processing question, electronic device and storage medium
CN114925681B (zh) 知识图谱问答问句实体链接方法、装置、设备及介质
CN114118049B (zh) 信息获取方法、装置、电子设备及存储介质
CN115630652A (zh) 客服会话情感分析系统、方法及计算机系统
CN115510193A (zh) 查询结果向量化方法、查询结果确定方法及相关装置
CN114416941A (zh) 融合知识图谱的对话知识点确定模型的生成方法及装置
CN113011177B (zh) 模型训练和词向量确定方法、装置、设备、介质和产品
CN114330345B (zh) 命名实体识别方法、训练方法、装置、电子设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23896341

Country of ref document: EP

Kind code of ref document: A1