CN114282513A - Matching method, system, intelligent terminal and storage medium for text semantic similarity - Google Patents
Matching method, system, intelligent terminal and storage medium for text semantic similarity Download PDFInfo
- Publication number
- CN114282513A CN114282513A CN202111620100.0A CN202111620100A CN114282513A CN 114282513 A CN114282513 A CN 114282513A CN 202111620100 A CN202111620100 A CN 202111620100A CN 114282513 A CN114282513 A CN 114282513A
- Authority
- CN
- China
- Prior art keywords
- sample
- similarity
- semantic similarity
- true
- text semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000012549 training Methods 0.000 claims abstract description 65
- 238000004364 calculation method Methods 0.000 claims abstract description 50
- 230000006870 function Effects 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 4
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 239000000523 sample Substances 0.000 description 182
- 239000000284 extract Substances 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请涉及人工智能领域中的自然语言处理技术,尤其是一种文本语义相似度的匹配方法、系统、智能终端及存储介质,其中方法包括获取历史数据作为训练样本集,所述训练样本集包括真样本、正样本和负样本;计算所述真样本与所述正样本之间的余弦相似度以及所述真样本和所述负样本之间的余弦相似度,基于计算结果训练预设的文本语义相似度匹配模型;将所述文本语义相似度匹配模型部署到线上平台;基于所述文本语义相似度匹配模型匹配标准问题并反馈至所述线上平台。本申请能够改善客服问题匹配的准确率较低的问题。
The present application relates to natural language processing technology in the field of artificial intelligence, in particular to a method, system, intelligent terminal and storage medium for matching text semantic similarity, wherein the method includes acquiring historical data as a training sample set, and the training sample set includes True sample, positive sample and negative sample; calculate the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample, and train the preset text based on the calculation result A semantic similarity matching model; deploying the text semantic similarity matching model to an online platform; matching standard questions based on the text semantic similarity matching model and feeding back to the online platform. The present application can improve the problem of low accuracy of customer service problem matching.
Description
技术领域technical field
本申请涉及人工智能领域中的自然语言处理技术,尤其是一种文本语义相似度的匹配方法、系统、智能终端及存储介质。The present application relates to natural language processing technology in the field of artificial intelligence, in particular to a method, system, intelligent terminal and storage medium for matching text semantic similarity.
背景技术Background technique
随着计算机互联网络的飞速发展,文本相似度计算在许多领域有着广泛的应用,尤其在现阶段客服问题匹配场景中;客服问题匹配场景的过程如下:客服针对用户提出的问题,通过判断问题的文本相似度,在数据库中检索与之相似的问题,并将检索到的问题反馈给用户。现阶段客服问题匹配场景中,判断文本相似度的方法主要是基于词频评估文本相似度,即统计两个文本中每个词语出现的次数,根据词语出现的次数构建文本向量,再通过计算两个文本向量之间的余弦相似度,反映两个文本之间的相似度。With the rapid development of computer Internet, text similarity calculation has been widely used in many fields, especially in the current customer service problem matching scenario; the process of customer service problem matching scenario is as follows: customer service responds to the questions raised by users, by judging the problem Text similarity, retrieve similar questions in the database, and feed back the retrieved questions to the user. In the current customer service problem matching scenario, the method for judging the text similarity is mainly to evaluate the text similarity based on the word frequency, that is, to count the number of occurrences of each word in the two texts, to construct a text vector according to the number of occurrences of the word, and then to calculate the two Cosine similarity between text vectors, reflecting the similarity between two texts.
在实现本申请的过程中,发明人发现上述技术至少存在以下问题:现阶段客服问题匹配场景中,基于词频评估文本相似度脱离了语言环境所带来的语义变化,忽略了用户的语言习惯,从而容易影响文本相似度的判断,导致客服问题匹配的准确率较低。In the process of realizing this application, the inventor found that the above technology has at least the following problems: in the current customer service problem matching scenario, the evaluation of text similarity based on word frequency is separated from the semantic changes brought by the language environment, ignoring the user's language habits, As a result, it is easy to affect the judgment of text similarity, resulting in a low accuracy rate of customer service problem matching.
发明内容SUMMARY OF THE INVENTION
为了改善客服问题匹配的准确率较低的问题,本申请提供一种文本语义相似度的匹配方法、系统、智能终端及存储介质。In order to improve the problem of low accuracy of customer service question matching, the present application provides a text semantic similarity matching method, system, intelligent terminal and storage medium.
第一方面,本申请提供一种文本语义相似度的匹配方法,采用如下的技术方案:In the first aspect, the present application provides a method for matching text semantic similarity, which adopts the following technical solutions:
一种文本语义相似度的匹配方法,包括以下步骤:A text semantic similarity matching method, comprising the following steps:
获取历史数据作为训练样本集,所述训练样本集包括真样本、正样本和负样本;Obtaining historical data as a training sample set, the training sample set includes true samples, positive samples and negative samples;
计算所述真样本与所述正样本之间的余弦相似度以及所述真样本和所述负样本之间的余弦相似度,基于计算结果训练预设的文本语义相似度匹配模型;Calculate the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample, and train a preset text semantic similarity matching model based on the calculation result;
将所述文本语义相似度匹配模型部署到线上平台;deploying the text semantic similarity matching model to an online platform;
基于所述文本语义相似度匹配模型匹配标准问题并反馈至所述线上平台。Standard questions are matched based on the text semantic similarity matching model and fed back to the online platform.
通过采用上述技术方案,获取历史数据作为训练样本集,训练样本集包括真样本、正样本和负样本;再基于真样本与正样本的余弦相似度以及真样本与负样本的余弦相似度对文本语义相似度匹配模型进行训练,训练后将文本语义相似度匹配模型部署到线上平台并将标准问题反馈至线上平台,通过对文本语义相似度匹配模型进行训练从而提升用户实际输入的问题与反馈给用户的标准问题的相似度,进而提高客服问题匹配的准确率。By adopting the above technical solution, the historical data is obtained as a training sample set, and the training sample set includes real samples, positive samples and negative samples; and then based on the cosine similarity between the real samples and the positive samples and the cosine similarity between the real samples and the negative samples, the text is analyzed. The semantic similarity matching model is trained. After training, the text semantic similarity matching model is deployed to the online platform and the standard questions are fed back to the online platform. By training the text semantic similarity matching model, the problems and problems actually input by users are improved. The similarity of the standard questions fed back to the user, thereby improving the accuracy of customer service question matching.
在一个具体的可实施方案中,所述真样本包括用户线上真实输入的问题;所述正样本包括用户选取的所述标准问题和客服针对用户的真实输入配置的所述标准问题;所述负样本包括用户没有选取的所述标准问题;In a specific implementation, the real samples include questions that are actually input by the user online; the positive samples include the standard questions selected by the user and the standard questions configured by customer service for the real input of the user; the Negative samples include said standard questions not selected by the user;
通过采用上述技术方案,构建足够的训练样本并对样本进行细致地划分,便于模型进行训练从而提升客服问题匹配的准确度。By adopting the above technical solution, sufficient training samples are constructed and the samples are divided into detail, which is convenient for model training and improves the accuracy of customer service problem matching.
在一个具体的可实施方案中,所述计算所述真样本与所述正样本之间的余弦相似度以及所述真样本和所述负样本之间的余弦相似度,基于计算结果训练预设的文本语义相似度匹配模型包括:In a specific implementation, the calculating the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample, and train a preset based on the calculation result The text semantic similarity matching models include:
分别计算所述真样本与所述正样本之间的余弦相似度以及所述真样本与所述负样本之间的余弦相似度,余弦相似度计算公式如下:Calculate the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample, and the cosine similarity calculation formula is as follows:
C0 = Cosine(T,P);C0 = Cosine(T,P);
C1 = Cosine(T,N1);C1 = Cosine(T,N1);
Ck = Cosine(T,Nk);Ck = Cosine(T,Nk);
其中T代表真样本,P代表正样本,N代表负样本,k代表负样本的个数;where T represents true samples, P represents positive samples, N represents negative samples, and k represents the number of negative samples;
约束所述真样本与所述正样本之间的余弦相似度大于等于所述真样本和所述负样本之间的余弦相似度,约束公式如下:Constrain the cosine similarity between the true sample and the positive sample to be greater than or equal to the cosine similarity between the true sample and the negative sample, and the constraint formula is as follows:
C0 = Max(C0,C1,…,Ck)。C0 = Max(C0,C1,…,Ck).
通过采用上述技术方案,由于在脱离语义环境时有时会出现真样本与负样本的余弦相似度大于真样本与正样本的余弦相似度的情况,因此在文本语义相似度匹配模型的训练过程中需要始终满足真样本与正样本的余弦相似度大于等于真样本与负样本的余弦相似度。By adopting the above technical solution, since sometimes the cosine similarity between the real sample and the negative sample is greater than the cosine similarity between the real sample and the positive sample when it is separated from the semantic environment, it is necessary in the training process of the text semantic similarity matching model. Always satisfy the cosine similarity between true samples and positive samples is greater than or equal to the cosine similarity between true samples and negative samples.
在一个具体的可实施方案中,选取Softmax函数将所述约束公式施加到所述余弦相似度计算公式得到Softmax(C0):In a specific implementation, a Softmax function is selected to apply the constraint formula to the cosine similarity calculation formula to obtain Softmax (C0):
Softmax(C0) = Max(Softmax(C0), Softmax(C1),…, Softmax(Ck));Softmax(C0) = Max(Softmax(C0), Softmax(C1),…, Softmax(Ck));
将用户输入的真实问题与用户选取的所述标准问题之间的误差定义为Loss,在Loss的计算过程中,约束所述真样本与所述正样本的余弦相似度始终大于等于所述真样本与所述负样本的余弦相似度,Loss的计算公式如下:The error between the real problem input by the user and the standard problem selected by the user is defined as Loss, and in the calculation process of Loss, the cosine similarity between the real sample and the positive sample is constrained to be greater than or equal to the real sample. The cosine similarity with the negative sample, Loss is calculated as follows:
Loss = - log(Softmax(C0))。Loss = - log(Softmax(C0)).
通过采用上述技术方案,Loss能够更加形象地表示真样本与正样本的余弦相似度高于真样本与负样本的余弦相似度,当文本语义相似度匹配模型的Loss越小时,则认为文本语义相似度匹配模型预测的结果越准确,通过对文本语义相似度匹配模型训练使Loss能够达到最小值,从而实现模型训练的最终目的。By adopting the above technical solution, Loss can more vividly represent that the cosine similarity between true samples and positive samples is higher than the cosine similarity between true samples and negative samples. When the text semantic similarity matching model's Loss is smaller, it is considered that the text semantics are similar The more accurate the prediction result of the degree matching model is, the loss can reach the minimum value by training the text semantic similarity matching model, so as to achieve the final purpose of model training.
在一个具体的可实施方案中,基于标注数据对所述文本语义相似度匹配模型进行有监督训练,所述标注数据包括用户真实点击的所述标准问题和用户真实输入的问题。In a specific implementation, supervised training is performed on the text semantic similarity matching model based on labeled data, where the labeled data includes the standard questions that are actually clicked by the user and questions that are actually input by the user.
通过采用上述技术方案,通过标注数据有监督训练文本语义相似度匹配模型,能够使得模型具有对未知数据进行预测和分类的能力。By adopting the above technical solution, the text semantic similarity matching model is supervised and trained by labeling data, so that the model can have the ability to predict and classify unknown data.
在一个具体的可实施方案中,随机抽取所述标注数据作为所述真样本与所述正样本相对的所述负样本。In a specific implementation, the labeled data is randomly selected as the negative sample relative to the true sample and the positive sample.
通过采用上述技术方案,随机抽取标注数据作为真样本与正样本相对的负样本,通过随机抽取标注数据作为负样本,提高了负样本与真样本之间的相似度,由于真样本与正样本的相似度总是大于等于真样本与负样本的相似度,使得正样本与真样本之间的相似度进一步地提高,从而增强文本语义相似度模型的训练效果。By adopting the above technical solution, the labeled data is randomly selected as the negative sample relative to the true sample and the positive sample, and the labeled data is randomly selected as the negative sample, which improves the similarity between the negative sample and the true sample. The similarity is always greater than or equal to the similarity between the real sample and the negative sample, so that the similarity between the positive sample and the real sample is further improved, thereby enhancing the training effect of the text semantic similarity model.
在一个具体的可实施方案中,所述文本语义相似度匹配模型包括所述真样本与所述正样本计算模块以及所述真样本与所述负样本计算模块;In a specific implementation, the text semantic similarity matching model includes the true sample and the positive sample calculation module and the true sample and the negative sample calculation module;
所述部署所述文本语义相似度匹配模型到线上平台之前还包括:The deploying the text semantic similarity matching model to the online platform further includes:
对所述文本语义相似度匹配模型进行切割并保留所述真样本和所述正样本计算模块。The text semantic similarity matching model is cut and the true sample and the positive sample calculation module are retained.
通过采用上述技术方案,文本语义相似度匹配模型是由真样本与正样本的计算模块和真样本与负样本的计算模块组合而成,通过去除文本语义相似度匹配模型中的真样本与负样本的计算模块,便于直接调用真样本与正样本的计算模块从而在正样本集合中进行文本相似度的匹配,能够有效缩短文本的相似度匹配时间和检索时间,提升匹配效率。By adopting the above technical solution, the text semantic similarity matching model is composed of a calculation module for real samples and positive samples and a calculation module for real samples and negative samples. The calculation module is convenient to directly call the calculation module of the real sample and the positive sample to match the text similarity in the positive sample set, which can effectively shorten the text similarity matching time and retrieval time, and improve the matching efficiency.
第二方面,本申请提供一种文本语义相似度的匹配系统,采用如下的技术方案:In the second aspect, the present application provides a text semantic similarity matching system, which adopts the following technical solutions:
一种文本语义相似度的匹配系统,包括:A text semantic similarity matching system, including:
数据获取模块,用于获取历史数据作为训练样本集,所述训练样本集包括真样本、正样本和负样本;a data acquisition module for acquiring historical data as a training sample set, the training sample set including true samples, positive samples and negative samples;
模型训练模块,用于计算所述真样本与所述正样本之间的余弦相似度以及所述真样本和所述负样本之间的余弦相似度,基于计算结果训练预设的文本语义相似度匹配模型;The model training module is used to calculate the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample, and train the preset text semantic similarity based on the calculation result matching model;
模型部署模块,用于将所述文本语义相似度匹配模型部署到线上平台;a model deployment module for deploying the text semantic similarity matching model to an online platform;
数据反馈模块,用于基于所述文本语义相似度匹配模型匹配标准问题并反馈至所述线上平台。A data feedback module, configured to match standard questions based on the text semantic similarity matching model and feed back to the online platform.
通过采用上述技术方案,获取历史数据作为训练样本集,训练样本集包括真样本、正样本和负样本;再基于真样本与正样本的余弦相似度以及真样本与负样本的余弦相似度对文本语义相似度匹配模型进行训练,训练后将文本语义相似度匹配模型部署到线上平台并将标准问题反馈至线上平台,通过对文本语义相似度匹配模型进行训练从而提升用户实际输入的问题与反馈给用户的标准问题的相似度,进而提高客服问题匹配的准确率。By adopting the above technical solution, the historical data is obtained as a training sample set, and the training sample set includes real samples, positive samples and negative samples; and then based on the cosine similarity between the real samples and the positive samples and the cosine similarity between the real samples and the negative samples, the text is analyzed. The semantic similarity matching model is trained. After training, the text semantic similarity matching model is deployed to the online platform and the standard questions are fed back to the online platform. By training the text semantic similarity matching model, the problems and problems actually input by users are improved. The similarity of the standard questions fed back to the user, thereby improving the accuracy of customer service question matching.
第三方面,本申请提供一种智能终端,采用如下的技术方案:In the third aspect, the application provides an intelligent terminal, which adopts the following technical solutions:
一种智能终端,所述智能终端包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如第一方面任一所述的一种文本语义相似度的匹配方法。An intelligent terminal, the intelligent terminal includes a processor and a memory, the memory stores at least one instruction, at least a section of program, code set or instruction set, the at least one instruction, the at least section of program, the code The set or instruction set is loaded and executed by the processor to implement a method for matching text semantic similarity according to any one of the first aspect.
通过采用上述技术方案,智能终端中的处理器可以根据存储器中存储的相关计算机程序,实现上述一种文本语义相似度的匹配方法,从而提高文本语义相似度的精确度,进而提高客服问题匹配的准确率。By adopting the above technical solution, the processor in the intelligent terminal can implement the above-mentioned method for matching the semantic similarity of text according to the relevant computer program stored in the memory, thereby improving the accuracy of the semantic similarity of the text, thereby improving the accuracy of customer service problem matching. Accuracy.
第四方面,本申请提供一种计算机可读存储介质,采用如下的技术方案:In a fourth aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:
一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如第一方面任一所述的一种文本语义相似度的匹配方法。A computer-readable storage medium having stored therein at least one instruction, at least one piece of program, code set or instruction set, said at least one instruction, said at least one piece of program, said code set or instruction set processed by The processor is loaded and executed to implement a text semantic similarity matching method according to any one of the first aspects.
通过采用上述技术方案,能够存储相应的程序,从而提高文本语义相似度的精确度,进而提高客服问题匹配的准确率。By adopting the above technical solution, a corresponding program can be stored, thereby improving the accuracy of text semantic similarity, thereby improving the accuracy of customer service problem matching.
综上所述,本申请包括以下至少一种有益技术效果:To sum up, the present application includes at least one of the following beneficial technical effects:
1.获取历史数据作为训练样本集,训练样本集包括真样本、正样本和负样本;再基于真样本与正样本的余弦相似度以及真样本与负样本的余弦相似度对文本语义相似度匹配模型进行训练,训练后将文本语义相似度匹配模型部署到线上平台并将标准问题反馈至线上平台,通过对文本语义相似度匹配模型进行训练从而提升用户实际输入的问题与反馈给用户的标准问题的相似度,进而提高客服问题匹配的准确率;1. Obtain historical data as a training sample set. The training sample set includes real samples, positive samples and negative samples; and then match the text semantic similarity based on the cosine similarity between the real sample and the positive sample and the cosine similarity between the real sample and the negative sample The model is trained. After training, the text semantic similarity matching model is deployed to the online platform and the standard questions are fed back to the online platform. By training the text semantic similarity matching model, the problems actually input by users and the feedback to users can be improved. The similarity of standard questions, thereby improving the accuracy of customer service question matching;
2.随机抽取标注数据作为真样本与正样本相对的负样本,通过随机抽取标注数据作为负样本,提高了负样本与真样本之间的相似度,由于真样本与正样本的相似度总是大于等于真样本与负样本的相似度,使得正样本与真样本之间的相似度进一步地提高,从而增强文本语义相似度模型的训练效果;2. Randomly extract the labeled data as the negative sample relative to the true sample and the positive sample. By randomly extracting the labeled data as the negative sample, the similarity between the negative sample and the true sample is improved. Since the similarity between the true sample and the positive sample is always It is greater than or equal to the similarity between the real sample and the negative sample, so that the similarity between the positive sample and the real sample is further improved, thereby enhancing the training effect of the text semantic similarity model;
3.文本语义相似度匹配模型是由真样本与正样本的计算模块和真样本与负样本的计算模块组合而成,通过去除文本语义相似度匹配模型中的真样本与负样本的计算模块,便于直接调用真样本与正样本的计算模块从而在正样本集合中进行文本相似度的匹配,能够有效缩短文本的相似度匹配时间和检索时间,提升匹配效率。3. The text semantic similarity matching model is composed of the calculation module of the real sample and the positive sample and the calculation module of the real sample and the negative sample. By removing the calculation module of the real sample and the negative sample in the text semantic similarity matching model, it is convenient to Directly calling the calculation module of the real sample and the positive sample to match the text similarity in the positive sample set can effectively shorten the text similarity matching time and retrieval time, and improve the matching efficiency.
附图说明Description of drawings
图1是本申请实施例中文本语义相似度的匹配方法的流程示意图。FIG. 1 is a schematic flowchart of a method for matching text semantic similarity in an embodiment of the present application.
图2是本申请实施例中文本语义相似度的匹配系统的结构框图。FIG. 2 is a structural block diagram of a text semantic similarity matching system in an embodiment of the present application.
图3是本申请实施例中文本语义相似度的匹配方法的流程示意图。FIG. 3 is a schematic flowchart of a method for matching text semantic similarity in an embodiment of the present application.
附图标记说明:100、数据获取模块;200、模型训练模块;300、模型部署模块;400、数据反馈模块。Reference numeral description: 100, a data acquisition module; 200, a model training module; 300, a model deployment module; 400, a data feedback module.
具体实施方式Detailed ways
以下结合附图对本申请作进一步详细说明。The present application will be further described in detail below with reference to the accompanying drawings.
本申请实施例公开一种文本语义相似度的匹配方法,该方法可以应用于智能终端中,以智能终端为执行主体,用于实现在客服问题的匹配场景下,根据用户线上真实输入的实际问题,提取实际问题中的文本语义特征并在标准问题库中检索与之相似的标准问题,通过文本语义相似度判断最为相似的标准问题并将检索到相似度最高的若干标准问题反馈给用户供用户进行选取。其中,文本语义相似度是指在依据文本字词的基础上提取文本的高维度的语义特征,然后通过相似度计算,从而能够衡量不同文本间的相似程度。The embodiment of the present application discloses a text semantic similarity matching method. The method can be applied to an intelligent terminal, with the intelligent terminal as the execution subject, and is used to realize in the matching scenario of customer service problems, according to the actual input of the user online. It extracts the text semantic features in the actual question and retrieves the similar standard questions in the standard question database, judges the most similar standard questions by the text semantic similarity, and retrieves some standard questions with the highest similarity and feeds them back to the user for information. The user makes a selection. Among them, the text semantic similarity refers to extracting high-dimensional semantic features of the text based on the text words, and then calculating the similarity, so as to measure the similarity between different texts.
参照图1,文本语义相似度的匹配方法包括以下步骤:Referring to Fig. 1, the matching method of text semantic similarity includes the following steps:
S101、获取历史数据作为训练样本集,训练样本集包括真样本、正样本和负样本。S101. Obtain historical data as a training sample set, where the training sample set includes true samples, positive samples and negative samples.
在实施中,首先要获取历史数据作为训练样本集,训练样本集包括真样本、正样本和负样本;真样本是用户在线上平台真实输入的问题,正样本是用户线上选取的标准问题或者客服工作人员针对用户的真实输入而配置的标准问题,负样本是在线上平台反馈的标准问题中用户没有选取的标准问题。In the implementation, historical data should be obtained first as a training sample set. The training sample set includes real samples, positive samples and negative samples; the real samples are the real input problems of the user's online platform, and the positive samples are the standard problems selected by the user online or The standard questions configured by the customer service staff according to the real input of the user, and the negative samples are the standard questions that the user did not select among the standard questions fed back by the online platform.
S102、计算真样本与正样本之间的余弦相似度以及真样本和负样本之间的余弦相似度,基于计算结果训练预设的文本语义相似度匹配模型。S102: Calculate the cosine similarity between the real sample and the positive sample and the cosine similarity between the real sample and the negative sample, and train a preset text semantic similarity matching model based on the calculation result.
在实施中,首先通过预设的文本语义相似度匹配模型从真样本、正样本和负样本中提取高维语义特征,文本语义相似度匹配模型中使用Albert作为训练样本集的高维语义特征提取的基础结构,Albert是一种提取文本特征的深度预训练模型,相较于其他较常用的训练模型,Albert中使用了参数降低技术用来减少内存消耗从而提高模型的训练速度。In the implementation, high-dimensional semantic features are first extracted from real samples, positive samples and negative samples through a preset text semantic similarity matching model, and Albert is used as the high-dimensional semantic feature extraction of the training sample set in the text semantic similarity matching model. The basic structure of Albert is a deep pre-training model that extracts text features. Compared with other commonly used training models, Albert uses parameter reduction technology to reduce memory consumption and improve the training speed of the model.
在实施中,模型训练是按照正样本:负样本为1:4的比例进行训练。通过文本语义相似度匹配模型提取完真样本、正样本和负样本的语义特征后,利用余弦相似度公式分别计算真样本与正样本的余弦相似度以及真样本与负样本的余弦相似度。余弦相似度是衡量文本之间相似度的标准,余弦相似度越接近1,则表示两者相似度越高,越接近0,则表示两者越独立。In the implementation, the model training is carried out according to the ratio of positive samples: negative samples is 1:4. After extracting the semantic features of real samples, positive samples and negative samples through the text semantic similarity matching model, the cosine similarity between the real samples and the positive samples and the cosine similarity between the real samples and the negative samples are calculated respectively by the cosine similarity formula. Cosine similarity is a measure of the similarity between texts. The closer the cosine similarity is to 1, the higher the similarity between the two is, and the closer to 0, the more independent the two are.
由于在脱离语义环境时有时会出现真样本与负样本的余弦相似度大于真样本与正样本的余弦相似度的情况,因此在文本语义相似度匹配模型的训练过程中需要始终满足真样本与正样本的余弦相似度大于等于真样本与负样本的余弦相似度。Since the cosine similarity between the real sample and the negative sample is sometimes greater than the cosine similarity between the real sample and the positive sample when it is separated from the semantic environment, it is necessary to always satisfy the real sample and the positive sample during the training process of the text semantic similarity matching model. The cosine similarity of the sample is greater than or equal to the cosine similarity of the true sample and the negative sample.
为了更加形象地表示真样本与正样本的余弦相似度高于真样本与负样本的余弦相似度,将深度学习中用户输入的真实问题与用户选取的标准问题之间的误差定义为Loss,当文本语义相似度匹配模型的Loss越小时,则认为文本语义相似度匹配模型预测的结果越准确,通过对文本语义相似度匹配模型训练使Loss能够达到最小值。其中,Loss在计算时,真样本与正样本的余弦相似度始终大于等于真样本与负样本的余弦相似度。In order to more vividly indicate that the cosine similarity between the real sample and the positive sample is higher than the cosine similarity between the real sample and the negative sample, the error between the real problem input by the user and the standard problem selected by the user in deep learning is defined as Loss, when The smaller the Loss of the text semantic similarity matching model is, the more accurate the result predicted by the text semantic similarity matching model is considered, and the loss can reach the minimum value by training the text semantic similarity matching model. Among them, when Loss is calculated, the cosine similarity between the true sample and the positive sample is always greater than or equal to the cosine similarity between the true sample and the negative sample.
具体的,Loss的计算过程为,首先分别计算真样本(T)与正样本(P)之间的余弦相似度以及真样本(T)与负样本(N)之间的余弦相似度,其中k是负样本(N)的数量,余弦相似度的计算公式如下:Specifically, the calculation process of Loss is to first calculate the cosine similarity between the true sample (T) and the positive sample (P) and the cosine similarity between the true sample (T) and the negative sample (N), where k is the number of negative samples (N), and the calculation formula of cosine similarity is as follows:
C0 = Cosine(T,P);C0 = Cosine(T,P);
C1 = Cosine(T,N1);C1 = Cosine(T,N1);
Ck = Cosine(T,Nk);Ck = Cosine(T,Nk);
为了保证真样本与正样本的相似度大于等于真样本与负样本的相似度,对余弦相似度的计算公式进行约束,约束公式如下:In order to ensure that the similarity between the true sample and the positive sample is greater than or equal to the similarity between the true sample and the negative sample, the calculation formula of the cosine similarity is constrained. The constraint formula is as follows:
C0 = Max(C0,C1,…,Ck);C0 = Max(C0,C1,…,Ck);
通过约束得到的C0是计算得到的C0,C1,…,Ck所组成的数集中最大的值,其中约束公式中的C0代表真样本与正样本之间的余弦相似度,从而能够保证真样本与正样本之间的余弦相似度总是大于真样本与负样本之间的余弦相似度。举例来说,C1,…,Ck中最大值为0.8,C0的值为0.6,C0会被约束公式赋值为最大值即为0.8,以此来满足真样本与正样本之间的余弦相似度总是大于等于真样本与负样本之间的余弦相似度。The C0 obtained by the constraint is the largest value in the data set composed of the calculated C0, C1, ..., Ck, where C0 in the constraint formula represents the cosine similarity between the true sample and the positive sample, so as to ensure that the true sample and the positive sample are the same. The cosine similarity between positive samples is always greater than the cosine similarity between true samples and negative samples. For example, the maximum value of C1,...,Ck is 0.8, the value of C0 is 0.6, and C0 will be assigned the maximum value of 0.8 by the constraint formula, so as to satisfy the total cosine similarity between the true sample and the positive sample. is greater than or equal to the cosine similarity between true samples and negative samples.
具体的,由于余弦相似度取值的范围为0-1,因此选取同样是函数且取值范围同样为0-1的Softmax函数将约束公式施加到C0的余弦相似度计算公式中,并得到Softmax(C0),将之前计算得到的C0置换成Softmax(C0)来作为真样本(T)与正样本(P)之间的余弦相似度,Softmax(C0)的计算公式如下:Specifically, since the cosine similarity value range is 0-1, the Softmax function that is also a function and the value range is also 0-1 is selected to apply the constraint formula to the cosine similarity calculation formula of C0, and obtain Softmax (C0), replace the previously calculated C0 with Softmax (C0) as the cosine similarity between the true sample (T) and the positive sample (P). The calculation formula of Softmax (C0) is as follows:
Softmax(C0) = Max(Softmax(C0), Softmax(C1),…, Softmax(Ck));Softmax(C0) = Max(Softmax(C0), Softmax(C1),…, Softmax(Ck));
由于Softmax的值最大为1,所以最终需要让Softmax(C0)无线趋近于1,这样就满足了真样本(T)与正样本(P)之间的余弦相似度是最大值且总是大于等于真样本(T)与负样本(N)之间的余弦相似度。Since the value of Softmax is at most 1, it is finally necessary to let Softmax (C0) approach 1 wirelessly, so that the cosine similarity between the true sample (T) and the positive sample (P) is the maximum value and is always greater than Equal to the cosine similarity between true samples (T) and negative samples (N).
由于模型训练的最终目的为让Loss达到最小值,此时真样本(T)与正样本(P)之间的余弦相似度即Softmax(C0)为最大值,因此Loss的计算公式如下:Since the ultimate purpose of model training is to make Loss reach the minimum value, at this time, the cosine similarity between the true sample (T) and the positive sample (P), that is, Softmax (C0), is the maximum value, so the calculation formula of Loss is as follows:
Loss = - log(Softmax(C0));Loss = - log(Softmax(C0));
在一个实施例中,基于训练样本集对文本语义相似度匹配模型训练的过程为有监督训练,而有监督训练需要依赖于标注数据,标注数据包括用户真实点击的标准问题和用户真实输入的问题。举例来说,用户在智能终端上输入了“怎么退飞机票”的问题后,智能终端反馈给用户五个相似度最高的标准问题供用户进行选取,用户选取其中一个标准问题,则对用户选取的标准问题和用户真实输入的问题进行标注并作为一组标注数据。通过标注数据有监督训练文本语义相似度匹配模型,能够使得模型具有对未知数据进行预测和分类的能力。In one embodiment, the process of training the text semantic similarity matching model based on the training sample set is supervised training, and the supervised training needs to rely on labeled data, and the labeled data includes the standard question of the user's actual click and the question of the user's actual input . For example, after the user enters the question "How to refund the plane ticket" on the smart terminal, the smart terminal feeds back five standard questions with the highest similarity for the user to choose. The standard questions and real user input questions are annotated and used as a set of labeled data. The supervised training of text semantic similarity matching model by labeling data enables the model to have the ability to predict and classify unknown data.
在一个实施例中,为了提升文本语义相似度模型的训练效果,使得客服问题匹配的准确率升高,可以随机抽取标注数据作为真样本与正样本相对的负样本,通过随机抽取标注数据作为负样本,提高了负样本与真样本之间的相似度,由于真样本与正样本的相似度总是大于等于真样本与负样本的相似度,使得正样本与真样本之间的相似度进一步地提高,从而增强文本语义相似度模型的训练效果。In one embodiment, in order to improve the training effect of the text semantic similarity model and increase the accuracy of customer service question matching, the labeled data can be randomly selected as the negative sample relative to the true sample and the positive sample, and the labeled data can be randomly selected as the negative sample. The similarity between the negative samples and the real samples is improved. Since the similarity between the real samples and the positive samples is always greater than or equal to the similarity between the real samples and the negative samples, the similarity between the positive samples and the real samples is further improved. to enhance the training effect of the text semantic similarity model.
S103、将文本语义相似度匹配模型部署到线上平台。S103. Deploy the text semantic similarity matching model to an online platform.
具体的,文本语义相似度匹配模型包括真样本与正样本计算模块以及真样本与负样本计算模块,真样本与正样本计算模块具体为计算出的真样本与正样本之间的余弦相似度的数值集合,真样本与负样本计算模块具体为计算出的真样本与负样本之间的余弦相似度的数值集合。在实施中,去除文本语义相似度匹配模型中的真样本与负样本计算模块并保留真样本与正样本计算模块,便于直接调用真样本与正样本计算模块从而在正样本集合中进行文本相似度的匹配,能够有效缩短文本的相似度匹配时间和检索时间,提升匹配效率。Specifically, the text semantic similarity matching model includes a real sample and a positive sample calculation module and a real sample and a negative sample calculation module, and the real sample and positive sample calculation module is specifically the calculated cosine similarity between the real sample and the positive sample The numerical set, the true sample and the negative sample calculation module is specifically the numerical set of the calculated cosine similarity between the true sample and the negative sample. In the implementation, the real sample and negative sample calculation module in the text semantic similarity matching model is removed, and the real sample and positive sample calculation module is retained, which is convenient to directly call the real sample and positive sample calculation module to calculate the text similarity in the positive sample set. It can effectively shorten the text similarity matching time and retrieval time, and improve the matching efficiency.
S104、基于文本语义相似度匹配模型匹配标准问题并反馈至线上平台。S104 , matching the standard question based on the text semantic similarity matching model and feeding it back to the online platform.
具体的,用户在线上平台输入问题后,首先文本语义相似度匹配模型提取能够用户输入问题的语义特征,随后在正样本集合中提取正样本的语义特征并与用户问题的语义特征进行匹配,最后匹配出相似度最高的五个标准问题并反馈给用户供其选取。Specifically, after the user enters a question on the online platform, the text semantic similarity matching model first extracts the semantic features that can be input by the user, and then extracts the semantic features of the positive samples from the positive sample set and matches with the semantic features of the user's question, and finally Five standard questions with the highest similarity are matched and fed back to the user for selection.
图3示出了文本语义相似度的匹配方法的流程示意图,文本语义相似度模型首先获取真样本、正样本和负样本作为训练样本集,随后文本语义相似度匹配模型使用Albert作为基础结构分别提取真样本、正样本和负样本的高维语义特征,随后分别计算真样本与正样本的余弦相似度以及真样本与负样本的余弦相似度,并约束真样本与正样本的余弦相似度大于等于真样本与负样本的余弦相似度对文本语义相似度匹配模型进行有监督训练,模型训练结束后会将真样本、正样本和负样本之间计算得出的数值进行反向传播并存储到模型中。在将文本语义相似度匹配模型应用到线上平台时,只保留真样本与正样本计算模块并部署到线上平台。用户在线上平台输入问题后,对用户输入的问题提取文本语义特征并在正样本集合中自动搜寻最为相似的正样本语义特征,获得最为相似的五个正样本即五个标准问题并反馈至用户供其选取,以此完成文本语义相似度的匹配过程。Figure 3 shows a schematic flowchart of the matching method of text semantic similarity. The text semantic similarity model first obtains true samples, positive samples and negative samples as training sample sets, and then the text semantic similarity matching model uses Albert as the basic structure to extract High-dimensional semantic features of real samples, positive samples and negative samples, and then calculate the cosine similarity between real samples and positive samples and the cosine similarity between real samples and negative samples, and constrain the cosine similarity between real samples and positive samples to be greater than or equal to The cosine similarity between true samples and negative samples is used to conduct supervised training of the text semantic similarity matching model. After the model training is completed, the calculated values between the true samples, positive samples and negative samples will be back-propagated and stored in the model. middle. When applying the text semantic similarity matching model to the online platform, only the real sample and positive sample calculation modules are retained and deployed to the online platform. After the user enters a question on the online platform, extract the textual semantic features of the question input by the user and automatically search for the most similar positive sample semantic features in the positive sample set, obtain the five most similar positive samples, namely five standard questions, and feed them back to the user. For its selection, in order to complete the text semantic similarity matching process.
本申请实施例还公开一种文本语义相似度的匹配系统。参照图2,文本语义相似度的匹配系统包括:The embodiment of the present application also discloses a text semantic similarity matching system. Referring to Figure 2, the matching system for text semantic similarity includes:
数据获取模块100,用于获取历史数据作为训练样本集,训练样本集包括真样本、正样本和负样本;真样本是用户在线上平台真实输入的问题,正样本是用户线上选取的标准问题或者客服工作人员针对用户的真实输入而配置的标准问题,负样本是在线上平台反馈的标准问题中用户没有选取的标准问题。The
模型训练模块200,用于计算真样本与正样本之间的余弦相似度以及真样本和负样本之间的余弦相似度,基于计算结果训练预设的文本语义相似度匹配模型;计算真样本与正样本之间的余弦相似度以及真样本和负样本之间的余弦相似度,基于计算结果训练预设的文本语义相似度匹配模型包括:The
分别计算真样本与正样本之间的余弦相似度以及真样本与负样本之间的余弦相似度,余弦相似度计算公式如下:Calculate the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample respectively. The calculation formula of the cosine similarity is as follows:
C0 = Cosine(T,P);C0 = Cosine(T,P);
C1 = Cosine(T,N1);C1 = Cosine(T,N1);
Ck = Cosine(T,Nk);Ck = Cosine(T,Nk);
其中T代表真样本,P代表正样本,N代表负样本,k代表负样本的个数;where T represents true samples, P represents positive samples, N represents negative samples, and k represents the number of negative samples;
约束真样本与正样本之间的余弦相似度大于等于真样本和负样本之间的余弦相似度,约束公式如下:Constrain the cosine similarity between true samples and positive samples to be greater than or equal to the cosine similarity between true samples and negative samples. The constraint formula is as follows:
C0 = Max(C0,C1,…,Ck)。C0 = Max(C0,C1,…,Ck).
选取Softmax函数将约束公式施加到余弦相似度计算公式得到Softmax(C0):Select the Softmax function to apply the constraint formula to the cosine similarity calculation formula to obtain Softmax (C0):
Softmax(C0) = Max(Softmax(C0), Softmax(C1),…, Softmax(Ck));Softmax(C0) = Max(Softmax(C0), Softmax(C1),…, Softmax(Ck));
将用户输入的真实问题与用户选取的标准问题之间的误差定义为Loss,在Loss的计算过程中,约束真样本与正样本的余弦相似度始终大于等于真样本与负样本的余弦相似度,Loss的计算公式如下:The error between the real problem input by the user and the standard problem selected by the user is defined as Loss. In the calculation process of Loss, the cosine similarity between the real sample and the positive sample is always greater than or equal to the cosine similarity between the real sample and the negative sample. The formula for calculating Loss is as follows:
Loss = - log(Softmax(C0))。Loss = - log(Softmax(C0)).
模型部署模块300,用于将文本语义相似度匹配模型部署到线上平台;A
数据反馈模块400,用于基于文本语义相似度匹配模型匹配标准问题并反馈至线上平台。The
可选的,模型部署模块300之前包括:Optionally, before the
模型切割模块,用于对文本语义相似度匹配模型进行切割并保留真样本和正样本计算模块。The model cutting module is used to cut the text semantic similarity matching model and retain the real sample and positive sample calculation module.
可选的,文本语义相似度的匹配系统还包括:Optionally, the text semantic similarity matching system further includes:
监督训练模块,用于基于标注数据对文本语义相似度匹配模型进行有监督训练。标注数据包括用户真实点击的标准问题和用户真实输入的问题。The supervised training module is used for supervised training of the text semantic similarity matching model based on the labeled data. Annotated data includes standard questions of real user clicks and real input questions of users.
可选的,监督训练模块包括:Optionally, the supervised training module includes:
数据增强子模块,用于随机抽取标注数据作为真样本与正样本相对的负样本。The data enhancement sub-module is used to randomly extract the labeled data as a negative sample relative to the true sample and the positive sample.
本申请实施例还公开一种智能终端,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述文本语义相似度的匹配方法的步骤。此处一种文本语义相似度的匹配方法的步骤可以是上述一种文本语义相似度的匹配方法中的步骤。The embodiment of the present application further discloses an intelligent terminal, including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the above text semantic similarity matching method. The steps of a text semantic similarity matching method here may be the steps in the above-mentioned text semantic similarity matching method.
本申请实施例还公开一种计算机可读存储介质,包括能够被处理器加载执行时实现上述一种文本语义相似度的匹配方法流程中的各个步骤。The embodiment of the present application further discloses a computer-readable storage medium, which includes various steps in the flow of the above-mentioned matching method for text semantic similarity when it can be loaded and executed by a processor.
计算机可读存储介质例如包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The computer-readable storage medium includes, for example: U disk, removable hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various storage media that can store program codes. medium.
所属领域的技术人员可以清楚地了解到,为描述的方便和简化,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and simplification of the description, only the division of the above-mentioned functional modules is used for illustration. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working processes of the systems, devices and units described above, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here. .
以上实施例仅用以对本申请的技术方案进行了详细介绍,但以上实施例的说明只是用于帮助理解本申请的方法及其核心思想,不应理解为对本申请的限制。本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。The above embodiments are only used to introduce the technical solutions of the present application in detail, but the descriptions of the above embodiments are only used to help understand the method and the core idea of the present application, and should not be construed as a limitation on the present application. Those skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application, which should all be covered by the protection scope of the present application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111620100.0A CN114282513A (en) | 2021-12-27 | 2021-12-27 | Matching method, system, intelligent terminal and storage medium for text semantic similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111620100.0A CN114282513A (en) | 2021-12-27 | 2021-12-27 | Matching method, system, intelligent terminal and storage medium for text semantic similarity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114282513A true CN114282513A (en) | 2022-04-05 |
Family
ID=80876687
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111620100.0A Pending CN114282513A (en) | 2021-12-27 | 2021-12-27 | Matching method, system, intelligent terminal and storage medium for text semantic similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114282513A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115329063A (en) * | 2022-10-18 | 2022-11-11 | 江西电信信息产业有限公司 | User intention identification method and system |
CN116127948A (en) * | 2023-02-10 | 2023-05-16 | 北京百度网讯科技有限公司 | Recommendation method and device for text data to be annotated and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111709247A (en) * | 2020-05-20 | 2020-09-25 | 北京百度网讯科技有限公司 | Data set processing method and device, electronic equipment and storage medium |
CN112182144A (en) * | 2020-12-01 | 2021-01-05 | 震坤行网络技术(南京)有限公司 | Search term normalization method, computing device, and computer-readable storage medium |
CN113221530A (en) * | 2021-04-19 | 2021-08-06 | 杭州火石数智科技有限公司 | Text similarity matching method and device based on circle loss, computer equipment and storage medium |
CN113821623A (en) * | 2021-09-29 | 2021-12-21 | 平安普惠企业管理有限公司 | Model training method, device, equipment and storage medium |
-
2021
- 2021-12-27 CN CN202111620100.0A patent/CN114282513A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111709247A (en) * | 2020-05-20 | 2020-09-25 | 北京百度网讯科技有限公司 | Data set processing method and device, electronic equipment and storage medium |
US20210365444A1 (en) * | 2020-05-20 | 2021-11-25 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for processing dataset |
CN112182144A (en) * | 2020-12-01 | 2021-01-05 | 震坤行网络技术(南京)有限公司 | Search term normalization method, computing device, and computer-readable storage medium |
CN113221530A (en) * | 2021-04-19 | 2021-08-06 | 杭州火石数智科技有限公司 | Text similarity matching method and device based on circle loss, computer equipment and storage medium |
CN113821623A (en) * | 2021-09-29 | 2021-12-21 | 平安普惠企业管理有限公司 | Model training method, device, equipment and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115329063A (en) * | 2022-10-18 | 2022-11-11 | 江西电信信息产业有限公司 | User intention identification method and system |
CN116127948A (en) * | 2023-02-10 | 2023-05-16 | 北京百度网讯科技有限公司 | Recommendation method and device for text data to be annotated and electronic equipment |
CN116127948B (en) * | 2023-02-10 | 2024-05-17 | 北京百度网讯科技有限公司 | Recommendation method and device for text data to be annotated and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111368049B (en) | Information acquisition method, information acquisition device, electronic equipment and computer readable storage medium | |
CN109992664B (en) | Dispute focus label classification method and device, computer equipment and storage medium | |
CN110019736B (en) | Question-answer matching method, system, equipment and storage medium based on language model | |
CN113377936B (en) | Intelligent question and answer method, device and equipment | |
KR20200007969A (en) | Information processing methods, terminals, and computer storage media | |
CN109086265B (en) | Semantic training method and multi-semantic word disambiguation method in short text | |
CN113821605B (en) | Event extraction method | |
CN111159375A (en) | Text processing method and device | |
CN112487154B (en) | Intelligent search method based on natural language | |
CN110781204A (en) | Identification information determination method, device, equipment and storage medium of target object | |
CN112579666B (en) | Intelligent question-answering system and method and related equipment | |
CN110555451A (en) | information identification method and device | |
CN109766550A (en) | A kind of text brand identification method, identification device and storage medium | |
CN112256845A (en) | Intention recognition method, device, electronic equipment and computer readable storage medium | |
CN117608650B (en) | Business flow chart generation method, processing device and storage medium | |
CN114282513A (en) | Matching method, system, intelligent terminal and storage medium for text semantic similarity | |
CN113297365A (en) | User intention determination method, device, equipment and storage medium | |
CN114791975A (en) | Cross-platform AI model recommendation system and method | |
CN112784580A (en) | Financial data analysis method and device based on event extraction | |
CN109033078B (en) | Sentence category recognition method and device, storage medium, processor | |
CN115526171A (en) | Intention identification method, device, equipment and computer readable storage medium | |
CN112580348B (en) | Policy text relevance analysis method and system | |
CN117633358A (en) | Content recommendation method, content recommendation device, and storage medium | |
CN118069822A (en) | Recommendation method, recommendation device, recommendation equipment and storage medium | |
CN118796991A (en) | Dialogue prompt text regeneration method, device, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |