WO2020258506A1 - Text information matching degree detection method and apparatus, computer device and storage medium - Google Patents

Text information matching degree detection method and apparatus, computer device and storage medium Download PDF

Info

Publication number
WO2020258506A1
WO2020258506A1 PCT/CN2019/103650 CN2019103650W WO2020258506A1 WO 2020258506 A1 WO2020258506 A1 WO 2020258506A1 CN 2019103650 W CN2019103650 W CN 2019103650W WO 2020258506 A1 WO2020258506 A1 WO 2020258506A1
Authority
WO
WIPO (PCT)
Prior art keywords
text information
feature vector
vector
similarity
preset
Prior art date
Application number
PCT/CN2019/103650
Other languages
French (fr)
Chinese (zh)
Inventor
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020258506A1 publication Critical patent/WO2020258506A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A text information matching degree detection method and apparatus, a computer device and a storage medium. The method comprises: acquiring object text information and corresponding reference text information thereof; converting the object text information into a first implicit feature vector, and converting the reference text information into a second implicit feature vector; calculating the vector similarity between the first implicit feature vector and the second implicit feature vector; and acquiring a logistic regression model according to the object text information and a preset keyword, and inputting the vector similarity into the logistic regression model to obtain a matching degree of the object text information between the object text information and the reference text information.

Description

文本信息匹配度检测方法、装置、计算机设备和存储介质Text information matching degree detection method, device, computer equipment and storage medium
本申请要求于2019年6月27日提交中国专利局、申请号为2019105694717,发明名称为“文本信息匹配度检测方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on June 27, 2019, the application number is 2019105694717, and the invention title is "text information matching degree detection method, device, computer equipment and storage medium", and its entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及计算机技术领域,特别是涉及一种文本信息匹配度检测方法、装置、计算机设备和非易失性存储介质。This application relates to the field of computer technology, and in particular to a method, device, computer equipment and non-volatile storage medium for detecting matching degree of text information.
背景技术Background technique
文本匹配度是指不同文本之间的语义关联度,文本匹配度的确定是文本挖掘和文本检索的核心工作之一,因此,如何更好的进行文本匹配度检测一直是本领域技术人员极为关注的问题。Text matching degree refers to the degree of semantic relevance between different texts. The determination of text matching degree is one of the core tasks of text mining and text retrieval. Therefore, how to better detect text matching degree has always been of great concern to those skilled in the art. The problem.
现有技术进行文本匹配度检测的主要方式为:将文本映射成单词空间中的一个向量,计算向量之间欧式距离或者余弦距离。发明人意识到现有的文本匹配度检测方式只是在单词空间进行文本相似度的确定,并没有考虑文本特征之间的关联和语义信息,因此匹配度检测不够准确。The main method for detecting text matching degree in the prior art is: mapping the text to a vector in the word space, and calculating the Euclidean distance or the cosine distance between the vectors. The inventor realized that the existing text matching degree detection method only determines the text similarity in the word space, and does not consider the association and semantic information between text features, so the matching degree detection is not accurate enough.
发明内容Summary of the invention
本申请的目的在于提供一种文本信息匹配度检测方法、装置、计算机设备和可读非易失性存储介质,使得文本信息匹配度检测更加准确。The purpose of this application is to provide a text information matching degree detection method, device, computer equipment and readable non-volatile storage medium, so that the text information matching degree detection is more accurate.
为解决上述技术问题,本申请提供一种文本信息匹配度检测方法,所述方法包括:获取对象文本信息及其对应的参考文本信息;根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;其中,所述第一隐含特征向量用于代表所述对象文本信息的特征信息;所述第二隐含特征向量用于代表所述参考文本信息的特征信息;计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。In order to solve the above technical problems, the present application provides a method for detecting matching degree of text information. The method includes: acquiring object text information and its corresponding reference text information; and converting the object text information into the first text information according to a preset self-encoding structure. A implicit feature vector, and converting the reference text information into a second implicit feature vector; wherein, the first implicit feature vector is used to represent feature information of the object text information; the second implicit feature vector The feature vector is used to represent the feature information of the reference text information; calculate the vector similarity between the first implicit feature vector and the second implicit feature vector; according to the object text information and the preset key The word acquisition logistic regression model, the vector similarity is input into the logistic regression model, and the matching degree of the target text information between the target text information and the reference text information is obtained.
为解决上述技术问题,本申请还提供一种文本信息匹配度检测装置,所述装置包括:文本信息获取模块,用于获取对象文本信息及其对应的参考文本信息;文本信息转换模块,用于根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;其中,所述第一隐含特征向量用于代表所述对象文本信息的特征信息;所述第二隐含特征向量用于代表所述参考文本信息的特征信 息;向量相似度获取模块,用于计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;匹配度检测模块,用于根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。In order to solve the above technical problems, this application also provides a text information matching degree detection device. The device includes: a text information acquisition module for acquiring object text information and its corresponding reference text information; a text information conversion module for The object text information is converted into a first implicit feature vector according to a preset self-encoding structure, and the reference text information is converted into a second implicit feature vector; wherein, the first implicit feature vector is used to represent The feature information of the object text information; the second implicit feature vector is used to represent feature information of the reference text information; the vector similarity acquisition module is used to calculate the first implicit feature vector and the first implicit feature vector 2. The vector similarity between implicit feature vectors; the matching detection module is used to obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain The degree of matching of the object text information between the object text information and the reference text information.
为解决上述技术问题,本申请还提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种文本信息匹配度检测方法,所述文本信息匹配度检测方法包括:获取对象文本信息及其对应的参考文本信息;根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;其中,所述第一隐含特征向量用于代表所述对象文本信息的特征信息;所述第二隐含特征向量用于代表所述参考文本信息的特征信息;计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。In order to solve the above technical problem, the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements a method for detecting the matching degree of text information when the computer program is executed. The method for detecting the matching degree of the text information includes: obtaining object text information and its corresponding reference text information; converting the object text information into a first implicit feature vector according to a preset self-encoding structure, and converting the reference text information Is a second implicit feature vector; wherein, the first implicit feature vector is used to represent feature information of the object text information; the second implicit feature vector is used to represent feature information of the reference text information; Calculate the vector similarity between the first implicit feature vector and the second implicit feature vector; obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the The logistic regression model is used to obtain the matching degree of the object text information between the object text information and the reference text information.
为解决上述技术问题,本申请还提供一种计算机可读非易失性存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种文本信息匹配度检测方法,所述文本信息匹配度检测方法包括:获取对象文本信息及其对应的参考文本信息;根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;其中,所述第一隐含特征向量用于代表所述对象文本信息的特征信息;所述第二隐含特征向量用于代表所述参考文本信息的特征信息;计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。In order to solve the above technical problems, the present application also provides a computer-readable non-volatile storage medium, on which a computer program is stored, and when the computer program is executed by a processor, a method for detecting matching degree of text information is implemented. The text information matching degree detection method includes: acquiring object text information and its corresponding reference text information; converting the object text information into a first implicit feature vector according to a preset self-encoding structure, and converting the reference text information into A second implicit feature vector; wherein, the first implicit feature vector is used to represent feature information of the object text information; the second implicit feature vector is used to represent feature information of the reference text information; calculation The vector similarity between the first implicit feature vector and the second implicit feature vector; obtaining a logistic regression model according to the object text information and preset keywords, and inputting the vector similarity into the Logistic regression model to obtain the matching degree of the object text information between the object text information and the reference text information.
本申请提供的一种文本信息匹配度检测方法、装置、计算机设备和非易失性存储介质,通过将对象文本信息和参考文本信息之间的隐含语义特征之间的向量相似度输入与对象文本信息对应的逻辑回归模型,可以有效提高文本信息匹配度检测的准确度。The present application provides a text information matching degree detection method, device, computer equipment, and non-volatile storage medium. The vector similarity between the implicit semantic features between the object text information and the reference text information is input to the object The logistic regression model corresponding to the text information can effectively improve the accuracy of the text information matching degree detection.
附图说明Description of the drawings
图1为一个实施例中文本信息匹配度检测方法的应用环境图;FIG. 1 is an application environment diagram of a method for detecting matching degree of text information in an embodiment;
图2为一个实施例中文本信息匹配度检测方法的流程示意图;2 is a schematic flowchart of a method for detecting matching degree of text information in an embodiment;
图3为另一个实施例中文本信息匹配度检测方法的流程示意图;3 is a schematic flowchart of a method for detecting matching degree of text information in another embodiment;
图4为一个实施例中文本信息匹配度检测装置的结构框图;Figure 4 is a structural block diagram of a text information matching degree detection device in an embodiment;
图5为一个实施例中计算机设备的内部结构图。Fig. 5 is an internal structure diagram of a computer device in an embodiment.
具体实施方式Detailed ways
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能解释为对本申请的限制。The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary, and are only used to explain the present application, and cannot be construed as a limitation to the present application.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。Those skilled in the art can understand that, unless specifically stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of this application refers to the presence of the described features, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof.
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as those commonly understood by those of ordinary skill in the art to which this application belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have a meaning consistent with the meaning in the context of the prior art, and unless specifically defined as here, they will not be idealized or overly Explain the formal meaning.
本申请提供的文本信息匹配度检测方法,可以应用于如图1所示的应用环境中,图中的服务器可以采用计算机设备来实现,该计算机设备包括通过装置总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的数据库用于存储文本信息匹配度检测涉及的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。具体的,服务器获取对象文本信息及其对应的参考文本信息;服务器将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;服务器计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;服务器根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。本技术领域技术人员可以理解,这里所使用的“服务器”可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The text information matching degree detection method provided in this application can be applied to the application environment shown in Figure 1. The server in the figure can be implemented by a computer device. The computer device includes a processor, a memory, and a network connected by a device bus. Interface and database. Among them, the processor of the computer device is used to provide calculation and control capabilities. The database of the computer device is used to store the data involved in the detection of the matching degree of text information. The network interface of the computer device is used to communicate with an external terminal through a network connection. Specifically, the server obtains the object text information and its corresponding reference text information; the server converts the object text information into a first implicit feature vector, and converts the reference text information into a second implicit feature vector; the server calculates The vector similarity between the first implicit feature vector and the second implicit feature vector; the server obtains a logistic regression model according to the object text information and preset keywords, and inputs the vector similarity into the office The logistic regression model is used to obtain the matching degree of the object text information between the object text information and the reference text information. Those skilled in the art can understand that the "server" used herein can be implemented by an independent server or a server cluster composed of multiple servers.
在一个实施例中,如图2所示,提供了一种文本信息匹配度检测方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:In one embodiment, as shown in FIG. 2, a method for detecting the matching degree of text information is provided. The method is applied to the server in FIG. 1 as an example for description, including the following steps:
步骤S201,获取对象文本信息及其对应的参考文本信息。Step S201: Obtain object text information and its corresponding reference text information.
本步骤中,对象文本信息可以是待检测匹配度的答案文本;参考文本信息可以是和答案文本对应的问题文本和标准文本。In this step, the object text information may be the answer text of the matching degree to be detected; the reference text information may be the question text and standard text corresponding to the answer text.
以文本评阅为例,用户针对问题作出的答案为对象文本信息,参考文本信息为问题和与问题对应的标准答案;检测对象文本信息和参考文本信息之间的匹配度,即判断答案与问题和标准答案之间的语义关联程度的过程。Taking text review as an example, the user’s answer to the question is the target text information, and the reference text information is the question and the standard answer corresponding to the question; the matching degree between the target text information and the reference text information is detected, that is, the answer and the question and The process of semantic relevance between standard answers.
在一个实施例中,所述步骤S201所述获取对象文本信息及其对应的参考文本信息的步骤之后,还包括:In an embodiment, after the step of obtaining the object text information and the corresponding reference text information in step S201, the method further includes:
A1,获取与所述对象文本信息相关联的训练特征向量。A1. Obtain a training feature vector associated with the object text information.
A2,根据所述训练特征向量,对预存的多个自编码结构进行训练,得到多个训练自编码结构;A2, training multiple pre-stored auto-encoding structures according to the training feature vector to obtain multiple training auto-encoding structures;
在本步骤中,可以通过自编码结构将文本信息转化成隐含特征向量;其中,自编码结构是一种神经网络,将输入自编码结构的特征进行编码,然后进行解码,使得输入与输出差异最小化。In this step, the text information can be transformed into implicit feature vectors through the self-encoding structure; among them, the self-encoding structure is a kind of neural network, which encodes the features of the input self-encoding structure, and then decodes, so that the input and output are different minimize.
A3,计算各所述训练自编码结构的信息损失量,选取信息损失量最小的训练自编码结构,作为预设自编码结构。A3: Calculate the information loss of each training self-encoding structure, and select the training self-encoding structure with the smallest amount of information loss as the preset self-encoding structure.
在具体实施过程中,自编码结构的训练过程是使输入和输出差异最小化的过程,将训练特征向量分别输入多个不同的自编码结构,不同的自编码结构的区别在于隐层数量和隐层单元数量的不同,分别调整多个自编码结构的参数,使各自编码结构输出和训练特征向量差异最小化,根据各训练自编码结构的输入和输出的差异值,从多个训练自编码结构中选取目标自编码结构。In the specific implementation process, the training process of the self-encoding structure is the process of minimizing the difference between input and output. The training feature vector is input into multiple different self-encoding structures. The difference between the different self-encoding structures lies in the number of hidden layers and the hidden layer. Depending on the number of layer units, adjust the parameters of multiple auto-encoding structures to minimize the difference between the output of each encoding structure and the training feature vector. According to the difference value of the input and output of each training auto-encoding structure, from multiple training auto-encoding structures Select the target self-encoding structure.
步骤S202,将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量。Step S202: Convert the object text information into a first implicit feature vector, and convert the reference text information into a second implicit feature vector.
在本步骤中,隐含特征向量是将输入自编码结构的特征进行编码得到的特征向量,保留了原始输入自编码结构的输入向量的大量信息,用于代表输入自编码结构的对象文本信息和参考文本信息的特征信息;自编码结构再将隐含特征向量进行解码还原,得到输出特征编码。In this step, the implicit feature vector is the feature vector obtained by encoding the features of the input self-encoding structure, which retains a large amount of information of the input vector of the original input self-encoding structure, and is used to represent the object text information of the input self-encoding structure and Refer to the feature information of the text information; the self-encoding structure decodes and restores the implicit feature vector to obtain the output feature code.
在一个实施例中,对于步骤S202将所述对象文本信息转换为第一隐含特征向量,可以包括:In an embodiment, for step S202, converting the object text information into a first implicit feature vector may include:
B1,将所述对象文本信息输入预设学习算法,得到对象输入向量。B1, input the object text information into a preset learning algorithm to obtain an object input vector.
B2,将所述对象输入向量输入预设自编码结构,提取所述预设自编码结构中与所述对象输入向量对应的所述第一隐含特征向量。B2. Input the object input vector into a preset self-encoding structure, and extract the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure.
在本实施例中,预设学习算法是用于将文本转化成对应的向量的算法,例如,通过Python中的sklearn库,将对象文本信息转化为词袋模型特征形式的对象输入向量;其中,Python是一种计算机程序设计语言;sklearn,也称为scikit-learn,是基于python的机器学习库,可以方便进行机器学习算法的实施,包括:分类、回归、聚类、降维、模型选择和预处理等数据挖掘的相关算法。In this embodiment, the preset learning algorithm is an algorithm for converting text into a corresponding vector. For example, the object text information is converted into an object input vector in the form of a bag of words model feature through the sklearn library in Python; where, Python is a computer programming language; sklearn, also known as scikit-learn, is a python-based machine learning library that can facilitate the implementation of machine learning algorithms, including: classification, regression, clustering, dimensionality reduction, model selection and Data mining related algorithms such as preprocessing.
例如,现有文本一:“我喜欢吃苹果,苹果营养丰富”,以及文本二:“我喜欢吃梨”,则首先通过Python中的jieba库进行分词以将话语中的词语分隔开来,随后通过sklearn库建立词袋模型特征(特征将包括“我”、“喜欢”、“吃”、“苹果”、“营养”、“丰富”、“梨”),并根据词语出现频次确定各个样本的特征数值,可以得到,文本一的特征向量(1,1,1,2,1,1,0),文本二的特征向量为(1,1,1,0,0,0,1))。其中,jieba库是一种Python中文分词库。For example, the existing text 1: "I like to eat apples, apples are rich in nutrition", and the text 2: "I like to eat pears", first use the jieba library in Python to segment words to separate words in the discourse. Then use the sklearn library to establish the features of the bag of words model (features will include "I", "like", "eat", "apple", "nutrition", "rich", and "pear"), and determine each sample according to the frequency of word occurrence The feature value of can be obtained, the feature vector of text one (1,1,1,2,1,1,0), the feature vector of text two is (1,1,1,0,0,0,1)) . Among them, the jieba library is a Python Chinese word segmentation library.
进一步的,参考文本信息包括与所述对象文本信息对应的问题文本信 息和标准文本信息;所述第二隐含特征向量包括问题隐含特征向量和标准隐含特征向量;对于步骤S202中将所述参考文本信息转换为第二隐含特征向量,包括:Further, the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes a question implicit feature vector and a standard implicit feature vector; for step S202, all The conversion of the reference text information into the second implicit feature vector includes:
B3,将所述问题文本信息输入预设学习算法,得到问题输入向量;将所述问题输入向量输入预设自编码结构,提取所述预设自编码结构中与所述问题输入向量对应的所述问题隐含特征向量;B3. Input the question text information into a preset learning algorithm to obtain a question input vector; input the question input vector into a preset self-encoding structure, and extract all of the preset self-encoding structure corresponding to the question input vector The implicit feature vector of the problem;
B4,将所述标准文本信息输入预设学习算法,得到标准输入向量;将所述标准输入向量输入所述预设自编码结构,提取所述预设自编码结构中与所述标准输入向量对应的所述标准隐含特征向量。B4. Input the standard text information into a preset learning algorithm to obtain a standard input vector; input the standard input vector into the preset self-encoding structure, and extract the preset self-encoding structure corresponding to the standard input vector The standard implies feature vectors.
在本实施例中,通过预设学习算法将对象文本信息、参考文本信息分别转化为对象输入向量和参考输入向量;然后分别将对象输入向量和参考输入向量输入到预设自编码结构,提取自编码结构中和对象输入向量对应的第一隐含特征向量,以及与参考输入向量对应的第二隐含特征向量,可以有效提取对象文本信息和参考文本信息之间的隐含语义特征。In this embodiment, the object text information and the reference text information are respectively converted into object input vectors and reference input vectors through a preset learning algorithm; then the object input vectors and reference input vectors are respectively input into the preset self-encoding structure and extracted from The first implicit feature vector corresponding to the object input vector and the second implicit feature vector corresponding to the reference input vector in the coding structure can effectively extract the implicit semantic features between the object text information and the reference text information.
步骤S203,计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度。Step S203: Calculate the vector similarity between the first implicit feature vector and the second implicit feature vector.
在本步骤中,向量关于相似度的计算,通常是计算两个向量之间的距离,距离越近,相似度越大,可以采用余弦相似度计算方法,计算第一隐含特征向量与所述第二隐含特征向量之间的向量相似度。In this step, the calculation of vector similarity is usually to calculate the distance between two vectors. The closer the distance, the greater the similarity. The cosine similarity calculation method can be used to calculate the first implicit feature vector and the said The second implied vector similarity between feature vectors.
在一个实施例中,所述向量相似度包括问题相似度和标准相似度;步骤S203所述计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度,包括:In one embodiment, the vector similarity includes question similarity and standard similarity; in step S203, calculating the vector similarity between the first implicit feature vector and the second implicit feature vector includes :
C1,计算所述第一隐含特征向量和所述问题隐含特征向量之间的夹角余弦值,得到所述问题相似度;C1, calculating the cosine value of the angle between the first implicit feature vector and the implicit feature vector of the question to obtain the similarity of the question;
C2,计算所述第一隐含特征向量和所述标准隐含特征向量之间的夹角余弦值,得到所述标准相似度。C2: Calculate the cosine of the angle between the first implicit feature vector and the standard implicit feature vector to obtain the standard similarity.
其中,余弦相似度计算方法又称为余弦相似性,是通过计算两个向量的夹角余弦值来评估他们的相似度;0度角的余弦值是1,而其他任何角度的余弦值都不大于1,并且其最小值是-1,从而两个向量之间的角度的余弦值确定两个向量是否大致指向相同的方向。两个向量有相同的指向时,余弦相似度的值为1;两个向量夹角为90°时,余弦相似度的值为0;两个向量指向完全相反的方向时,余弦相似度的值为-1;余弦相似度通常用于正空间,因此给出的值为0到1之间。Among them, the cosine similarity calculation method is also called cosine similarity, which evaluates their similarity by calculating the cosine value of the angle between two vectors; the cosine value of an angle of 0 degrees is 1, while the cosine value of any other angle is not It is greater than 1, and its minimum value is -1, so the cosine of the angle between the two vectors determines whether the two vectors are roughly pointing in the same direction. When two vectors have the same direction, the cosine similarity value is 1; when the angle between the two vectors is 90°, the cosine similarity value is 0; when the two vectors point in completely opposite directions, the cosine similarity value is Is -1; cosine similarity is usually used in positive space, so the value given is between 0 and 1.
步骤S204,根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。Step S204: Obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain object text information between the object text information and the reference text information The matching degree.
在本步骤中,通过对象文本信息和预设关键词计算得到逻辑回归模型的参数,再将向量相似度输入逻辑回归模型,输出一个匹配度数值。In this step, the parameters of the logistic regression model are calculated through the object text information and preset keywords, and then the vector similarity is input into the logistic regression model, and a matching value value is output.
以文本评分为例,根据用户作答的答案文本和预设关键词计算得到一 系列参数,根据得到的参数建立对应的逻辑回归模型,再将答案文本与参考文本之间的相似度输入到逻辑回归模型,就可以得到一个匹配分值。Taking text scoring as an example, a series of parameters are calculated according to the answer text of the user's answer and preset keywords, and the corresponding logistic regression model is established based on the obtained parameters, and then the similarity between the answer text and the reference text is input into the logistic regression Model, you can get a matching score.
下面,将结合图3和具体实施例来阐述本申请中逻辑回归模型的获取过程。在一个实施例中,步骤S204所述根据所述对象文本信息以及预设的关键词获取逻辑回归模型,包括:In the following, the acquisition process of the logistic regression model in this application will be described in conjunction with FIG. 3 and specific embodiments. In one embodiment, obtaining a logistic regression model according to the target text information and preset keywords in step S204 includes:
S410,获取预设关键词和所述对象文本信息之间的关键词相似度;S410: Acquire keyword similarity between a preset keyword and the object text information;
S420,将所述关键词相似度和所述向量相似度设为预设的初始回归模型的参数,得到与所述对象文本信息对应的所述逻辑回归模型。S420: Set the keyword similarity and the vector similarity as parameters of a preset initial regression model to obtain the logistic regression model corresponding to the object text information.
在一个实施例中,步骤S410获取预设关键词和所述对象文本信息之间的关键词相似度,包括:In an embodiment, step S410 acquiring the keyword similarity between the preset keyword and the object text information includes:
D1,计算预设关键词库中各关键词的信息价值,选取信息价值大于预设阈值的关键词设为所述预设关键词;D1: Calculate the information value of each keyword in the preset keyword library, and select keywords whose information value is greater than a preset threshold as the preset keywords;
D2,将所述对象文本信息拆分得到多个对象词语,计算所述预设关键词和所述对象词语的相似度;D2: Split the object text information to obtain multiple object words, and calculate the similarity between the preset keywords and the object words;
D3,选取所述相似度中的最大值设为所述关键词相似度。D3. Select the maximum value of the similarity as the keyword similarity.
在选取关键词的过程中,信息价值越大的关键词,说明该关键词越能评判对象文本信息的语义的关联度,例如,计算预设词库中信息价值最高的十个关键词,将这十个关键词分别和多个对象词语计算相似度,然后选取对象文本中和关键词相似度最高的那个对象词语,就可以得到最终的十个相似度值,将十个相似度值和向量相似度一起作为逻辑回归模型的参数。In the process of selecting keywords, the keyword with the greater information value indicates that the keyword can judge the semantic relevance of the target text information. For example, calculate the ten keywords with the highest information value in the preset thesaurus. These ten keywords are calculated for similarity with multiple target words, and then the target word with the highest similarity in the target text is selected to obtain the final ten similarity values. The ten similarity values and the vector The similarity is used as a parameter of the logistic regression model together.
上述文本信息匹配度检测方法,通过获取对象文本信息及其对应的参考文本信息;将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度,可以有效提取对象文本信息和参考文本信息之间的隐含语义特征并进行匹配;根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度,通过将对象文本信息和参考文本信息之间的隐含语义特征之间的向量相似度输入与对象文本信息对应的逻辑回归模型,可以有效提高文本信息匹配度检测的准确度。The above method for detecting matching degree of text information is to obtain object text information and its corresponding reference text information; convert the object text information into a first implicit feature vector, and convert the reference text information into a second implicit feature Vector; calculating the vector similarity between the first implicit feature vector and the second implicit feature vector, which can effectively extract and match the implicit semantic features between the target text information and the reference text information; The object text information and preset keywords are used to obtain a logistic regression model, and the vector similarity is input into the logistic regression model to obtain the degree of matching of the object text information between the object text information and the reference text information. Inputting the vector similarity between the implicit semantic features between the object text information and the reference text information into the logistic regression model corresponding to the object text information can effectively improve the accuracy of the text information matching degree detection.
应该理解的是,虽然图2-3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-3中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flowchart of FIGS. 2-3 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 2-3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在其中一个实施例中,如图4所示,提供了一种文本信息匹配度检测装置,装置包括:In one of the embodiments, as shown in FIG. 4, a text information matching degree detection device is provided, and the device includes:
文本信息获取模块401,用于获取对象文本信息及其对应的参考文本信息;The text information obtaining module 401 is used to obtain object text information and its corresponding reference text information;
文本信息转换模块402,用于将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;The text information conversion module 402 is configured to convert the object text information into a first implicit feature vector, and convert the reference text information into a second implicit feature vector;
向量相似度获取模块403,用于计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;The vector similarity acquisition module 403 is configured to calculate the vector similarity between the first implicit feature vector and the second implicit feature vector;
匹配度检测模块404,用于根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。The matching degree detection module 404 is configured to obtain a logistic regression model according to the target text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the target text information and the reference text information The degree of match between the object text information.
关于文本信息匹配度检测装置的具体限定可以参见上文中对于文本信息匹配度检测方法的限定,在此不再赘述。上述文本信息匹配度检测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the text information matching degree detection device, please refer to the above definition of the text information matching degree detection method, which will not be repeated here. Each module in the above-mentioned text information matching degree detection device can be implemented in whole or in part by software, hardware and a combination thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
在一个实施例中,提供了一种服务器,该服务器可以采用计算机设备来实现,其内部结构图可以如图5所示。该计算机设备包括通过装置总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性非易失性存储介质、内存储器。该非易失性非易失性存储介质存储有操作装置、计算机程序和数据库。该内存储器为非易失性非易失性存储介质中的操作装置和计算机程序的运行提供环境。该计算机设备的数据库用于存储文本信息匹配度检测涉及的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种文本信息匹配度检测方法。In one embodiment, a server is provided. The server may be implemented by computer equipment, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a network interface and a database connected by a device bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile non-volatile storage medium and an internal memory. The non-volatile non-volatile storage medium stores an operating device, a computer program, and a database. The internal memory provides an environment for the operation of the operating device and the computer program in the non-volatile non-volatile storage medium. The database of the computer device is used to store the data involved in the detection of the matching degree of text information. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a text information matching degree detection method.
本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:获取对象文本信息及其对应的参考文本信息;将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。In one embodiment, a computer device is provided, including a memory and a processor, and a computer program is stored in the memory. When the processor executes the computer program, the following steps are implemented: acquiring object text information and its corresponding reference text information; Converting the object text information into a first implicit feature vector, and converting the reference text information into a second implicit feature vector; calculating the difference between the first implicit feature vector and the second implicit feature vector The vector similarity of the; the logistic regression model is obtained according to the object text information and preset keywords, and the vector similarity is input into the logistic regression model to obtain the object between the object text information and the reference text information The matching degree of the text information.
在其中一个实施例中,处理器执行计算机程序时所述获取目标自编码结构,包括:将所述对象文本信息输入预设学习算法,得到对象输入向量;将所述对象输入向量输入预设自编码结构,提取所述预设自编码结构中与所述对象输入向量对应的所述第一隐含特征向量。In one of the embodiments, acquiring the target self-encoding structure when the processor executes the computer program includes: inputting the object text information into a preset learning algorithm to obtain an object input vector; inputting the object input vector into the preset self-encoding structure An encoding structure, extracting the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure.
在其中一个实施例中,处理器执行计算机程序时所述参考文本信息包括与所述对象文本信息对应的问题文本信息和标准文本信息;所述第二隐含特征向量包括问题隐含特征向量和标准隐含特征向量;所述将所述参考文本信息转换为第二隐含特征向量,包括:将所述问题文本信息输入预设学习算法,得到问题输入向量;将所述问题输入向量输入预设自编码结构,提取所述预设自编码结构中与所述问题输入向量对应的所述问题隐含特征向量;将所述标准文本信息输入预设学习算法,得到标准输入向量;将所述标准输入向量输入所述预设自编码结构,提取所述预设自编码结构中与所述标准输入向量对应的所述标准隐含特征向量。In one of the embodiments, when the processor executes the computer program, the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes question implicit feature vector and Standard implicit feature vector; said converting the reference text information into a second implicit feature vector includes: inputting the question text information into a preset learning algorithm to obtain a question input vector; inputting the question input vector into a pre- Set a self-encoding structure, extract the hidden feature vector of the question corresponding to the question input vector in the preset self-encoding structure; input the standard text information into a preset learning algorithm to obtain a standard input vector; A standard input vector is input to the preset self-encoding structure, and the standard implicit feature vector corresponding to the standard input vector in the preset self-encoding structure is extracted.
在其中一个实施例中,处理器执行计算机程序时所述获取对象文本信息及其对应的参考文本信息的步骤之后,还包括:获取与所述对象文本信息相关联的训练特征向量;根据所述训练特征向量,对预存的多个自编码结构进行训练,得到多个训练自编码结构;计算各所述训练自编码结构的信息损失量,选取信息损失量最小的训练自编码结构,作为预设自编码结构。In one of the embodiments, after the step of obtaining the object text information and the corresponding reference text information when the processor executes the computer program, the method further includes: obtaining a training feature vector associated with the object text information; Training feature vectors, train multiple pre-stored auto-encoding structures to obtain multiple training auto-encoding structures; calculate the information loss of each training auto-encoding structure, and select the training auto-encoding structure with the smallest amount of information loss as the preset Self-encoding structure.
在其中一个实施例中,处理器执行计算机程序时所述向量相似度包括问题相似度和标准相似度;所述计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度,包括:计算所述第一隐含特征向量和所述问题隐含特征向量之间的夹角余弦值,得到所述问题相似度;计算所述第一隐含特征向量和所述标准隐含特征向量之间的夹角余弦值,得到所述标准相似度。In one of the embodiments, when the processor executes the computer program, the vector similarity includes question similarity and standard similarity; the calculation of the difference between the first implicit feature vector and the second implicit feature vector The vector similarity includes: calculating the cosine of the angle between the first implicit feature vector and the problem implicit feature vector to obtain the problem similarity; calculating the first implicit feature vector and the The standard implies the cosine of the angle between the feature vectors to obtain the standard similarity.
在其中一个实施例中,处理器执行计算机程序时所述根据所述对象文本信息以及预设的关键词获取逻辑回归模型,包括:获取预设关键词和所述对象文本信息之间的关键词相似度;将所述关键词相似度和所述向量相似度设为预设的初始回归模型的参数,得到与所述对象文本信息对应的所述逻辑回归模型。In one of the embodiments, when the processor executes the computer program, acquiring a logistic regression model based on the target text information and preset keywords includes: acquiring a keyword between the preset keyword and the target text information Similarity; the keyword similarity and the vector similarity are set as parameters of a preset initial regression model to obtain the logistic regression model corresponding to the object text information.
在其中一个实施例中,处理器执行计算机程序时所述获取预设关键词和所述对象文本信息之间的关键词相似度,包括:计算预设关键词库中各关键词的信息价值,选取信息价值大于预设阈值的关键词设为所述预设关键词;将所述对象文本信息拆分得到多个对象词语,计算所述预设关键词和所述对象词语的相似度;选取所述相似度中的最大值设为所述关键词相似度。In one of the embodiments, the acquiring the keyword similarity between the preset keyword and the object text information when the processor executes the computer program includes: calculating the information value of each keyword in the preset keyword library, Select keywords whose information value is greater than a preset threshold value as the preset keywords; split the object text information to obtain multiple target words, and calculate the similarity between the preset keywords and the target words; select The maximum value in the similarity is set as the keyword similarity.
在一个实施例中,提供了一种计算机可读非易失性存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取对象文本信息及其对应的参考文本信息;将所述对象文本信息转换为第一隐含特 征向量,以及将所述参考文本信息转换为第二隐含特征向量;计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。In one embodiment, a computer-readable non-volatile storage medium is provided, and a computer program is stored thereon. When the computer program is executed by a processor, the following steps are implemented: acquiring object text information and its corresponding reference text information; Convert the object text information into a first implicit feature vector, and convert the reference text information into a second implicit feature vector; calculate the difference between the first implicit feature vector and the second implicit feature vector The vector similarity between the two; obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the difference between the object text information and the reference text information The matching degree of the object text information.
在其中一个实施例中,计算机程序被处理器执行时所述获取目标自编码结构,包括:将所述对象文本信息输入预设学习算法,得到对象输入向量;将所述对象输入向量输入预设自编码结构,提取所述预设自编码结构中与所述对象输入向量对应的所述第一隐含特征向量。In one of the embodiments, the obtaining the target self-encoding structure when the computer program is executed by the processor includes: inputting the object text information into a preset learning algorithm to obtain an object input vector; inputting the object input vector into a preset The self-encoding structure extracts the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure.
在其中一个实施例中,计算机程序被处理器执行时所述参考文本信息包括与所述对象文本信息对应的问题文本信息和标准文本信息;所述第二隐含特征向量包括问题隐含特征向量和标准隐含特征向量;所述将所述参考文本信息转换为第二隐含特征向量,包括:将所述问题文本信息输入预设学习算法,得到问题输入向量;将所述问题输入向量输入预设自编码结构,提取所述预设自编码结构中与所述问题输入向量对应的所述问题隐含特征向量;将所述标准文本信息输入预设学习算法,得到标准输入向量;将所述标准输入向量输入所述预设自编码结构,提取所述预设自编码结构中与所述标准输入向量对应的所述标准隐含特征向量。In one of the embodiments, when the computer program is executed by the processor, the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes a question implicit feature vector And the standard implicit feature vector; said converting the reference text information into a second implicit feature vector includes: inputting the question text information into a preset learning algorithm to obtain a question input vector; inputting the question input vector A preset self-encoding structure is used to extract the hidden feature vector of the question corresponding to the question input vector in the preset self-encoding structure; the standard text information is input into a preset learning algorithm to obtain a standard input vector; The standard input vector is input to the preset self-encoding structure, and the standard implicit feature vector corresponding to the standard input vector in the preset self-encoding structure is extracted.
在其中一个实施例中,计算机程序被处理器执行时所述获取对象文本信息及其对应的参考文本信息的步骤之后,还包括:获取与所述对象文本信息相关联的训练特征向量;根据所述训练特征向量,对预存的多个自编码结构进行训练,得到多个训练自编码结构;计算各所述训练自编码结构的信息损失量,选取信息损失量最小的训练自编码结构,作为预设自编码结构。In one of the embodiments, after the step of obtaining the object text information and the corresponding reference text information when the computer program is executed by the processor, the method further includes: obtaining a training feature vector associated with the object text information; The training feature vector is used to train multiple pre-stored self-encoding structures to obtain multiple training self-encoding structures; calculate the information loss of each training self-encoding structure, and select the training self-encoding structure with the smallest amount of information loss as the prediction Set up a self-encoding structure.
在其中一个实施例中,计算机程序被处理器执行时所述向量相似度包括问题相似度和标准相似度;所述计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度,包括:计算所述第一隐含特征向量和所述问题隐含特征向量之间的夹角余弦值,得到所述问题相似度;计算所述第一隐含特征向量和所述标准隐含特征向量之间的夹角余弦值,得到所述标准相似度。In one of the embodiments, when the computer program is executed by the processor, the vector similarity includes question similarity and standard similarity; the calculation of the difference between the first implicit feature vector and the second implicit feature vector The vector similarity of includes: calculating the cosine of the angle between the first implicit feature vector and the problem implicit feature vector to obtain the problem similarity; calculating the first implicit feature vector and the The cosine of the angle between the implicit feature vectors of the standard is used to obtain the similarity of the standard.
在其中一个实施例中,计算机程序被处理器执行时所述根据所述对象文本信息以及预设的关键词获取逻辑回归模型,包括:获取预设关键词和所述对象文本信息之间的关键词相似度;将所述关键词相似度和所述向量相似度设为预设的初始回归模型的参数,得到与所述对象文本信息对应的所述逻辑回归模型。In one of the embodiments, when the computer program is executed by the processor, obtaining a logistic regression model based on the target text information and preset keywords includes: obtaining the key between the preset keywords and the target text information Word similarity; setting the keyword similarity and the vector similarity as the parameters of a preset initial regression model to obtain the logistic regression model corresponding to the target text information.
在其中一个实施例中,计算机程序被处理器执行时所述获取预设关键词和所述对象文本信息之间的关键词相似度,包括:计算预设关键词库中各关键词的信息价值,选取信息价值大于预设阈值的关键词设为所述预设关键词;将所述对象文本信息拆分得到多个对象词语,计算所述预设关键 词和所述对象词语的相似度;选取所述相似度中的最大值设为所述关键词相似度。In one of the embodiments, when the computer program is executed by the processor, acquiring the keyword similarity between the preset keyword and the object text information includes: calculating the information value of each keyword in the preset keyword library Select a keyword with an information value greater than a preset threshold as the preset keyword; split the target text information to obtain multiple target words, and calculate the similarity between the preset keyword and the target word; The maximum value of the similarity is selected as the keyword similarity.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,的计算机程序可存储于一非易失性计算机可读取非易失性存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing relevant hardware. The computer program can be stored in a non-volatile computer and can be read by a non-volatile computer. In a sexual storage medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, they should It is considered as the range described in this specification.
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above examples only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种文本信息匹配度检测方法,所述方法包括:A method for detecting matching degree of text information, the method comprising:
    获取对象文本信息及其对应的参考文本信息;Obtain object text information and its corresponding reference text information;
    根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;其中,所述第一隐含特征向量用于代表所述对象文本信息的特征信息;所述第二隐含特征向量用于代表所述参考文本信息的特征信息;The object text information is converted into a first implicit feature vector according to a preset self-encoding structure, and the reference text information is converted into a second implicit feature vector; wherein, the first implicit feature vector is used to represent Feature information of the object text information; the second implicit feature vector is used to represent the feature information of the reference text information;
    计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;Calculating the vector similarity between the first implicit feature vector and the second implicit feature vector;
    根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。Obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the degree of matching of the object text information between the object text information and the reference text information .
  2. 根据权利要求1所述的方法,所述根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,包括:The method according to claim 1, wherein the converting the object text information into a first implicit feature vector according to a preset self-encoding structure comprises:
    将所述对象文本信息输入预设学习算法,得到对象输入向量;Input the object text information into a preset learning algorithm to obtain an object input vector;
    将所述对象输入向量输入所述预设自编码结构,提取所述预设自编码结构中与所述对象输入向量对应的所述第一隐含特征向量。The object input vector is input into the preset self-encoding structure, and the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure is extracted.
  3. 根据权利要求1所述的方法,所述参考文本信息包括与所述对象文本信息对应的问题文本信息和标准文本信息;所述第二隐含特征向量包括问题隐含特征向量和标准隐含特征向量;所述将所述参考文本信息转换为第二隐含特征向量,包括:The method according to claim 1, wherein the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes a question implicit feature vector and a standard implicit feature Vector; said converting the reference text information into a second implicit feature vector includes:
    将所述问题文本信息输入预设学习算法,得到问题输入向量;Input the question text information into a preset learning algorithm to obtain a question input vector;
    将所述问题输入向量输入预设自编码结构,提取所述预设自编码结构中与所述问题输入向量对应的所述问题隐含特征向量;Inputting the question input vector into a preset self-encoding structure, and extracting the question implicit feature vector corresponding to the question input vector in the preset self-encoding structure;
    将所述标准文本信息输入预设学习算法,得到标准输入向量;Input the standard text information into a preset learning algorithm to obtain a standard input vector;
    将所述标准输入向量输入所述预设自编码结构,提取所述预设自编码结构中与所述标准输入向量对应的所述标准隐含特征向量。The standard input vector is input into the preset self-encoding structure, and the standard implicit feature vector corresponding to the standard input vector in the preset self-encoding structure is extracted.
  4. 根据权利要求1所述的方法,所述获取对象文本信息及其对应的参考文本信息的步骤之后,还包括:The method according to claim 1, after the step of obtaining object text information and its corresponding reference text information, further comprising:
    获取与所述对象文本信息相关联的训练特征向量;Acquiring a training feature vector associated with the object text information;
    根据所述训练特征向量,对预存的多个自编码结构进行训练,得到多个训练自编码结构;Training multiple pre-stored auto-encoding structures according to the training feature vector to obtain multiple training auto-encoding structures;
    计算各所述训练自编码结构的信息损失量,选取信息损失量最小的训练自编码结构,作为预设自编码结构。The information loss amount of each training self-encoding structure is calculated, and the training self-encoding structure with the smallest amount of information loss is selected as the preset self-encoding structure.
  5. 根据权利要求3所述的方法,所述向量相似度包括问题相似度和标准相似度;所述计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度,包括:The method according to claim 3, wherein the vector similarity includes question similarity and standard similarity; said calculating the vector similarity between the first implicit feature vector and the second implicit feature vector, include:
    计算所述第一隐含特征向量和所述问题隐含特征向量之间的夹角余 弦值,得到所述问题相似度;Calculating the cosine value of the angle between the first implicit feature vector and the implicit feature vector of the question to obtain the similarity of the question;
    计算所述第一隐含特征向量和所述标准隐含特征向量之间的夹角余弦值,得到所述标准相似度。Calculate the cosine of the angle between the first implicit feature vector and the standard implicit feature vector to obtain the standard similarity.
  6. 根据权利要求1所述的方法,所述根据所述对象文本信息以及预设的关键词获取逻辑回归模型,包括:The method according to claim 1, wherein said obtaining a logistic regression model according to said target text information and preset keywords comprises:
    获取预设关键词和所述对象文本信息之间的关键词相似度;Acquiring the keyword similarity between the preset keyword and the object text information;
    将所述关键词相似度和所述向量相似度设为预设的初始回归模型的参数,得到与所述对象文本信息对应的所述逻辑回归模型。The keyword similarity and the vector similarity are set as parameters of a preset initial regression model to obtain the logistic regression model corresponding to the object text information.
  7. 根据权利要求6所述的方法,所述获取预设关键词和所述对象文本信息之间的关键词相似度,包括:The method according to claim 6, wherein said obtaining the keyword similarity between the preset keyword and the object text information comprises:
    计算预设关键词库中各关键词的信息价值,选取信息价值大于预设阈值的关键词设为所述预设关键词;Calculate the information value of each keyword in the preset keyword library, and select keywords with an information value greater than a preset threshold as the preset keyword;
    将所述对象文本信息拆分得到多个对象词语,计算所述预设关键词和所述对象词语的相似度;Splitting the object text information to obtain multiple object words, and calculating the similarity between the preset keywords and the object words;
    选取所述相似度中的最大值设为所述关键词相似度。The maximum value of the similarity is selected as the keyword similarity.
  8. 一种文本信息匹配度检测装置,所述装置包括:A text information matching degree detection device, the device comprising:
    文本信息获取模块,用于获取对象文本信息及其对应的参考文本信息;The text information acquisition module is used to acquire object text information and its corresponding reference text information;
    文本信息转换模块,用于根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;其中,所述第一隐含特征向量用于代表所述对象文本信息的特征信息;所述第二隐含特征向量用于代表所述参考文本信息的特征信息;The text information conversion module is used to convert the object text information into a first implicit feature vector according to a preset self-encoding structure, and convert the reference text information into a second implicit feature vector; wherein, the first The hidden feature vector is used to represent the feature information of the object text information; the second hidden feature vector is used to represent the feature information of the reference text information;
    向量相似度获取模块,用于计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;A vector similarity acquisition module, configured to calculate the vector similarity between the first implicit feature vector and the second implicit feature vector;
    匹配度检测模块,用于根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。The matching degree detection module is used to obtain a logistic regression model according to the target text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the difference between the target text information and the reference text information The degree of match between the object text information.
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种文本信息匹配度检测方法,所述文本信息匹配度检测方法包括以下步骤:A computer device includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a text information matching degree detection method is implemented. The text information matching degree detection method includes the following steps :
    获取对象文本信息及其对应的参考文本信息;Obtain object text information and its corresponding reference text information;
    根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;其中,所述第一隐含特征向量用于代表所述对象文本信息的特征信息;所述第二隐含特征向量用于代表所述参考文本信息的特征信息;The object text information is converted into a first implicit feature vector according to a preset self-encoding structure, and the reference text information is converted into a second implicit feature vector; wherein, the first implicit feature vector is used to represent Feature information of the object text information; the second implicit feature vector is used to represent the feature information of the reference text information;
    计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;Calculating the vector similarity between the first implicit feature vector and the second implicit feature vector;
    根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文 本信息之间对象文本信息的匹配度。Obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the degree of matching of the object text information between the object text information and the reference text information .
  10. 根据权利要求9所述的计算机设备,所述根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,包括:The computer device according to claim 9, wherein the converting the object text information into a first implicit feature vector according to a preset self-encoding structure comprises:
    将所述对象文本信息输入预设学习算法,得到对象输入向量;Input the object text information into a preset learning algorithm to obtain an object input vector;
    将所述对象输入向量输入所述预设自编码结构,提取所述预设自编码结构中与所述对象输入向量对应的所述第一隐含特征向量。The object input vector is input into the preset self-encoding structure, and the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure is extracted.
  11. 根据权利要求10所述的计算机设备,所述参考文本信息包括与所述对象文本信息对应的问题文本信息和标准文本信息;所述第二隐含特征向量包括问题隐含特征向量和标准隐含特征向量;所述将所述参考文本信息转换为第二隐含特征向量,包括:The computer device according to claim 10, wherein the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes a question implicit feature vector and a standard implicit feature vector Feature vector; said converting the reference text information into a second implicit feature vector includes:
    将所述问题文本信息输入预设学习算法,得到问题输入向量;Input the question text information into a preset learning algorithm to obtain a question input vector;
    将所述问题输入向量输入预设自编码结构,提取所述预设自编码结构中与所述问题输入向量对应的所述问题隐含特征向量;Inputting the question input vector into a preset self-encoding structure, and extracting the question implicit feature vector corresponding to the question input vector in the preset self-encoding structure;
    将所述标准文本信息输入预设学习算法,得到标准输入向量;Input the standard text information into a preset learning algorithm to obtain a standard input vector;
    将所述标准输入向量输入所述预设自编码结构,提取所述预设自编码结构中与所述标准输入向量对应的所述标准隐含特征向量。The standard input vector is input into the preset self-encoding structure, and the standard implicit feature vector corresponding to the standard input vector in the preset self-encoding structure is extracted.
  12. 根据权利要求9所述的计算机设备,所述获取对象文本信息及其对应的参考文本信息的步骤之后,还包括:The computer device according to claim 9, after the step of obtaining the object text information and the corresponding reference text information, further comprising:
    获取与所述对象文本信息相关联的训练特征向量;Acquiring a training feature vector associated with the object text information;
    根据所述训练特征向量,对预存的多个自编码结构进行训练,得到多个训练自编码结构;Training multiple pre-stored auto-encoding structures according to the training feature vector to obtain multiple training auto-encoding structures;
    计算各所述训练自编码结构的信息损失量,选取信息损失量最小的训练自编码结构,作为预设自编码结构。The information loss amount of each training self-encoding structure is calculated, and the training self-encoding structure with the smallest amount of information loss is selected as the preset self-encoding structure.
  13. 根据权利要求11所述的计算机设备,所述向量相似度包括问题相似度和标准相似度;所述计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度,包括:11. The computer device according to claim 11, wherein the vector similarity includes question similarity and standard similarity; and said calculating the vector similarity between the first implicit feature vector and the second implicit feature vector ,include:
    计算所述第一隐含特征向量和所述问题隐含特征向量之间的夹角余弦值,得到所述问题相似度;Calculating the cosine value of the angle between the first implicit feature vector and the implicit feature vector of the question to obtain the similarity of the question;
    计算所述第一隐含特征向量和所述标准隐含特征向量之间的夹角余弦值,得到所述标准相似度。Calculate the cosine of the angle between the first implicit feature vector and the standard implicit feature vector to obtain the standard similarity.
  14. 根据权利要求9所述的计算机设备,所述根据所述对象文本信息以及预设的关键词获取逻辑回归模型,包括:9. The computer device according to claim 9, wherein said obtaining a logistic regression model according to said target text information and preset keywords comprises:
    获取预设关键词和所述对象文本信息之间的关键词相似度;Acquiring the keyword similarity between the preset keyword and the object text information;
    将所述关键词相似度和所述向量相似度设为预设的初始回归模型的参数,得到与所述对象文本信息对应的所述逻辑回归模型。The keyword similarity and the vector similarity are set as parameters of a preset initial regression model to obtain the logistic regression model corresponding to the object text information.
  15. 根据权利要求14所述的计算机设备,所述获取预设关键词和所述对象文本信息之间的关键词相似度,包括:The computer device according to claim 14, wherein said obtaining the keyword similarity between the preset keyword and the object text information comprises:
    计算预设关键词库中各关键词的信息价值,选取信息价值大于预设阈值的关键词设为所述预设关键词;Calculate the information value of each keyword in the preset keyword library, and select keywords with an information value greater than a preset threshold as the preset keyword;
    将所述对象文本信息拆分得到多个对象词语,计算所述预设关键词和所述对象词语的相似度;Splitting the object text information to obtain multiple object words, and calculating the similarity between the preset keywords and the object words;
    选取所述相似度中的最大值设为所述关键词相似度。The maximum value of the similarity is selected as the keyword similarity.
  16. 一种计算机可读非易失性存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种文本信息匹配度检测方法,所述文本信息匹配度检测方法包括以下步骤:A computer-readable non-volatile storage medium has a computer program stored thereon, and when the computer program is executed by a processor, a method for detecting matching degree of text information is realized. The method for detecting matching degree of text information includes the following steps:
    获取对象文本信息及其对应的参考文本信息;Obtain object text information and its corresponding reference text information;
    根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,以及将所述参考文本信息转换为第二隐含特征向量;其中,所述第一隐含特征向量用于代表所述对象文本信息的特征信息;所述第二隐含特征向量用于代表所述参考文本信息的特征信息;The object text information is converted into a first implicit feature vector according to a preset self-encoding structure, and the reference text information is converted into a second implicit feature vector; wherein, the first implicit feature vector is used to represent Feature information of the object text information; the second implicit feature vector is used to represent the feature information of the reference text information;
    计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度;Calculating the vector similarity between the first implicit feature vector and the second implicit feature vector;
    根据所述对象文本信息以及预设的关键词获取逻辑回归模型,将所述向量相似度输入所述逻辑回归模型,得到所述对象文本信息与所述参考文本信息之间对象文本信息的匹配度。Obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the degree of matching of the object text information between the object text information and the reference text information .
  17. 根据权利要求16所述的非易失性存储介质,所述根据预设自编码结构将所述对象文本信息转换为第一隐含特征向量,包括:The non-volatile storage medium according to claim 16, wherein said converting said object text information into a first implicit feature vector according to a preset self-encoding structure comprises:
    将所述对象文本信息输入预设学习算法,得到对象输入向量;Input the object text information into a preset learning algorithm to obtain an object input vector;
    将所述对象输入向量输入所述预设自编码结构,提取所述预设自编码结构中与所述对象输入向量对应的所述第一隐含特征向量。The object input vector is input into the preset self-encoding structure, and the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure is extracted.
  18. 根据权利要求16所述的非易失性存储介质,所述参考文本信息包括与所述对象文本信息对应的问题文本信息和标准文本信息;所述第二隐含特征向量包括问题隐含特征向量和标准隐含特征向量;所述将所述参考文本信息转换为第二隐含特征向量,包括:The non-volatile storage medium according to claim 16, wherein the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes a question implicit feature vector And the standard implicit feature vector; said converting the reference text information into a second implicit feature vector includes:
    将所述问题文本信息输入预设学习算法,得到问题输入向量;Input the question text information into a preset learning algorithm to obtain a question input vector;
    将所述问题输入向量输入预设自编码结构,提取所述预设自编码结构中与所述问题输入向量对应的所述问题隐含特征向量;Inputting the question input vector into a preset self-encoding structure, and extracting the question implicit feature vector corresponding to the question input vector in the preset self-encoding structure;
    将所述标准文本信息输入预设学习算法,得到标准输入向量;Input the standard text information into a preset learning algorithm to obtain a standard input vector;
    将所述标准输入向量输入所述预设自编码结构,提取所述预设自编码结构中与所述标准输入向量对应的所述标准隐含特征向量。The standard input vector is input into the preset self-encoding structure, and the standard implicit feature vector corresponding to the standard input vector in the preset self-encoding structure is extracted.
  19. 根据权利要求16所述的方非易失性存储介质,所述获取对象文本信息及其对应的参考文本信息的步骤之后,还包括:The non-volatile storage medium according to claim 16, after the step of obtaining the object text information and the corresponding reference text information, further comprising:
    获取与所述对象文本信息相关联的训练特征向量;Acquiring a training feature vector associated with the object text information;
    根据所述训练特征向量,对预存的多个自编码结构进行训练,得到多个训练自编码结构;Training multiple pre-stored auto-encoding structures according to the training feature vector to obtain multiple training auto-encoding structures;
    计算各所述训练自编码结构的信息损失量,选取信息损失量最小的训练自编码结构,作为预设自编码结构。The information loss amount of each training self-encoding structure is calculated, and the training self-encoding structure with the smallest amount of information loss is selected as the preset self-encoding structure.
  20. 根据权利要求18所述的非易失性存储介质,所述向量相似度包 括问题相似度和标准相似度;所述计算所述第一隐含特征向量与所述第二隐含特征向量之间的向量相似度,包括:The non-volatile storage medium according to claim 18, wherein the vector similarity includes question similarity and standard similarity; said calculating the difference between the first implicit feature vector and the second implicit feature vector The vector similarity includes:
    计算所述第一隐含特征向量和所述问题隐含特征向量之间的夹角余弦值,得到所述问题相似度;Calculating the cosine value of the angle between the first implicit feature vector and the implicit feature vector of the question to obtain the similarity of the question;
    计算所述第一隐含特征向量和所述标准隐含特征向量之间的夹角余弦值,得到所述标准相似度。Calculate the cosine of the angle between the first implicit feature vector and the standard implicit feature vector to obtain the standard similarity.
PCT/CN2019/103650 2019-06-27 2019-08-30 Text information matching degree detection method and apparatus, computer device and storage medium WO2020258506A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910569471.7 2019-06-27
CN201910569471.7A CN110413730A (en) 2019-06-27 2019-06-27 Text information matching degree detection method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2020258506A1 true WO2020258506A1 (en) 2020-12-30

Family

ID=68359982

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103650 WO2020258506A1 (en) 2019-06-27 2019-08-30 Text information matching degree detection method and apparatus, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110413730A (en)
WO (1) WO2020258506A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343987A (en) * 2021-06-30 2021-09-03 北京奇艺世纪科技有限公司 Text detection processing method and device, electronic equipment and storage medium
CN114003305A (en) * 2021-10-22 2022-02-01 济南浪潮数据技术有限公司 Device similarity calculation method, computer device, and storage medium
CN116188091A (en) * 2023-05-04 2023-05-30 品茗科技股份有限公司 Method, device, equipment and medium for automatic matching unit price reference of cost list
CN117195860A (en) * 2023-11-07 2023-12-08 品茗科技股份有限公司 Intelligent inspection method, system, electronic equipment and computer readable storage medium

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111180086B (en) * 2019-12-12 2023-04-25 平安医疗健康管理股份有限公司 Data matching method, device, computer equipment and storage medium
CN111191457B (en) * 2019-12-16 2023-09-15 浙江大搜车软件技术有限公司 Natural language semantic recognition method, device, computer equipment and storage medium
CN111401076B (en) * 2020-04-09 2023-04-25 支付宝(杭州)信息技术有限公司 Text similarity determination method and device and electronic equipment
CN113672694A (en) * 2020-05-13 2021-11-19 武汉Tcl集团工业研究院有限公司 Text processing method, terminal and storage medium
CN111737975A (en) * 2020-05-14 2020-10-02 平安科技(深圳)有限公司 Text connotation quality evaluation method, device, equipment and storage medium
CN111639161A (en) * 2020-05-29 2020-09-08 中国工商银行股份有限公司 System information processing method, apparatus, computer system and medium
CN112749252B (en) * 2020-07-14 2023-11-03 腾讯科技(深圳)有限公司 Text matching method and related device based on artificial intelligence
CN112597281A (en) * 2020-12-28 2021-04-02 中国农业银行股份有限公司 Information acquisition method and device
CN113836942B (en) * 2021-02-08 2022-09-20 宏龙科技(杭州)有限公司 Text matching method based on hidden keywords
CN112989784A (en) * 2021-03-04 2021-06-18 广州汇才创智科技有限公司 Text automatic scoring method and device based on twin neural network and electronic equipment
CN113157871B (en) * 2021-05-27 2021-12-21 宿迁硅基智能科技有限公司 News public opinion text processing method, server and medium applying artificial intelligence
CN113989859B (en) * 2021-12-28 2022-05-06 江苏苏宁银行股份有限公司 Fingerprint similarity identification method and device for anti-flashing equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870440A (en) * 2012-12-12 2014-06-18 中国移动通信集团广西有限公司 Text data processing method and device
CN108920654A (en) * 2018-06-29 2018-11-30 泰康保险集团股份有限公司 A kind of matched method and apparatus of question and answer text semantic
CN109189931A (en) * 2018-09-05 2019-01-11 腾讯科技(深圳)有限公司 A kind of screening technique and device of object statement
CN109918663A (en) * 2019-03-04 2019-06-21 腾讯科技(深圳)有限公司 A kind of semantic matching method, device and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829299B (en) * 2018-11-29 2022-05-10 电子科技大学 Unknown attack identification method based on depth self-encoder
CN109871531A (en) * 2019-01-04 2019-06-11 平安科技(深圳)有限公司 Hidden feature extracting method, device, computer equipment and storage medium
CN109766428B (en) * 2019-02-02 2021-05-28 中国银行股份有限公司 Data query method and equipment and data processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870440A (en) * 2012-12-12 2014-06-18 中国移动通信集团广西有限公司 Text data processing method and device
CN108920654A (en) * 2018-06-29 2018-11-30 泰康保险集团股份有限公司 A kind of matched method and apparatus of question and answer text semantic
CN109189931A (en) * 2018-09-05 2019-01-11 腾讯科技(深圳)有限公司 A kind of screening technique and device of object statement
CN109918663A (en) * 2019-03-04 2019-06-21 腾讯科技(深圳)有限公司 A kind of semantic matching method, device and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343987A (en) * 2021-06-30 2021-09-03 北京奇艺世纪科技有限公司 Text detection processing method and device, electronic equipment and storage medium
CN113343987B (en) * 2021-06-30 2023-08-22 北京奇艺世纪科技有限公司 Text detection processing method and device, electronic equipment and storage medium
CN114003305A (en) * 2021-10-22 2022-02-01 济南浪潮数据技术有限公司 Device similarity calculation method, computer device, and storage medium
CN114003305B (en) * 2021-10-22 2024-03-15 济南浪潮数据技术有限公司 Device similarity calculation method, computer device, and storage medium
CN116188091A (en) * 2023-05-04 2023-05-30 品茗科技股份有限公司 Method, device, equipment and medium for automatic matching unit price reference of cost list
CN117195860A (en) * 2023-11-07 2023-12-08 品茗科技股份有限公司 Intelligent inspection method, system, electronic equipment and computer readable storage medium
CN117195860B (en) * 2023-11-07 2024-03-26 品茗科技股份有限公司 Intelligent inspection method, system, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN110413730A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
WO2020258506A1 (en) Text information matching degree detection method and apparatus, computer device and storage medium
WO2021068615A1 (en) Method and device for acquiring question and answer data in document, computer device, and storage medium
US20210257066A1 (en) Machine learning based medical data classification method, computer device, and non-transitory computer-readable storage medium
CN108829757B (en) Intelligent service method, server and storage medium for chat robot
US11423304B2 (en) System and method for semantic analysis of multimedia data using attention-based fusion network
CN109783655B (en) Cross-modal retrieval method and device, computer equipment and storage medium
CN110765763B (en) Error correction method and device for voice recognition text, computer equipment and storage medium
WO2021042503A1 (en) Information classification extraction method, apparatus, computer device and storage medium
CN108427707B (en) Man-machine question and answer method, device, computer equipment and storage medium
WO2021068321A1 (en) Information pushing method and apparatus based on human-computer interaction, and computer device
US10789415B2 (en) Information processing method and related device
WO2021000412A1 (en) Text degree-of-matching measurement method and apparatus, and computer device and readable storage medium
CN111985228B (en) Text keyword extraction method, text keyword extraction device, computer equipment and storage medium
WO2020114100A1 (en) Information processing method and apparatus, and computer storage medium
WO2022227162A1 (en) Question and answer data processing method and apparatus, and computer device and storage medium
CN108536735B (en) Multi-mode vocabulary representation method and system based on multi-channel self-encoder
WO2022141864A1 (en) Conversation intent recognition model training method, apparatus, computer device, and medium
CN111259113A (en) Text matching method and device, computer readable storage medium and computer equipment
WO2021139344A1 (en) Text generation method and apparatus based on artificial intelligence, computer device, and medium
CN112632258A (en) Text data processing method and device, computer equipment and storage medium
CN114139551A (en) Method and device for training intention recognition model and method and device for recognizing intention
CN115495553A (en) Query text ordering method and device, computer equipment and storage medium
CN110377618B (en) Method, device, computer equipment and storage medium for analyzing decision result
CN113836192B (en) Parallel corpus mining method and device, computer equipment and storage medium
CN115146068A (en) Method, device and equipment for extracting relation triples and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19935477

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19935477

Country of ref document: EP

Kind code of ref document: A1