WO2020258506A1

WO2020258506A1 - Text information matching degree detection method and apparatus, computer device and storage medium

Info

Publication number: WO2020258506A1
Application number: PCT/CN2019/103650
Authority: WO
Inventors: 金戈; 徐亮
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-06-27
Filing date: 2019-08-30
Publication date: 2020-12-30
Also published as: CN110413730A

Abstract

A text information matching degree detection method and apparatus, a computer device and a storage medium. The method comprises: acquiring object text information and corresponding reference text information thereof; converting the object text information into a first implicit feature vector, and converting the reference text information into a second implicit feature vector; calculating the vector similarity between the first implicit feature vector and the second implicit feature vector; and acquiring a logistic regression model according to the object text information and a preset keyword, and inputting the vector similarity into the logistic regression model to obtain a matching degree of the object text information between the object text information and the reference text information.

Description

Text information matching degree detection method, device, computer equipment and storage medium

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on June 27, 2019, the application number is 2019105694717, and the invention title is "text information matching degree detection method, device, computer equipment and storage medium", and its entire content Incorporated in this application by reference.

Technical field

This application relates to the field of computer technology, and in particular to a method, device, computer equipment and non-volatile storage medium for detecting matching degree of text information.

Background technique

Text matching degree refers to the degree of semantic relevance between different texts. The determination of text matching degree is one of the core tasks of text mining and text retrieval. Therefore, how to better detect text matching degree has always been of great concern to those skilled in the art. The problem.

The main method for detecting text matching degree in the prior art is: mapping the text to a vector in the word space, and calculating the Euclidean distance or the cosine distance between the vectors. The inventor realized that the existing text matching degree detection method only determines the text similarity in the word space, and does not consider the association and semantic information between text features, so the matching degree detection is not accurate enough.

Summary of the invention

The purpose of this application is to provide a text information matching degree detection method, device, computer equipment and readable non-volatile storage medium, so that the text information matching degree detection is more accurate.

In order to solve the above technical problems, the present application provides a method for detecting matching degree of text information. The method includes: acquiring object text information and its corresponding reference text information; and converting the object text information into the first text information according to a preset self-encoding structure. A implicit feature vector, and converting the reference text information into a second implicit feature vector; wherein, the first implicit feature vector is used to represent feature information of the object text information; the second implicit feature vector The feature vector is used to represent the feature information of the reference text information; calculate the vector similarity between the first implicit feature vector and the second implicit feature vector; according to the object text information and the preset key The word acquisition logistic regression model, the vector similarity is input into the logistic regression model, and the matching degree of the target text information between the target text information and the reference text information is obtained.

In order to solve the above technical problems, this application also provides a text information matching degree detection device. The device includes: a text information acquisition module for acquiring object text information and its corresponding reference text information; a text information conversion module for The object text information is converted into a first implicit feature vector according to a preset self-encoding structure, and the reference text information is converted into a second implicit feature vector; wherein, the first implicit feature vector is used to represent The feature information of the object text information; the second implicit feature vector is used to represent feature information of the reference text information; the vector similarity acquisition module is used to calculate the first implicit feature vector and the first implicit feature vector 2. The vector similarity between implicit feature vectors; the matching detection module is used to obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain The degree of matching of the object text information between the object text information and the reference text information.

In order to solve the above technical problem, the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements a method for detecting the matching degree of text information when the computer program is executed. The method for detecting the matching degree of the text information includes: obtaining object text information and its corresponding reference text information; converting the object text information into a first implicit feature vector according to a preset self-encoding structure, and converting the reference text information Is a second implicit feature vector; wherein, the first implicit feature vector is used to represent feature information of the object text information; the second implicit feature vector is used to represent feature information of the reference text information; Calculate the vector similarity between the first implicit feature vector and the second implicit feature vector; obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the The logistic regression model is used to obtain the matching degree of the object text information between the object text information and the reference text information.

In order to solve the above technical problems, the present application also provides a computer-readable non-volatile storage medium, on which a computer program is stored, and when the computer program is executed by a processor, a method for detecting matching degree of text information is implemented. The text information matching degree detection method includes: acquiring object text information and its corresponding reference text information; converting the object text information into a first implicit feature vector according to a preset self-encoding structure, and converting the reference text information into A second implicit feature vector; wherein, the first implicit feature vector is used to represent feature information of the object text information; the second implicit feature vector is used to represent feature information of the reference text information; calculation The vector similarity between the first implicit feature vector and the second implicit feature vector; obtaining a logistic regression model according to the object text information and preset keywords, and inputting the vector similarity into the Logistic regression model to obtain the matching degree of the object text information between the object text information and the reference text information.

The present application provides a text information matching degree detection method, device, computer equipment, and non-volatile storage medium. The vector similarity between the implicit semantic features between the object text information and the reference text information is input to the object The logistic regression model corresponding to the text information can effectively improve the accuracy of the text information matching degree detection.

Description of the drawings

FIG. 1 is an application environment diagram of a method for detecting matching degree of text information in an embodiment;

2 is a schematic flowchart of a method for detecting matching degree of text information in an embodiment;

3 is a schematic flowchart of a method for detecting matching degree of text information in another embodiment;

Figure 4 is a structural block diagram of a text information matching degree detection device in an embodiment;

Fig. 5 is an internal structure diagram of a computer device in an embodiment.

Detailed ways

The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary, and are only used to explain the present application, and cannot be construed as a limitation to the present application.

Those skilled in the art can understand that, unless specifically stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of this application refers to the presence of the described features, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof.

Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as those commonly understood by those of ordinary skill in the art to which this application belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have a meaning consistent with the meaning in the context of the prior art, and unless specifically defined as here, they will not be idealized or overly Explain the formal meaning.

The text information matching degree detection method provided in this application can be applied to the application environment shown in Figure 1. The server in the figure can be implemented by a computer device. The computer device includes a processor, a memory, and a network connected by a device bus. Interface and database. Among them, the processor of the computer device is used to provide calculation and control capabilities. The database of the computer device is used to store the data involved in the detection of the matching degree of text information. The network interface of the computer device is used to communicate with an external terminal through a network connection. Specifically, the server obtains the object text information and its corresponding reference text information; the server converts the object text information into a first implicit feature vector, and converts the reference text information into a second implicit feature vector; the server calculates The vector similarity between the first implicit feature vector and the second implicit feature vector; the server obtains a logistic regression model according to the object text information and preset keywords, and inputs the vector similarity into the office The logistic regression model is used to obtain the matching degree of the object text information between the object text information and the reference text information. Those skilled in the art can understand that the "server" used herein can be implemented by an independent server or a server cluster composed of multiple servers.

In one embodiment, as shown in FIG. 2, a method for detecting the matching degree of text information is provided. The method is applied to the server in FIG. 1 as an example for description, including the following steps:

Step S201: Obtain object text information and its corresponding reference text information.

In this step, the object text information may be the answer text of the matching degree to be detected; the reference text information may be the question text and standard text corresponding to the answer text.

Taking text review as an example, the user’s answer to the question is the target text information, and the reference text information is the question and the standard answer corresponding to the question; the matching degree between the target text information and the reference text information is detected, that is, the answer and the question and The process of semantic relevance between standard answers.

In an embodiment, after the step of obtaining the object text information and the corresponding reference text information in step S201, the method further includes:

A1. Obtain a training feature vector associated with the object text information.

A2, training multiple pre-stored auto-encoding structures according to the training feature vector to obtain multiple training auto-encoding structures;

In this step, the text information can be transformed into implicit feature vectors through the self-encoding structure; among them, the self-encoding structure is a kind of neural network, which encodes the features of the input self-encoding structure, and then decodes, so that the input and output are different minimize.

A3: Calculate the information loss of each training self-encoding structure, and select the training self-encoding structure with the smallest amount of information loss as the preset self-encoding structure.

In the specific implementation process, the training process of the self-encoding structure is the process of minimizing the difference between input and output. The training feature vector is input into multiple different self-encoding structures. The difference between the different self-encoding structures lies in the number of hidden layers and the hidden layer. Depending on the number of layer units, adjust the parameters of multiple auto-encoding structures to minimize the difference between the output of each encoding structure and the training feature vector. According to the difference value of the input and output of each training auto-encoding structure, from multiple training auto-encoding structures Select the target self-encoding structure.

Step S202: Convert the object text information into a first implicit feature vector, and convert the reference text information into a second implicit feature vector.

In this step, the implicit feature vector is the feature vector obtained by encoding the features of the input self-encoding structure, which retains a large amount of information of the input vector of the original input self-encoding structure, and is used to represent the object text information of the input self-encoding structure and Refer to the feature information of the text information; the self-encoding structure decodes and restores the implicit feature vector to obtain the output feature code.

In an embodiment, for step S202, converting the object text information into a first implicit feature vector may include:

B1, input the object text information into a preset learning algorithm to obtain an object input vector.

B2. Input the object input vector into a preset self-encoding structure, and extract the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure.

In this embodiment, the preset learning algorithm is an algorithm for converting text into a corresponding vector. For example, the object text information is converted into an object input vector in the form of a bag of words model feature through the sklearn library in Python; where, Python is a computer programming language; sklearn, also known as scikit-learn, is a python-based machine learning library that can facilitate the implementation of machine learning algorithms, including: classification, regression, clustering, dimensionality reduction, model selection and Data mining related algorithms such as preprocessing.

For example, the existing text 1: "I like to eat apples, apples are rich in nutrition", and the text 2: "I like to eat pears", first use the jieba library in Python to segment words to separate words in the discourse. Then use the sklearn library to establish the features of the bag of words model (features will include "I", "like", "eat", "apple", "nutrition", "rich", and "pear"), and determine each sample according to the frequency of word occurrence The feature value of can be obtained, the feature vector of text one (1,1,1,2,1,1,0), the feature vector of text two is (1,1,1,0,0,0,1)) . Among them, the jieba library is a Python Chinese word segmentation library.

Further, the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes a question implicit feature vector and a standard implicit feature vector; for step S202, all The conversion of the reference text information into the second implicit feature vector includes:

B3. Input the question text information into a preset learning algorithm to obtain a question input vector; input the question input vector into a preset self-encoding structure, and extract all of the preset self-encoding structure corresponding to the question input vector The implicit feature vector of the problem;

B4. Input the standard text information into a preset learning algorithm to obtain a standard input vector; input the standard input vector into the preset self-encoding structure, and extract the preset self-encoding structure corresponding to the standard input vector The standard implies feature vectors.

In this embodiment, the object text information and the reference text information are respectively converted into object input vectors and reference input vectors through a preset learning algorithm; then the object input vectors and reference input vectors are respectively input into the preset self-encoding structure and extracted from The first implicit feature vector corresponding to the object input vector and the second implicit feature vector corresponding to the reference input vector in the coding structure can effectively extract the implicit semantic features between the object text information and the reference text information.

Step S203: Calculate the vector similarity between the first implicit feature vector and the second implicit feature vector.

In this step, the calculation of vector similarity is usually to calculate the distance between two vectors. The closer the distance, the greater the similarity. The cosine similarity calculation method can be used to calculate the first implicit feature vector and the said The second implied vector similarity between feature vectors.

In one embodiment, the vector similarity includes question similarity and standard similarity; in step S203, calculating the vector similarity between the first implicit feature vector and the second implicit feature vector includes :

C1, calculating the cosine value of the angle between the first implicit feature vector and the implicit feature vector of the question to obtain the similarity of the question;

C2: Calculate the cosine of the angle between the first implicit feature vector and the standard implicit feature vector to obtain the standard similarity.

Among them, the cosine similarity calculation method is also called cosine similarity, which evaluates their similarity by calculating the cosine value of the angle between two vectors; the cosine value of an angle of 0 degrees is 1, while the cosine value of any other angle is not It is greater than 1, and its minimum value is -1, so the cosine of the angle between the two vectors determines whether the two vectors are roughly pointing in the same direction. When two vectors have the same direction, the cosine similarity value is 1; when the angle between the two vectors is 90°, the cosine similarity value is 0; when the two vectors point in completely opposite directions, the cosine similarity value is Is -1; cosine similarity is usually used in positive space, so the value given is between 0 and 1.

Step S204: Obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain object text information between the object text information and the reference text information The matching degree.

In this step, the parameters of the logistic regression model are calculated through the object text information and preset keywords, and then the vector similarity is input into the logistic regression model, and a matching value value is output.

Taking text scoring as an example, a series of parameters are calculated according to the answer text of the user's answer and preset keywords, and the corresponding logistic regression model is established based on the obtained parameters, and then the similarity between the answer text and the reference text is input into the logistic regression Model, you can get a matching score.

In the following, the acquisition process of the logistic regression model in this application will be described in conjunction with FIG. 3 and specific embodiments. In one embodiment, obtaining a logistic regression model according to the target text information and preset keywords in step S204 includes:

S410: Acquire keyword similarity between a preset keyword and the object text information;

S420: Set the keyword similarity and the vector similarity as parameters of a preset initial regression model to obtain the logistic regression model corresponding to the object text information.

In an embodiment, step S410 acquiring the keyword similarity between the preset keyword and the object text information includes:

D1: Calculate the information value of each keyword in the preset keyword library, and select keywords whose information value is greater than a preset threshold as the preset keywords;

D2: Split the object text information to obtain multiple object words, and calculate the similarity between the preset keywords and the object words;

D3. Select the maximum value of the similarity as the keyword similarity.

In the process of selecting keywords, the keyword with the greater information value indicates that the keyword can judge the semantic relevance of the target text information. For example, calculate the ten keywords with the highest information value in the preset thesaurus. These ten keywords are calculated for similarity with multiple target words, and then the target word with the highest similarity in the target text is selected to obtain the final ten similarity values. The ten similarity values and the vector The similarity is used as a parameter of the logistic regression model together.

The above method for detecting matching degree of text information is to obtain object text information and its corresponding reference text information; convert the object text information into a first implicit feature vector, and convert the reference text information into a second implicit feature Vector; calculating the vector similarity between the first implicit feature vector and the second implicit feature vector, which can effectively extract and match the implicit semantic features between the target text information and the reference text information; The object text information and preset keywords are used to obtain a logistic regression model, and the vector similarity is input into the logistic regression model to obtain the degree of matching of the object text information between the object text information and the reference text information. Inputting the vector similarity between the implicit semantic features between the object text information and the reference text information into the logistic regression model corresponding to the object text information can effectively improve the accuracy of the text information matching degree detection.

It should be understood that, although the various steps in the flowchart of FIGS. 2-3 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 2-3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

In one of the embodiments, as shown in FIG. 4, a text information matching degree detection device is provided, and the device includes:

The text information obtaining module 401 is used to obtain object text information and its corresponding reference text information;

The text information conversion module 402 is configured to convert the object text information into a first implicit feature vector, and convert the reference text information into a second implicit feature vector;

The vector similarity acquisition module 403 is configured to calculate the vector similarity between the first implicit feature vector and the second implicit feature vector;

The matching degree detection module 404 is configured to obtain a logistic regression model according to the target text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the target text information and the reference text information The degree of match between the object text information.

For the specific definition of the text information matching degree detection device, please refer to the above definition of the text information matching degree detection method, which will not be repeated here. Each module in the above-mentioned text information matching degree detection device can be implemented in whole or in part by software, hardware and a combination thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.

In one embodiment, a server is provided. The server may be implemented by computer equipment, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a network interface and a database connected by a device bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile non-volatile storage medium and an internal memory. The non-volatile non-volatile storage medium stores an operating device, a computer program, and a database. The internal memory provides an environment for the operation of the operating device and the computer program in the non-volatile non-volatile storage medium. The database of the computer device is used to store the data involved in the detection of the matching degree of text information. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a text information matching degree detection method.

Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

In one embodiment, a computer device is provided, including a memory and a processor, and a computer program is stored in the memory. When the processor executes the computer program, the following steps are implemented: acquiring object text information and its corresponding reference text information; Converting the object text information into a first implicit feature vector, and converting the reference text information into a second implicit feature vector; calculating the difference between the first implicit feature vector and the second implicit feature vector The vector similarity of the; the logistic regression model is obtained according to the object text information and preset keywords, and the vector similarity is input into the logistic regression model to obtain the object between the object text information and the reference text information The matching degree of the text information.

In one of the embodiments, acquiring the target self-encoding structure when the processor executes the computer program includes: inputting the object text information into a preset learning algorithm to obtain an object input vector; inputting the object input vector into the preset self-encoding structure An encoding structure, extracting the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure.

In one of the embodiments, when the processor executes the computer program, the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes question implicit feature vector and Standard implicit feature vector; said converting the reference text information into a second implicit feature vector includes: inputting the question text information into a preset learning algorithm to obtain a question input vector; inputting the question input vector into a pre- Set a self-encoding structure, extract the hidden feature vector of the question corresponding to the question input vector in the preset self-encoding structure; input the standard text information into a preset learning algorithm to obtain a standard input vector; A standard input vector is input to the preset self-encoding structure, and the standard implicit feature vector corresponding to the standard input vector in the preset self-encoding structure is extracted.

In one of the embodiments, after the step of obtaining the object text information and the corresponding reference text information when the processor executes the computer program, the method further includes: obtaining a training feature vector associated with the object text information; Training feature vectors, train multiple pre-stored auto-encoding structures to obtain multiple training auto-encoding structures; calculate the information loss of each training auto-encoding structure, and select the training auto-encoding structure with the smallest amount of information loss as the preset Self-encoding structure.

In one of the embodiments, when the processor executes the computer program, the vector similarity includes question similarity and standard similarity; the calculation of the difference between the first implicit feature vector and the second implicit feature vector The vector similarity includes: calculating the cosine of the angle between the first implicit feature vector and the problem implicit feature vector to obtain the problem similarity; calculating the first implicit feature vector and the The standard implies the cosine of the angle between the feature vectors to obtain the standard similarity.

In one of the embodiments, when the processor executes the computer program, acquiring a logistic regression model based on the target text information and preset keywords includes: acquiring a keyword between the preset keyword and the target text information Similarity; the keyword similarity and the vector similarity are set as parameters of a preset initial regression model to obtain the logistic regression model corresponding to the object text information.

In one of the embodiments, the acquiring the keyword similarity between the preset keyword and the object text information when the processor executes the computer program includes: calculating the information value of each keyword in the preset keyword library, Select keywords whose information value is greater than a preset threshold value as the preset keywords; split the object text information to obtain multiple target words, and calculate the similarity between the preset keywords and the target words; select The maximum value in the similarity is set as the keyword similarity.

In one embodiment, a computer-readable non-volatile storage medium is provided, and a computer program is stored thereon. When the computer program is executed by a processor, the following steps are implemented: acquiring object text information and its corresponding reference text information; Convert the object text information into a first implicit feature vector, and convert the reference text information into a second implicit feature vector; calculate the difference between the first implicit feature vector and the second implicit feature vector The vector similarity between the two; obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the difference between the object text information and the reference text information The matching degree of the object text information.

In one of the embodiments, the obtaining the target self-encoding structure when the computer program is executed by the processor includes: inputting the object text information into a preset learning algorithm to obtain an object input vector; inputting the object input vector into a preset The self-encoding structure extracts the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure.

In one of the embodiments, when the computer program is executed by the processor, the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes a question implicit feature vector And the standard implicit feature vector; said converting the reference text information into a second implicit feature vector includes: inputting the question text information into a preset learning algorithm to obtain a question input vector; inputting the question input vector A preset self-encoding structure is used to extract the hidden feature vector of the question corresponding to the question input vector in the preset self-encoding structure; the standard text information is input into a preset learning algorithm to obtain a standard input vector; The standard input vector is input to the preset self-encoding structure, and the standard implicit feature vector corresponding to the standard input vector in the preset self-encoding structure is extracted.

In one of the embodiments, after the step of obtaining the object text information and the corresponding reference text information when the computer program is executed by the processor, the method further includes: obtaining a training feature vector associated with the object text information; The training feature vector is used to train multiple pre-stored self-encoding structures to obtain multiple training self-encoding structures; calculate the information loss of each training self-encoding structure, and select the training self-encoding structure with the smallest amount of information loss as the prediction Set up a self-encoding structure.

In one of the embodiments, when the computer program is executed by the processor, the vector similarity includes question similarity and standard similarity; the calculation of the difference between the first implicit feature vector and the second implicit feature vector The vector similarity of includes: calculating the cosine of the angle between the first implicit feature vector and the problem implicit feature vector to obtain the problem similarity; calculating the first implicit feature vector and the The cosine of the angle between the implicit feature vectors of the standard is used to obtain the similarity of the standard.

In one of the embodiments, when the computer program is executed by the processor, obtaining a logistic regression model based on the target text information and preset keywords includes: obtaining the key between the preset keywords and the target text information Word similarity; setting the keyword similarity and the vector similarity as the parameters of a preset initial regression model to obtain the logistic regression model corresponding to the target text information.

In one of the embodiments, when the computer program is executed by the processor, acquiring the keyword similarity between the preset keyword and the object text information includes: calculating the information value of each keyword in the preset keyword library Select a keyword with an information value greater than a preset threshold as the preset keyword; split the target text information to obtain multiple target words, and calculate the similarity between the preset keyword and the target word; The maximum value of the similarity is selected as the keyword similarity.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing relevant hardware. The computer program can be stored in a non-volatile computer and can be read by a non-volatile computer. In a sexual storage medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, they should It is considered as the range described in this specification.

The above examples only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A method for detecting matching degree of text information, the method comprising:

Obtain object text information and its corresponding reference text information;

The object text information is converted into a first implicit feature vector according to a preset self-encoding structure, and the reference text information is converted into a second implicit feature vector; wherein, the first implicit feature vector is used to represent Feature information of the object text information; the second implicit feature vector is used to represent the feature information of the reference text information;

Calculating the vector similarity between the first implicit feature vector and the second implicit feature vector;

Obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the degree of matching of the object text information between the object text information and the reference text information .
The method according to claim 1, wherein the converting the object text information into a first implicit feature vector according to a preset self-encoding structure comprises:

Input the object text information into a preset learning algorithm to obtain an object input vector;

The object input vector is input into the preset self-encoding structure, and the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure is extracted.
The method according to claim 1, wherein the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes a question implicit feature vector and a standard implicit feature Vector; said converting the reference text information into a second implicit feature vector includes:

Input the question text information into a preset learning algorithm to obtain a question input vector;

Inputting the question input vector into a preset self-encoding structure, and extracting the question implicit feature vector corresponding to the question input vector in the preset self-encoding structure;

Input the standard text information into a preset learning algorithm to obtain a standard input vector;

The standard input vector is input into the preset self-encoding structure, and the standard implicit feature vector corresponding to the standard input vector in the preset self-encoding structure is extracted.
The method according to claim 1, after the step of obtaining object text information and its corresponding reference text information, further comprising:

Acquiring a training feature vector associated with the object text information;

Training multiple pre-stored auto-encoding structures according to the training feature vector to obtain multiple training auto-encoding structures;

The information loss amount of each training self-encoding structure is calculated, and the training self-encoding structure with the smallest amount of information loss is selected as the preset self-encoding structure.
The method according to claim 3, wherein the vector similarity includes question similarity and standard similarity; said calculating the vector similarity between the first implicit feature vector and the second implicit feature vector, include:

Calculating the cosine value of the angle between the first implicit feature vector and the implicit feature vector of the question to obtain the similarity of the question;

Calculate the cosine of the angle between the first implicit feature vector and the standard implicit feature vector to obtain the standard similarity.
The method according to claim 1, wherein said obtaining a logistic regression model according to said target text information and preset keywords comprises:

Acquiring the keyword similarity between the preset keyword and the object text information;

The keyword similarity and the vector similarity are set as parameters of a preset initial regression model to obtain the logistic regression model corresponding to the object text information.
The method according to claim 6, wherein said obtaining the keyword similarity between the preset keyword and the object text information comprises:

Calculate the information value of each keyword in the preset keyword library, and select keywords with an information value greater than a preset threshold as the preset keyword;

Splitting the object text information to obtain multiple object words, and calculating the similarity between the preset keywords and the object words;

The maximum value of the similarity is selected as the keyword similarity.
A text information matching degree detection device, the device comprising:

The text information acquisition module is used to acquire object text information and its corresponding reference text information;

The text information conversion module is used to convert the object text information into a first implicit feature vector according to a preset self-encoding structure, and convert the reference text information into a second implicit feature vector; wherein, the first The hidden feature vector is used to represent the feature information of the object text information; the second hidden feature vector is used to represent the feature information of the reference text information;

A vector similarity acquisition module, configured to calculate the vector similarity between the first implicit feature vector and the second implicit feature vector;

The matching degree detection module is used to obtain a logistic regression model according to the target text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the difference between the target text information and the reference text information The degree of match between the object text information.
A computer device includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a text information matching degree detection method is implemented. The text information matching degree detection method includes the following steps :

Obtain object text information and its corresponding reference text information;

The object text information is converted into a first implicit feature vector according to a preset self-encoding structure, and the reference text information is converted into a second implicit feature vector; wherein, the first implicit feature vector is used to represent Feature information of the object text information; the second implicit feature vector is used to represent the feature information of the reference text information;

Calculating the vector similarity between the first implicit feature vector and the second implicit feature vector;

Obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the degree of matching of the object text information between the object text information and the reference text information .
The computer device according to claim 9, wherein the converting the object text information into a first implicit feature vector according to a preset self-encoding structure comprises:

Input the object text information into a preset learning algorithm to obtain an object input vector;

The object input vector is input into the preset self-encoding structure, and the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure is extracted.
The computer device according to claim 10, wherein the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes a question implicit feature vector and a standard implicit feature vector Feature vector; said converting the reference text information into a second implicit feature vector includes:

Input the question text information into a preset learning algorithm to obtain a question input vector;

Inputting the question input vector into a preset self-encoding structure, and extracting the question implicit feature vector corresponding to the question input vector in the preset self-encoding structure;

Input the standard text information into a preset learning algorithm to obtain a standard input vector;

The standard input vector is input into the preset self-encoding structure, and the standard implicit feature vector corresponding to the standard input vector in the preset self-encoding structure is extracted.
The computer device according to claim 9, after the step of obtaining the object text information and the corresponding reference text information, further comprising:

Acquiring a training feature vector associated with the object text information;

Training multiple pre-stored auto-encoding structures according to the training feature vector to obtain multiple training auto-encoding structures;

The information loss amount of each training self-encoding structure is calculated, and the training self-encoding structure with the smallest amount of information loss is selected as the preset self-encoding structure.
11. The computer device according to claim 11, wherein the vector similarity includes question similarity and standard similarity; and said calculating the vector similarity between the first implicit feature vector and the second implicit feature vector ,include:

Calculating the cosine value of the angle between the first implicit feature vector and the implicit feature vector of the question to obtain the similarity of the question;

Calculate the cosine of the angle between the first implicit feature vector and the standard implicit feature vector to obtain the standard similarity.
9. The computer device according to claim 9, wherein said obtaining a logistic regression model according to said target text information and preset keywords comprises:

Acquiring the keyword similarity between the preset keyword and the object text information;

The keyword similarity and the vector similarity are set as parameters of a preset initial regression model to obtain the logistic regression model corresponding to the object text information.
The computer device according to claim 14, wherein said obtaining the keyword similarity between the preset keyword and the object text information comprises:

Calculate the information value of each keyword in the preset keyword library, and select keywords with an information value greater than a preset threshold as the preset keyword;

Splitting the object text information to obtain multiple object words, and calculating the similarity between the preset keywords and the object words;

The maximum value of the similarity is selected as the keyword similarity.
A computer-readable non-volatile storage medium has a computer program stored thereon, and when the computer program is executed by a processor, a method for detecting matching degree of text information is realized. The method for detecting matching degree of text information includes the following steps:

Obtain object text information and its corresponding reference text information;

The object text information is converted into a first implicit feature vector according to a preset self-encoding structure, and the reference text information is converted into a second implicit feature vector; wherein, the first implicit feature vector is used to represent Feature information of the object text information; the second implicit feature vector is used to represent the feature information of the reference text information;

Calculating the vector similarity between the first implicit feature vector and the second implicit feature vector;

Obtain a logistic regression model according to the object text information and preset keywords, and input the vector similarity into the logistic regression model to obtain the degree of matching of the object text information between the object text information and the reference text information .
The non-volatile storage medium according to claim 16, wherein said converting said object text information into a first implicit feature vector according to a preset self-encoding structure comprises:

Input the object text information into a preset learning algorithm to obtain an object input vector;

The object input vector is input into the preset self-encoding structure, and the first implicit feature vector corresponding to the object input vector in the preset self-encoding structure is extracted.
The non-volatile storage medium according to claim 16, wherein the reference text information includes question text information and standard text information corresponding to the object text information; the second implicit feature vector includes a question implicit feature vector And the standard implicit feature vector; said converting the reference text information into a second implicit feature vector includes:

Input the question text information into a preset learning algorithm to obtain a question input vector;

Inputting the question input vector into a preset self-encoding structure, and extracting the question implicit feature vector corresponding to the question input vector in the preset self-encoding structure;

Input the standard text information into a preset learning algorithm to obtain a standard input vector;

The standard input vector is input into the preset self-encoding structure, and the standard implicit feature vector corresponding to the standard input vector in the preset self-encoding structure is extracted.
The non-volatile storage medium according to claim 16, after the step of obtaining the object text information and the corresponding reference text information, further comprising:

Acquiring a training feature vector associated with the object text information;

Training multiple pre-stored auto-encoding structures according to the training feature vector to obtain multiple training auto-encoding structures;

The information loss amount of each training self-encoding structure is calculated, and the training self-encoding structure with the smallest amount of information loss is selected as the preset self-encoding structure.
The non-volatile storage medium according to claim 18, wherein the vector similarity includes question similarity and standard similarity; said calculating the difference between the first implicit feature vector and the second implicit feature vector The vector similarity includes:

Calculating the cosine value of the angle between the first implicit feature vector and the implicit feature vector of the question to obtain the similarity of the question;

Calculate the cosine of the angle between the first implicit feature vector and the standard implicit feature vector to obtain the standard similarity.