CN112445900A

CN112445900A - Quick retrieval method and system

Info

Publication number: CN112445900A
Application number: CN201910809541.1A
Authority: CN
Inventors: 李霞; 陈怡�; 刘凤余; 王驹冬
Original assignee: Shanghai Zhuofan Information Technology Co ltd
Current assignee: Shanghai Zhuofan Information Technology Co ltd
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2021-03-05

Abstract

A quick retrieval method and a system comprise: s1, preprocessing the question text of the user to be retrieved and quantizing the text; s2, constructing an n-dimensional space according to the quantization result; s3, randomly selecting n points in the n-dimensional space, and constructing a segmentation hyperplane of the n-dimensional space and a vertical plane of the segmentation hyperplane based on the n points; s4, repeating the step S3 until m points remain in the segmented area; s5, constructing a binary tree and establishing a binary tree index; s6, performing word segmentation, word deactivation and text quantization on the search text; s7, searching and traversing to leaf nodes according to the binary tree structure to obtain similar text data of the searched text, calculating the similarity between the searched text and each similar text data, and acquiring the similar text data with the highest similarity; and S8, searching in the database based on the similar text data with the highest similarity to obtain the final answer of the similar text data with the highest similarity, wherein the final answer is used as the answer of the search ontology.

Description

Quick retrieval method and system

Technical Field

The invention belongs to the field of natural language processing information retrieval, and particularly relates to a quick retrieval method and a quick retrieval system.

Background

With the rapid development of internet technology, data becomes an important carrier for information dissemination. In the field of man-machine conversation, for massive data retrieval, the traditional method has higher time complexity, and obviously does not meet the requirements in some scenes with higher real-time requirements, so that the construction of a rapid retrieval method is very important.

Disclosure of Invention

Aiming at the problems and the defects in the prior art, the invention provides a novel rapid retrieval method and a novel rapid retrieval system.

The invention solves the technical problems through the following technical scheme:

the invention provides a quick retrieval method which is characterized by comprising the following steps:

s1, preprocessing massive user question texts to be retrieved, and quantitatively representing the preprocessed user question texts in texts, wherein the preprocessing process comprises word segmentation and word stop;

s2, constructing an n-dimensional space according to the quantization result after text quantization, wherein n > is 100;

s3, randomly selecting n points in the n-dimensional space, and constructing a segmentation hyperplane of the n-dimensional space and a vertical plane of the segmentation hyperplane based on the n points;

s4, repeating the step S3 until m points are left in the divided region, wherein m is less than or equal to n;

s5, constructing a binary tree and establishing a binary tree index;

s6, performing word segmentation, word deactivation and text quantitative representation processing on the search text;

s7, searching and traversing to leaf nodes according to the binary tree structure to obtain similar text data of the searched text, calculating the similarity between the searched text and each similar text data, and acquiring the similar text data with the highest similarity;

and S8, searching in the database based on the similar text data with the highest similarity to obtain the final answer of the similar text data with the highest similarity, wherein the final answer is used as the answer of the search ontology.

Preferably, in step S1, Word2vec is used to perform text quantization on the preprocessed user question text.

Preferably, in step S8, the database is a MySQL database.

The invention also provides a quick retrieval system which is characterized by comprising a processing quantization module, a space construction module, a plane construction module, a calling module, an index construction module, a preprocessing module, a calculation module and a retrieval module;

the processing quantization module is used for preprocessing massive user question texts to be retrieved and carrying out text quantization representation on the preprocessed user question texts, wherein the preprocessing process comprises word segmentation and word stop;

the space construction module is used for constructing an n-dimensional space according to a quantization result after text quantization, wherein n > is 100;

the plane construction module is used for randomly selecting n points in an n-dimensional space and constructing a segmentation hyperplane of the n-dimensional space and a vertical plane of the segmentation hyperplane based on the n points;

the calling module is used for repeatedly calling the plane construction module until m points are left in the divided region, wherein m is less than or equal to n;

the index building module is used for building a binary tree and building a binary tree index;

the preprocessing module is used for carrying out word segmentation, word stop and text quantitative representation preprocessing on the retrieval text;

the calculation module is used for performing retrieval traversal according to the binary tree structure until the leaf nodes, obtaining similar text data of the retrieval text, calculating the similarity between the retrieval text and each similar text data, and acquiring the similar text data with the highest similarity;

the retrieval module is used for retrieving in the database based on the similar text data with the highest similarity to obtain a final answer of the similar text data with the highest similarity, and the final answer is used as an answer of the retrieval body.

Preferably, the processing quantization module is configured to perform text quantization on the preprocessed user question text by using Word2 vec.

Preferably, the database is a MySQL database.

On the basis of the common knowledge in the field, the above preferred conditions can be combined randomly to obtain the preferred embodiments of the invention.

The positive progress effects of the invention are as follows:

according to the invention, for mass data retrieval, the data retrieval speed is improved by constructing the tree structure.

Drawings

FIG. 1 is a flowchart illustrating a fast search method according to a preferred embodiment of the present invention.

FIG. 2 is a block diagram of a fast search system according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, the embodiment provides a fast search method, which includes the following steps:

step 101, preprocessing massive user question texts to be retrieved, and performing text quantitative representation on the preprocessed user question texts by using Word2vec, wherein the preprocessing process comprises Word segmentation and Word stop.

And 102, constructing an n-dimensional space according to a quantization result after text quantization, wherein n > is 100.

And 103, randomly selecting n points in the n-dimensional space, and constructing a segmentation hyperplane of the n-dimensional space and a vertical plane of the segmentation hyperplane based on the n points.

And step 104, repeatedly executing step 3 until m points are left in the divided region, wherein m is less than or equal to n.

And 105, constructing a binary tree and establishing a binary tree index.

And 106, performing word segmentation, word deactivation and text quantitative representation processing on the search text.

And step 107, searching and traversing according to the binary tree structure until the leaf nodes, obtaining similar text data of the searched text, calculating the similarity between the searched text and each similar text data, and acquiring the similar text data with the highest similarity.

And step 108, searching in the MySQL database based on the similar text data with the highest similarity to obtain a final answer of the similar text data with the highest similarity, wherein the final answer is used as an answer of the search body.

As shown in fig. 2, the embodiment further provides a fast retrieval system, which includes a processing quantization module 1, a space construction module 2, a plane construction module 3, a calling module 4, an index construction module 5, a preprocessing module 6, a calculation module 7, and a retrieval module 8.

The processing quantization module 1 is used for preprocessing massive user question texts to be retrieved, and performing text quantization representation on the preprocessed user question texts by using Word2vec, wherein the preprocessing process comprises Word segmentation and stop Word removal.

The space construction module 2 is configured to construct an n-dimensional space according to a quantization result after text quantization, where n > is 100.

The plane construction module 3 is used for randomly selecting n points in the n-dimensional space, and constructing a segmentation hyperplane of the n-dimensional space and a vertical plane of the segmentation hyperplane based on the n points.

The calling module 4 is used for repeatedly calling the plane construction module until m points are left in the divided region, wherein m is less than or equal to n.

The index building module 5 is used for building a binary tree and building a binary tree index.

The preprocessing module 6 is used for preprocessing the search text by word segmentation, word deactivation and text quantization representation.

The calculation module 7 is used for performing retrieval traversal according to the binary tree structure until the leaf nodes, obtaining similar text data of the retrieved text, calculating the similarity between the retrieved text and each similar text data, and acquiring the similar text data with the highest similarity.

The retrieval module 8 is configured to retrieve in the MySQL database based on the similar text data with the highest similarity, to obtain a final answer of the similar text data with the highest similarity, and the final answer is used as an answer of the retrieval body.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that these are by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. A quick retrieval method is characterized by comprising the following steps:

s5, constructing a binary tree and establishing a binary tree index;

2. The quick search method of claim 1, wherein in step S1, the preprocessed user question text is text-quantized using Word2 vec.

3. The rapid search method according to claim 1, wherein in step S8, the database is a MySQL database.

4. A quick retrieval system is characterized by comprising a processing quantization module, a space construction module, a plane construction module, a calling module, an index construction module, a preprocessing module, a calculation module and a retrieval module;

5. The quick search system of claim 4 wherein the process quantization module is adapted to perform text quantization on the preprocessed user question text using Word2 vec.

6. The quick retrieval system of claim 4, wherein the database is a MySQL database.