CN109977334B

CN109977334B - Search speed optimization method

Info

Publication number: CN109977334B
Application number: CN201910231353.5A
Authority: CN
Inventors: 潘杰; 曹建军
Original assignee: Zhejiang Duyan Information Technology Co ltd
Current assignee: Zhejiang Duyan Information Technology Co ltd
Priority date: 2019-03-26
Filing date: 2019-03-26
Publication date: 2023-10-20
Anticipated expiration: 2039-03-26
Also published as: CN109977334A

Abstract

The application provides a search speed optimization method, which is characterized in that a plurality of corresponding storage files are respectively established for each search item, and each data under the search item is written into the corresponding storage file by adopting a plurality of corresponding threads; combining a plurality of corresponding storage files into one storage file aiming at regularly and circularly searched items, and arranging data in sequence in the combined storage files; and searching according to the input searching condition based on the combined storage file. The application adopts a mode of writing data by multithreading and multiple files, thereby improving the writing efficiency; the application combines the corresponding multiple storage files into one storage file aiming at the regularly and circularly searched items, so that the data are orderly arranged in the combined storage files, and the searching is carried out based on the combined storage files, thereby improving the writing efficiency by utilizing a multi-thread multi-file writing mode and simultaneously ensuring the searching efficiency.

Description

Search speed optimization method

Technical Field

The application belongs to the field of information retrieval, and particularly relates to a retrieval speed optimization method.

Background

With the advent of the big data age, how to quickly and accurately retrieve data that has been needed from huge databases using search engines has become a hotspot for research. Currently, a search engine generally comprises four main parts, namely capturing, recording, screening and displaying, and the conventional search engine has the problem of low writing efficiency and low searching efficiency.

Disclosure of Invention

The application provides a search speed optimization method, which aims to solve the problem of low writing and searching efficiency of the existing search engine.

According to a first aspect of an embodiment of the present application, there is provided a search speed optimization method, including:

for each search item, respectively establishing a plurality of corresponding storage files, and writing each data under the search item into the corresponding storage files by adopting a plurality of corresponding threads;

combining a plurality of corresponding storage files into one storage file aiming at regularly and circularly searched items, and arranging data in sequence in the combined storage files; and searching according to the input searching condition based on the combined storage file.

In an optional implementation manner, for a search term in text format, word segmentation processing is performed on first text data stored under the search term to generate a plurality of first words, word segmentation processing is performed on second text data in an input search condition to generate a plurality of second words, and for each second word, matching search is performed on the second word and the plurality of first words, so that first text data matched with the second text data in the search condition is obtained.

In another alternative implementation, after the search condition is input, field condition combination or field logic value combination is performed on the search condition;

and searching according to the combined searching conditions.

In another alternative implementation, for a search term with less space usage, the data under the search term is preloaded into memory prior to searching.

In another alternative implementation, the storage format type of the data under each search term is set according to the characteristics of the data under the search term.

In another optional implementation manner, when writing a piece of complete data information, for the data of the piece of data information under each search item, firstly judging whether an idle thread exists under the search item, if so, calling any idle thread, writing the data of the data information under the search item into a storage file corresponding to the calling thread, writing a segmenter after the data writing is completed, and setting the calling thread to be idle.

In another alternative implementation manner, for each search item allowing a longer input field, the thread is divided into a main thread and a standby thread, and a temporary storage file corresponding to the standby thread is established, and compared with the search item allowing a shorter input field length, the total number of threads is larger;

when writing a piece of complete data information, firstly judging whether an idle main thread exists under each search item aiming at the data of the data information under each search item, if so, calling any idle main thread, writing the data of the data information under the search item into a storage file corresponding to a calling main thread, writing a segmenter after the data writing is finished, and setting the calling main thread as the idle main thread; if the idle main thread does not exist under the search item, inserting a write-in request from the rear end of the corresponding queue, and starting a corresponding timer; judging whether the timer exceeds the corresponding duration, if so, starting an idle standby thread, adopting the standby thread to write the data of the data information under the search item into a corresponding temporary storage file, writing the segmenter after the data is written, and setting the standby thread to be idle.

In another alternative implementation manner, the number of standby threads and/or the corresponding duration are/is adjusted to regulate the writing duration of the whole piece of data information, so that the updating speed of the database is regulated.

In another alternative implementation manner, an address association table is established for each piece of data information, the address association table comprises storage address information of data of the piece of data information under each search item, and an index table is established for each search item, the index table comprises storage address information of data of each piece of data information under the search item and a pointing address corresponding to each storage address information, the storage address information comprises a storage file address of corresponding data and an address of the data under a storage file of the data, and the pointing address is used for pointing to a position of a first storage address information in the address association table corresponding to the storage address information of the data;

when searching under the corresponding search item, searching according to the input search condition based on the data stored in the storage file under the search item; when the search result is displayed, firstly, the storage address information of the data matched with the input search condition is determined, then the pointing address corresponding to the determined storage address information is searched from the index table corresponding to the search item, the position of the first storage address information in the address association table corresponding to the determined storage address information is located according to the searched pointing address, the address association table corresponding to the determined storage address information is obtained, and the storage address of the data information under each search item is obtained, so that complete data information is obtained.

In another optional implementation manner, the search item includes a name, an age, a self-introduction, a province, a deposit and a birth date, wherein a storage format type of the name is set as a text type, a storage format type of the age is set as an int type, a storage format type of the self-introduction is set as a txt type, a storage format type of the province is set as a keyword type, a storage format type of the deposit is set as a long type, and a storage format type of the birth date is set as a date type.

The beneficial effects of the application are as follows:

1. according to the application, by adopting a multithreading multi-file writing mode, data corresponding to the search item in the data information is written into the corresponding storage file, and even if the field occupied by the data corresponding to the search item in each piece of data information is uncertain, parallel writing of the corresponding data in a plurality of pieces of data information can be realized, so that the writing efficiency can be improved; according to the application, by combining a plurality of corresponding storage files into one storage file aiming at regularly and circularly available retrieval items and arranging data in sequence in the combined storage files, the writing efficiency can be improved by utilizing a multi-thread multi-file writing mode, and the retrieval efficiency can be ensured;

2. according to the text matching method, word segmentation processing is carried out on text data stored under a text format search item and text data input in search conditions, word matching is carried out on the two types of text data after word segmentation, and word matching is not carried out, so that the search efficiency can be greatly improved;

3. the application can improve the searching efficiency by carrying out the field condition combination or the field logic value combination on the searching condition and then searching;

4. according to the application, aiming at the retrieval items with less space occupation, the data under the retrieval items are preloaded into the memory before retrieval, so that the retrieval efficiency can be further improved;

5. the application can ensure the accuracy of the search by accurately setting the format type of the data of each search item;

6. according to the application, aiming at the search items with longer allowed input field length in each search item, a standby thread and a temporary storage file are additionally arranged, when the writing data does not have an idle main thread, a timer is additionally arranged, and when the writing request exceeds the corresponding duration, the idle standby thread is called to write the data into the temporary storage file, so that the data writing time length of the search items with longer allowed input field length can be dynamically adjusted, the writing time length of the whole piece of data information is controllable, and a designer can conveniently regulate and control the updating speed of a database;

7. according to the application, the address association table is established for each piece of data information, and the index table is established for each search item, so that the data of the data information stored under each search item can be accurately associated, and the complete data information can be quickly found out when the data information is displayed, thereby improving the accuracy and speed of search.

Drawings

FIG. 1 is a flow chart of one embodiment of the search rate optimization method of the present application.

Detailed Description

In order to better understand the technical solution in the embodiments of the present application and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solution in the embodiments of the present application is described in further detail below with reference to the accompanying drawings.

In the description of the present application, unless otherwise specified and defined, it should be noted that the term "connected" should be interpreted broadly, and for example, it may be a mechanical connection or an electrical connection, or may be a connection between two elements, or may be a direct connection or may be an indirect connection through an intermediary, and it will be understood to those skilled in the art that the specific meaning of the term may be interpreted according to the specific circumstances.

Referring to FIG. 1, a flowchart of one embodiment of a search rate optimization method of the present application is shown. The method is applied to a processing device (such as a computer, a server and the like), and the processing device realizes the search speed optimization, and the method can comprise the following steps:

step S101, establishing a plurality of corresponding storage files for each search item, and writing each data under the search item into the corresponding storage file by adopting a plurality of corresponding threads.

In the conventional search engine, when the acquired data information is stored in terms of items according to the search item, generally, only one thread is used for one search item and only one storage file is built, and only after the thread writes the data corresponding to the search item in the previous piece of data information into the storage file, the data corresponding to the search item in the next piece of data information can be written, so that the writing efficiency of the data information is lower. In addition, since the fields of the data corresponding to a certain search item in each piece of data information may be different, even if multi-thread writing is adopted at this time, the previous thread can only wait for the previous thread to write the corresponding data in the previous piece of data information into the storage file, and then the next thread can write the corresponding data in the next piece of data information into the storage file, if the corresponding data in the previous piece of data information is not written completely, the writing of the corresponding data in the next piece of data information is started, and disorder is caused, so that the writing efficiency of the data information cannot be improved only by adopting the multi-thread writing mode. Therefore, the application proposes a multi-thread multi-file writing mode, taking the age of a search item as an example, the data information comprises age data, 5 storage files are established aiming at the age of the search item, correspondingly, the age data in 5 pieces of data information can be respectively stored into the corresponding storage files by adopting 5 threads, and the parallel storage of the age data in 5 pieces of data information is realized, so that the writing efficiency is improved. The application writes the data corresponding to the search item in the data information into the corresponding storage file by adopting a multithreading multi-file writing mode, and even if the field of the data corresponding to the search item in each piece of data information is uncertain, the parallel writing of the corresponding data in a plurality of pieces of data information can be realized, thereby improving the writing efficiency.

Step S102, combining a plurality of corresponding storage files into one storage file aiming at regularly and circularly searched items, and arranging data in sequence in the combined storage files; and searching according to the input searching condition based on the combined storage file.

While multithreaded multi-file writing may improve writing efficiency, it may result in less efficient retrieval. For example, the 5 storage files stored with the ages include "2,3", "1,5", "5,6", "2,7" and "1,2", respectively, and if the maximum ages are to be queried, the respective maximum ages need to be first searched out from the 5 storage files, and then the searched 5 maximum ages are compared to obtain the required maximum ages. For this purpose, the application aims at regularly and circularly searching items, and combines the storage files into one storage file during searching, and the data are arranged in sequence in the combined storage file. The regularly and circularly searched items can comprise name, age, province, deposit, birth date and the like, wherein the name and the province can be ordered according to the initial of the pinyin of the first word, the age and the deposit can be ordered according to the order from small to large, and the birth date can be ordered according to the size of the birth date. Taking 5 storage files with age as an example, wherein the 5 storage files respectively comprise '2, 3', '1, 5', '5, 6', '2, 7' and '1, 2', the application firstly combines the 5 storage files into one storage file '1,1,2,2,2,3,4,5,6,7' when searching for the maximum age, and the application can directly start to search from the last of the storage files formed by combining. The application combines the corresponding multiple storage files into one storage file aiming at the regularly and circularly searched items, so that the data are orderly arranged in the storage files formed by combining, and the writing efficiency can be improved by utilizing a multi-thread multi-file writing mode and the searching efficiency can be ensured.

Specifically, when writing a piece of complete data information, firstly judging whether an idle thread exists under each search item aiming at the data of the data information under each search item, if so, calling any idle thread, writing the data of the data information under the search item into a storage file corresponding to a calling thread, writing a segmenter after the data writing is finished, and setting the calling thread to be idle, if not, inserting a writing request from the rear end of a corresponding queue, and entering waiting until the writing request is positioned at the head end of the queue and the idle thread exists. The length of the fields allowed to be input by each search item is usually different, for example, the length of the fields allowed to be input by the search item in the name of a person, the age, the self-introduction, the province, the deposit and the date of birth is far longer than that of the other five search items, the writing time is relatively longer, if the same number of threads are adopted for the other five search items and the same number of storage files are established, the data corresponding to the other five search items in the data information can be completely written, and the data corresponding to the self-introduction in the data information still waits for writing, and the waiting time is very long.

In order to improve the overall writing efficiency of the data information, for the search items with longer allowed input field length in each search item (for example, when the difference between the length of the allowed input field of a certain search item in each search item and the average value of the allowed input field lengths of the rest search items is greater than the corresponding preset length, the search item can be determined to be the search item with longer allowed input field length), the threads are divided into a main thread and a standby thread (wherein the main thread can be a plurality of standby threads, at least one of which has a larger total number of threads compared with the search item with shorter allowed input field length), and a temporary storage file corresponding to the standby thread is established. When writing a piece of complete data information, firstly judging whether an idle main thread exists under each search item aiming at the data of the data information under each search item, if so, calling any idle main thread, writing the data of the data information under the search item into a storage file corresponding to a calling main thread, writing a segmenter after the data writing is finished, and setting the calling main thread as the idle main thread; if the idle main thread does not exist under the search item, inserting a write-in request from the rear end of the corresponding queue, and starting a corresponding timer; judging whether the timer exceeds the corresponding duration, if so, starting an idle standby thread, adopting the standby thread to write the data of the data information under the search item into a corresponding temporary storage file, writing the segmenter after the data is written, and setting the standby thread to be idle.

The application adds the standby thread and the temporary storage file aiming at the search items with longer allowed input field length in each search item, and adds the timer when the writing data does not have the idle main thread, and calls the idle standby thread when the writing request exceeds the corresponding time length to write the data into the temporary storage file, thereby dynamically adjusting the data writing time length of the search items with longer allowed input field length, ensuring that the writing time length of the whole piece of data information is controllable, and being convenient for a designer to regulate and control the updating speed of the database. The application regulates and controls the writing time length of the whole data information by regulating the number of the standby threads and/or the corresponding time length, thereby regulating and controlling the updating speed of the database. And in addition, after the standby thread is adopted to write the data of the data information under the search item into the corresponding temporary storage file, judging whether the idle main thread is in a long-term idle state (for example, the idle state exceeds a preset time period), if so, calling the idle main thread, writing the data in the temporary storage file into the corresponding storage file, writing the divider after the data is written, and emptying the temporary storage file. According to the application, when the main thread is idle for a long time, the data in the temporary storage file is written into the storage file corresponding to the main thread, so that on one hand, the retrieval can be conveniently carried out only in the storage file corresponding to the main thread during the retrieval, the data can be conveniently collected, and on the other hand, the temporary storage file is emptied, and the next data writing can be conveniently carried out.

The data information is written in a multithreading and multi-file mode, so that the data storage positions of the data information under each search item are disordered and irregularly circulated. In order to accurately show corresponding data in data information, an address association table is established for each piece of data information, the address association table comprises storage address information of data of the piece of data information under each search item, and an index table is established for each search item, the index table comprises storage address information of data of each piece of data information under the search item and pointing addresses corresponding to the storage address information, the storage address information comprises storage file addresses of the corresponding data and addresses of the data under the storage files of the corresponding data, and the pointing addresses are used for pointing to positions of first storage address information in the address association table corresponding to the storage address information of the corresponding data.

When searching under the corresponding search item, searching according to the input search condition based on the data stored in the storage file under the search item; when the search result is displayed, firstly, the storage address information of the data matched with the input search condition is determined, then the pointing address corresponding to the determined storage address information is searched from the index table corresponding to the search item, the position of the first storage address information in the address association table corresponding to the determined storage address information is located according to the searched pointing address, the address association table corresponding to the determined storage address information is obtained, and the storage address of the data information under each search item is obtained, so that complete data information is obtained. According to the application, the address association table is established for each piece of data information, and the index table is established for each search item, so that the data of the data information stored under each search item can be accurately associated, and the complete data information can be quickly found out when the data information is displayed, thereby improving the accuracy and speed of search.

As can be seen from the above embodiments, by adopting the multithreading multi-file writing manner, the present application writes the data corresponding to the search item in the data information into the corresponding storage file, and even if the occupied field of the data corresponding to the search item in each piece of data information is uncertain, the parallel writing of the corresponding data in the plurality of pieces of data information can be realized, thereby improving the writing efficiency; the application combines the corresponding multiple storage files into one storage file aiming at the regularly and circularly searched items, and ensures that the data are orderly arranged in the storage files formed by combining, thereby improving the writing efficiency by utilizing a multi-thread multi-file writing mode and simultaneously ensuring the searching efficiency.

In addition, for the search term in text format, when the text is searched, the text is usually searched by dividing the text, for example, a self introduction is that "I am Zhang San, I am like to play basketball, I am an active boy. This word is stored by default in the search engine as one unit per word, i.e. "i/yes/w/tri/i/happy/cheerful/hit/basket/ball/i/yes/w/live/splash/child/small/male/child", with a query at this time, provided that the self-introduced field matches "basketball", at this time, this condition is also broken down into 'basket/ball' words, then the 'basket' words are used to match first, from 'me' to 'blue', until the ninth word is matched, then the 'ball' words are used to match, from 'me' to 'ball', the tenth word is matched, and the total experience of nineteen matches is low in matching efficiency. Therefore, the application introduces a word segmentation system, carries out word segmentation processing on first text data stored under a text format search item to generate a plurality of first words, carries out word segmentation processing on second text data in an input search condition to generate a plurality of second words, and carries out matching search on each second word and the plurality of first words respectively, thereby obtaining first text data matched with the second text data in the search condition. For example, the self-introduction becomes "i/yes/three/i/like/play/basketball/i/are individual/lively/small/boy", the query condition becomes "basketball", and the result can be found by only 7 matches from 'i' to 'basketball' at this time. In this case, the efficiency of the query is improved by more than 2 times by utilizing the segmentation, and the longer the text, the more obvious the improvement effect is. According to the application, word segmentation processing is carried out on the text data stored under the search item in the text format and the text data input in the search condition, and word-by-word matching is carried out on the two types of text data after word segmentation instead of word-by-word matching, so that the search efficiency can be greatly improved.

Three logical relationships are defined in the search, MUST, SHOULD, MUST _NOT, two of which are the most representative are now taken, and an explanation is made for the combination of search conditions: mud: whether multiple conditions or a single condition must be satisfied in their entirety. Shold: a single condition must be satisfied, and a plurality of conditions may be satisfied. A situation is encountered during searching, where the user enters a search criteria that is relatively complex and the entered criteria are irregular. For example, the search conditions are:

the mud name contains 'sheet';

the age of SHOULD is greater than or equal to 20 and less than or equal to 25;

the mud province is Zhejiang;

the mud name contains 'country' inside;

the age of SHOULD is 25 or more and 30 or less);

in this condition, there are a plurality of occurrences of matching of names and matching of ages, and if searching is performed in order, the names are matched first, then the ages are matched, then the provinces are matched, then the names are matched, and then the ages are matched. But in fact this condition may be combined into a more compact condition. The conditions after the merging are:

the mud name contains 'Zhang' and 'Guo';

the age of MUST is more than or equal to 20 and less than or equal to 30;

the mud province is Zhejiang;

the name and age fields are not matched once more and then additional conditions are matched when such matching is completed. It should be noted that the condition merging herein includes two kinds of merging, one is simple field condition merging like a name, and the other is logical value merging of fields like an age, merging 'greater than or equal to 20 less than or equal to 25' and 'greater than or equal to 25 less than or equal to 30' into one "greater than or equal to 20 less than or equal to 30". After the search condition is input, the application performs field condition combination or field logic value combination on the search condition; the search is performed according to the combined search conditions, whereby the search efficiency can be further improved.

The memory is obviously superior to a magnetic disk in the aspect of file read-write efficiency, even a solid state disk, the memory is only 500mb/s, and the read-write speed of a common memory can reach 7000mb/s, so that the memory is reasonably utilized, and the search speed can be improved. Search engines search for files by default because the amount of data of a search engine is often very large, as large as several hundred GB, and a server with several hundred GB of memory is found, and the price of a server with a defeated GB disk is very different, even if the data is larger, so that no server with such large memory is found on the market. However, the space occupation of some fields is relatively small, such as age, province, and can be loaded into the memory in advance, and the data exist in the memory, so that the search speed can be greatly improved when the two fields are matched. That is, the application is aimed at the search item with less space occupation, and the data under the search item is preloaded into the memory before searching, so as to improve the searching efficiency.

The data features of different search items in the search are different, the format types of the different feature data are accurately defined, and the method is a basis for realizing accurate search. Therefore, the application sets the format type of the data storage under each search item according to the characteristics of the data under the search item. For example, the search items include name, age, self-introduction, province, deposit, and date of birth, and the common format types of the fields are:

int type (used to represent integers, with the advantage of fast search ordering),

long type (used to represent integer types longer than int, search ordering is fast, volume is larger than int),

the data type (used to represent the date),

text type (text type, slow sort search, text split match, support word match, not support aggregate query),

keyword type (text type, fast search speed, text inseparable matching, no word segmentation matching support, aggregate query support).

Next, each field of the batch of data mentioned in the precondition is analyzed.

The name is text format, so this field must be set to text or keyword type, analyze this field, we may need to use when searching again, search all people of last name's' or search all people of last name's' word, therefore, this text format needs to support match after splitting, so this field type is text.

Age, is an integer type. Either the int or long type is selected. int and long are different in computers of different digits and different programming languages, but the difference between int and long is that int is applicable to shorter integer types, which saves space. The type of this field is int.

Deposit is also an integer type relative to the age field, but its length may grow beyond the upper limit of int. The type of this field is long.

Self-introduction is a large piece of text content, and split text matching is needed during searching, for example, matching is needed, and two words of 'playing' are recorded in the self-introduction. This field is set to text type.

Province, also text format content, but more particularly, the province's field value need not be split to match, ' Zhejiang ' and ' Jiangsu ' to match ' Jiangsu ', and need not input a ' Jiang ' to match all bands ' Jiang ' out, then the field is set to keyword.

The date of birth is set to date type.

In addition, an operation called aggregate query is used in the search engine, similar to count+group BY in mysql, that is, for a certain field or fields, the number is counted when the value of the field or values of the fields are the same. For example, how many people with inquiry age=20, how many people with inquiry province are 'Zhejiang', even the age years with inquiry number in the first ten, and the province name with inquiry number in the first ten. This operation is called an aggregate query. As mentioned in the field description, text in the text format does not have the capability of aggregating queries, and keyword has the capability of aggregating queries. The business may use the statistics of the number of people in Zhejiang, province, but does not use how many people of the same self-introduction, so that the rationality of setting the self-introduction into text type and setting the province into keyword is also verified. The application can ensure the accuracy of the search by accurately setting the format type of the data of each search item.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A search rate optimization method, comprising:

for each search item, a plurality of corresponding storage files are respectively established, and a plurality of corresponding threads are adopted to write each data under the search item into the corresponding storage files, so that when the field of the data corresponding to the search item in each piece of data information is uncertain, the parallel writing of the corresponding data in the plurality of pieces of data information is realized;

aiming at the search items with longer allowed input fields in each search item, the thread is divided into a main thread and a standby thread, a temporary storage file corresponding to the standby thread is established, compared with the search items with shorter allowed input fields, the total number of the threads is larger, the search items with longer allowed input fields comprise self-introduction, and the search items with shorter allowed input fields comprise names and ages;

when writing a piece of complete data information, firstly judging whether an idle main thread exists under each search item aiming at the data of the data information under each search item, if so, calling any idle main thread, writing the data of the data information under the search item into a storage file corresponding to a calling main thread, writing a segmenter after the data writing is finished, and setting the calling main thread as the idle main thread; if the idle main thread does not exist under the search item, inserting a write-in request from the rear end of the corresponding queue, and starting a corresponding timer; judging whether the timer exceeds the corresponding duration, if so, starting an idle standby thread, adopting the standby thread to write the data of the data information under the search item into a corresponding temporary storage file, writing a divider after the data is written, and setting the standby thread to be idle; after the data of the data information under the search item is written into the corresponding temporary storage file by adopting the standby thread, judging whether the idle main thread is in a long-term idle state, if so, calling the idle main thread, writing the data in the temporary storage file into the corresponding storage file, writing the segmenter after the data is written, and emptying the temporary storage file;

the writing time length of the whole piece of data information is regulated and controlled by regulating the number of the standby threads and the corresponding time length, so that the updating speed of the database is regulated and controlled;

2. The search rate optimizing method according to claim 1, characterized by further comprising:

for a search term in text format, performing word segmentation processing on first text data stored under the search term to generate a plurality of first words, performing word segmentation processing on second text data in an input search condition to generate a plurality of second words, and for each second word, performing matching search on the second word and the plurality of first words, so as to obtain first text data matched with the second text data in the search condition.

3. The search rate optimizing method according to claim 1 or 2, characterized by further comprising:

after the search condition is input, carrying out field condition combination or field logic value combination on the search condition;

and searching according to the combined searching conditions.

4. The search rate optimizing method according to claim 3, characterized by further comprising:

for a search term with small space occupation, the data under the search term is preloaded into the memory before searching.

5. The search rate optimizing method according to claim 4, characterized by further comprising:

and setting the storage format type of the data under each search item according to the characteristics of the data under the search item.

6. The search speed optimizing method according to claim 1, wherein for each piece of data information, an address association table is established, the address association table including storage address information of data of the piece of data information under each search item, and for each search item, an index table is established, the index table including storage address information of data of each piece of data information under the search item and a pointing address corresponding to each storage address information, the storage address information including a storage file address where the corresponding data is located and an address where the data is located under its storage file, the pointing address being used to point to a location where first storage address information is located in the address association table corresponding to the storage address information thereof;

7. The search speed optimizing method according to claim 5, wherein the search items include a name, an age, a self-introduction, a province, a deposit, and a date of birth, wherein a storage format type of the name is set to a text type, a storage format type of the age is set to an int type, a storage format type of the self-introduction is set to a txt type, a storage format type of the province is set to a keyword type, a storage format type of the deposit is set to a long type, and a storage format type of the date of birth is set to a date type.