CN109977334A - Retrieval rate optimization method - Google Patents
Retrieval rate optimization method Download PDFInfo
- Publication number
- CN109977334A CN109977334A CN201910231353.5A CN201910231353A CN109977334A CN 109977334 A CN109977334 A CN 109977334A CN 201910231353 A CN201910231353 A CN 201910231353A CN 109977334 A CN109977334 A CN 109977334A
- Authority
- CN
- China
- Prior art keywords
- data
- retrieval
- storage
- search terms
- now
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of retrieval rate optimization method, for each retrieval project, establishes corresponding multiple storage files respectively, and each data of the search terms now are written in corresponding storage file using corresponding multiple threads;For regular governed retrieval project, corresponding multiple storage files are merged into a storage file, the data arranged in sequence in merging the storage file formed;In retrieval based on the storage file formed is merged, retrieved according to the search condition of input.Write efficiency can be improved in such a way that data are written in multithreading multifile in the present invention;The present invention is by being directed to regular governed retrieval project, corresponding multiple storage files are merged into a storage file, make data arranged in sequence in merging the storage file formed, it is retrieved based on the storage file formed is merged, it can guarantee recall precision while improving write efficiency using multithreading multifile writing mode.
Description
Technical field
The invention belongs to information retrieval fields, and in particular to a kind of retrieval rate optimization method.
Background technique
With the arrival of big data era, how rapidly and accurately to be retrieved from huge database using search engine
The data oneself needed become the focus of people's research.Search engine generally includes to grab, include, screen and show four at present
Major part, the generally existing write efficiency of traditional search engine and the lower problem of recall precision.
Summary of the invention
The present invention provides a kind of retrieval rate optimization method, lower to solve current search engine write-in and recall precision
Problem.
According to a first aspect of the embodiments of the present invention, a kind of retrieval rate optimization method is provided, comprising:
For each retrieval project, corresponding multiple storage files are established respectively, using corresponding multiple threads by the search terms
Now each data are written in corresponding storage file;
For regular governed retrieval project, corresponding multiple storage files are merged into a storage file, are merging shape
At storage file in data arranged in sequence;In retrieval based on the storage file formed is merged, according to the search condition of input
It is retrieved.
In an optional implementation manner, for the retrieval project of text text formatting, which is stored now
The first text data carry out word segmentation processing, multiple first words are generated, to the second text data in the search condition of input
Word segmentation processing is carried out, multiple second words are generated, for each second word, by second word and the multiple first word
Matching retrieval is carried out respectively, to obtain the first text data with the second matches text data in the search condition.
In another optional implementation, after inputting search condition, the merging of field condition is carried out to search condition
Or the logical value of field merges;
It is retrieved according to the search condition after merging.
In another optional implementation, for the less retrieval project of space hold amount, before retrieval should
The data of search terms now are pre-loaded in memory.
In another optional implementation, according to the feature of search terms data now, each search terms are set now
The storage format type of data.
In another optional implementation, when a complete data information is written, for the data information
In the data of each search terms now, first determine whether that the search terms whether there is idle thread now, if it exists idle thread, then
Any idle thread is called, by the data information in the data write-in of the search terms now storage file corresponding with thread is called
In, decollator is written after the data write complete, and the calling thread is set to the free time.
In another optional implementation, for the permission longer search terms of input field in each retrieval project
Mesh, thread is divided into main thread and standby thread, and establishes interim storage file corresponding with the standby thread, defeated in contrast to allowing
Enter the field length retrieval project shorter than its, total number of threads is bigger;
When a complete data information is written, for the data of the data information in each search terms now, sentence first
The search terms of breaking whether there is idle main thread now, if it exists idle main thread, then any idle main thread is called, by the number
It is believed that breath is write after the data write complete in the data write-in of the search terms now storage file corresponding with main thread is called
Enter decollator, and the calling main thread is set to idle main thread;If idle main thread is not present in the search terms now, will write
Enter request to be inserted into from the rear end of corresponding queue, and starts corresponding timer;Judge whether the timer is more than corresponding duration, if
It is then to enable idle standby thread, the data write-in pair using the standby thread by the data information in the search terms now
In the interim storage file answered, decollator is written after the data write, and the standby thread is set to the free time.
In another optional implementation, by quantity to the standby thread and/or it is described corresponding when progress
Row adjustment, regulates and controls come the write-in duration to whole data information, so that the renewal speed to database regulates and controls.
In another optional implementation, for every data information, an address information table, the address are all established
Contingency table includes the storage address information of the data of the data information in each search terms now, and is directed to each search terms
Mesh establishes concordance list respectively, which includes the storage address information of the data of pieces of data information now in the search terms
And direction corresponding with each storage address information address, the storage address information include the storage file where corresponding data
The address of address and the data under its storage file, the direction address is for being directed toward correspondingly with its storage address information
Position in the contingency table of location where first storage address information;
When corresponding search terms are retrieved now, based on the data stored in the search terms now storage file, according to input
Search condition retrieved;When showing search result, it is first determined with the storage of the input matched data of search condition
Then location information finds out direction ground corresponding with the storage address information of the determination from the corresponding concordance list of retrieval project
Location navigates to first in address information table corresponding with the storage address information of the determination according to the direction address found
Position where a storage address information obtains address information table corresponding with the storage address information of the determination, obtains the number
It is believed that the storage address of data of the breath in each search terms now, to obtain complete data information.
In another optional implementation, the retrieval project includes name, the age, self-introduction, saves nationality, deposit
And the date of birth, wherein the storage format type of name is set as text type, and the storage format type at age is set as int class
The storage format type of type, self-introduction is set as txt type, and the storage format type for saving nationality is set as keyword type, deposits
The storage format type of money is set as long type, and the storage format type of date of birth is set as date type.
The beneficial effects of the present invention are:
1, the mode that the present invention is written by using multithreading multifile, data corresponding with retrieval project in data information are write
Enter in corresponding storage file, even if field occupied by data corresponding with the retrieval project is uncertain in pieces of data information
In the case of, being written in parallel to for corresponding data in a plurality of data information may be implemented, also so as to improve write efficiency;The present invention
By being directed to regular governed retrieval project, corresponding multiple storage files are merged into a storage file, and make data
The arranged in sequence in merging the storage file formed can improve the same of write efficiency using multithreading multifile writing mode
When, guarantee recall precision;
2, it is inputted in the text data and search condition that the present invention is stored now by the search terms to text text formatting
Text data carries out word segmentation processing, and two class text data after participle match by word, rather than matches by word, it is possible thereby to
Greatly improve recall precision;
3, the present invention after carrying out the merging of field condition or the logical value merging of field to search condition by retrieving, Ke Yiti
High recall precision;
4, the present invention retrieval project less for space hold amount before retrieval preloads the data of the search terms now
Into memory, recall precision can be further improved;
5, the present invention be accurately arranged by the Format Type to each search terms purpose data, it is ensured that retrieval it is accurate
Degree;
6, the present invention adds standby thread and faces for the longer retrieval project of input field length is allowed in each retrieval project
When storage file timer is added, when write request is more than corresponding duration and when data being written idle main thread being not present
Idle standby thread is called, is write data into interim storage file, it is possible thereby to allowing input field length longer
Search terms purpose data are written duration and carry out dynamic adjustment, to guarantee that the write-in duration of whole data information is controllable, thus
Regulate and control convenient for renewal speed of the designer to database;
It is 7, of the invention by establishing address information table, for each retrieval project, establishing concordance list for every data information,
Data information memory can be subjected to accurate correlation in the data of each search terms now, and be checked quickly fastly in exhibiting data information
Complete data information is found out, so as to improve the accuracy and speed of retrieval.
Detailed description of the invention
Fig. 1 is one embodiment flow chart of retrieval rate optimization method of the present invention.
Specific embodiment
Technical solution in embodiment in order to enable those skilled in the art to better understand the present invention, and make of the invention real
The above objects, features, and advantages for applying example can be more obvious and easy to understand, with reference to the accompanying drawing to technical side in the embodiment of the present invention
Case is described in further detail.
In the description of the present invention, unless otherwise specified and limited, it should be noted that term " connection " should do broad sense
Understand, for example, it may be mechanical connection or electrical connection, the connection being also possible to inside two elements can be directly connected,
Can indirectly connected through an intermediary, for the ordinary skill in the art, can understand as the case may be on
State the concrete meaning of term.
It is one embodiment flow chart of retrieval rate optimization method of the present invention referring to Fig. 1.This method is applied to be filled in processing
It sets on (such as computer, server etc.), retrieval rate optimization is realized by processing unit, this method may include:
Step S101, it is directed to each retrieval project, establishes corresponding multiple storage files respectively, it will using corresponding multiple threads
The each data of the search terms now are written in corresponding storage file.
In traditional search engine when carrying out sub-item storage according to data information of the retrieval project to acquisition, usual one
A retrieval project only with a thread and only establishes a storage file, only when the thread by upper data information with
The corresponding data of retrieval project are written to storage file, could be to corresponding with the retrieval project in next data information
Data are written, and the write efficiency that so will lead to data information is lower.Further, since being examined in pieces of data information with some
The field of the corresponding data of rope project may be not quite similar, and be written at this time even with multithreading, can only also wait previous thread
After storage file is written in corresponding data in upper data information, latter thread could be by pair in next data information
It answers in data write-in storage file, if the corresponding data in a upper data information does not write, begins to that next data information is written
In corresponding data, it will cause entanglements, it can be seen that, only with multithreading write-in mode can not improve data information
Write efficiency.For this purpose, the present invention proposes by the way of the write-in of multithreading multifile, and for retrieving project and be the age, number
It is believed that including that age data establishes 5 storage files, correspondingly adopt 5 threads for the retrieval project age in breath
The age data in 5 data information is stored respectively into corresponding storage file simultaneously, realizes the age in 5 data information
The parallel memorizing of data, to improve write efficiency.The mode that the present invention is written by using multithreading multifile, by data
In information in storage file corresponding with the corresponding data write-in of retrieval project, even if in pieces of data information with the retrieval project pair
The fields of the data answered is uncertain, being written in parallel to for corresponding data in a plurality of data information may be implemented, also so as to improve
Write efficiency.
Step S102, it is directed to regular governed retrieval project, corresponding multiple storage files are merged into a storage
File, the data arranged in sequence in merging the storage file formed;In retrieval based on the storage file formed is merged, according to defeated
The search condition entered is retrieved.
Although multithreading multifile write-in mode write efficiency can be improved, will lead to retrieval when efficiency compared with
It is low.For example, storage has age 5 storage files in respectively include " 2,3 ", " 1,5 ", " 5,6 ", " 2,7 " and " 1,2 ", if
Inquire max age, then firstly the need of finding respective max age from 5 storage files respectively, 5 then will found
A max age is compared, and can just obtain required max age.For this purpose, the present invention is directed to regular governed search terms
The multiple storage file is merged into a storage file in retrieval by mesh, and data are pressed in merging the storage file formed
Sequence arrangement.Wherein, regular governed retrieval project may include name, age, province's nationality, deposit and date of birth etc., name
Can be ranked up according to the first letter of pinyin of first character with nationality is saved, age and deposit can according to sequence from small to large into
Row arrangement, date of birth can be arranged according to date of birth size.Equally it is with 5 storage files for storing has age
Example respectively includes " 2,3 ", " 1,5 ", " 5,6 ", " 2,7 " and " 1,2 ", the present invention is in retrieval first by 5 in 5 storage files
A storage file is merged into storage file ' 1, and 1,2,2,2,3,4,5,6,7 ', when searching max age, directly from merging
Most end in the storage file of formation starts to query.The present invention, will be right by being directed to regular governed retrieval project
The multiple storage files answered are merged into a storage file, make data arranged in sequence in merging the storage file formed, can be with
While improving write efficiency using multithreading multifile writing mode, guarantee recall precision.
Specifically, when a complete data information is written, for the data information in each search terms now
Data first determine whether that the search terms whether there is idle thread now, if it exists idle thread, then call any idle thread,
By the data information in the data write-in of the search terms now storage file corresponding with thread is called, write complete in the data
After decollator is written, and the calling thread is set to the free time, if it does not exist idle thread, then by write request from corresponding queue
Rear end insertion, into waiting until said write request is located at the head end of the queue and there are idle threads.Usual each retrieval
The length of the allowed input field of project is not quite similar, such as retrieves project name, at the age, self-introduction, saves nationality, deposit and go out
Birthday, interim self-introduction allowed the field length inputted to be longer than remaining five retrieval project, write-in it is spent when
Length also will be relatively long, deposits if the five retrieval projects in it and remaining all use same number thread and establish same number
File is stored up, then it is possible that data corresponding with remaining five retrieval project have all write in data information, and the data information
In data corresponding with self-introduction still waiting to be written, and need to wait the very long time.
In order to improve the whole write efficiency of data information, allow input field length longer in each retrieval project
Retrieval project (such as when the length of some allowed input field of retrieval project and remaining retrieval project in each retrieval project
When the difference of allowed input field length average value is greater than corresponding preset length, so that it may which it is defeated to allow to determine the retrieval project
Enter the longer retrieval project of field length), thread is divided into main thread and standby thread, and (wherein main thread can be multiple, spare
Thread at least one, in contrast to the input field length retrieval project shorter than its is allowed, total number of threads is bigger), and establish with
The corresponding interim storage file of the standby thread.When a complete data information is written, for the data information every
The data of a search terms now first determine whether the search terms now and whether there is idle main thread, if it exists idle main thread, then
Any idle main thread is called, by the data information in the data write-in of the search terms now storage corresponding with main thread is called
In file, decollator is written after the data write complete, and the calling main thread is set to idle main thread;If the retrieval project
Lower there is no idle main threads, then are inserted into write request from the rear end of corresponding queue, and start corresponding timer;Judge the meter
When device whether be more than corresponding duration, if so, idle standby thread is enabled, using the standby thread by the data information at this
The data of search terms now are written in corresponding interim storage file, decollator are written after the data write, and this is spare
Thread is set to the free time.
The present invention for the longer retrieval project of input field length is allowed in each retrieval project, add standby thread and
Interim storage file, and data are being written there is no when idle main thread, timer is added, is more than corresponding duration in write request
When call idle standby thread, write data into interim storage file, it is possible thereby to allowing input field length longer
Search terms purpose data write-in duration carry out dynamic adjustment, with guarantee the write-in duration of whole data information be it is controllable, from
And the renewal speed of database is regulated and controled convenient for designer.The present invention passes through quantity to the standby thread and/or institute
It states corresponding duration to be adjusted, regulate and control come the write-in duration to whole data information, thus to the renewal speed of database
Regulated and controled.In addition, corresponding interim storage is written in the data of the search terms now in the data information using standby thread
After file, judge whether idle main thread is in long-term idle state (such as idle state is more than preset duration), if so,
Idle main thread is called, the data in interim storage file are written in corresponding storage file, are write after the data write
Enter decollator, empties interim storage file.The present invention by main thread for a long time leave unused when, by the data in interim storage file
It is written in corresponding with main thread storage file, when on the one hand can be convenient retrieval in storage file corresponding with main thread
It is retrieved, is collected convenient for data, on the other hand empty interim storage file, it can be in order to the write-in of data next time.
Since data information is written by the way of multithreading multifile in the present invention, data information is in each search terms
Now the position that data are stored be it is disorderly and unsystematic, it is irregular governed.For the correspondence number in accurate exhibiting data information
According to the present invention is directed to every data information, all establishes an address information table, which includes that the data information exists
The storage address information of the data of each search terms now, and it is directed to each retrieval project, concordance list is established respectively, the index
Table include the data of pieces of data information now in the search terms storage address information and with each storage address information pair
The direction address answered, the storage address information include storage file address where corresponding data and the data in its storage text
Address under part, the direction address is for being directed toward first storage address in address information table corresponding with its storage address information
Position where information.
When corresponding search terms are retrieved now, based on the data stored in the search terms now storage file, according to
The search condition of input is retrieved;When showing search result, it is first determined with depositing for the input matched data of search condition
Address information is stored up, then finds out finger corresponding with the storage address information of the determination from the corresponding concordance list of retrieval project
It is navigated in address information table corresponding with the storage address information of the determination to address according to the direction address found
Position where first storage address information obtains address information table corresponding with the storage address information of the determination, obtains
The storage address of data of the data information in each search terms now, to obtain complete data information.The present invention passes through
For every data information, address information table is established, for each retrieval project, concordance list is established, data information can be deposited
It stores up and carries out accurate correlation in the data of each search terms now, and quickly find out complete data letter in exhibiting data information
Breath, so as to improve the accuracy and speed of retrieval.
As seen from the above-described embodiment, the mode that the present invention is written by using multithreading multifile, by data information with
The corresponding data of retrieval project are written in corresponding storage file, even if the number corresponding with the retrieval project in pieces of data information
According to being written in parallel to for corresponding data in a plurality of data information in the uncertain situation of occupied field, also may be implemented, so as to
To improve write efficiency;Corresponding multiple storage files are merged by the present invention by being directed to regular governed retrieval project
One storage file, and make data arranged in sequence in merging the storage file formed, it can be write using multithreading multifile
While entering mode and improve write efficiency, guarantee recall precision.
In addition, the retrieval project of text text formatting is directed to, in text query, it will usually use segmentation text and look into
It askes, such as having one section of self-introduction is " I is Zhang San, I likes playing basketball, I is an active spadger ".The words default
Can store into one unit of each word in a search engine, that is, " I/be// tri-/I/happiness it is/joyous/beat/basket/ball/I/
Be/it is/living/sprinkle// small/male/child ", have a inquiry this when, condition is to introduce myself fields match " basketball ", this
When, this condition can also be splitted into ' then basket/ball ' two word first goes to match with ' basket ' one word one word, from ' I ' to
' indigo plant ' is just matched to the 9th, then goes to match with ' ball ' one word one word, and from ' I ' to ' ball ', the tenth word is matched to,
It experienced 19 matchings altogether, matching efficiency is lower.For this purpose, invention introduces Words partition system, for text text formatting
Retrieval project, the first text data stored now to the search terms carries out word segmentation processing, multiple first words generated, to defeated
The second text data in the search condition entered carries out word segmentation processing, generates multiple second words, will for each second word
Second word and the multiple first word carry out matching retrieval respectively, to obtain and the second textual data in the search condition
According to matched first text data.For example, self-introduction becomes, "/being/Zhang San ,// liking/beat/basketball ,/I lives at/being/for I for I
Sprinkle// small/boy ", querying condition becomes " basketball ", this when from ' I ' to ' basketball ', the energy as long as by 7 matchings
Find result.It has been in this case that inquiry improves the efficiency of 2 times or more using participle, the text of text is longer, mentions
Ascending effect is more obvious.In the text data and search condition that the present invention is stored now by the search terms to text text formatting
The text data of input carries out word segmentation processing, and two class text data after participle match by word, rather than matches by word, by
This can greatly improve recall precision.
Define three kinds of logical relations in search, MUST, SHOULD, MUST_NOT now take wherein most representative two
It is a, one is done for search condition merging and is illustrated: MUST: either multiple conditions or single condition, it is necessary to all be met.
SHOULD: must satisfy when single condition, and whens multiple conditions meets one of them.A kind of feelings can be encountered when search
Condition, user input search condition when be a more complicated search condition and also input condition be irregular.Than
As search condition is:
MUST name includes ' opening ';
The SHOULD age is more than or equal to 20 and is less than or equal to 25;
It is Zhejiang that MUST, which saves nationality,;
Include inside MUST name ' state ';
The SHOULD age is more than or equal to 25 and is less than or equal to 30);
Repeatedly occur the matching of name and the matching at age, if executing search in sequence, first in this condition
With name, then the age is matched, then matches province's nationality, then match name, then match the age.But this condition is can to merge in fact
At more succinct condition.Condition after having merged is:
MUST name includes ' opening ' and ' state ';
The MUST age is more than or equal to 20 and is less than or equal to 30;
It is Zhejiang that MUST, which saves nationality,;
Name and age field, which do not have to match again, when matching in this way completes once to go to have matched other condition again later.It needs
It should be noted that it includes two kinds of merging that condition merging here, which is in fact, one is the merging of simple field condition, just as name
Word, another is that the logical value of field merges, just as the age, ' more than or equal to 20 less than or equal to 25 ' and ' more than or equal to 25
One " be more than or equal to 20 and be less than or equal to 30 " is merged into less than or equal to 30 '.The present invention is after inputting search condition, to search condition
The logical value for carrying out the merging of field condition or field merges;It is retrieved according to the search condition after merging, it is possible thereby into one
Step improves recall precision.
Memory is substantially better than disk in terms of file read-write efficiency, even solid state hard disk, also only 500mb/s, and
One common memory, read or write speed is attained by 7000mb/s, so rationally making good use of memory, can promote search speed.
Search engine search default be, because often the data volume of search engine is very big, arrive several hundred a GB greatly from file search, and
The price difference of the server and a server for defeating GB disk of looking for a several hundred GB memories is very big or even data are bigger,
It all can not find the server of so big memory on the market.But the space hold amount for having some fields is smaller, such as year
Age saves nationality, can be loaded into memory in advance, these data are deposited lower in memory, in this way, matching the two fields
When, search speed can be greatly promoted.That is, the retrieval project that the present invention is less for space hold amount, before retrieval
The data of the search terms now are pre-loaded in memory, Lai Tigao recall precision.
The feature of different search terms data now is different in retrieval, the format class of accurate definition different characteristic data
Type is to realize the basis accurately retrieved.For this purpose, each search terms can be arranged according to the feature of search terms data now in the present invention
Now the Format Type of data storage.For example, the retrieval project includes name, at the age, self-introduction, saves nationality, deposit and go out
Phase birthday, the common Format Type of field have:
Int type (for indicating integer, there is the advantage that retrieval ordering is fast),
Long type (for indicating integer type longer than int, retrieval ordering is fast, and volume ratio int is big),
Data type (for indicating the date),
(text type, sequence retrieval rate is slow, and divisible cut across of text is matched, and supports participle matching, does not support to polymerize for text type
Inquiry),
(text type, retrieval rate is fast, and indivisible cut across of text is matched, and participle is not supported to match, and supports poly- for keyword type
Close inquiry).
Next each field for this batch data mentioned in premise is analyzed.
Name is text formatting, so this field must be arranged to text keyword type, analyzes this word
Section, when we search again for, it may be desirable to use, search for the people of all surnames ' open ', or have in all names of search a
The people of ' state ' word can support to match after separating, so the type of this field is therefore, it is necessary to this text formatting
text。
Age is integer type.So selection int long type.Int and long different digits computer and
It is discrepant under different programming languages, but the difference of int, long are that int is suitable for shorter integer type, doing so can
To save space.So the type of this field is int.
Deposit, relative to age field and integer type, but its length may grow to the upper limit more than int.
So the type of this field is long.
Self-introduction, is one big section of word content, and search when needs to use segmentation characters matching, for example, needing
Fit over the record for there are ' playing ball ' two words in self-introduction.This field is arranged to text type.
Save the content of nationality and text formatting, but more special is that the value of province's nationality this field does not need to be partitioned into
It matches, ' Zhejiang ' just matching ' Zhejiang ', ' Jiangsu ' just matching ' Jiangsu ' does not need input one ' river ' all bands ' river '
All match come, then this field is arranged to keyword.
Date of birth is arranged to date type.
In addition to this, it searches to have in plain engine and uses a kind of operation for being called aggregate query, similar to the COUNT in mysql
(*)+GROUP BY, aiming at some perhaps certain several field when the value of this field or the value phase of these fields
When same, quantity is counted.Such as inquiry age=20 people how many, inquiry save nationality be ' Zhejiang ' people how many, even
It is the age age that inquiry number comes preceding ten, inquiry number comes preceding ten province's nationality title.Aggregate query is just in this operation.
As mentioned in field description, text does not have aggregate query ability in text formatting, and keyword has aggregate query ability.Industry
Statistics may be used in business and saves the number in nationality ' Zhejiang ', but will not use how many people of the same self-introduction, therefore,
Also it has proved self-introduction and has been arranged to text type, saved the reasonability that nationality is arranged to keyword.The present invention passes through to each retrieval
The Format Type of the data of project is accurately arranged, it is ensured that the accuracy of retrieval.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or
Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention
Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following
Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.
Claims (10)
1. a kind of retrieval rate optimization method characterized by comprising
For each retrieval project, corresponding multiple storage files are established respectively, using corresponding multiple threads by the search terms
Now each data are written in corresponding storage file;
For regular governed retrieval project, corresponding multiple storage files are merged into a storage file, are merging shape
At storage file in data arranged in sequence;In retrieval based on the storage file formed is merged, according to the search condition of input
It is retrieved.
2. retrieval rate optimization method according to claim 1, which is characterized in that further include:
For the retrieval project of text text formatting, word segmentation processing is carried out to the first text data that the search terms store now,
Multiple first words are generated, word segmentation processing is carried out to the second text data in the search condition of input, generates multiple second words
Second word and the multiple first word are carried out matching retrieval for each second word by language respectively, thus obtain with
First text data of the second matches text data in the search condition.
3. retrieval rate optimization method according to claim 1 or 2, which is characterized in that further include:
After inputting search condition, the logical value for carrying out the merging of field condition or field to search condition merges;
It is retrieved according to the search condition after merging.
4. retrieval rate optimization method according to claim 3, which is characterized in that further include:
For the less retrieval project of space hold amount, the data of the search terms now are pre-loaded to memory before retrieval
In.
5. retrieval rate optimization method according to claim 4, which is characterized in that further include:
According to the feature of search terms data now, the storage format type of each search terms data now is set.
6. retrieval rate optimization method according to claim 1, which is characterized in that
When a complete data information is written, for the data of the data information in each search terms now, sentence first
The search terms of breaking whether there is idle thread now, if it exists idle thread, then calls any idle thread, by the data information
In the data write-in of the search terms now storage file corresponding with thread is called, segmentation is written after the data write complete
Symbol, and the calling thread is set to the free time.
7. retrieval rate optimization method according to claim 1 or 6, which is characterized in that
For the longer retrieval project of input field is allowed in each retrieval project, thread is divided into main thread and standby thread, and
Corresponding with the standby thread interim storage file is established, in contrast to the permission input field length retrieval project shorter than its,
Total number of threads is bigger;
When a complete data information is written, for the data of the data information in each search terms now, sentence first
The search terms of breaking whether there is idle main thread now, if it exists idle main thread, then any idle main thread is called, by the number
It is believed that breath is write after the data write complete in the data write-in of the search terms now storage file corresponding with main thread is called
Enter decollator, and the calling main thread is set to idle main thread;If idle main thread is not present in the search terms now, will write
Enter request to be inserted into from the rear end of corresponding queue, and starts corresponding timer;Judge whether the timer is more than corresponding duration, if
It is then to enable idle standby thread, the data write-in pair using the standby thread by the data information in the search terms now
In the interim storage file answered, decollator is written after the data write, and the standby thread is set to the free time.
8. retrieval rate optimization method according to claim 7, which is characterized in that pass through the quantity to the standby thread
And/or the corresponding duration is adjusted, and is regulated and controled come the write-in duration to whole data information, thus more to database
New speed is regulated and controled.
9. retrieval rate optimization method according to claim 8, which is characterized in that be directed to every data information, all establish
One address information table, the address information table include the storage address letter of the data of the data information in each search terms now
Breath, and be directed to each retrieval project, establishes concordance list respectively, the concordance list include pieces of data information the search terms now
Data storage address information and direction corresponding with each storage address information address, the storage address information include pair
Answer the address of storage file address and the data where data under its storage file, the direction address for be directed toward and its
Position in the corresponding address information table of storage address information where first storage address information;
When corresponding search terms are retrieved now, based on the data stored in the search terms now storage file, according to input
Search condition retrieved;When showing search result, it is first determined with the storage of the input matched data of search condition
Then location information finds out direction ground corresponding with the storage address information of the determination from the corresponding concordance list of retrieval project
Location navigates to first in address information table corresponding with the storage address information of the determination according to the direction address found
Position where a storage address information obtains address information table corresponding with the storage address information of the determination, obtains the number
It is believed that the storage address of data of the breath in each search terms now, to obtain complete data information.
10. retrieval rate optimization method according to claim 5, which is characterized in that the retrieval project includes name, year
Age, saves nationality, deposit and date of birth at self-introduction, and wherein the storage format type of name is set as text type, and the age deposits
Storage Format Type is set as int type, and the storage format type of self-introduction is set as txt type, saves the storage format class of nationality
Type is set as keyword type, and the storage format type of deposit is set as long type, and the storage format type of date of birth is set
It is set to date type.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910231353.5A CN109977334B (en) | 2019-03-26 | 2019-03-26 | Search speed optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910231353.5A CN109977334B (en) | 2019-03-26 | 2019-03-26 | Search speed optimization method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109977334A true CN109977334A (en) | 2019-07-05 |
CN109977334B CN109977334B (en) | 2023-10-20 |
Family
ID=67080577
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910231353.5A Active CN109977334B (en) | 2019-03-26 | 2019-03-26 | Search speed optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977334B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104373A (en) * | 2019-12-24 | 2020-05-05 | 天地伟业技术有限公司 | Database performance optimization method |
CN115587115A (en) * | 2022-12-12 | 2023-01-10 | 西南石油大学 | Database query optimization method and system |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101087210A (en) * | 2007-05-22 | 2007-12-12 | 网御神州科技(北京)有限公司 | High-performance Syslog processing and storage method |
US20080163216A1 (en) * | 2006-12-27 | 2008-07-03 | Wenlong Li | Pointer renaming in workqueuing execution model |
CN101739293A (en) * | 2009-12-24 | 2010-06-16 | 航天恒星科技有限公司 | Method for scheduling satellite data product production tasks in parallel based on multithread |
CN103729442A (en) * | 2013-12-30 | 2014-04-16 | 华为技术有限公司 | Method for recording event logs and database engine |
CN104461915A (en) * | 2014-11-17 | 2015-03-25 | 苏州阔地网络科技有限公司 | Method and device for dynamically allocating internal storage in online class system |
CN105069149A (en) * | 2015-08-24 | 2015-11-18 | 电子科技大学 | Structured line data-oriented distributed parallel data importing method |
US20170013083A1 (en) * | 2014-01-22 | 2017-01-12 | Beijing Jingdong Shangke Information Technology Co., Ltd. | Data processing method and apparatus used for terminal application |
CN107368362A (en) * | 2017-06-29 | 2017-11-21 | 上海阅文信息技术有限公司 | A kind of multithreading/multi-process for disk read-write data is without lock processing method and system |
US20180075080A1 (en) * | 2015-07-17 | 2018-03-15 | Hitachi, Ltd. | Computer System and Database Management Method |
CN108139938A (en) * | 2015-07-31 | 2018-06-08 | 华为技术有限公司 | For assisting the device of main thread executing application task, method and computer program using secondary thread |
CN108694187A (en) * | 2017-04-07 | 2018-10-23 | 北京国双科技有限公司 | The storage method and device of real-time streaming data |
-
2019
- 2019-03-26 CN CN201910231353.5A patent/CN109977334B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080163216A1 (en) * | 2006-12-27 | 2008-07-03 | Wenlong Li | Pointer renaming in workqueuing execution model |
CN101087210A (en) * | 2007-05-22 | 2007-12-12 | 网御神州科技(北京)有限公司 | High-performance Syslog processing and storage method |
CN101739293A (en) * | 2009-12-24 | 2010-06-16 | 航天恒星科技有限公司 | Method for scheduling satellite data product production tasks in parallel based on multithread |
CN103729442A (en) * | 2013-12-30 | 2014-04-16 | 华为技术有限公司 | Method for recording event logs and database engine |
US20170013083A1 (en) * | 2014-01-22 | 2017-01-12 | Beijing Jingdong Shangke Information Technology Co., Ltd. | Data processing method and apparatus used for terminal application |
CN104461915A (en) * | 2014-11-17 | 2015-03-25 | 苏州阔地网络科技有限公司 | Method and device for dynamically allocating internal storage in online class system |
US20180075080A1 (en) * | 2015-07-17 | 2018-03-15 | Hitachi, Ltd. | Computer System and Database Management Method |
CN108139938A (en) * | 2015-07-31 | 2018-06-08 | 华为技术有限公司 | For assisting the device of main thread executing application task, method and computer program using secondary thread |
CN105069149A (en) * | 2015-08-24 | 2015-11-18 | 电子科技大学 | Structured line data-oriented distributed parallel data importing method |
CN108694187A (en) * | 2017-04-07 | 2018-10-23 | 北京国双科技有限公司 | The storage method and device of real-time streaming data |
CN107368362A (en) * | 2017-06-29 | 2017-11-21 | 上海阅文信息技术有限公司 | A kind of multithreading/multi-process for disk read-write data is without lock processing method and system |
Non-Patent Citations (5)
Title |
---|
YUKI SHOJI等: "A Large-Scale Speculation for the Thread-Level Parallelization", 2015 3RD INTERNATIONAL CONFERENCE ON APPLIED COMPUTING AND INFORMATION TECHNOLOGY/2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND INTELLIGENCE * |
任双君;周旭;任勇毛;李灵玲;: "基于HTML5的浏览器端多线程下载技术", 计算机系统应用, no. 11 * |
孙丽云等: "《数据结构(C语言版)》", 28 February 2017, 华中科技大学出版社, pages: 232 * |
过汇卿: "天文大数据存储管理关键技术研究", 《优秀硕士论文全文数据库 信息科技辑》 * |
过汇卿: "天文大数据存储管理关键技术研究", 《优秀硕士论文全文数据库 信息科技辑》, no. 05, 15 May 2016 (2016-05-15), pages 1 - 4 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104373A (en) * | 2019-12-24 | 2020-05-05 | 天地伟业技术有限公司 | Database performance optimization method |
CN111104373B (en) * | 2019-12-24 | 2023-09-19 | 天地伟业技术有限公司 | Database performance optimization method |
CN115587115A (en) * | 2022-12-12 | 2023-01-10 | 西南石油大学 | Database query optimization method and system |
CN115587115B (en) * | 2022-12-12 | 2023-02-28 | 西南石油大学 | Database query optimization method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109977334B (en) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105989195B (en) | For handling the method and system of database data | |
US9507816B2 (en) | Partitioned database model to increase the scalability of an information system | |
US7284009B2 (en) | System and method for command line prediction | |
US9805077B2 (en) | Method and system for optimizing data access in a database using multi-class objects | |
CN103577454B (en) | A kind of file mergences method and apparatus | |
US20070179973A1 (en) | Status tool to expose metadata read and write queues | |
JP2017518584A (en) | Method for flash optimized data layout, apparatus for flash optimized storage, and computer program | |
CN105095247B (en) | symbol data analysis method and system | |
Marchi | Dividing up the data: Epistemological, methodological and practical impact of diachronic segmentation | |
EP3869511A1 (en) | Method and system for annotating scope of claims of gene sequence, method and system for searching gene sequence, and method and system for annotating information of gene sequence | |
US10963440B2 (en) | Fast incremental column store data loading | |
CN109977334A (en) | Retrieval rate optimization method | |
US20040158558A1 (en) | Information processor and program for implementing information processor | |
US9390111B2 (en) | Database insert with deferred materialization | |
CN113032420A (en) | Data query method and device and server | |
CN109299143A (en) | The knowledge fast indexing method in the data interoperation knowledge on testing library based on Redis caching | |
EP1850250A1 (en) | Method and system for renewing an index | |
CN109815328A (en) | A kind of abstraction generating method and device | |
Iacob et al. | Gpu accelerated information retrieval using bloom filters | |
CN110874360A (en) | Ordered queue caching method and device based on fixed capacity | |
CN109800208B (en) | Network traceability system and its data processing method, computer storage medium | |
CN107729518A (en) | The text searching method and device of a kind of relevant database | |
Truica et al. | Building an inverted index at the dbms layer for fast full text search | |
JP7273293B2 (en) | Information processing device, control method, program | |
Sahi et al. | NoSQL: Will it be an alternative to a relational database? MySQL vs MongoDB comparison |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |