CN109977334A - Retrieval rate optimization method - Google Patents

Retrieval rate optimization method Download PDF

Info

Publication number
CN109977334A
CN109977334A CN201910231353.5A CN201910231353A CN109977334A CN 109977334 A CN109977334 A CN 109977334A CN 201910231353 A CN201910231353 A CN 201910231353A CN 109977334 A CN109977334 A CN 109977334A
Authority
CN
China
Prior art keywords
data
retrieval
storage
search terms
now
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910231353.5A
Other languages
Chinese (zh)
Other versions
CN109977334B (en
Inventor
潘杰
曹建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Profile Information Technology Co Ltd
Original Assignee
Zhejiang Profile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Profile Information Technology Co Ltd filed Critical Zhejiang Profile Information Technology Co Ltd
Priority to CN201910231353.5A priority Critical patent/CN109977334B/en
Publication of CN109977334A publication Critical patent/CN109977334A/en
Application granted granted Critical
Publication of CN109977334B publication Critical patent/CN109977334B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of retrieval rate optimization method, for each retrieval project, establishes corresponding multiple storage files respectively, and each data of the search terms now are written in corresponding storage file using corresponding multiple threads;For regular governed retrieval project, corresponding multiple storage files are merged into a storage file, the data arranged in sequence in merging the storage file formed;In retrieval based on the storage file formed is merged, retrieved according to the search condition of input.Write efficiency can be improved in such a way that data are written in multithreading multifile in the present invention;The present invention is by being directed to regular governed retrieval project, corresponding multiple storage files are merged into a storage file, make data arranged in sequence in merging the storage file formed, it is retrieved based on the storage file formed is merged, it can guarantee recall precision while improving write efficiency using multithreading multifile writing mode.

Description

Retrieval rate optimization method
Technical field
The invention belongs to information retrieval fields, and in particular to a kind of retrieval rate optimization method.
Background technique
With the arrival of big data era, how rapidly and accurately to be retrieved from huge database using search engine The data oneself needed become the focus of people's research.Search engine generally includes to grab, include, screen and show four at present Major part, the generally existing write efficiency of traditional search engine and the lower problem of recall precision.
Summary of the invention
The present invention provides a kind of retrieval rate optimization method, lower to solve current search engine write-in and recall precision Problem.
According to a first aspect of the embodiments of the present invention, a kind of retrieval rate optimization method is provided, comprising:
For each retrieval project, corresponding multiple storage files are established respectively, using corresponding multiple threads by the search terms Now each data are written in corresponding storage file;
For regular governed retrieval project, corresponding multiple storage files are merged into a storage file, are merging shape At storage file in data arranged in sequence;In retrieval based on the storage file formed is merged, according to the search condition of input It is retrieved.
In an optional implementation manner, for the retrieval project of text text formatting, which is stored now The first text data carry out word segmentation processing, multiple first words are generated, to the second text data in the search condition of input Word segmentation processing is carried out, multiple second words are generated, for each second word, by second word and the multiple first word Matching retrieval is carried out respectively, to obtain the first text data with the second matches text data in the search condition.
In another optional implementation, after inputting search condition, the merging of field condition is carried out to search condition Or the logical value of field merges;
It is retrieved according to the search condition after merging.
In another optional implementation, for the less retrieval project of space hold amount, before retrieval should The data of search terms now are pre-loaded in memory.
In another optional implementation, according to the feature of search terms data now, each search terms are set now The storage format type of data.
In another optional implementation, when a complete data information is written, for the data information In the data of each search terms now, first determine whether that the search terms whether there is idle thread now, if it exists idle thread, then Any idle thread is called, by the data information in the data write-in of the search terms now storage file corresponding with thread is called In, decollator is written after the data write complete, and the calling thread is set to the free time.
In another optional implementation, for the permission longer search terms of input field in each retrieval project Mesh, thread is divided into main thread and standby thread, and establishes interim storage file corresponding with the standby thread, defeated in contrast to allowing Enter the field length retrieval project shorter than its, total number of threads is bigger;
When a complete data information is written, for the data of the data information in each search terms now, sentence first The search terms of breaking whether there is idle main thread now, if it exists idle main thread, then any idle main thread is called, by the number It is believed that breath is write after the data write complete in the data write-in of the search terms now storage file corresponding with main thread is called Enter decollator, and the calling main thread is set to idle main thread;If idle main thread is not present in the search terms now, will write Enter request to be inserted into from the rear end of corresponding queue, and starts corresponding timer;Judge whether the timer is more than corresponding duration, if It is then to enable idle standby thread, the data write-in pair using the standby thread by the data information in the search terms now In the interim storage file answered, decollator is written after the data write, and the standby thread is set to the free time.
In another optional implementation, by quantity to the standby thread and/or it is described corresponding when progress Row adjustment, regulates and controls come the write-in duration to whole data information, so that the renewal speed to database regulates and controls.
In another optional implementation, for every data information, an address information table, the address are all established Contingency table includes the storage address information of the data of the data information in each search terms now, and is directed to each search terms Mesh establishes concordance list respectively, which includes the storage address information of the data of pieces of data information now in the search terms And direction corresponding with each storage address information address, the storage address information include the storage file where corresponding data The address of address and the data under its storage file, the direction address is for being directed toward correspondingly with its storage address information Position in the contingency table of location where first storage address information;
When corresponding search terms are retrieved now, based on the data stored in the search terms now storage file, according to input Search condition retrieved;When showing search result, it is first determined with the storage of the input matched data of search condition Then location information finds out direction ground corresponding with the storage address information of the determination from the corresponding concordance list of retrieval project Location navigates to first in address information table corresponding with the storage address information of the determination according to the direction address found Position where a storage address information obtains address information table corresponding with the storage address information of the determination, obtains the number It is believed that the storage address of data of the breath in each search terms now, to obtain complete data information.
In another optional implementation, the retrieval project includes name, the age, self-introduction, saves nationality, deposit And the date of birth, wherein the storage format type of name is set as text type, and the storage format type at age is set as int class The storage format type of type, self-introduction is set as txt type, and the storage format type for saving nationality is set as keyword type, deposits The storage format type of money is set as long type, and the storage format type of date of birth is set as date type.
The beneficial effects of the present invention are:
1, the mode that the present invention is written by using multithreading multifile, data corresponding with retrieval project in data information are write Enter in corresponding storage file, even if field occupied by data corresponding with the retrieval project is uncertain in pieces of data information In the case of, being written in parallel to for corresponding data in a plurality of data information may be implemented, also so as to improve write efficiency;The present invention By being directed to regular governed retrieval project, corresponding multiple storage files are merged into a storage file, and make data The arranged in sequence in merging the storage file formed can improve the same of write efficiency using multithreading multifile writing mode When, guarantee recall precision;
2, it is inputted in the text data and search condition that the present invention is stored now by the search terms to text text formatting Text data carries out word segmentation processing, and two class text data after participle match by word, rather than matches by word, it is possible thereby to Greatly improve recall precision;
3, the present invention after carrying out the merging of field condition or the logical value merging of field to search condition by retrieving, Ke Yiti High recall precision;
4, the present invention retrieval project less for space hold amount before retrieval preloads the data of the search terms now Into memory, recall precision can be further improved;
5, the present invention be accurately arranged by the Format Type to each search terms purpose data, it is ensured that retrieval it is accurate Degree;
6, the present invention adds standby thread and faces for the longer retrieval project of input field length is allowed in each retrieval project When storage file timer is added, when write request is more than corresponding duration and when data being written idle main thread being not present Idle standby thread is called, is write data into interim storage file, it is possible thereby to allowing input field length longer Search terms purpose data are written duration and carry out dynamic adjustment, to guarantee that the write-in duration of whole data information is controllable, thus Regulate and control convenient for renewal speed of the designer to database;
It is 7, of the invention by establishing address information table, for each retrieval project, establishing concordance list for every data information, Data information memory can be subjected to accurate correlation in the data of each search terms now, and be checked quickly fastly in exhibiting data information Complete data information is found out, so as to improve the accuracy and speed of retrieval.
Detailed description of the invention
Fig. 1 is one embodiment flow chart of retrieval rate optimization method of the present invention.
Specific embodiment
Technical solution in embodiment in order to enable those skilled in the art to better understand the present invention, and make of the invention real The above objects, features, and advantages for applying example can be more obvious and easy to understand, with reference to the accompanying drawing to technical side in the embodiment of the present invention Case is described in further detail.
In the description of the present invention, unless otherwise specified and limited, it should be noted that term " connection " should do broad sense Understand, for example, it may be mechanical connection or electrical connection, the connection being also possible to inside two elements can be directly connected, Can indirectly connected through an intermediary, for the ordinary skill in the art, can understand as the case may be on State the concrete meaning of term.
It is one embodiment flow chart of retrieval rate optimization method of the present invention referring to Fig. 1.This method is applied to be filled in processing It sets on (such as computer, server etc.), retrieval rate optimization is realized by processing unit, this method may include:
Step S101, it is directed to each retrieval project, establishes corresponding multiple storage files respectively, it will using corresponding multiple threads The each data of the search terms now are written in corresponding storage file.
In traditional search engine when carrying out sub-item storage according to data information of the retrieval project to acquisition, usual one A retrieval project only with a thread and only establishes a storage file, only when the thread by upper data information with The corresponding data of retrieval project are written to storage file, could be to corresponding with the retrieval project in next data information Data are written, and the write efficiency that so will lead to data information is lower.Further, since being examined in pieces of data information with some The field of the corresponding data of rope project may be not quite similar, and be written at this time even with multithreading, can only also wait previous thread After storage file is written in corresponding data in upper data information, latter thread could be by pair in next data information It answers in data write-in storage file, if the corresponding data in a upper data information does not write, begins to that next data information is written In corresponding data, it will cause entanglements, it can be seen that, only with multithreading write-in mode can not improve data information Write efficiency.For this purpose, the present invention proposes by the way of the write-in of multithreading multifile, and for retrieving project and be the age, number It is believed that including that age data establishes 5 storage files, correspondingly adopt 5 threads for the retrieval project age in breath The age data in 5 data information is stored respectively into corresponding storage file simultaneously, realizes the age in 5 data information The parallel memorizing of data, to improve write efficiency.The mode that the present invention is written by using multithreading multifile, by data In information in storage file corresponding with the corresponding data write-in of retrieval project, even if in pieces of data information with the retrieval project pair The fields of the data answered is uncertain, being written in parallel to for corresponding data in a plurality of data information may be implemented, also so as to improve Write efficiency.
Step S102, it is directed to regular governed retrieval project, corresponding multiple storage files are merged into a storage File, the data arranged in sequence in merging the storage file formed;In retrieval based on the storage file formed is merged, according to defeated The search condition entered is retrieved.
Although multithreading multifile write-in mode write efficiency can be improved, will lead to retrieval when efficiency compared with It is low.For example, storage has age 5 storage files in respectively include " 2,3 ", " 1,5 ", " 5,6 ", " 2,7 " and " 1,2 ", if Inquire max age, then firstly the need of finding respective max age from 5 storage files respectively, 5 then will found A max age is compared, and can just obtain required max age.For this purpose, the present invention is directed to regular governed search terms The multiple storage file is merged into a storage file in retrieval by mesh, and data are pressed in merging the storage file formed Sequence arrangement.Wherein, regular governed retrieval project may include name, age, province's nationality, deposit and date of birth etc., name Can be ranked up according to the first letter of pinyin of first character with nationality is saved, age and deposit can according to sequence from small to large into Row arrangement, date of birth can be arranged according to date of birth size.Equally it is with 5 storage files for storing has age Example respectively includes " 2,3 ", " 1,5 ", " 5,6 ", " 2,7 " and " 1,2 ", the present invention is in retrieval first by 5 in 5 storage files A storage file is merged into storage file ' 1, and 1,2,2,2,3,4,5,6,7 ', when searching max age, directly from merging Most end in the storage file of formation starts to query.The present invention, will be right by being directed to regular governed retrieval project The multiple storage files answered are merged into a storage file, make data arranged in sequence in merging the storage file formed, can be with While improving write efficiency using multithreading multifile writing mode, guarantee recall precision.
Specifically, when a complete data information is written, for the data information in each search terms now Data first determine whether that the search terms whether there is idle thread now, if it exists idle thread, then call any idle thread, By the data information in the data write-in of the search terms now storage file corresponding with thread is called, write complete in the data After decollator is written, and the calling thread is set to the free time, if it does not exist idle thread, then by write request from corresponding queue Rear end insertion, into waiting until said write request is located at the head end of the queue and there are idle threads.Usual each retrieval The length of the allowed input field of project is not quite similar, such as retrieves project name, at the age, self-introduction, saves nationality, deposit and go out Birthday, interim self-introduction allowed the field length inputted to be longer than remaining five retrieval project, write-in it is spent when Length also will be relatively long, deposits if the five retrieval projects in it and remaining all use same number thread and establish same number File is stored up, then it is possible that data corresponding with remaining five retrieval project have all write in data information, and the data information In data corresponding with self-introduction still waiting to be written, and need to wait the very long time.
In order to improve the whole write efficiency of data information, allow input field length longer in each retrieval project Retrieval project (such as when the length of some allowed input field of retrieval project and remaining retrieval project in each retrieval project When the difference of allowed input field length average value is greater than corresponding preset length, so that it may which it is defeated to allow to determine the retrieval project Enter the longer retrieval project of field length), thread is divided into main thread and standby thread, and (wherein main thread can be multiple, spare Thread at least one, in contrast to the input field length retrieval project shorter than its is allowed, total number of threads is bigger), and establish with The corresponding interim storage file of the standby thread.When a complete data information is written, for the data information every The data of a search terms now first determine whether the search terms now and whether there is idle main thread, if it exists idle main thread, then Any idle main thread is called, by the data information in the data write-in of the search terms now storage corresponding with main thread is called In file, decollator is written after the data write complete, and the calling main thread is set to idle main thread;If the retrieval project Lower there is no idle main threads, then are inserted into write request from the rear end of corresponding queue, and start corresponding timer;Judge the meter When device whether be more than corresponding duration, if so, idle standby thread is enabled, using the standby thread by the data information at this The data of search terms now are written in corresponding interim storage file, decollator are written after the data write, and this is spare Thread is set to the free time.
The present invention for the longer retrieval project of input field length is allowed in each retrieval project, add standby thread and Interim storage file, and data are being written there is no when idle main thread, timer is added, is more than corresponding duration in write request When call idle standby thread, write data into interim storage file, it is possible thereby to allowing input field length longer Search terms purpose data write-in duration carry out dynamic adjustment, with guarantee the write-in duration of whole data information be it is controllable, from And the renewal speed of database is regulated and controled convenient for designer.The present invention passes through quantity to the standby thread and/or institute It states corresponding duration to be adjusted, regulate and control come the write-in duration to whole data information, thus to the renewal speed of database Regulated and controled.In addition, corresponding interim storage is written in the data of the search terms now in the data information using standby thread After file, judge whether idle main thread is in long-term idle state (such as idle state is more than preset duration), if so, Idle main thread is called, the data in interim storage file are written in corresponding storage file, are write after the data write Enter decollator, empties interim storage file.The present invention by main thread for a long time leave unused when, by the data in interim storage file It is written in corresponding with main thread storage file, when on the one hand can be convenient retrieval in storage file corresponding with main thread It is retrieved, is collected convenient for data, on the other hand empty interim storage file, it can be in order to the write-in of data next time.
Since data information is written by the way of multithreading multifile in the present invention, data information is in each search terms Now the position that data are stored be it is disorderly and unsystematic, it is irregular governed.For the correspondence number in accurate exhibiting data information According to the present invention is directed to every data information, all establishes an address information table, which includes that the data information exists The storage address information of the data of each search terms now, and it is directed to each retrieval project, concordance list is established respectively, the index Table include the data of pieces of data information now in the search terms storage address information and with each storage address information pair The direction address answered, the storage address information include storage file address where corresponding data and the data in its storage text Address under part, the direction address is for being directed toward first storage address in address information table corresponding with its storage address information Position where information.
When corresponding search terms are retrieved now, based on the data stored in the search terms now storage file, according to The search condition of input is retrieved;When showing search result, it is first determined with depositing for the input matched data of search condition Address information is stored up, then finds out finger corresponding with the storage address information of the determination from the corresponding concordance list of retrieval project It is navigated in address information table corresponding with the storage address information of the determination to address according to the direction address found Position where first storage address information obtains address information table corresponding with the storage address information of the determination, obtains The storage address of data of the data information in each search terms now, to obtain complete data information.The present invention passes through For every data information, address information table is established, for each retrieval project, concordance list is established, data information can be deposited It stores up and carries out accurate correlation in the data of each search terms now, and quickly find out complete data letter in exhibiting data information Breath, so as to improve the accuracy and speed of retrieval.
As seen from the above-described embodiment, the mode that the present invention is written by using multithreading multifile, by data information with The corresponding data of retrieval project are written in corresponding storage file, even if the number corresponding with the retrieval project in pieces of data information According to being written in parallel to for corresponding data in a plurality of data information in the uncertain situation of occupied field, also may be implemented, so as to To improve write efficiency;Corresponding multiple storage files are merged by the present invention by being directed to regular governed retrieval project One storage file, and make data arranged in sequence in merging the storage file formed, it can be write using multithreading multifile While entering mode and improve write efficiency, guarantee recall precision.
In addition, the retrieval project of text text formatting is directed to, in text query, it will usually use segmentation text and look into It askes, such as having one section of self-introduction is " I is Zhang San, I likes playing basketball, I is an active spadger ".The words default Can store into one unit of each word in a search engine, that is, " I/be// tri-/I/happiness it is/joyous/beat/basket/ball/I/ Be/it is/living/sprinkle// small/male/child ", have a inquiry this when, condition is to introduce myself fields match " basketball ", this When, this condition can also be splitted into ' then basket/ball ' two word first goes to match with ' basket ' one word one word, from ' I ' to ' indigo plant ' is just matched to the 9th, then goes to match with ' ball ' one word one word, and from ' I ' to ' ball ', the tenth word is matched to, It experienced 19 matchings altogether, matching efficiency is lower.For this purpose, invention introduces Words partition system, for text text formatting Retrieval project, the first text data stored now to the search terms carries out word segmentation processing, multiple first words generated, to defeated The second text data in the search condition entered carries out word segmentation processing, generates multiple second words, will for each second word Second word and the multiple first word carry out matching retrieval respectively, to obtain and the second textual data in the search condition According to matched first text data.For example, self-introduction becomes, "/being/Zhang San ,// liking/beat/basketball ,/I lives at/being/for I for I Sprinkle// small/boy ", querying condition becomes " basketball ", this when from ' I ' to ' basketball ', the energy as long as by 7 matchings Find result.It has been in this case that inquiry improves the efficiency of 2 times or more using participle, the text of text is longer, mentions Ascending effect is more obvious.In the text data and search condition that the present invention is stored now by the search terms to text text formatting The text data of input carries out word segmentation processing, and two class text data after participle match by word, rather than matches by word, by This can greatly improve recall precision.
Define three kinds of logical relations in search, MUST, SHOULD, MUST_NOT now take wherein most representative two It is a, one is done for search condition merging and is illustrated: MUST: either multiple conditions or single condition, it is necessary to all be met. SHOULD: must satisfy when single condition, and whens multiple conditions meets one of them.A kind of feelings can be encountered when search Condition, user input search condition when be a more complicated search condition and also input condition be irregular.Than As search condition is:
MUST name includes ' opening ';
The SHOULD age is more than or equal to 20 and is less than or equal to 25;
It is Zhejiang that MUST, which saves nationality,;
Include inside MUST name ' state ';
The SHOULD age is more than or equal to 25 and is less than or equal to 30);
Repeatedly occur the matching of name and the matching at age, if executing search in sequence, first in this condition With name, then the age is matched, then matches province's nationality, then match name, then match the age.But this condition is can to merge in fact At more succinct condition.Condition after having merged is:
MUST name includes ' opening ' and ' state ';
The MUST age is more than or equal to 20 and is less than or equal to 30;
It is Zhejiang that MUST, which saves nationality,;
Name and age field, which do not have to match again, when matching in this way completes once to go to have matched other condition again later.It needs It should be noted that it includes two kinds of merging that condition merging here, which is in fact, one is the merging of simple field condition, just as name Word, another is that the logical value of field merges, just as the age, ' more than or equal to 20 less than or equal to 25 ' and ' more than or equal to 25 One " be more than or equal to 20 and be less than or equal to 30 " is merged into less than or equal to 30 '.The present invention is after inputting search condition, to search condition The logical value for carrying out the merging of field condition or field merges;It is retrieved according to the search condition after merging, it is possible thereby into one Step improves recall precision.
Memory is substantially better than disk in terms of file read-write efficiency, even solid state hard disk, also only 500mb/s, and One common memory, read or write speed is attained by 7000mb/s, so rationally making good use of memory, can promote search speed. Search engine search default be, because often the data volume of search engine is very big, arrive several hundred a GB greatly from file search, and The price difference of the server and a server for defeating GB disk of looking for a several hundred GB memories is very big or even data are bigger, It all can not find the server of so big memory on the market.But the space hold amount for having some fields is smaller, such as year Age saves nationality, can be loaded into memory in advance, these data are deposited lower in memory, in this way, matching the two fields When, search speed can be greatly promoted.That is, the retrieval project that the present invention is less for space hold amount, before retrieval The data of the search terms now are pre-loaded in memory, Lai Tigao recall precision.
The feature of different search terms data now is different in retrieval, the format class of accurate definition different characteristic data Type is to realize the basis accurately retrieved.For this purpose, each search terms can be arranged according to the feature of search terms data now in the present invention Now the Format Type of data storage.For example, the retrieval project includes name, at the age, self-introduction, saves nationality, deposit and go out Phase birthday, the common Format Type of field have:
Int type (for indicating integer, there is the advantage that retrieval ordering is fast),
Long type (for indicating integer type longer than int, retrieval ordering is fast, and volume ratio int is big),
Data type (for indicating the date),
(text type, sequence retrieval rate is slow, and divisible cut across of text is matched, and supports participle matching, does not support to polymerize for text type Inquiry),
(text type, retrieval rate is fast, and indivisible cut across of text is matched, and participle is not supported to match, and supports poly- for keyword type Close inquiry).
Next each field for this batch data mentioned in premise is analyzed.
Name is text formatting, so this field must be arranged to text keyword type, analyzes this word Section, when we search again for, it may be desirable to use, search for the people of all surnames ' open ', or have in all names of search a The people of ' state ' word can support to match after separating, so the type of this field is therefore, it is necessary to this text formatting text。
Age is integer type.So selection int long type.Int and long different digits computer and It is discrepant under different programming languages, but the difference of int, long are that int is suitable for shorter integer type, doing so can To save space.So the type of this field is int.
Deposit, relative to age field and integer type, but its length may grow to the upper limit more than int. So the type of this field is long.
Self-introduction, is one big section of word content, and search when needs to use segmentation characters matching, for example, needing Fit over the record for there are ' playing ball ' two words in self-introduction.This field is arranged to text type.
Save the content of nationality and text formatting, but more special is that the value of province's nationality this field does not need to be partitioned into It matches, ' Zhejiang ' just matching ' Zhejiang ', ' Jiangsu ' just matching ' Jiangsu ' does not need input one ' river ' all bands ' river ' All match come, then this field is arranged to keyword.
Date of birth is arranged to date type.
In addition to this, it searches to have in plain engine and uses a kind of operation for being called aggregate query, similar to the COUNT in mysql (*)+GROUP BY, aiming at some perhaps certain several field when the value of this field or the value phase of these fields When same, quantity is counted.Such as inquiry age=20 people how many, inquiry save nationality be ' Zhejiang ' people how many, even It is the age age that inquiry number comes preceding ten, inquiry number comes preceding ten province's nationality title.Aggregate query is just in this operation. As mentioned in field description, text does not have aggregate query ability in text formatting, and keyword has aggregate query ability.Industry Statistics may be used in business and saves the number in nationality ' Zhejiang ', but will not use how many people of the same self-introduction, therefore, Also it has proved self-introduction and has been arranged to text type, saved the reasonability that nationality is arranged to keyword.The present invention passes through to each retrieval The Format Type of the data of project is accurately arranged, it is ensured that the accuracy of retrieval.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

Claims (10)

1. a kind of retrieval rate optimization method characterized by comprising
For each retrieval project, corresponding multiple storage files are established respectively, using corresponding multiple threads by the search terms Now each data are written in corresponding storage file;
For regular governed retrieval project, corresponding multiple storage files are merged into a storage file, are merging shape At storage file in data arranged in sequence;In retrieval based on the storage file formed is merged, according to the search condition of input It is retrieved.
2. retrieval rate optimization method according to claim 1, which is characterized in that further include:
For the retrieval project of text text formatting, word segmentation processing is carried out to the first text data that the search terms store now, Multiple first words are generated, word segmentation processing is carried out to the second text data in the search condition of input, generates multiple second words Second word and the multiple first word are carried out matching retrieval for each second word by language respectively, thus obtain with First text data of the second matches text data in the search condition.
3. retrieval rate optimization method according to claim 1 or 2, which is characterized in that further include:
After inputting search condition, the logical value for carrying out the merging of field condition or field to search condition merges;
It is retrieved according to the search condition after merging.
4. retrieval rate optimization method according to claim 3, which is characterized in that further include:
For the less retrieval project of space hold amount, the data of the search terms now are pre-loaded to memory before retrieval In.
5. retrieval rate optimization method according to claim 4, which is characterized in that further include:
According to the feature of search terms data now, the storage format type of each search terms data now is set.
6. retrieval rate optimization method according to claim 1, which is characterized in that
When a complete data information is written, for the data of the data information in each search terms now, sentence first The search terms of breaking whether there is idle thread now, if it exists idle thread, then calls any idle thread, by the data information In the data write-in of the search terms now storage file corresponding with thread is called, segmentation is written after the data write complete Symbol, and the calling thread is set to the free time.
7. retrieval rate optimization method according to claim 1 or 6, which is characterized in that
For the longer retrieval project of input field is allowed in each retrieval project, thread is divided into main thread and standby thread, and Corresponding with the standby thread interim storage file is established, in contrast to the permission input field length retrieval project shorter than its, Total number of threads is bigger;
When a complete data information is written, for the data of the data information in each search terms now, sentence first The search terms of breaking whether there is idle main thread now, if it exists idle main thread, then any idle main thread is called, by the number It is believed that breath is write after the data write complete in the data write-in of the search terms now storage file corresponding with main thread is called Enter decollator, and the calling main thread is set to idle main thread;If idle main thread is not present in the search terms now, will write Enter request to be inserted into from the rear end of corresponding queue, and starts corresponding timer;Judge whether the timer is more than corresponding duration, if It is then to enable idle standby thread, the data write-in pair using the standby thread by the data information in the search terms now In the interim storage file answered, decollator is written after the data write, and the standby thread is set to the free time.
8. retrieval rate optimization method according to claim 7, which is characterized in that pass through the quantity to the standby thread And/or the corresponding duration is adjusted, and is regulated and controled come the write-in duration to whole data information, thus more to database New speed is regulated and controled.
9. retrieval rate optimization method according to claim 8, which is characterized in that be directed to every data information, all establish One address information table, the address information table include the storage address letter of the data of the data information in each search terms now Breath, and be directed to each retrieval project, establishes concordance list respectively, the concordance list include pieces of data information the search terms now Data storage address information and direction corresponding with each storage address information address, the storage address information include pair Answer the address of storage file address and the data where data under its storage file, the direction address for be directed toward and its Position in the corresponding address information table of storage address information where first storage address information;
When corresponding search terms are retrieved now, based on the data stored in the search terms now storage file, according to input Search condition retrieved;When showing search result, it is first determined with the storage of the input matched data of search condition Then location information finds out direction ground corresponding with the storage address information of the determination from the corresponding concordance list of retrieval project Location navigates to first in address information table corresponding with the storage address information of the determination according to the direction address found Position where a storage address information obtains address information table corresponding with the storage address information of the determination, obtains the number It is believed that the storage address of data of the breath in each search terms now, to obtain complete data information.
10. retrieval rate optimization method according to claim 5, which is characterized in that the retrieval project includes name, year Age, saves nationality, deposit and date of birth at self-introduction, and wherein the storage format type of name is set as text type, and the age deposits Storage Format Type is set as int type, and the storage format type of self-introduction is set as txt type, saves the storage format class of nationality Type is set as keyword type, and the storage format type of deposit is set as long type, and the storage format type of date of birth is set It is set to date type.
CN201910231353.5A 2019-03-26 2019-03-26 Search speed optimization method Active CN109977334B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910231353.5A CN109977334B (en) 2019-03-26 2019-03-26 Search speed optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910231353.5A CN109977334B (en) 2019-03-26 2019-03-26 Search speed optimization method

Publications (2)

Publication Number Publication Date
CN109977334A true CN109977334A (en) 2019-07-05
CN109977334B CN109977334B (en) 2023-10-20

Family

ID=67080577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910231353.5A Active CN109977334B (en) 2019-03-26 2019-03-26 Search speed optimization method

Country Status (1)

Country Link
CN (1) CN109977334B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104373A (en) * 2019-12-24 2020-05-05 天地伟业技术有限公司 Database performance optimization method
CN115587115A (en) * 2022-12-12 2023-01-10 西南石油大学 Database query optimization method and system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101087210A (en) * 2007-05-22 2007-12-12 网御神州科技(北京)有限公司 High-performance Syslog processing and storage method
US20080163216A1 (en) * 2006-12-27 2008-07-03 Wenlong Li Pointer renaming in workqueuing execution model
CN101739293A (en) * 2009-12-24 2010-06-16 航天恒星科技有限公司 Method for scheduling satellite data product production tasks in parallel based on multithread
CN103729442A (en) * 2013-12-30 2014-04-16 华为技术有限公司 Method for recording event logs and database engine
CN104461915A (en) * 2014-11-17 2015-03-25 苏州阔地网络科技有限公司 Method and device for dynamically allocating internal storage in online class system
CN105069149A (en) * 2015-08-24 2015-11-18 电子科技大学 Structured line data-oriented distributed parallel data importing method
US20170013083A1 (en) * 2014-01-22 2017-01-12 Beijing Jingdong Shangke Information Technology Co., Ltd. Data processing method and apparatus used for terminal application
CN107368362A (en) * 2017-06-29 2017-11-21 上海阅文信息技术有限公司 A kind of multithreading/multi-process for disk read-write data is without lock processing method and system
US20180075080A1 (en) * 2015-07-17 2018-03-15 Hitachi, Ltd. Computer System and Database Management Method
CN108139938A (en) * 2015-07-31 2018-06-08 华为技术有限公司 For assisting the device of main thread executing application task, method and computer program using secondary thread
CN108694187A (en) * 2017-04-07 2018-10-23 北京国双科技有限公司 The storage method and device of real-time streaming data

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080163216A1 (en) * 2006-12-27 2008-07-03 Wenlong Li Pointer renaming in workqueuing execution model
CN101087210A (en) * 2007-05-22 2007-12-12 网御神州科技(北京)有限公司 High-performance Syslog processing and storage method
CN101739293A (en) * 2009-12-24 2010-06-16 航天恒星科技有限公司 Method for scheduling satellite data product production tasks in parallel based on multithread
CN103729442A (en) * 2013-12-30 2014-04-16 华为技术有限公司 Method for recording event logs and database engine
US20170013083A1 (en) * 2014-01-22 2017-01-12 Beijing Jingdong Shangke Information Technology Co., Ltd. Data processing method and apparatus used for terminal application
CN104461915A (en) * 2014-11-17 2015-03-25 苏州阔地网络科技有限公司 Method and device for dynamically allocating internal storage in online class system
US20180075080A1 (en) * 2015-07-17 2018-03-15 Hitachi, Ltd. Computer System and Database Management Method
CN108139938A (en) * 2015-07-31 2018-06-08 华为技术有限公司 For assisting the device of main thread executing application task, method and computer program using secondary thread
CN105069149A (en) * 2015-08-24 2015-11-18 电子科技大学 Structured line data-oriented distributed parallel data importing method
CN108694187A (en) * 2017-04-07 2018-10-23 北京国双科技有限公司 The storage method and device of real-time streaming data
CN107368362A (en) * 2017-06-29 2017-11-21 上海阅文信息技术有限公司 A kind of multithreading/multi-process for disk read-write data is without lock processing method and system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
YUKI SHOJI等: "A Large-Scale Speculation for the Thread-Level Parallelization", 2015 3RD INTERNATIONAL CONFERENCE ON APPLIED COMPUTING AND INFORMATION TECHNOLOGY/2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND INTELLIGENCE *
任双君;周旭;任勇毛;李灵玲;: "基于HTML5的浏览器端多线程下载技术", 计算机系统应用, no. 11 *
孙丽云等: "《数据结构(C语言版)》", 28 February 2017, 华中科技大学出版社, pages: 232 *
过汇卿: "天文大数据存储管理关键技术研究", 《优秀硕士论文全文数据库 信息科技辑》 *
过汇卿: "天文大数据存储管理关键技术研究", 《优秀硕士论文全文数据库 信息科技辑》, no. 05, 15 May 2016 (2016-05-15), pages 1 - 4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104373A (en) * 2019-12-24 2020-05-05 天地伟业技术有限公司 Database performance optimization method
CN111104373B (en) * 2019-12-24 2023-09-19 天地伟业技术有限公司 Database performance optimization method
CN115587115A (en) * 2022-12-12 2023-01-10 西南石油大学 Database query optimization method and system
CN115587115B (en) * 2022-12-12 2023-02-28 西南石油大学 Database query optimization method and system

Also Published As

Publication number Publication date
CN109977334B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN105989195B (en) For handling the method and system of database data
US9507816B2 (en) Partitioned database model to increase the scalability of an information system
US7284009B2 (en) System and method for command line prediction
US9805077B2 (en) Method and system for optimizing data access in a database using multi-class objects
CN103577454B (en) A kind of file mergences method and apparatus
US20070179973A1 (en) Status tool to expose metadata read and write queues
JP2017518584A (en) Method for flash optimized data layout, apparatus for flash optimized storage, and computer program
CN105095247B (en) symbol data analysis method and system
Marchi Dividing up the data: Epistemological, methodological and practical impact of diachronic segmentation
EP3869511A1 (en) Method and system for annotating scope of claims of gene sequence, method and system for searching gene sequence, and method and system for annotating information of gene sequence
US10963440B2 (en) Fast incremental column store data loading
CN109977334A (en) Retrieval rate optimization method
US20040158558A1 (en) Information processor and program for implementing information processor
US9390111B2 (en) Database insert with deferred materialization
CN113032420A (en) Data query method and device and server
CN109299143A (en) The knowledge fast indexing method in the data interoperation knowledge on testing library based on Redis caching
EP1850250A1 (en) Method and system for renewing an index
CN109815328A (en) A kind of abstraction generating method and device
Iacob et al. Gpu accelerated information retrieval using bloom filters
CN110874360A (en) Ordered queue caching method and device based on fixed capacity
CN109800208B (en) Network traceability system and its data processing method, computer storage medium
CN107729518A (en) The text searching method and device of a kind of relevant database
Truica et al. Building an inverted index at the dbms layer for fast full text search
JP7273293B2 (en) Information processing device, control method, program
Sahi et al. NoSQL: Will it be an alternative to a relational database? MySQL vs MongoDB comparison

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant